

m: number of training examples
n = \(n_x\) = dimension of input feature \(x\), so \(x \in R^{n_x}\)
\(\in\) {0, 1} for Binary Classification
Set {\((x^{(1)}, y^{(1)}),  (x^{(2)}, y^{(2)}), ..., (x^{(m)}, y^{(m)})\)} where \((x^{(i)}, y^{(i)})\) is the \(i^{th}\) training example

Stacking data by columns:
  • X = [\(x^{(1)}, x^{(2)}, ..., x^{(m)}\)] \(\in R^{n_x \times m} \) is a matrix with each feature vector as column
  • Y = [\(y^{(1)}, y^{(2)}, ..., y^{(m)}\)] \(\in R^{1 \times m} \) is a (row) matrix with each output as column
Sigmoid function: \(\sigma(z) = \frac{1}{(1 + e^{-z})} \) where \(z = w^T x + b\) with \(w \in R^{n_x}\), \(b \in R\)

w: weight is row vector \(1 \times n_x\)
b: bias term, is \(\in R\)
\(n^{[i]}\): number of units in layer \(i^{th}\)

\(W^{[i]}, b^{[i]}\) is the Weights and bias term in \(i^{th}\) layer in the neural network

\(a^{[i]}\): activation, refers to values that each layer passing on to subsequent layer
\(z_i^{[j]}\): node \(i^{th}\) in layer \(j^{th}\) (input layer is layer 0)
\(a^{[j](i)}\): activation resulted from \(i^{th}\) training example on layer j
