Notations

Notations:

m: number of training examples
n = nx = dimension of input feature x, so xRnx
{0, 1} for Binary Classification
Set {(x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))} where (x(i),y(i)) is the ith training example

Stacking data by columns:
  • X = [x(1),x(2),...,x(m)] Rnx×m is a matrix with each feature vector as column
  • Y = [y(1),y(2),...,y(m)R1×m is a (row) matrix with each output as column
Sigmoid function: σ(z)=1(1+ez) where z=wTx+b with wRnx, bR

w: weight is row vector 1×nx
b: bias term, is R
n[i]: number of units in layer ith

W[i],b[i] is the Weights and bias term in ith layer in the neural network

a[i]: activation, refers to values that each layer passing on to subsequent layer
z[j]i: node ith in layer jth (input layer is layer 0)
a[j](i): activation resulted from ith training example on layer j

Comments