Using logistic regression, we're stacking multiple logistic regressions (on top of each other as well as additional layers) using result of previous layer as inputs of current layer
Single hidden layer neural network (2 layers neural network - not counting input layer):
Vertical corresponding to nodes (units) in each layer
Forward pass:
Non-vectorized implementation:
z[1](1)=W[1]x(1)+b[1], z[1](2)=W[1]x(2)+b[1], z[1](3)=W[1]x(3)+b[1]
z[2](1)=W[2]x(1)+b[2], z[2](2)=W[2]x(2)+b[2], z[2](3)=W[2]x(3)+b[2]
Vectorized implementation by stacking all training examples horizontally:
z[i]=W[i]X+b[i] where X∈Rnx×m
Activation functions: (non-linear)
Currently, using sigmoid as activation function, not a very good choice.
Single hidden layer neural network (2 layers neural network - not counting input layer):
- Input layer (a[0])
- Hidden layer: you don't see values in hidden layer (a[1])
- Output layer (a[2])
Vertical corresponding to nodes (units) in each layer
Forward pass:
Non-vectorized implementation:
z[1](1)=W[1]x(1)+b[1], z[1](2)=W[1]x(2)+b[1], z[1](3)=W[1]x(3)+b[1]
z[2](1)=W[2]x(1)+b[2], z[2](2)=W[2]x(2)+b[2], z[2](3)=W[2]x(3)+b[2]
Vectorized implementation by stacking all training examples horizontally:
z[i]=W[i]X+b[i] where X∈Rnx×m
Activation functions: (non-linear)
Currently, using sigmoid as activation function, not a very good choice.
- sigmoid
- g′(z)=g(z)×(1−g(z))=a(1−a)
- tanh(z), (shifted sigmoid function) hyperbolic tangent function where y∈[−1,1]
- g′(z)=1−g(z)2=1−a2
- ReLU (rectified linear unit): a = max(0, z) (commonly used)
- g′(z)=0 if z<0,1 if z≥0
- Leaky ReLU: a = max (0.01z, z)
- g′(z)=0.01 if z<0,1 if z≥0
Except for the output layer if output is either 0 and 1 then use sigmoid as activation makes sense
Comments
Post a Comment