Neural Network using Logistic Regression

Using logistic regression, we're stacking multiple logistic regressions (on top of each other as well as additional layers) using result of previous layer as inputs of current layer

Single hidden layer neural network (2 layers neural network - not counting input layer):
  • Input layer (\(a^{[0]}\))
  • Hidden layer: you don't see values in hidden layer (\(a^{[1]}\))
  • Output layer (\(a^{[2]}\))
Horizontal corresponding to training examples
Vertical corresponding to nodes (units) in each layer

Forward pass:
Non-vectorized implementation:
\(z^{[1](1)} = W^{[1]} x^{(1)} + b^{[1]}\), \(z^{[1](2)} = W^{[1]} x^{(2)} + b^{[1]}\), \(z^{[1](3)} = W^{[1]} x^{(3)} + b^{[1]}\)
\(z^{[2](1)} = W^{[2]} x^{(1)} + b^{[2]}\), \(z^{[2](2)} = W^{[2]} x^{(2)} + b^{[2]}\), \(z^{[2](3)} = W^{[2]} x^{(3)} + b^{[2]}\)

Vectorized implementation by stacking all training examples horizontally:
\(z^{[i]} = W^{[i]} X + b^{[i]}\) where \(X \in R^{n_x \times m}\)

Activation functions: (non-linear)
Currently, using sigmoid as activation function, not a very good choice.
  • sigmoid
    • \(g'(z) = g(z) \times ( 1 - g(z)) = a(1-a)\)
  • tanh(z), (shifted sigmoid function) hyperbolic tangent function where \(y \in  [-1, 1]\)
    • \(g'(z) = 1 - g(z)^2 = 1 - a^2\)
  • ReLU (rectified linear unit): a = max(0, z) (commonly used)
    • \(g'(z) = 0 \space if \space z < 0, 1 \space if \space z \geq 0 \)
  • Leaky ReLU: a = max (0.01z, z)
    • \(g'(z) = 0.01 \space if \space z < 0, 1 \space if \space z \geq 0 \)
Except for the output layer if output is either 0 and 1 then use sigmoid as activation makes sense
If we only use linear activation functions, we just basically compute output as linear function of inputs

How to determine dimensions for W and b:
Suppose  \(X \in R^{3 \times m} \), hidden layer has 4 units, output layer has 2 units,
then \( W^{[1]} \in R^{4 \times 3}, b^{[1]} \in R^{4 \times 1},  W^{[2]} \in R^{2 \times 4}, b^{[2]} \in R^{2 \times 1} \)

Backward pass: Gradient descent

Comments