Computation Graph for Logistic Regression

Logistic Regression:
z=wTx+b
ˆy=a=σ(z)
L(a,y)=(ylog(a)+(1y)log(1a))
Then, computation graph is:
La=ya+1y1a and az=a(1a) (derivative of sigmoid function)
then, Lz=Laaz=ay
Finally,
Lw1=Lzzw1=(ay)x1, Lx1=Lzzx1=(ay)w1
Lw2=Lzzw2=(ay)x2, Lx2=Lzzx1=(ay)w2
Lb=Lzzb=(ay)1=(ay)
From here, we can use Gradient Descent algorithm to update w and b

Algorithm for m training examples: (suppose n_x = 2)
for i from 1 to m: {

  • z(i)=wTx(i)+b
  • a(i)=ˆy(i)=σ(z(i))
  • dz(i)=a(i)y(i)
  • J+=(y(i)log(a(i))+(1y)(i)log(1a(i)))
  • dw1+=x(i)1dz(i)
  • dw2+=x(i)2dz(i)
  • db+=dz(i)

}
Then J/=m,dw1/=m,dw2/=m,db/=m
Then run repeat simultaneous update for w1,w2,b until converges

Vectorization: get rid of for loops taking advantage of parallelism
z = np.dot(w, x)
Rule of thumbs: whenever possible, avoid explicit for-loops
u = Av => u = np.dot(A, v)
v, find u = exp(v) => u = np.exp(v)

Above algorithm in vectorized form in python using numpy:
Z=np.dot(w.T,X)+b (b is made into an (1 x m) row vector by python's broadcasting
A=σ(Z)
dZ=AY
dw=1mXdZT
db=1mnp.sum(dZ)
Where
XRnx×mb,Y,Z,AR1×mw,dwRnx×1bR
Then run repeat simultaneous update w:=wαdw,b:=bαdb

We still need a for (or while) loop over multiple iterations of gradient descent


Comments