Logistic Regression:
z=wTx+b
ˆy=a=σ(z)
L(a,y)=−(ylog(a)+(1−y)log(1−a))
Then, computation graph is:
∂L∂a=−ya+1−y1−a and ∂a∂z=a(1−a) (derivative of sigmoid function)
then, ∂L∂z=∂L∂a∗∂a∂z=a−y
Finally,
∂L∂w1=∂L∂z∗∂z∂w1=(a−y)∗x1, ∂L∂x1=∂L∂z∗∂z∂x1=(a−y)∗w1
∂L∂w2=∂L∂z∗∂z∂w2=(a−y)∗x2, ∂L∂x2=∂L∂z∗∂z∂x1=(a−y)∗w2
∂L∂b=∂L∂z∗∂z∂b=(a−y)∗1=(a−y)
From here, we can use Gradient Descent algorithm to update w and b
Algorithm for m training examples: (suppose n_x = 2)
for i from 1 to m: {
}
Then J/=m,dw1/=m,dw2/=m,db/=m
Then run repeat simultaneous update for w1,w2,b until converges
Vectorization: get rid of for loops taking advantage of parallelism
z = np.dot(w, x)
Rule of thumbs: whenever possible, avoid explicit for-loops
u = Av => u = np.dot(A, v)
v, find u = exp(v) => u = np.exp(v)
Above algorithm in vectorized form in python using numpy:
Z=np.dot(w.T,X)+b (b is made into an (1 x m) row vector by python's broadcasting
A=σ(Z)
dZ=A−Y
dw=1mXdZT
db=1mnp.sum(dZ)
Where
X∈Rnx×mb,Y,Z,A∈R1×mw,dw∈Rnx×1b∈R
Then run repeat simultaneous update w:=w−αdw,b:=b−αdb
We still need a for (or while) loop over multiple iterations of gradient descent
z=wTx+b
ˆy=a=σ(z)
L(a,y)=−(ylog(a)+(1−y)log(1−a))
Then, computation graph is:
∂L∂a=−ya+1−y1−a and ∂a∂z=a(1−a) (derivative of sigmoid function)
then, ∂L∂z=∂L∂a∗∂a∂z=a−y
Finally,
∂L∂w1=∂L∂z∗∂z∂w1=(a−y)∗x1, ∂L∂x1=∂L∂z∗∂z∂x1=(a−y)∗w1
∂L∂w2=∂L∂z∗∂z∂w2=(a−y)∗x2, ∂L∂x2=∂L∂z∗∂z∂x1=(a−y)∗w2
∂L∂b=∂L∂z∗∂z∂b=(a−y)∗1=(a−y)
From here, we can use Gradient Descent algorithm to update w and b
Algorithm for m training examples: (suppose n_x = 2)
for i from 1 to m: {
- z(i)=wTx(i)+b
- a(i)=ˆy(i)=σ(z(i))
- dz(i)=a(i)−y(i)
- J+=−(y(i)log(a(i))+(1−y)(i)log(1−a(i)))
- dw1+=x(i)1dz(i)
- dw2+=x(i)2dz(i)
- db+=dz(i)
}
Then J/=m,dw1/=m,dw2/=m,db/=m
Then run repeat simultaneous update for w1,w2,b until converges
Vectorization: get rid of for loops taking advantage of parallelism
z = np.dot(w, x)
Rule of thumbs: whenever possible, avoid explicit for-loops
u = Av => u = np.dot(A, v)
v, find u = exp(v) => u = np.exp(v)
Above algorithm in vectorized form in python using numpy:
Z=np.dot(w.T,X)+b (b is made into an (1 x m) row vector by python's broadcasting
A=σ(Z)
dZ=A−Y
dw=1mXdZT
db=1mnp.sum(dZ)
Where
X∈Rnx×mb,Y,Z,A∈R1×mw,dw∈Rnx×1b∈R
Then run repeat simultaneous update w:=w−αdw,b:=b−αdb
We still need a for (or while) loop over multiple iterations of gradient descent
Comments
Post a Comment