ˆy = σ(z)=1(1+e−z) where z=wTx+b with w∈Rnx, b∈R
Goal: given a set of m training examples, find w and b so that ˆy(i)≈y(i), estimate is close to ground truth. In other words, minimize following Cost function:
Loss (error) function: L(ˆy,y)=−(ylog(ˆy)+(1−y)log(1−ˆy)) on a single training example
Note: square error not working well with logistic regression because of non-convex
Intuition: if ˆy is close to y (edge cases: either 0 or 1), L is close to 0 which is what we want
Cost function (on entire training examples): J(w,b)=1m∑mi=1L(ˆy(i),y(i))
Gradient Descent: find w and b to minimze J(w, b). Algorithm:
Repeat following updates until converge: {
w:=w−α∂J(w,b)∂w
b:=b−α∂J(w,b)∂b
}
where α is the learning rate, control how big a step in each iteration


Derivatives Intuition:
slope=heightwidth
f(a) = 3a
Let a = 2 then f(2) = 6, if we nudge a a little bit to 2.001, then f(2.001) = 6.003,
then we have the slope (derivative) of f(a) at a=2 is df(a)da=ΔyΔx=3
Goal: given a set of m training examples, find w and b so that ˆy(i)≈y(i), estimate is close to ground truth. In other words, minimize following Cost function:
Loss (error) function: L(ˆy,y)=−(ylog(ˆy)+(1−y)log(1−ˆy)) on a single training example
Note: square error not working well with logistic regression because of non-convex
Intuition: if ˆy is close to y (edge cases: either 0 or 1), L is close to 0 which is what we want
Cost function (on entire training examples): J(w,b)=1m∑mi=1L(ˆy(i),y(i))
Gradient Descent: find w and b to minimze J(w, b). Algorithm:
Repeat following updates until converge: {
w:=w−α∂J(w,b)∂w
b:=b−α∂J(w,b)∂b
}
where α is the learning rate, control how big a step in each iteration


Derivatives Intuition:
slope=heightwidth
f(a) = 3a
Let a = 2 then f(2) = 6, if we nudge a a little bit to 2.001, then f(2.001) = 6.003,
then we have the slope (derivative) of f(a) at a=2 is df(a)da=ΔyΔx=3
Comments
Post a Comment