Logistic Regression for Binary Classification

ˆy = σ(z)=1(1+ez) where z=wTx+b with wRnx, bR

Goal: given a set of m training examples, find w and b so that ˆy(i)y(i), estimate is close to ground truth. In other words, minimize following Cost function:

Loss (error) function: L(ˆy,y)=(ylog(ˆy)+(1y)log(1ˆy)) on a single training example
Note: square error not working well with logistic regression because of non-convex
Intuition: if ˆy is close to y (edge cases: either 0 or 1), L is close to 0 which is what we want

Cost function (on entire training examples): J(w,b)=1mmi=1L(ˆy(i),y(i))

Gradient Descent: find w and b to minimze J(w, b). Algorithm:
Repeat following updates until converge: {
    w:=wαJ(w,b)w
    b:=bαJ(w,b)b
}
where α is the learning rate, control how big a step in each iteration

Derivatives Intuition:
slope=heightwidth
f(a) = 3a
Let a = 2 then f(2) = 6, if we nudge a a little bit to 2.001, then f(2.001) = 6.003,
then we have the slope (derivative) of f(a) at a=2 is df(a)da=ΔyΔx=3

Comments