Slides from Politecnico Di Torino about Optimization for Machine Learning. The Pdf explores concepts like MAP estimation and prior selection, illustrating the application of the Beta prior for Bernoulli likelihood. This University level Computer science document provides detailed explanations and formulas for self-study.
See more28 Pages


Unlock the full PDF for free
Sign up to get full access to the document and start transforming it with AI.
Posterior x Likelihood x Prior
T. Bayes
G.C. Calafiore (Politecnico di Torino) 2 / 28Outline
G.C. Calafiore (Politecnico di Torino) 3 / 28Introduction Bayesian inference is a mathematical procedure that applies probabilities to statistical problems. It provides the tools to update one's beliefs in the evidence of new data.
p(A|B) = p(B) p(A)p(B|A) . A is some statement (e.g., "the subject is pregnant" ), and B is some data or evidence (e.g., "the human chorionic gonadotropin (HCG) test is positive" ).
p(pregnant|HCG+) = p(pregnant)p(HCG+|pregnant) p(HCG+) . p(pregnant|HCG+): the probability that the subject is pregnant, given the information that the HCG test is positive.
G.C. Calafiore (Politecnico di Torino) 4 / 28Introduction p(A)p(B|A) p(A|B) = p(B) → p(pregant |HCG+) = p(pregnant)p(HCG+|pregnant) p(HCG+) . p(pregnant): the probability of being pregnant, before looking at any evidence: it is the a-priori plausibility of statement A, for instance based on age, nationality, etc.
p(HCG+) = p(HCG+|preg.)p(preg.) + p(HCG+|not preg.)p(not preg.)
G.C. Calafiore (Politecnico di Torino) 5 / 28Bayes' Theorem In pictures ... A B . p(An B) is the joint probability of A and B, often also denoted by p(A, B)
p(A|B) = p(B) p(AnB)
p(B|A) = p(A) p(BnA) = p(A|B)p(B) p(A) which is Bayes' rule.
G.C. Calafiore (Politecnico di Torino) 6 / 28Law of total probability
p(A) = > p(An Bi)= >p(A|Bi)p(Bi) i i
p(Bk|A) = p(A|Bk)p(Bk) p(A) = p(A|Bk)p(Bk) Ei p(A|B;)p(Bi) .
G.C. Calafiore (Politecnico di Torino) 7 / 28Bayes' rule for probability density functions (pdf)
p(0|D) = p(D|0)p(0) p(D) , where p(0|D) is the posterior distribution, representing the updated state of knowledge about 0, after we see the data.
Posterior & Likelihood x Prior.
G.C. Calafiore (Politecnico di Torino) 8 / 28Bayes' rule for probability density functions (pdf)
p(D) = |p(DO)p(A)do.
D1 D2 D3 data data data p(0) p(e|D1) p(e|D1, D2) etc ... Bayes' Rule Bayes' Rule Bayes' Rule prior posterior (becomes new prior) posterior (becomes new prior)
G.C. Calafiore (Politecnico di Torino) 9 / 28Bayesian Estimators
G.C. Calafiore (Politecnico di Torino) 10 / 28Loss functions
Quadratic: L(0-0) = (0- 0)2 Absolute-value: C(0 - 0) = 10 - 0) · Hit-or-miss: C(0 -0) = { 1 O if 0 - @ < 8 if | 0 - 0| > 8 Huber loss: L(O-Ô) ={ 1(0 - 0)2 if 0 - 0| < 8 8 ( 0 - 0) - 8 /2) if |0 - 0| > 8
G.C. Calafiore (Politecnico di Torino) 11 / 28Estimation
 = arg min | [(0 -0) p(O)D) do = arg min E{C(0 - 0) |D}.
G.C. Calafiore (Politecnico di Torino) 12 / 28Estimation Minimum Mean-Square Error (MMSE)
E{C(0 - 0)|D} = (0 -0)2 p(OD) de
∂ ˆ (0 -0)2 p(OD) de = - ˆ ∂ (0-0)2p(OD) de = -2(0 -0) p(e)D) de
-2(0-@)p(0|D)de = 0 4 ⇔ Ôp(O)D)do = |Op(0)D)de = [Op(0)D)do = E{0|D}
G.C. Calafiore (Politecnico di Torino) 13 / 28Estimation Minimum Mean-Square Error (MMSE)
ÔMMSE = / Op(0|D)do = E{e|D}, i.e., the mean of the posterior pdf p(0|D).
G.C. Calafiore (Politecnico di Torino) 14 / 28Estimation Minimum Absolute Error (MAE)
E{C(0 - 0)|D} = |0 - Ôp(OD) do
1 O ·Ô 0 p(!|D)de = .00 p(e|D)de
- 0 p(e|D)de = 1 2', which is the definition of the median of the posterior pdf.
G.C. Calafiore (Politecnico di Torino) 15 / 28Estimation Maximum A-Posteriori Estimator (MAP)
O if 0 - @ < 8 1 if |0 - @ > s . we have E{C(0- 0)|D} = -00 0-8 p(!|D)de + JÔ+8 00 p(!|D)de 1 la 0-8 .Ô+8 = - p(e|D)de
1 JÔ-8 p(!|D)de.
G.C. Calafiore (Politecnico di Torino) 16 / 28Estimation MAP and Maximum Likelihood (ML) estimators
θ ˆ OMAP = arg max p(((D) [by Bayes' rule] = arg max p(D|0)p(0), 0 where p(D|0) is the likelihood, and p(0) is the prior.
ÔML = arg max p(D|0), and it is equivalent to the MAP estimator, under uniform prior.
G.C. Calafiore (Politecnico di Torino) 17 / 28Estimation Summary
G.C. Calafiore (Politecnico di Torino) 18 / 28Example Estimating the outcome probability of a Bernoulli experiment
p(y = 1) = 0.
Ber(x|0)= 01(x)(1- 0)1(1-x) 1 1-0 θ for x = 1 for x = 0 where 1(x) is equal to one for x = 1 and it is zero for x = 0.
G.C. Calafiore (Politecnico di Torino) 19 / 28