MLE • D1 = {1,1,1,0,0,0} • D2 = {1,0,1,0,1,0} • A function of the data ϕ(Y) is a sufficient statistic, if the following is true å f (y ) = å f (y ) i iÎD1 (C) Dhruv Batra i Þ L(q ; D1 ) =L(q ; D2 ) iÎD2 5

MAP for Beta distribution • MAP: use most likely parameter: • Beta prior equivalent to extra W/L matches • As N → inf, prior is “forgotten” • But, for small sample size, prior is important! Slide Credit: Carlos Guestrin 7

Using Bayesian posterior • Posterior distribution: • Bayesian inference: – No longer single parameter: – Integral is often hard to compute Slide Credit: Carlos Guestrin 10

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: (C) Dhruv Batra Slide Credit: Carlos Guestrin 11

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: • Observe n data points, ni from assignment i, posterior: Homework 1!!!! • Prediction: (C) Dhruv Batra 17

What about continuous variables? • Boss says: If I want to bet on continuous variables, like stock prices, what can you do for me? • You say: Let me tell you about Gaussians… (C) Dhruv Batra 20

Central Limit Theorem • Simplest Form – X1, X2,…, XN are IID random variables – Mean μ, variance σ2 – Sample mean SN approaches Gaussian for large N • Demo – http://www.stat.sc.edu/~west/javahtml/CLT.html (C) Dhruv Batra 22

Curse of Dimensionality • Consider: Sphere of radius 1 in d-dims • Consider: an outer ε-shell in this sphere • What is (C) Dhruv Batra shell volume ? spherevolume 23

Why Gaussians? • Why does the entire world seem to always be harping on about Gaussians? – – – – (C) Dhruv Batra Central Limit Theorem! They’re easy (and we like easy) Closely related to squared loss (will see in regression) Mixture of Gaussians are sufficient to approximate many distributions (will see it clustering) 25

Some properties of Gaussians • Affine transformation – multiplying by scalar and adding a constant – X ~ N(,2) – Y = aX + b Y ~ N(a+b,a22) • Sum of Independent Gaussians – X ~ N(X,2X) – Y ~ N(Y,2Y) – Z = X+Y (C) Dhruv Batra Z ~ N(X+Y, 2X+2Y) 26