Slide #1.

ECE 5424: Introduction to Machine Learning Topics: – Gaussians – (Linear) Regression Readings: Barber 8.4, 17.1, 17.2 Stefan Lee Virginia Tech
More slides like this


Slide #2.

Administrativia • HW1 – Due tomorrow night be 11:55pm • Project Proposal – Due: 09/21, 11:55 pm – <=2pages, NIPS format (C) Dhruv Batra 2
More slides like this


Slide #3.

Recap of last time (C) Dhruv Batra 3
More slides like this


Slide #4.

Statistical Estimation • Frequentist Tool • Maximum Likelihood • Bayesian Tools • Maximum A Posteriori • Bayesian Estimation (C) Dhruv Batra 4
More slides like this


Slide #5.

MLE • D1 = {1,1,1,0,0,0} • D2 = {1,0,1,0,1,0} • A function of the data ϕ(Y) is a sufficient statistic, if the following is true å f (y ) = å f (y ) i iÎD1 (C) Dhruv Batra i Þ L(q ; D1 ) =L(q ; D2 ) iÎD2 5
More slides like this


Slide #6.

Beta prior distribution – P(q) • Demo: – http://demonstrations.wolfram.com/BetaDistribution/ Slide Credit: Carlos Guestrin 6
More slides like this


Slide #7.

MAP for Beta distribution • MAP: use most likely parameter: • Beta prior equivalent to extra W/L matches • As N → inf, prior is “forgotten” • But, for small sample size, prior is important! Slide Credit: Carlos Guestrin 7
More slides like this


Slide #8.

Effect of Prior • Prior = Beta(2,2) – θprior = 0.5 • Dataset = {H} – L(θ) = θ – θMLE = 1 • Posterior = Beta(3,2) – θMAP = (3-1)/(3+2-2) = 2/3 (C) Dhruv Batra 8
More slides like this


Slide #9.

Effect of Prior Starting from different priors (C) Dhruv Batra 9
More slides like this


Slide #10.

Using Bayesian posterior • Posterior distribution: • Bayesian inference: – No longer single parameter: – Integral is often hard to compute Slide Credit: Carlos Guestrin 10
More slides like this


Slide #11.

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: (C) Dhruv Batra Slide Credit: Carlos Guestrin 11
More slides like this


Slide #12.

Simplex (C) Dhruv Batra Slide Credit: Erik Sudderth 12
More slides like this


Slide #13.

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13
More slides like this


Slide #14.

Dirichlet Probability Densities Mean: Mode:
More slides like this


Slide #15.

Dirichlet Probability Densities • Matlab Demo – Written by Iyad Obeid (C) Dhruv Batra 15
More slides like this


Slide #16.

Dirichlet Samples Slide Credit: Erik Sudderth
More slides like this


Slide #17.

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: • Observe n data points, ni from assignment i, posterior: Homework 1!!!!  • Prediction: (C) Dhruv Batra 17
More slides like this


Slide #18.

Plan for Today • Gaussians – PDF – MLE/MAP estimation of mean • Regression – Linear Regression – Connections with Gaussians (C) Dhruv Batra 18
More slides like this


Slide #19.

Gaussians (C) Dhruv Batra 19
More slides like this


Slide #20.

What about continuous variables? • Boss says: If I want to bet on continuous variables, like stock prices, what can you do for me? • You say: Let me tell you about Gaussians… (C) Dhruv Batra 20
More slides like this


Slide #21.

Why Gaussians? • Why does the entire world seem to always be telling you about Gaussian? – Central Limit Theorem! (C) Dhruv Batra 21
More slides like this


Slide #22.

Central Limit Theorem • Simplest Form – X1, X2,…, XN are IID random variables – Mean μ, variance σ2 – Sample mean SN approaches Gaussian for large N • Demo – http://www.stat.sc.edu/~west/javahtml/CLT.html (C) Dhruv Batra 22
More slides like this


Slide #23.

Curse of Dimensionality • Consider: Sphere of radius 1 in d-dims • Consider: an outer ε-shell in this sphere • What is (C) Dhruv Batra shell volume ? spherevolume 23
More slides like this


Slide #24.

(C) Dhruv Batra Image Credit: http://en.wikipedia.org/wiki/Bean_machine 24
More slides like this


Slide #25.

Why Gaussians? • Why does the entire world seem to always be harping on about Gaussians? – – – – (C) Dhruv Batra Central Limit Theorem! They’re easy (and we like easy) Closely related to squared loss (will see in regression) Mixture of Gaussians are sufficient to approximate many distributions (will see it clustering) 25
More slides like this


Slide #26.

Some properties of Gaussians • Affine transformation – multiplying by scalar and adding a constant – X ~ N(,2) – Y = aX + b  Y ~ N(a+b,a22) • Sum of Independent Gaussians – X ~ N(X,2X) – Y ~ N(Y,2Y) – Z = X+Y (C) Dhruv Batra  Z ~ N(X+Y, 2X+2Y) 26
More slides like this


Slide #27.

Learning a Gaussian • Collect a bunch of data – Hopefully, i.i.d. samples – e.g., exam scores • Learn parameters – Mean – Variance (C) Dhruv Batra 27
More slides like this


Slide #28.

MLE for Gaussian • Prob. of i.i.d. samples D={x1,…,xN}: • Log-likelihood of data: (C) Dhruv Batra Slide Credit: Carlos Guestrin 28
More slides like this


Slide #29.

Your second learning algorithm: MLE for mean of a Gaussian • What’s MLE for mean? (C) Dhruv Batra Slide Credit: Carlos Guestrin 29
More slides like this


Slide #30.

MLE for variance • Again, set derivative to zero: (C) Dhruv Batra Slide Credit: Carlos Guestrin 30
More slides like this


Slide #31.

Learning Gaussian parameters • MLE: (C) Dhruv Batra 31
More slides like this


Slide #32.

Bayesian learning of Gaussian parameters • Conjugate priors – Mean: Gaussian prior – Variance: Inverse Gamma or Wishart Distribution • Prior for mean: (C) Dhruv Batra Slide Credit: Carlos Guestrin 32
More slides like this


Slide #33.

MAP for mean of Gaussian (C) Dhruv Batra Slide Credit: Carlos Guestrin 33
More slides like this


Slide #34.

New Topic: Regression (C) Dhruv Batra 34
More slides like this


Slide #35.

1-NN for Regression • Often bumpy (overfits) (C) Dhruv Batra Figure Credit: Andrew Moore 35
More slides like this


Slide #36.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 36
More slides like this


Slide #37.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 37
More slides like this


Slide #38.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 38
More slides like this


Slide #39.

Linear Regression • Demo – http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/ LeastSquaresDemo.html (C) Dhruv Batra 39
More slides like this


Slide #40.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 40
More slides like this


Slide #41.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 41
More slides like this


Slide #42.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 42
More slides like this