## Slide #1.

ECE 5424: Introduction to Machine Learning Topics: – Gaussians – (Linear) Regression Readings: Barber 8.4, 17.1, 17.2 Stefan Lee Virginia Tech
More slides like this

## Slide #2.

Administrativia • HW1 – Due tomorrow night be 11:55pm • Project Proposal – Due: 09/21, 11:55 pm – <=2pages, NIPS format (C) Dhruv Batra 2
More slides like this

## Slide #3.

Recap of last time (C) Dhruv Batra 3
More slides like this

## Slide #4.

Statistical Estimation • Frequentist Tool • Maximum Likelihood • Bayesian Tools • Maximum A Posteriori • Bayesian Estimation (C) Dhruv Batra 4
More slides like this

## Slide #5.

MLE • D1 = {1,1,1,0,0,0} • D2 = {1,0,1,0,1,0} • A function of the data ϕ(Y) is a sufficient statistic, if the following is true å f (y ) = å f (y ) i iÎD1 (C) Dhruv Batra i Þ L(q ; D1 ) =L(q ; D2 ) iÎD2 5
More slides like this

## Slide #6.

Beta prior distribution – P(q) • Demo: – http://demonstrations.wolfram.com/BetaDistribution/ Slide Credit: Carlos Guestrin 6
More slides like this

## Slide #7.

MAP for Beta distribution • MAP: use most likely parameter: • Beta prior equivalent to extra W/L matches • As N → inf, prior is “forgotten” • But, for small sample size, prior is important! Slide Credit: Carlos Guestrin 7
More slides like this

## Slide #8.

Effect of Prior • Prior = Beta(2,2) – θprior = 0.5 • Dataset = {H} – L(θ) = θ – θMLE = 1 • Posterior = Beta(3,2) – θMAP = (3-1)/(3+2-2) = 2/3 (C) Dhruv Batra 8
More slides like this

## Slide #9.

Effect of Prior Starting from different priors (C) Dhruv Batra 9
More slides like this

## Slide #10.

Using Bayesian posterior • Posterior distribution: • Bayesian inference: – No longer single parameter: – Integral is often hard to compute Slide Credit: Carlos Guestrin 10
More slides like this

## Slide #11.

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: (C) Dhruv Batra Slide Credit: Carlos Guestrin 11
More slides like this

## Slide #12.

Simplex (C) Dhruv Batra Slide Credit: Erik Sudderth 12
More slides like this

## Slide #13.

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13
More slides like this

## Slide #14.

Dirichlet Probability Densities Mean: Mode:
More slides like this

## Slide #15.

Dirichlet Probability Densities • Matlab Demo – Written by Iyad Obeid (C) Dhruv Batra 15
More slides like this

## Slide #16.

Dirichlet Samples Slide Credit: Erik Sudderth
More slides like this

## Slide #17.

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: • Observe n data points, ni from assignment i, posterior: Homework 1!!!!  • Prediction: (C) Dhruv Batra 17
More slides like this

## Slide #18.

Plan for Today • Gaussians – PDF – MLE/MAP estimation of mean • Regression – Linear Regression – Connections with Gaussians (C) Dhruv Batra 18
More slides like this

## Slide #19.

Gaussians (C) Dhruv Batra 19
More slides like this

## Slide #20.

What about continuous variables? • Boss says: If I want to bet on continuous variables, like stock prices, what can you do for me? • You say: Let me tell you about Gaussians… (C) Dhruv Batra 20
More slides like this

## Slide #21.

Why Gaussians? • Why does the entire world seem to always be telling you about Gaussian? – Central Limit Theorem! (C) Dhruv Batra 21
More slides like this

## Slide #22.

Central Limit Theorem • Simplest Form – X1, X2,…, XN are IID random variables – Mean μ, variance σ2 – Sample mean SN approaches Gaussian for large N • Demo – http://www.stat.sc.edu/~west/javahtml/CLT.html (C) Dhruv Batra 22
More slides like this

## Slide #23.

Curse of Dimensionality • Consider: Sphere of radius 1 in d-dims • Consider: an outer ε-shell in this sphere • What is (C) Dhruv Batra shell volume ? spherevolume 23
More slides like this

## Slide #24.

(C) Dhruv Batra Image Credit: http://en.wikipedia.org/wiki/Bean_machine 24
More slides like this

## Slide #25.

Why Gaussians? • Why does the entire world seem to always be harping on about Gaussians? – – – – (C) Dhruv Batra Central Limit Theorem! They’re easy (and we like easy) Closely related to squared loss (will see in regression) Mixture of Gaussians are sufficient to approximate many distributions (will see it clustering) 25
More slides like this

## Slide #26.

Some properties of Gaussians • Affine transformation – multiplying by scalar and adding a constant – X ~ N(,2) – Y = aX + b  Y ~ N(a+b,a22) • Sum of Independent Gaussians – X ~ N(X,2X) – Y ~ N(Y,2Y) – Z = X+Y (C) Dhruv Batra  Z ~ N(X+Y, 2X+2Y) 26
More slides like this

## Slide #27.

Learning a Gaussian • Collect a bunch of data – Hopefully, i.i.d. samples – e.g., exam scores • Learn parameters – Mean – Variance (C) Dhruv Batra 27
More slides like this

## Slide #28.

MLE for Gaussian • Prob. of i.i.d. samples D={x1,…,xN}: • Log-likelihood of data: (C) Dhruv Batra Slide Credit: Carlos Guestrin 28
More slides like this

## Slide #29.

Your second learning algorithm: MLE for mean of a Gaussian • What’s MLE for mean? (C) Dhruv Batra Slide Credit: Carlos Guestrin 29
More slides like this

## Slide #30.

MLE for variance • Again, set derivative to zero: (C) Dhruv Batra Slide Credit: Carlos Guestrin 30
More slides like this

## Slide #31.

Learning Gaussian parameters • MLE: (C) Dhruv Batra 31
More slides like this

## Slide #32.

Bayesian learning of Gaussian parameters • Conjugate priors – Mean: Gaussian prior – Variance: Inverse Gamma or Wishart Distribution • Prior for mean: (C) Dhruv Batra Slide Credit: Carlos Guestrin 32
More slides like this

## Slide #33.

MAP for mean of Gaussian (C) Dhruv Batra Slide Credit: Carlos Guestrin 33
More slides like this

## Slide #34.

New Topic: Regression (C) Dhruv Batra 34
More slides like this

## Slide #35.

1-NN for Regression • Often bumpy (overfits) (C) Dhruv Batra Figure Credit: Andrew Moore 35
More slides like this

## Slide #36.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 36
More slides like this

## Slide #37.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 37
More slides like this

## Slide #38.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 38
More slides like this

## Slide #39.

Linear Regression • Demo – http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/ LeastSquaresDemo.html (C) Dhruv Batra 39
More slides like this

## Slide #40.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 40
More slides like this

## Slide #41.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 41
More slides like this

## Slide #42.

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 42
More slides like this