Bayesian learning of Gaussian parameters • Conjugate priors – Mean: Gaussian prior – Variance: Inverse Gamma or Wishart Distribution • Prior for mean: (C) Dhruv Batra Slide Credit: Carlos Guestrin 32
View full slide show




Part 25: Bayesian [15/57] Priors and Posteriors   The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters    Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors    P(θ) is uniform over the allowable range of θ Cannot integrate to 1.0 if the range is infinite. Salvation – improper, but noninformative priors will fall out of the posterior.
View full slide show




Mixing Prior and Sample Information A typical result (exact for sampling from the normal distribution with known variance) Posterior mean  w  Prior Mean + (1-w)  MLE = w  (Prior Mean - MLE) + MLE Posterior Mean - MLE .3333  .32  .073889 Prior Mean - MLE .5  .32 Approximate Result Prior Mean MLE  Prior Variance Asymptotic Variance Posterior Mean  Prior + (1-)MLE 1 1  Prior Variance Asymptotic Variance 1 1 / (1 / 12) Prior Variance =  .09547 1 1 1 / (1 / 12)  1 / (.008704)  Prior Variance Asymptotic Variance w=
View full slide show




Priors and Posteriors   The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters    Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors    P(θ) is uniform over the allowable range of θ Cannot integrate to 1.0 if the range is infinite. Salvation – improper, but noninformative priors will fall out of the posterior.
View full slide show




Priors • What are priors? – – – – Express beliefs before experiments are conducted Computational ease: lead to “good” posteriors Help deal with unseen data Regularizers: More about this in later lectures • Conjugate Priors – Prior is conjugate to likelihood if it leads to itself as posterior – Closed form representation of posterior (C) Dhruv Batra 17
View full slide show




Priors – Where do they come from?   What does the prior contain?  Informative priors – real prior information  Noninformative priors Mathematical complications  Diffuse     Uniform Normal with huge variance Improper priors Conjugate priors p(|data)  L(data)p() L(data)p()d  
View full slide show




Noninformative Inverse Wishart Priors • As R→0, posterior approaches likelihood • Implies very small prior covariance matrix and runs into same problems as inverse gamma prior with small parameters – Too much weight is placed on small variances and so prior is not really noninformative – Study effects are shrunk toward their mean • Could instead choose R with reasonable diagonal elements that match reasonable standard deviation • Still assumes independence • One degree of freedom parameter which implies same amount of prior information about all variance parameters 41
View full slide show




Part 25: Bayesian [41/57] Bayesian Priors Prior Densities  i ~ N [, V ], Implies  i   w i , w i ~ N [, V ]  j ~ Inverse Gamma[v, s j ] (looks like chi-squared), v=3, s j 1 Priors over structural model parameters  ~ N [, aV ],  0 V 1 ~ Wishart[v0 , V0 ], v0 8, V0 8I
View full slide show




Bayesian Priors Prior Densities  i ~ N [, V ], Implies  i   w i , w i ~ N [, V ]  j ~ Inverse Gamma[v, s j ] (looks like chi-squared), v =3, si 1 Priors over model parameters  ~ N [ , aV ],  0 V 1 ~ Wishart[v0 , V0 ], v0 8, V0 8I 24-20/35 Part 24: Bayesian Estimation
View full slide show




Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13
View full slide show




Part 26: Bayesian vs. Classical [37/45] Simulation Based Estimation  Bayesian: Limited to convenient priors (normal, inverse gamma and Wishart) that produce mathematically tractable posteriors. Largely simple RPM’s without heterogeneity.  Classical: Use any distributions for any parts of the heterogeneity that can be simulated. Rich layered model specifications.       Comparable to Bayesian (Normal) Constrain parameters to be positive. (Triangular, Lognormal) Limit ranges of parameters. (Uniform, Triangular) Produce particular shapes of distributions such as small tails. (Beta, Weibull, Johnson SB) Heteroscedasticity and scaling heterogeneity Nesting and multilayered correlation structures
View full slide show




Conjugate Acid-Base Pairs conjugate acid-base pair: H2S and HS- : HS- is the conjugate base of H2S. NH3 and NH4+ : NH4+ is the conjugate acid of NH3. An acid _____ protons to form conjugate _____ A base _____ proton to form conjugate ______. Brønsted-Lowry Acid-base Reaction: An acid and a base react to form their conjugate base and conjugate acid, respectively. acid1 + base1 18-25 base2 + acid2
View full slide show




References • • • • • • • • • [1] Gamma Knife Overview. (2015, November 16). [2] Gamma Knife Sugery: Pros and Cons. (2015, November 16). [3] Caruso, JP, Moosa, S, Fezeu, F, Ramesh, A, Sheehan, JP. (2015, November 16). A cost comparative study of Gamma Knife radiosurgery versus open surgery for intracranial pathology. [4]Taban, A. (2015, November 14). A gamma knifre case study: Acoustic Neuroma. [5] Gamma Knifre Radiosurgery. (2015, November 14). [6]Niranjan, A. (2015, November 14). Gamma Knife Radiosurgery: Current Technique. [7] The Gamma Knifre: A Technical Overview. (2015, November 14). [8] MayoClinic: Brain Stereotactic radiosurgery. (2015, November 14). [9] RWJ University Hospital. (2015, November 16). Gamma Knife.
View full slide show




Part 25: Bayesian [26/57] Nonlinear Models and Simulation  Bayesian inference over parameters in a nonlinear model:      1. Parameterize the model 2. Form the likelihood conditioned on the parameters 3. Develop the priors – joint prior for all model parameters 4. Posterior is proportional to likelihood times prior. (Usually requires conjugate priors to be tractable.) 5. Draw observations from the posterior to study its characteristics.
View full slide show




Image Credits • Slide 1 Description: Caterpillar cartoon. Credit: Alice Vacca Clearance: Licensed from Fotolia, ID#79041625. • Slide 3 (right, bottom) Description: Photo of shelter building silver spotted skipper. Credit: Laurel C. Cepero, case author. • Slide 2 (left) Description: Photo of dead caterpillar. Credit: Teles Source: Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Dead_caterpillar.JPG Clearance: Public domain. • Slide 4 Description: Photo of Epargyraeus clarus adult Credit: Laurel C. Cepero, case author. • Slide 2 (right) Description: Photo of ants and caterpillar. Credit: Paulo Oliveira Clearance: Used with permission. • Slide 2 (center) Description: Photo of caterpillar on wet leaf. Source: Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Dysphania_percota_cate rpillar.jpg Credit: AshLin Clearance: Used in accordance with CC BY-SA 3.0. • Slide 3 (left) Description: Photo of shelter building silver spotted skipper. Credit: Laurel C. Cepero, case author. • Slide 5 Description: Image capture from video of frass flinging. Credit: Kylee Grenis, case author. Source: https://youtu.be/hIwhUwXk4yo • Slides 6, 8, 9 10 Description: Figures and tables from Weiss paper. Credit: M.R. Weiss. Source: Good housekeeping: Why do shelterdwelling caterpillars fling their frass? Ecology Letters 6(4), 361–370. Clearance: Used with permission of Wiley and Sons. • Slide 11 Description: Photo of Dr. Weiss. Source: http://www.weisslab.org/people.html Clearance: Used with permission. • Slide 3 (right, top) Description: Photo of an opened oak leaf shelter. Credit: Kylee Grenis, case author. 12
View full slide show




Your second learning algorithm: MLE for mean of a Gaussian • What’s MLE for mean? (C) Dhruv Batra Slide Credit: Carlos Guestrin 29
View full slide show




(1) Test for a normal distribution hist(DATA) (use this to take a look) Shapiro-Wilk test: shapiro.test(DATA) P ≤ 0.05 Reject the null hypothesis of a normal distribution “Non-NORMAL” P > 0.05 Cannot reject the null of a normal distribution “NORMAL” (2) Test for a equal variance var.test(DATA) (2) Test for a equal variance var.test(DATA) P ≤ 0.05 Reject the null UnEqual variance P > 0.05 Cannot reject the null Equal variance Wilcoxon test (aka Mann-Whitney U) P > 0.05 Cannot reject the null Equal variance P ≤ 0.05 Reject the null Unequal variance t.test(DATA), var.equal=F t.test(DATA),var.equal=T wilcox.test t.test Once you have established that the data are non normal, you need to do wilcox.test. The variance test is to better understand the situation. If you end up with a failed variance test, you still run the wilcox.test but you should proceed with caution. There is not a var.equal=F vs T, version of wilcox.test Adding the F means False- as in equal variance is false, meaning the populations in the analysis have unequal variance. Whereas T means True, in other words the populations have equal variance. One thing left- are the data paired or independent? If independent, then you are all good. If they are paired, then you need to add “paired=T” to the code. You can add it to either wilcox.test or t.test
View full slide show