Part 25: Bayesian [15/57] Priors and Posteriors The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors P(θ) is uniform over the allowable range of θ Cannot integrate to 1.0 if the range is infinite. Salvation – improper, but noninformative priors will fall out of the posterior.

Mixing Prior and Sample Information A typical result (exact for sampling from the normal distribution with known variance) Posterior mean w Prior Mean + (1-w) MLE = w (Prior Mean - MLE) + MLE Posterior Mean - MLE .3333 .32 .073889 Prior Mean - MLE .5 .32 Approximate Result Prior Mean MLE Prior Variance Asymptotic Variance Posterior Mean Prior + (1-)MLE 1 1 Prior Variance Asymptotic Variance 1 1 / (1 / 12) Prior Variance = .09547 1 1 1 / (1 / 12) 1 / (.008704) Prior Variance Asymptotic Variance w=

Priors and Posteriors The Achilles heel of Bayesian Econometrics Noninformative and Informative priors for estimation of parameters Noninformative (diffuse) priors: How to incorporate the total lack of prior belief in the Bayesian estimator. The estimator becomes solely a function of the likelihood Informative prior: Some prior information enters the estimator. The estimator mixes the information in the likelihood with the prior information. Improper and Proper priors P(θ) is uniform over the allowable range of θ Cannot integrate to 1.0 if the range is infinite. Salvation – improper, but noninformative priors will fall out of the posterior.

Priors • What are priors? – – – – Express beliefs before experiments are conducted Computational ease: lead to “good” posteriors Help deal with unseen data Regularizers: More about this in later lectures • Conjugate Priors – Prior is conjugate to likelihood if it leads to itself as posterior – Closed form representation of posterior (C) Dhruv Batra 17

Priors – Where do they come from? What does the prior contain? Informative priors – real prior information Noninformative priors Mathematical complications Diffuse Uniform Normal with huge variance Improper priors Conjugate priors p(|data) L(data)p() L(data)p()d

Noninformative Inverse Wishart Priors • As R→0, posterior approaches likelihood • Implies very small prior covariance matrix and runs into same problems as inverse gamma prior with small parameters – Too much weight is placed on small variances and so prior is not really noninformative – Study effects are shrunk toward their mean • Could instead choose R with reasonable diagonal elements that match reasonable standard deviation • Still assumes independence • One degree of freedom parameter which implies same amount of prior information about all variance parameters 41

Part 25: Bayesian [41/57] Bayesian Priors Prior Densities i ~ N [, V ], Implies i w i , w i ~ N [, V ] j ~ Inverse Gamma[v, s j ] (looks like chi-squared), v=3, s j 1 Priors over structural model parameters ~ N [, aV ], 0 V 1 ~ Wishart[v0 , V0 ], v0 8, V0 8I

Bayesian Priors Prior Densities i ~ N [, V ], Implies i w i , w i ~ N [, V ] j ~ Inverse Gamma[v, s j ] (looks like chi-squared), v =3, si 1 Priors over model parameters ~ N [ , aV ], 0 V 1 ~ Wishart[v0 , V0 ], v0 8, V0 8I 24-20/35 Part 24: Bayesian Estimation

Bayesian learning for multinomial • What if you have a k sided coin??? • Likelihood function if categorical: • Conjugate prior for multinomial is Dirichlet: (C) Dhruv Batra Slide Credit: Carlos Guestrin 13

Part 26: Bayesian vs. Classical [37/45] Simulation Based Estimation Bayesian: Limited to convenient priors (normal, inverse gamma and Wishart) that produce mathematically tractable posteriors. Largely simple RPM’s without heterogeneity. Classical: Use any distributions for any parts of the heterogeneity that can be simulated. Rich layered model specifications. Comparable to Bayesian (Normal) Constrain parameters to be positive. (Triangular, Lognormal) Limit ranges of parameters. (Uniform, Triangular) Produce particular shapes of distributions such as small tails. (Beta, Weibull, Johnson SB) Heteroscedasticity and scaling heterogeneity Nesting and multilayered correlation structures

Conjugate Acid-Base Pairs conjugate acid-base pair: H2S and HS- : HS- is the conjugate base of H2S. NH3 and NH4+ : NH4+ is the conjugate acid of NH3. An acid _____ protons to form conjugate _____ A base _____ proton to form conjugate ______. Brønsted-Lowry Acid-base Reaction: An acid and a base react to form their conjugate base and conjugate acid, respectively. acid1 + base1 18-25 base2 + acid2

References • • • • • • • • • [1] Gamma Knife Overview. (2015, November 16). [2] Gamma Knife Sugery: Pros and Cons. (2015, November 16). [3] Caruso, JP, Moosa, S, Fezeu, F, Ramesh, A, Sheehan, JP. (2015, November 16). A cost comparative study of Gamma Knife radiosurgery versus open surgery for intracranial pathology. [4]Taban, A. (2015, November 14). A gamma knifre case study: Acoustic Neuroma. [5] Gamma Knifre Radiosurgery. (2015, November 14). [6]Niranjan, A. (2015, November 14). Gamma Knife Radiosurgery: Current Technique. [7] The Gamma Knifre: A Technical Overview. (2015, November 14). [8] MayoClinic: Brain Stereotactic radiosurgery. (2015, November 14). [9] RWJ University Hospital. (2015, November 16). Gamma Knife.

Part 25: Bayesian [26/57] Nonlinear Models and Simulation Bayesian inference over parameters in a nonlinear model: 1. Parameterize the model 2. Form the likelihood conditioned on the parameters 3. Develop the priors – joint prior for all model parameters 4. Posterior is proportional to likelihood times prior. (Usually requires conjugate priors to be tractable.) 5. Draw observations from the posterior to study its characteristics.

Image Credits • Slide 1 Description: Caterpillar cartoon. Credit: Alice Vacca Clearance: Licensed from Fotolia, ID#79041625. • Slide 3 (right, bottom) Description: Photo of shelter building silver spotted skipper. Credit: Laurel C. Cepero, case author. • Slide 2 (left) Description: Photo of dead caterpillar. Credit: Teles Source: Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Dead_caterpillar.JPG Clearance: Public domain. • Slide 4 Description: Photo of Epargyraeus clarus adult Credit: Laurel C. Cepero, case author. • Slide 2 (right) Description: Photo of ants and caterpillar. Credit: Paulo Oliveira Clearance: Used with permission. • Slide 2 (center) Description: Photo of caterpillar on wet leaf. Source: Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Dysphania_percota_cate rpillar.jpg Credit: AshLin Clearance: Used in accordance with CC BY-SA 3.0. • Slide 3 (left) Description: Photo of shelter building silver spotted skipper. Credit: Laurel C. Cepero, case author. • Slide 5 Description: Image capture from video of frass flinging. Credit: Kylee Grenis, case author. Source: https://youtu.be/hIwhUwXk4yo • Slides 6, 8, 9 10 Description: Figures and tables from Weiss paper. Credit: M.R. Weiss. Source: Good housekeeping: Why do shelterdwelling caterpillars fling their frass? Ecology Letters 6(4), 361–370. Clearance: Used with permission of Wiley and Sons. • Slide 11 Description: Photo of Dr. Weiss. Source: http://www.weisslab.org/people.html Clearance: Used with permission. • Slide 3 (right, top) Description: Photo of an opened oak leaf shelter. Credit: Kylee Grenis, case author. 12

(1) Test for a normal distribution hist(DATA) (use this to take a look) Shapiro-Wilk test: shapiro.test(DATA) P ≤ 0.05 Reject the null hypothesis of a normal distribution “Non-NORMAL” P > 0.05 Cannot reject the null of a normal distribution “NORMAL” (2) Test for a equal variance var.test(DATA) (2) Test for a equal variance var.test(DATA) P ≤ 0.05 Reject the null UnEqual variance P > 0.05 Cannot reject the null Equal variance Wilcoxon test (aka Mann-Whitney U) P > 0.05 Cannot reject the null Equal variance P ≤ 0.05 Reject the null Unequal variance t.test(DATA), var.equal=F t.test(DATA),var.equal=T wilcox.test t.test Once you have established that the data are non normal, you need to do wilcox.test. The variance test is to better understand the situation. If you end up with a failed variance test, you still run the wilcox.test but you should proceed with caution. There is not a var.equal=F vs T, version of wilcox.test Adding the F means False- as in equal variance is false, meaning the populations in the analysis have unequal variance. Whereas T means True, in other words the populations have equal variance. One thing left- are the data paired or independent? If independent, then you are all good. If they are paired, then you need to add “paired=T” to the code. You can add it to either wilcox.test or t.test