The central limit theorem Most population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn’t Normal? It is a remarkable fact that as the sample size increases, the distribution of sample means changes its shape: it looks less like that of the population and more like a Normal distribution! CENTRAL LIMIT THEOREM Draw an SRS of size from any population with mean and finite standard deviation . The central limit theorem says that when n is large, the sampling distribution of the sample mean is approximately Normal: is approximately The central limit theorem allows us to use Normal probability calculations to answer questions about sample means from many observations even when the population distribution is not Normal.

The Central Limit Theorem 14 Most population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn’t Normal? It is a remarkable fact that as the sample size increases, the distribution of sample means changes its shape: it looks less like that of the population and more like a Normal distribution! When the sample is large enough, the distribution of sample means is very close to Normal, no matter what shape the population distribution has, as long as the population has a finite standard deviation. Draw an SRS of size n from any population with mean and finite standard deviation s . The central limit theorem(CLT) says that when n is large, the sampling distribution of the sample mean x is approximately Normal: æ s ö x is approximately N ç , ÷ è nø

The Sampling Distribution of x 12 When we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts. The Sampling Distribution of Sample Means Suppose that x is the mean of an SRS of size n drawn from a large population with mean and standard deviation s . Then : The mean of the sampling distribution of x is x = The standard deviation of the sampling distribution of x is s sx = n Note: These facts about the mean and standard deviation of x are true no matter what shape the population distribution has. If individual observations have the N(µ,σ) distribution, then the sample mean of an SRS of size n has the N(µ, σ/√n) distribution regardless of the sample size n.

The normal distribution is very important in statistics when we study sampling distributions. sampling distribution of x a probability distribution for all values of y that are possible with (random) samples of size n parent population the population from which a sample is to be selected x is used to represent the mean of the population consisting of the possible values of x from a random sample size n, that is, the mean of the sampling distribution of x with random samples of size n. x is used to represent the standard deviation of the population consisting of the possible values of x from a random sample size n, that is, the standard deviation of the sampling distribution of x with random samples of size n. x is called the standard error of the mean. Class Exercise #4 illustrates why the normal distribution is so important when we study sampling distributions:

Stat Trek Sampling Distributions Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution. And the standard deviation of this statistic is called the standard error. Variability of a Sampling Distribution The variability of a sampling distribution is measured by its variance or its standard deviation. The variability of a sampling distribution depends on three factors: N: The number of observations in the population. n: The number of observations in the sample. The way that the random sample is chosen. If the population size is much larger than the sample size, then the sampling distribution has roughly the same standard error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/20) of the population size, the standard error will be meaningfully smaller, when we sample without replacement.

The sampling distribution of When we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts. MEAN AND STANDARD DEVIATION OF A SAMPLE MEAN Suppose that is the mean of an SRS of size drawn from a large population with mean and standard deviation . Then the sampling distribution of has mean and standard deviation . We say the statistic is an unbiased estimator of the parameter . Because it’s standard deviation is , the averages are less variable than individual observations, and the results of large samples are less variable than the results of small samples. SAMPLING DISTRIBUTION OF A SAMPLE MEAN If individual observations have the distribution, then the sample mean of an SRS of size has the distribution.

Stat Trek Sampling Distribution of the Mean Suppose we draw all possible samples of size n from a population of size N. Suppose further that we compute a mean score for each sample. In this way, we create a sampling distribution of the mean. We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μ x) is equal to the mean of the population (μ). And the standard error of the sampling distribution (σ x) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n). These relationships are shown in the equations below: μx = μ and σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ] In the standard error formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard error formula can be approximated by: σx = σ / sqrt(n). You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.