createconfigs script in src/mpp-mpred-3.2.0/p95/mu11 #!/bin/bash for g in .1 .2 .4 .7 .9 do sed -i -e "s/dMNsdsThr=[^ ]*/dMNsdsThr=$g/" t.config for h in .1 .2 .4 .7 .9 do sed -i -e "s/dMNsdsExp=[^ ]*/dMNsdsExp=$h/" t.config cp t.config configs/a$g$h.config submit script run in scr/mpp-mpred-3.2.0 produces subdirs in mpp-mpred-3.2.0 : drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x creates in src.mpp-mpred-3.2.0/p95/mu11/configs: -rw-r--r-- 1 a.1.1.config -rw-r--r-- 1 a.1.2.config -rw-r--r-- 1 a.1.4.config -rw-r--r-- 1 a.1.7.config -rw-r--r-- 1 a.1.9.config -rw-r--r-- 1 a.2.1.config -rw-r--r-- 1 a.2.2.co -rw-r--r-- 1 a.2.4.co submit in src/mpp-mpred-3.2.0 produces here -rw-r--r-- 1 a.2.7.co #!/bin/bash -rw-r--r-- 1 a.2.9.co -rw-r--r-- 1 a.4.1.co for g in .1 .2 .4 .7 .9 do; for h in .1 .2 .4 .7 .9 do -rw-r--r-- 1 a.4.2.co ./mpp-submit -S -i Data/p95test.txt -c p95/mu11/configs -rw-r--r-- 1 a.4.4.co a$g$h.out -t .05 -d ./p95/mu11 -rw-r--r-- 1 a.4.7.co -rw-r--r-- 1 a.4.9.co -rw-r--r-- 1 a.7.1.co .predictions -rw-r--r-- 1 a.7.2.co p95test.txt.rmse 12641: Movie: 12641: -rw-r--r-- 1 a.7.4.co 1.22 0: Ans: 1 Pred: 1.22 Error: -rw-r--r-- 1 a.7.7.co 3.65 0.04840 -rw-r--r-- 1 a.7.9.co 1: Ans: 4 Pred: 3.65 Error: 2.55 -rw-r--r-- 1 a.9.1.co 0.12250 4.04 -rw-r--r-- 1 a.9.2.co 2: Ans: 2 Pred: 2.55 Error: 1.85 -rw-r--r-- 1 a.9.4.co 0.30250 -rw-r--r-- 1 a.9.7.co 3: Ans: 4 Pred: 4.04 Error: -rw-r--r-- 1 a.9.9.co 0.00160 12502: -rw-r--r-- 1 a.1.1.out 4: Ans: 2 Pred: 1.85 Error: 4.71 0.02250 -rw-r--r-- 1 a.1.2.out 3.54 Sum: 0.49750 Total: 5 RMSE: 0.315436 -rw-r--r-- 1 a.1.4.out Running RMSE: 0.315436 / 5 predictions 3.87 -rw-r--r-- 1 a.1.7.out 3.33 12502: -rw-r--r-- 1 a.1.9.out Movie: 2.97 0: Ans: 4 Pred: 4.71 Error: -rw-r--r-- 1 a.2.1.out 0.50410 : -rw-r--r-- 1 a.2.2.out . 1: Ans: 5 Pred: 3.54 Error: -rw-r--r-- 1 a.2.4.out 2.13160 10811: -rw-r--r-- 1 a.2.7.out 2: Ans: 5 Pred: 3.87 Error: 1.2769 4.05 -rw-r--r-- 1 a.2.9.out 3: Ans: 3 Pred: 3.33 Error: 3.49 -rw-r--r-- 1 a.4.1.out 0.10890 3.94 4: Ans: 2 Pred: 2.97 Error: -rw-r--r-- 1 a.4.2.out 3.39 0.94090 -rw-r--r-- 1 a.4.4.out Sum: 4.96240 Total: 5 RMSE: 0.996233 -rw-r--r-- 1 a.4.7.out 12069: -rw-r--r-- 1 a.4.9.out : Running RMSE: .738911 /10 predictions 3.20 -rw-r--r-- 1 a.7.1.out Movie: 10811 3.48 -rw-r--r-- 1 a.7.2.out 0: Ans: 5 Pred: 4.05 Error: -rw-r--r-- 1 a.7.4.out 0.90250 1: Ans: 3 Pred: 3.49 Error: -rw-r--r-- 1 a.7.7.out -rw-r--r-- 1 a.7.9.out 0.24010 is a script, createtablermse: 2: Ans: 4 Pred: 3.94In dotouts Error: -rw-r--r-- 1 a.9.1.out #!/bin/bash 0.00360 -rw-r--r-- 1 a.9.2.out 3: Ans: 3 Pred: 3.39for gError: in .1 .2 .4 .7 .9 do; for h in .1 .2 .4 .7 .9 -rw-r--r-- 1 a.9.4.out 0.15210 grep RMSE:\ a$g$h.out >> rmse -rw-r--r-- 1 a.9.7.out Sum: 1.29830 Total: 4 RMSE: 0.569715 -rw-r--r-- 1 a.9.9.out Running RMSE: 0.964397 / 743 preds I copy to src/mpp-mpred-3.2.0/dotouts. Movie: 12069: do Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: Sum: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 10:15 10:15 10:15 10:15 10:15 10:15 10:15 10:16 10:16 10:16 10:16 10:16 10:16 10:16 10:17 10:17 10:17 10:17 10:17 10:17 10:17 10:18 10:18 10:18 10:18 a.1.1 a.1.2 a.1.4 a.1.7 a.1.9 a.2.1 a.2.2 a.2.4 a.2.7 a.2.9 a.4.1 a.4.2 a.4.4 a.4.7 a.4.9 a.7.1 a.7.2 a.7.4 a.7.7 a.7.9 a.9.1 a.9.2 a.9.4 a.9.7 a.9.9 692.82510 691.59330 691.90610 691.90610 691.90610 691.84690 690.47330 691.90610 691.90610 691.90610 693.27970 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 691.90610 and e.g., a.9.9 contains: -rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r-- 1 a.9.9.config 1 hi-a.9.9.txt hi-a.9.9.txt.answers lo-a.9.9.txt lo-a.9.9.txt.answers p95test.txt.predictions p95test.txt.rmse dotouts is a script, createtablejob:#!/bin/bash for g in .1 .2 .4 .7 .9 do; for h in .1 .2 .4 .7 .9 do grep Input:\ \ \ lo a$g$h.out >> job Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Total: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 745 lo-a.1.1.txt lo-a.1.2.txt lo-a.1.4.txt lo-a.1.7.txt lo-a.1.9.txt lo-a.2.1.txt lo-a.2.2.txt lo-a.2.4.txt lo-a.2.7.txt lo-a.2.9.txt lo-a.4.1.txt lo-a.4.2.txt lo-a.4.4.txt lo-a.4.7.txt lo-a.4.9.txt lo-a.7.1.txt lo-a.7.2.txt lo-a.7.4.txt lo-a.7.7.txt lo-a.7.9.txt lo-a.9.1.txt lo-a.9.2.txt lo-a.9.4.txt lo-a.9.7.txt lo-a.9.9.txt RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: RMSE: 0.964348 0.963490 0.963708 0.963708 0.963708 0.963667 0.962710 0.963708 0.963708 0.963708 0.964664 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708 0.963708
View full slide show




In dotouts is a script, createtablermse: In dotouts is a script, createtablejob: #!/bin/bash for g in .1 .2 .4 .7 .9 do for h in .1 .2 .4 .7 .9 do grep RMSE:\ a$g$h.out >> rmse done done #!/bin/bash for g in .1 .2 .4 .7 .9 do for h in .1 .2 .4 .7 .9 do grep Input:\ \ \ lo a$g$h.out >> job done done Sum: 692.82510 Sum: 691.59330 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.84690 Sum: 690.47330 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 693.27970 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Sum: 691.90610 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 Total: 745 RMSE: 0.964348 RMSE: 0.963490 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963667 RMSE: 0.962710 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.964664 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 RMSE: 0.963708 Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: Input: lo-a.1.1.txt lo-a.1.2.txt lo-a.1.4.txt lo-a.1.7.txt lo-a.1.9.txt lo-a.2.1.txt lo-a.2.2.txt lo-a.2.4.txt lo-a.2.7.txt lo-a.2.9.txt lo-a.4.1.txt lo-a.4.2.txt lo-a.4.4.txt lo-a.4.7.txt lo-a.4.9.txt lo-a.7.1.txt lo-a.7.2.txt lo-a.7.4.txt lo-a.7.7.txt lo-a.7.9.txt lo-a.9.1.txt lo-a.9.2.txt lo-a.9.4.txt lo-a.9.7.txt lo-a.9.9.txt
View full slide show




Sampling Distributions A sample mean is the sum of the observations divided by the total number of observations. Sample Mean n x i x i 1 n where xi = observations of a quality characteristic such as time. n = total number of observations x = mean The distribution of sample means can be approximated by the normal distribution. © 2007 Pearson Education
View full slide show




The central limit theorem  Most   population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn’t Normal?  It is a remarkable fact that as the sample size increases, the distribution of sample means changes its shape: it looks less like that of the population and more like a Normal distribution! CENTRAL LIMIT THEOREM  Draw an SRS of size from any population with mean and finite standard deviation . The central limit theorem says that when n is large, the sampling distribution of the sample mean is approximately Normal: is approximately  The central limit theorem allows us to use Normal probability calculations to answer questions about sample means from many observations even when the population distribution is not Normal.
View full slide show




The Central Limit Theorem 14 Most population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn’t Normal? It is a remarkable fact that as the sample size increases, the distribution of sample means changes its shape: it looks less like that of the population and more like a Normal distribution! When the sample is large enough, the distribution of sample means is very close to Normal, no matter what shape the population distribution has, as long as the population has a finite standard deviation. Draw an SRS of size n from any population with mean  and finite standard deviation s . The central limit theorem(CLT) says that when n is large, the sampling distribution of the sample mean x is approximately Normal: æ s ö x is approximately N ç , ÷ è nø
View full slide show




The Sampling Distribution of x 12 When we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts.   The Sampling Distribution of Sample Means Suppose that x is the mean of an SRS of size n drawn from a large population with mean  and standard deviation s . Then : The mean of the sampling distribution of x is x =     The standard deviation of the sampling distribution of x is s sx = n Note: These facts about the mean and standard deviation of x are true no matter what shape the population distribution has.   If individual observations have the N(µ,σ) distribution, then the sample mean of an SRS of size n has the N(µ, σ/√n) distribution regardless of the sample size n.
View full slide show




The normal distribution is very important in statistics when we study sampling distributions. sampling distribution of x a probability distribution for all values of y that are possible with (random) samples of size n parent population the population from which a sample is to be selected x is used to represent the mean of the population consisting of the possible values of x from a random sample size n, that is, the mean of the sampling distribution of x with random samples of size n. x is used to represent the standard deviation of the population consisting of the possible values of x from a random sample size n, that is, the standard deviation of the sampling distribution of x with random samples of size n. x is called the standard error of the mean. Class Exercise #4 illustrates why the normal distribution is so important when we study sampling distributions:
View full slide show




Stat Trek Sampling Distributions Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a  statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution. And the standard deviation of this statistic is called the standard error. Variability of a Sampling Distribution The variability of a sampling distribution is measured by its variance or its standard deviation. The variability of a sampling distribution depends on three factors: N: The number of observations in the population. n: The number of observations in the sample. The way that the random sample is chosen. If the population size is much larger than the sample size, then the sampling distribution has roughly the same standard error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/20) of the population size, the standard error will be meaningfully smaller, when we sample without replacement.
View full slide show




 The sampling distribution of    When    we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts. MEAN AND STANDARD DEVIATION OF A SAMPLE MEAN Suppose that is the mean of an SRS of size drawn from a large population with mean and standard deviation . Then the sampling distribution of has mean and standard deviation . We say the statistic is an unbiased estimator of the parameter . Because it’s standard deviation is , the averages are less variable than individual observations, and the results of large samples are less variable than the results of small samples. SAMPLING DISTRIBUTION OF A SAMPLE MEAN If individual observations have the distribution, then the sample mean of an SRS of size has the distribution.
View full slide show




Stat Trek Sampling Distribution of the Mean Suppose we draw all possible samples of size n from a population of size N. Suppose further that we compute a mean score for each sample. In this way, we create a sampling distribution of the mean. We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μ x) is equal to the mean of the population (μ). And the standard error of the sampling distribution (σ x) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n). These relationships are shown in the equations below: μx = μ      and      σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ] In the standard error formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard error formula can be approximated by: σx = σ / sqrt(n). You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.
View full slide show