Slide #1.

Evaluating What’s Been Learned
More slides like this


Slide #2.

Cross-Validation • Foundation is a simple idea – “holdout” – holds out a certain amount for testing and uses rest for training • Separation should NOT be “convenience”, – Should at least be random – Better – “stratified” random – division preserves relative proportion of classes in both training and test data • Enhanced : repeated holdout – Enables using more data in training, while still getting a good test • 10-fold cross validation has become standard • This is improved if the folds are chosen in a “stratified” random way
More slides like this


Slide #3.

For Small Datasets • Leave One Out • Bootstrapping • To be discussed in turn
More slides like this


Slide #4.

Leave One Out • Train on all but one instance, test on that one (pct correct always equals 100% or 0%) • Repeat until have tested on all instances, average results • Really equivalent to N-fold cross validation where N = number of instances available • Plusses: – Always trains on maximum possible training data (without cheating) – Efficient to run – no repeated (since fold contents not randomized) – No stratification, no random sampling necessary • Minuses – Guarantees a non-stratified sample – the correct class will always be at least a little bit under-represented in the training data – Statistical tests are not appropriate
More slides like this


Slide #5.

Bootstrapping • Sampling done with replacement to form a training dataset • Particular approach – 0.632 bootstrap – – – – Dataset of n instances is sampled n times Some instances will be included multiple times Those not picked will be used as test data On large enough dataset, .632 of the data instances will end up in the training dataset, rest will be in test • This is a bit of a pessimistic estimate of performance, since only using 63% of data for training (vs 90% in 10-fold cross validation) • May try to balance by weighting in performance predicting training data (p 129) • This procedure can be repeated any number of times, allowing statistical tests
More slides like this


Slide #6.

Counting the Cost • Some mistakes are more costly to make than others • Giving a loan to a defaulter is more costly than denying somebody who would be a good customer • Sending mail solicitation to somebody who won’t buy is less costly than missing somebody who would buy (opportunity cost) • Looking at a confusion matrix, each position could have an associated cost (or benefit from correct positions) • Measurement could be average profit/ loss per prediction • To be fair in cost benefit analysis, should also factor in cost of collecting and preparing the data, building the model …
More slides like this


Slide #7.

Information Retrieval (IR) Measures • IR community has developed 3 measures: – Recall = number of documents retrieved that are relevant total number of documents that are relevant – Precision = number of documents retrieved that are relevant total number of documents that are retrieved – F-measure = 2 * recall * precision recall + precision
More slides like this


Slide #8.

WEKA • • Part of the results provided by WEKA (that we’ve ignored so far) Let’s look at an example (Naïve Bayes on my-weather-nominal) === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall 0.667 0.125 0.8 0.667 0.875 0.333 0.778 0.875 === Confusion Matrix === a b <-- classified as 4 2 | a = yes 1 7 | b = no F-Measure 0.727 0.824 Class yes no • TP rate and recall are the same = TP / (TP + FN) – For Yes = 4 / (4 + 2); For No = 7 / (7 + 1) • FP rate = FP / (FP + TN) – For Yes = 1 / (1 + 7); For No = 2 / (2 + 4) • Precision = TP / (TP + FP) – For yes = 4 / (4 + 1); For No = 7 / (7 + 2) • F-measure = 2TP / (2TP + FP + FN) – For Yes = 2*4 / (2*4 + 1 + 2) = 8 / 11 – For No = 2 * 7 / (2*7 + 2 + 1) = 14/17
More slides like this


Slide #9.

WEKA – with more than two classes • Contact Lenses with Naïve Bayes === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall 0.8 0.053 0.8 0.8 0.25 0.1 0.333 0.25 0.8 0.444 0.75 0.8 === Confusion Matrix === a b c <-- classified as 4 0 1 | a = soft 0 1 3 | b = hard 1 2 12 | c = none F-Measure 0.8 0.286 0.774 Class soft hard none • Class exercise – show how to calculate recall, precision, fmeasure for each class
More slides like this


Slide #10.

Applying Action Rules to change Detractor to Passive /Accuracy- Precision, Coverage- Recall/ Let’s assume that we built action rules from the classifiers for Promoter & Detractor. The goal is to change Detractors -> Promoters The confidence of action rule – 0.993 * 0.849 = 0.84 Our action rule can target only 4.2 (out of 10.2) detractors. So, we can expect 4.2*0.84 = 3.52 detractors moving to the promoter status
More slides like this