An Example Data Set and Decision Tree # Attribute Outlook Company Sailboat 1 sunny big small 2 sunny med small 3 sunny med big 4 sunny no small 5 sunny big big 6 rainy no small 7 rainy med small 8 rainy big big 9 rainy no big 10 rainy med big Class Sail? yes yes yes yes yes no yes yes no no outlook sunny rainy yes company no med no sailboat small yes big no big yes
Analysis of Severe Trauma Patients Data PH_ICU <7.2 The worst pH value at ICU >7.33 7.2-7.33 Death 0.0 (0/15) APPT_WORST <78.7 >=78.7 Well 0.82 (9/11) Death 0.0 (0/7) Well 0.88 (14/16) The worst active partial thromboplastin time PH_ICU and APPT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al., 1997.
Breast Cancer Recurrence Degree of Malig <3 Tumor Size < 15 no rec 125 recurr 39 Age no rec 4 recurr 1 >= 15 >= 3 Involved Nodes <3 no rec 30 recurr 18 >= 3 recurr 27 no_rec 10 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists
Prostate cancer recurrence Secondary SecondaryGleason GleasonGrade Grade 1,2 No No 4 5 PSA PSALevel Level 14.9 No No 3 14.9 Stage Stage 4 No No Yes Yes T1ab,T3 T1c,T2a, T2b,T2c Primary PrimaryGleason GleasonGrade Grade 2,3 Yes Yes No No Yes Yes
TDIDT Algorithm • Also known as ID3 (Quinlan) • To construct decision tree T from learning set S: – If all examples in S belong to some class C Then make leaf labeled C – Otherwise • select the “most informative” attribute A • partition S according to A’s values • recursively construct subtrees T1, T2, ..., for the subsets of S
Another Example # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Attribute Temperature Humidity hot high hot high hot high moderate high cold normal cold normal cold normal moderate high cold normal moderate normal moderate normal moderate high hot normal moderate high Windy no yes no no no yes yes no no no yes yes no yes Class Play N N P P P N P N P P P P P N
Complicated Tree Temperature hot col d moderate Outlook sunny P overcas t P Outlook rainy sunny Windy yes N no P Windy yes P N rain y overcas t P no Windy yes Humidity high N Humidity normal Windy yes N P no P no sunny N high normal Outlook P overcast P rainy null
Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity – – – – – Information-theoretic Gini index X2 ReliefF ... • Various improvements – probability estimates – normalization – binarization, subsetting
Information-Theoretic Approach • To classify an object, a certain information is needed – I, information • After we have learned the value of attribute A, we only need some remaining amount of information to classify the object – Ires, residual information • Gain – Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain
Triangles and Squares # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Color green green yellow red red red green green yellow red green yellow yellow red Attribute Outline dashed dashed dashed dashed solid solid solid dashed solid solid solid dashed solid dashed Shape Dot no yes no no no yes no no yes no yes yes no yes triange triange square square square triange square triange square square square square square triange
Triangles and Squares # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Color green green yellow red red red green green yellow red green yellow yellow red Attribute Outline dashed dashed dashed dashed solid solid solid dashed solid solid solid dashed solid dashed Shape Dot no yes no no no yes no no yes no yes yes no yes triange triange square square square triange square triange square square square square square triange Data Set: A set of classified objects . . . . . .
Information Gain of The Attribute • Attributes – Gain(Color) = 0.246 – Gain(Outline) = 0.151 – Gain(Dot) = 0.048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity)
A Defect of Ires • Ires favors attributes with many values • Such attribute splits S to many subsets, and if these are small, they will tend to be pure anyway • One way to rectify this is through a corrected measure of information gain ratio.
Three Impurity Measures A Gain(A) GainRatio(A) GiniGain(A) Color 0.247 0.156 0.058 Outline 0.152 0.152 0.046 Dot 0.048 0.049 0.015 • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly