Slide #1.

Induction of Decision Trees Blaž Zupan and Ivan Bratko magix.fri.uni-lj.si/predavanja/uisp
More slides like this


Slide #2.

An Example Data Set and Decision Tree # Attribute Outlook Company Sailboat 1 sunny big small 2 sunny med small 3 sunny med big 4 sunny no small 5 sunny big big 6 rainy no small 7 rainy med small 8 rainy big big 9 rainy no big 10 rainy med big Class Sail? yes yes yes yes yes no yes yes no no outlook sunny rainy yes company no med no sailboat small yes big no big yes
More slides like this


Slide #3.

Classification # 1 2 Attribute Outlook Company Sailboat sunny no big rainy big small Class Sail? ? ? outlook sunny rainy yes company no med no sailboat small yes big no big yes
More slides like this


Slide #4.

Induction of Decision Trees • Data Set (Learning Set) – Each example = Attributes + Class • Induced description = Decision tree • TDIDT – Top Down Induction of Decision Trees • Recursive Partitioning
More slides like this


Slide #5.

Some TDIDT Systems • • • • • • • ID3 (Quinlan 79) CART (Brieman et al. 84) Assistant (Cestnik et al. 87) C4.5 (Quinlan 93) See5 (Quinlan 97) ... Orange (Demšar, Zupan 98-03)
More slides like this


Slide #6.

Analysis of Severe Trauma Patients Data PH_ICU <7.2 The worst pH value at ICU >7.33 7.2-7.33 Death 0.0 (0/15) APPT_WORST <78.7 >=78.7 Well 0.82 (9/11) Death 0.0 (0/7) Well 0.88 (14/16) The worst active partial thromboplastin time PH_ICU and APPT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al., 1997.
More slides like this


Slide #7.

Breast Cancer Recurrence Degree of Malig <3 Tumor Size < 15 no rec 125 recurr 39 Age no rec 4 recurr 1 >= 15 >= 3 Involved Nodes <3 no rec 30 recurr 18 >= 3 recurr 27 no_rec 10 no rec 32 recurr 0 Tree induced by Assistant Professional Interesting: Accuracy of this tree compared to medical specialists
More slides like this


Slide #8.

Prostate cancer recurrence Secondary SecondaryGleason GleasonGrade Grade 1,2 No No 4 5 PSA PSALevel Level 14.9 No No 3 14.9 Stage Stage 4 No No Yes Yes T1ab,T3 T1c,T2a, T2b,T2c Primary PrimaryGleason GleasonGrade Grade 2,3 Yes Yes No No Yes Yes
More slides like this


Slide #9.

TDIDT Algorithm • Also known as ID3 (Quinlan) • To construct decision tree T from learning set S: – If all examples in S belong to some class C Then make leaf labeled C – Otherwise • select the “most informative” attribute A • partition S according to A’s values • recursively construct subtrees T1, T2, ..., for the subsets of S
More slides like this


Slide #10.

TDIDT Algorithm • Resulting tree T is: Attribute A A v1 T1 v2 T2 vn A’s values Tn Subtrees
More slides like this


Slide #11.

Another Example # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Attribute Temperature Humidity hot high hot high hot high moderate high cold normal cold normal cold normal moderate high cold normal moderate normal moderate normal moderate high hot normal moderate high Windy no yes no no no yes yes no no no yes yes no yes Class Play N N P P P N P N P P P P P N
More slides like this


Slide #12.

Simple Tree Outlook sunny Humidity high N overcast rainy P Windy normal P yes N no P
More slides like this


Slide #13.

Complicated Tree Temperature hot col d moderate Outlook sunny P overcas t P Outlook rainy sunny Windy yes N no P Windy yes P N rain y overcas t P no Windy yes Humidity high N Humidity normal Windy yes N P no P no sunny N high normal Outlook P overcast P rainy null
More slides like this


Slide #14.

Attribute Selection Criteria • Main principle – Select attribute which partitions the learning set into subsets as “pure” as possible • Various measures of purity – – – – – Information-theoretic Gini index X2 ReliefF ... • Various improvements – probability estimates – normalization – binarization, subsetting
More slides like this


Slide #15.

Information-Theoretic Approach • To classify an object, a certain information is needed – I, information • After we have learned the value of attribute A, we only need some remaining amount of information to classify the object – Ires, residual information • Gain – Gain(A) = I – Ires(A) • The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain
More slides like this


Slide #16.

Entropy • The average amount of information I needed to classify an object is given by the entropy measure • For a two-class problem: entropy p(c1)
More slides like this


Slide #17.

Residual Information • After applying attribute A, S is partitioned into subsets according to values v of A • Ires is equal to weighted sum of the amounts of information for the subsets
More slides like this


Slide #18.

Triangles and Squares # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Color green green yellow red red red green green yellow red green yellow yellow red Attribute Outline dashed dashed dashed dashed solid solid solid dashed solid solid solid dashed solid dashed Shape Dot no yes no no no yes no no yes no yes yes no yes triange triange square square square triange square triange square square square square square triange
More slides like this


Slide #19.

Triangles and Squares # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Color green green yellow red red red green green yellow red green yellow yellow red Attribute Outline dashed dashed dashed dashed solid solid solid dashed solid solid solid dashed solid dashed Shape Dot no yes no no no yes no no yes no yes yes no yes triange triange square square square triange square triange square square square square square triange Data Set: A set of classified objects . . . . . .
More slides like this


Slide #20.

Entropy . . . . . • 5 triangles • 9 squares • class probabilities . • entropy
More slides like this


Slide #21.

Entropy reduction by data set partitioni ng . . . . . . . . red Color? green . yellow . . .
More slides like this


Slide #22.

Entropija vrednosti atributa . . . . . . . . red Color? green yellow . . . .
More slides like this


Slide #23.

Information Gain . . . . . . . . red Color? green yellow . . . .
More slides like this


Slide #24.

Information Gain of The Attribute • Attributes – Gain(Color) = 0.246 – Gain(Outline) = 0.151 – Gain(Dot) = 0.048 • Heuristics: attribute with the highest gain is chosen • This heuristics is local (local minimization of impurity)
More slides like this


Slide #25.

. . . . . . . . red Color? green . . yellow . . Gain(Outline) = 0.971 – 0 = 0.971 bits Gain(Dot) = 0.971 – 0.951 = 0.020 bits
More slides like this


Slide #26.

. . . . . . red Color? green . . . . Gain(Outline) = 0.971 – 0.951 = 0.020 bits Gain(Dot) = 0.971 – 0 = 0.971 bits yellow . . solid Outline? dashed . .
More slides like this


Slide #27.

. . . . . . . . red . yes Dot? Color? . no green . . yellow . . solid Outline? dashed . .
More slides like this


Slide #28.

Decision Tree . . . . . . Color red Dot yes triangle yellow green square no square Outline dashed triangle solid square
More slides like this


Slide #29.

A Defect of Ires • Ires favors attributes with many values • Such attribute splits S to many subsets, and if these are small, they will tend to be pure anyway • One way to rectify this is through a corrected measure of information gain ratio.
More slides like this


Slide #30.

Information Gain Ratio • I(A) is amount of information needed to determine the value of an attribute A • Information gain ratio
More slides like this


Slide #31.

Information Gain Ratio . . . . . . . . red Color? green yellow . . . .
More slides like this


Slide #32.

Information Gain and Information Gain Ratio A |v(A)| Gain(A) GainRatio(A) Color 3 0.247 0.156 Outline 2 0.152 0.152 Dot 2 0.048 0.049
More slides like this


Slide #33.

Gini Index • Another sensible measure of impurity (i and j are classes) • After applying attribute A, the resulting Gini index is • Gini can be interpreted as expected error rate
More slides like this


Slide #34.

Gini Index . . . . . .
More slides like this


Slide #35.

Gini Index for Color . . . . . . . . red Color? green yellow . . . .
More slides like this


Slide #36.

Gain of Gini Index
More slides like this


Slide #37.

Three Impurity Measures A Gain(A) GainRatio(A) GiniGain(A) Color 0.247 0.156 0.058 Outline 0.152 0.152 0.046 Dot 0.048 0.049 0.015 • These impurity measures assess the effect of a single attribute • Criterion “most informative” that they define is local (and “myopic”) • It does not reliably predict the effect of several attributes applied jointly
More slides like this