Slide #1.

X o d = Fd(X)=DPPd(X) FAUST CLUSTER x1 x2 : xN Where are we at? Perfecting FAUST Clustering (using distance dominated functional gap analysis). Primary functional is DPPd(x). Optimizes the variance of F(X)? Why? Because if there is low dispersion, there can't be lots of large gaps. But, just because there is high dispersion, does not mean there IS a large gap (simple example follows). =  Xj2 dj2 +2j
More slides like this


Slide #2.

akk=(104320, 30955, 605471, 4683) d=HillClimbedUntGradient ( 0, 0, 1, 0 ) F(d)=11376 (-.15,-.1,.93,-.32) (-.2, -.1, .91,-.35) (-.24,-.1, .9,-.36) (-.27,-.1,.88,-.37) ( -.3 ,-.1,.87,-.37) (-.33,-.09,.86,-.37) (-.33,-.09,.86,-.37) F(d)=12002 F(d)=12318 F(d)=12493 F(d)=12598 F(d)=12665 F(d)=12708 F(d)=12736 d=(0010)(-.33,-.09,.86,.37) FX Ct gp4 (F-MN)/8 0 1 2 2 1 2 4 2 2 6 2 2 8 1 1 9 1 1 10 1 1 11 1 2 13 2 2 15 1 2 ___ 17 1 ___5 [0,22) 0L 14M 0H C1 22 1 1 23 3 1 24 1 1 25 1 1 26 3 1 27 1 1 28 2 1 29 2 1 30 3 1 31 2 1 32 1 1 33 1 1 34 1 1 35 2 1 36 8 1 37 7 1 38 7 1 39 3 1 40 3 1 41 3 1 42 2 2 44 7 1 45 3 1 46 4 1 47 8 1 48 7 1 49 5 1 50 6 1 51 10 1 52 6 1 53 2 1 54 6 1 55 3 1 56 1 1 57 3 2 59 1 2 61 2 1 62 2 1 63 1 2 ___ ___ [22,66) 43L 38M 55H C2 106405 30207 613481 3653 akk d1 d2 d3 d4 V(d) 0.00 0.00 1.00 0.00 9430 -0.03 -0.09 0.98 -0.16 10581 -0.06 -0.10 0.96 -0.24 10991 -0.09 -0.10 0.95 -0.28 11178 -0.12 -0.10 0.94 -0.31 11281 -0.15 -0.10 0.93 -0.32 11347 -0.18 -0.10 0.92 -0.32 11395 -0.22 -0.10 0.92 -0.32 11432 -0.25 -0.09 0.91 -0.32 11461 -0.30 -0.09 0.89 -0.32 11486 -0.30 -0.09 0.89 -0.32 11506 -0.33 -0.09 0.88 -0.32 11522 -0.36 -0.09 0.87 -0.32 11535 -0.38 -0.08 0.86 -0.32 11545 -0.40 -0.08 0.85 -0.32 11552 -0.42 -0.08 0.85 -0.32 11557 -0.44 -0.08 0.84 -0.31 11559 -0.46 -0.08 0.83 -0.31 11560 -0.48 -0.08 0.82 -0.31 11559 C2X Ct gp4 (F-MN)/8 0 1 1 1 1 2 3 1 1 4 2 2 6 1 1 7 1 2 9 1 2 11 1 3 14 1 ___ 1 [0,19) 0L 10M 1H C2.1 ___ 15 1 4 19 2 1 20 1 1 21 1 1 22 1 1 23 2 1 24 1 1 25 3 1 26 2 1 27 2 1 28 2 1 29 2 1 30 2 1 31 1 1 32 6 1 33 8 1 34 7 1 35 3 1 36 3 1 37 2 1 38 2 1 39 3 2 41 5 1 42 3 1 43 7 1 44 4 1 45 6 1 46 3 1 47 5 1 48 3 1 49 4 1 50 7 1 51 6 1 52 4 1 53 2 1 54 2 2 56 1 1 57 2 2 59 4 ___ ___1 [19,61) 43L 28M 54H C2.2 104883 29672 618463 2618 akk d1 d2 d3 d4 V(d) 0 0 1 0 8233 0.01 -0.08 0.99 -0.11 8757 0.02 -0.09 0.98 -0.15 8862 0.03 -0.09 0.98 -0.16 8894 0.05 -0.09 0.98 -0.17 8906 0.06 -0.10 0.98 -0.17 8915 0.08 -0.10 0.98 -0.17 8922 0.10 -0.10 0.98 -0.17 8932 0.13 -0.10 0.97 -0.16 8945 0.16 -0.10 0.97 -0.16 8962 0.20 -0.10 0.96 -0.16 8986 0.24 -0.10 0.95 -0.15 9019 0.29 -0.10 0.94 -0.15 9063 0.35 -0.10 0.92 -0.14 9121 0.41 -0.10 0.90 -0.13 9195 0.48 -0.10 0.86 -0.12 9286 0.55 -0.10 0.82 -0.11 9393 0.62 -0.10 0.77 -0.10 9513 0.69 -0.10 0.71 -0.08 9638 0.76 -0.09 0.64 -0.07 9761 0.81 -0.09 0.58 -0.05 9873 0.86 -0.08 0.51 -0.04 9969 0.89 -0.08 0.45 -0.03 10047 0.92 -0.07 0.39 -0.02 10108 0.94 -0.07 0.33 -0.00 10153 0.96 -0.07 0.29 0.00 10186 0.97 -0.06 0.25 0.01 10209 0.98 -0.06 0.21 0.02 10226 0.98 -0.06 0.18 0.02 10237 0.99 -0.06 0.16 0.03 10244 0.99 -0.05 0.14 0.03 10249 0.99 -0.05 0.12 0.04 10252 0.99 -0.05 0.11 0.04 10254 0.99 -0.05 0.10 0.04 10255 0.99 -0.05 0.09 0.04 10256 0.99 -0.05 0.08 0.04 10256 C2.2X Ct gp3 (F-MN)/8 0 2 1 1 1 2 3 3 1 4 2 1 5 1 1 6 4 1 7 4 1 8 3 1 9 5 1 10 2 1 11 7 1 12 3 1 13 3 1 14 3 1 ___ 15 2 ___ 3 [0,18) 32L 13M 0H C2.2.1 18 1 1 19 5 1 20 3 1 21 1 1 22 3 1 23 4 1 24 2 1 25 1 2 27 1 1 28 15 1 29 5 2 31 3 1 32 3 1 33 1 1 34 2 2 36 8 1 ___ 37 10 ___ 3 [18,40) 11L 10M 47H C2.2.2 40 1 1 41 1 1 ___ 42 7 ___ 7 [40,49) 0L 3M 6H C2.2.3 49 1 1 ___ ___ [49,51) 0L 2M 1H C2.2.3 The method fails on CONCRETE4150. On the next slide I investigate whether that failure might become a success if a different starting point is used. I will try using d=akk/| akk| CONC4150 (C,W,FA,A)
More slides like this


Slide #3.

104320 d1 0.17 0.17 0.12 0.06 0.01 -0.05 -0.10 -0.15 -0.19 -0.23 -0.27 -0.30 -0.33 -0.35 -0.38 -0.40 FX Ct 0 1 2 1 4 1 5 1 7 2 8 1 9 1 11 2 12 1 13 1 15 1 ___ 1 17 21 1 23 3 24 1 27 4 28 2 29 3 30 1 31 3 32 1 33 1 34 2 35 3 36 5 37 8 38 7 39 4 40 2 41 5 42 2 43 1 44 1 45 5 46 8 47 2 48 8 49 10 50 3 51 5 52 6 53 5 54 2 55 8 56 3 57 1 58 2 60 2 62 2 63 2 64 2 30955 605471 4683 d2 d3 d4 0.05 0.98 0.01 0.05 0.98 0.01 -0.09 0.98 -0.15 -0.11 0.96 -0.26 -0.11 0.94 -0.32 -0.11 0.93 -0.35 -0.11 0.92 -0.37 -0.10 0.91 -0.38 -0.10 0.90 -0.38 -0.10 0.89 -0.38 -0.10 0.88 -0.38 -0.10 0.87 -0.38 -0.09 0.86 -0.37 -0.09 0.85 -0.37 -0.09 0.85 -0.37 -0.09 0.84 -0.36 gp4 (F-MN)/8 2 2 1 2 1 1 2 1 1 2 2 ___4 [0,21) 0L 14M 1H 2 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 CONCRETE4150. akk V(d) 9327 9327 10920 11594 11973 12210 12373 12489 12575 12638 12684 12717 12741 12757 12767 12773 C1 try starting at d=akk/|akk| From here is fails again. CONC4150 (C,W,FA,A)
More slides like this


Slide #4.

3414 933 1398 144 akk d1 d2 d3 d4 V(d) 0.90 0.24 0.37 0.04 180 0.43 -0.02 0.87 0.24 411 0.27 -0.05 0.93 0.24 408 0.24 -0.05 0.94 0.23 405 FX Ct gp4 (F-MN)/8 0 2 3 3 5 1 4 5 1 5 14 1 6 11 1 7 6 1 8 1 1 9 5 1 10 1 5 ___ ___ 15 1 8 [0,23) 50set 0vers 1virg C1 23 1 2 25 2 2 27 1 2 29 1 1 30 2 1 31 1 1 32 2 1 33 2 1 34 5 1 35 3 1 36 2 1 37 2 1 38 4 1 39 3 1 40 6 1 41 2 1 42 4 1 43 2 1 44 6 1 45 4 1 46 7 1 47 1 1 48 1 1 49 3 1 50 4 1 51 3 1 52 3 1 53 2 1 54 3 1 55 4 1 56 1 1 57 3 2 59 1 1 60 2 1 61 1 2 63 1 2 65 1 1 66 2 2 ___ 68 1 ___ [2369) 0set 50vers 49virg C2 starting at d=akk/|akk| 3925 d1 0.84 0.64 0.84 0.64 0.55 0.52 C2X Ct 0 1 1 2 ___ 3 1 7 2 8 2 9 1 10 2 11 2 12 2 13 5 14 4 15 2 16 2 17 1 18 4 19 1 20 4 21 6 22 4 23 6 24 4 25 3 26 1 27 3 28 2 29 4 30 3 31 2 32 3 33 3 34 1 35 4 36 1 37 1 38 1 39 1 40 1 41 1 ___ 42 1 45 1 46 2 48 2 3939 d1 0.69 0.06 0.05 0.05 824 2408 280 akk d2 d3 d4 V(d) 0.18 0.51 0.06 98 0.14 0.72 0.21 116 0.18 0.51 0.06 98 0.14 0.72 0.21 116 0.13 0.79 0.23 117 0.13 0.81 0.24 116 gp3 (F-MN)/8 1 2 ___4 [0,7) 0set 4vers 0virg C2.1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ___1 4vers 44virg C2.2 3 [7,45) 0set 0set 0vers 5virg C2.3 1 2 1024 d2 0.18 0.12 0.11 0.11 3988 d3 0.70 0.99 0.99 0.99 232 d4 0.04 -0.01 -0.01 -0.01 akk V(d) 8063 12880 12848 12847 3894 828 2384 282 akk d1 d2 d3 d4 V(d) 0.17 0.05 0.98 0.01 55 0.35 0.10 0.90 0.24 72 0.42 0.12 0.85 0.29 76 C21XCt gp2 (F-MN)/8 0 1 1 1 1 1 2 2 1 3 2 1 4 2 1 5 3 1 6 3 1 ___ 7 6 ___2 [0,7) 0set 19vers 1virg C2.2.1 9 2 1 10 3 1 11 2 1 12 3 1 13 4 1 14 4 1 15 4 1 16 5 1 17 6 1 18 4 1 ___ 2 ___ 19 2 [7,21) 0set 32vers 13virg C2.2.1 ___ 21 4 ___2 [21,23) 0set 0vers 4virg C2.2.3 23 7 1 24 1 1 25 4 1 26 1 1 27 3 1 28 3 1 29 1 1 30 2 1 31 1 1 32 1 1 33 1 1 34 1 1 ___ 35 1 ___ [23,36) 0set 0vers 27virg C2.2.4 C21XCt gp2 (F-MN)/8 0 1 13 13 1 2 15 1 1 16 1 1 ___ ___ 17 1 8 [0,7) 0set 5vers 0virg C2.2.1 25 1 1 26 1 2 28 2 4 ___ ___ 32 1 3 [7,35) 0set 8vers 1virg C2.2.1 35 2 2 37 1 1 38 1 1 39 2 1 40 1 4 44 2 5 49 2 2 51 1 5 ___ 56 2 ___ 4 [35,60) 0set 14vers 0virg C2.2.1 60 1 1 61 1 2 63 1 1 64 1 2 ___ 66 2 ___ 3 [60,69) 0set 5vers 1virg C2.2.1 69 2 2 71 1 2 73 1 1 74 1 1 75 1 2 ___ ___ 77 2 6 [69,83) 0set 2vers 6irg C2.2.1 83 1 2 85 3 1 86 1 2 IRIS4150
More slides like this


Slide #5.

median std variance mean consecutive differences avgCD maxCD |mean-median| 0 1 2 3 4 5 6 7 8 9 10 3.16 10.0 5.00 0 0 0 0 0 0 0 0 0 0 10 2.87 8.3 0.91 0 5 5 5 5 5 5 5 5 5 10 2.13 4.5 5.00 0 0 2 2 4 4 6 6 8 8 10 3.20 10.2 4.55 0 0 0 3 3 3 6 6 6 9 10 3.35 11.2 4.18 0 0 0 0 6 6 6 6 9 9 10 3.82 14.6 4.73 0 0 0 0 0 9 9 9 9 9 10 4.57 20.9 5.00 0 0 0 0 0 0 10 10 10 10 10 4.98 24.8 4.55 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 10 5 0 0 0 0 0 0 0 0 5 0 2 0 2 0 2 0 2 0 2 0 0 3 0 0 3 0 0 3 1 0 0 0 6 0 0 0 3 0 1 0 0 0 0 9 0 0 0 0 1 0 0 0 0 0 10 0 0 0 0 1.00 1.00 0.00 1.00 10.00 0.91 1.00 5.00 0.00 1.00 2.00 0.55 1.00 3.00 1.18 1.00 6.00 1.27 1.00 9.00 4.00 1.00 10.00 4.55 Should we maximize variance? MEAN-MEDIAN picks out the last two sequences, which have the best gaps (discounting outlier gaps at the extremes). Sooo... Finding a good unit vector, d, for Dot Product functional, DPP. to maximize gaps Maximize wrt d, |Mean(DPPd(X)) - Median(DPPd(X)| subjected to i=1..n di2= 1 Mean(DPPdX) = (1/N) Mean(DPP  xi,jddX) = ( (1/N)Mean(DPP i=1..N xi,j ) ddjX) =  Xjdj j i=1..N j=1..n j=1..n X X X1...Xj ...Xn x1 x2 . j=1..n But how do we compute Median(DPPd(X) ? We want to use only pTree processing. We want to end up with a formula involving d and numbers only (like the one above for the mean (involves only the vector d and the numbers X1 ,..., Xn ) A heuristic is to substitute the Vector of Medians (VOM) for Median(DPPd(X)??? mm d d1 = DPPd(X) = x1od x2od dn . . xi xN xi,j xiod xNod
More slides like this


Slide #6.

0 1 VOM mean on IRIS150(SL,SW,PL,PW) and 5 1 10 1 k 7 2 11 1 spread out F values using G=(F-minF)*2 (since 13 2 12 2 16 3 13 7 3 PL 17 1 F-minF ranges over [0,60]). 14 12 19 1 15 14 20 2 16 7 21 1 17 4 22 1 18 1 Splitting at the MaxGap=24 first, [76,100] 23 2 _________[0.25) 50 setosa 1 virginica CLUS1 19 2 24 5 30 1 25 1 CLUS2 33 2 26 2 35 2 Split at next hi MaxGap=6, [7,13], [70,76] (outliers) 27 1 36 1 28 2 37 1 29 1 38 1 30 2 39 3 Split at thinning [39,44] 31 1 40 5 32 2 41 3 33 1 42 4 34 2 43 2 35 4 44 4 36 3 45 8 37 2 46 3 _________[0.39) 2 vericolor 44 virginica CLUS_1.1 38 1 47 5 _________[25,49) 46 versicolor 2 virginica CLUS2.1 39 2 48 3 40 1 49 5 CLUS2.2 41 2 50 4 42 1 51 8 43 1 52 2 5 versicolor 3 virginica CLUS_1.2 44 1 _________[49,54) 4 versicolor 17 virginica CLUS2.2.1 53 2 45 . 4 54 2 [45,76) 42 versicolor 1 virginica CLUS_1.3 46 3 CLUS2.2.2 55 3 47 4 56 6 48 3 57 3 49 2 58 3 50 1 59 2 51 2 60 2 53 3 61 3 54 3 63 1 56 3 64 1 57 2 66 1 58 4 67 2 59 1 _________[54,70) 4 versicolor 30 virginica 69 1 60 1 61 1 CLUS2.2 F[49,69] 47_Virginica with 4_Versicolor errors; 63 1 65 1 CLUS2.1 F[25,48] 46_Versicolor with 2_Virginica errors; 66 1 68 1 70 2 CLUS1 F[0, 25] 50_Setosa with 1 Virginica error. _________[0.88) 50 versicolor 48 virginica CLUS_1 76 1 100 1 So the classification accuracy is 143/150 or 95.3% 50 setosa 2 virginica CLUS_2 102 1 103 1 105 1 107 3 d=VOMMN DPP on IRIS_150_SEI_(SL,SW,PL,PW). 108 6 109 2 110 8 CLUS1.1 F[0,39] 44_Virginica with 2_Versicolor errors; 111 8 112 5 CLUS1.3 F[45,76] 42_Versicolor with 1_Virginica error; 113 7 114 1 CLUS2 F[80,120] 50_Setosa with 2_Virginica errors; and 115 3 116 1 CLUS1.2 F[39,44], if classified as 5_Versicolor has 3_Virginica errors. 117 1 119 1 120 1 So the classification accuracy is 142/150 or 94.6% Choosing d=e to be the k w max std (Here d=e = e on IRIS150 IRIS
More slides like this


Slide #7.

VOMmean on WINE150hl(FA,FS,TS,AL) 0 1 1 4 2 4 3 5 4 3 5 8 6 7 7 1 8 7 _________[0 9 2 . 10) 42 low 0 high CLUS_1.1.1.1.1.1 _________[0 . 19) 56 low 10 5 [10,16) 15 low 12 high CLUS 1.1.1.1.1.2 [19,22) 1 low 12 high 11 4 But no algorithm would pick 19 as a cut! ___12 _____[0 . 13) 56 low 5 1 low 12 high But no alg would pick a 13 cut! 13 7 [13,16) 14 3 _________[0 15 3 . 16) 57 low 12 high CLUS 1.1.1.1.1 35 high CLUS 1.1.1.1.2 17 2 [16,31) 0 low 18 5 19 4 20 3 21 1 22 4 23 4 24 5 25 1 26 2 27 1 29 2 30 1 31 1 _________[0 . 31) 57 low 47 high CLUS_1.1.1.1.1 32 1 28 high CLUS 1.1.1.1.2 34 1 [31,58) 0 low 36 1 37 3 38 1 39 2 40 1 43 6 44 4 46 2 47 1 48 1 50 1 52 1 _________[0 .58) 57 low 75 high CLUS_1.1.1.1 56 1 5 high CLUS 1.1.1.2 60 1 [58,70) 0 low 63 1 65 2 _________[0 .70) 57 low 80 high CLUS_1.1.1 67 1 [70,78) 0 low 2 high CLUS 1.1.2 72 1 _________[0 .78) 57 low 82 high CLUS_1.1 74 1 82 1 7 high CLUS1.2 83 1 [78,94) 0 low 85 1 86 1 87 2 _________[0.94) 57 low 89 high CLUS_1 88 1 99 1 0 low 4 high CLUS_2 105 1 113 1 119 1 d=VOMMEAN DPP on WINE_150_HL_(FA,FSO2,TSO2,ALCOHOL). Some agglomeration required: CLUS1.1.1.1.1.1 is LOW_Quality F[0,10], else HIGH Quality F[13,119] with 15 LOW error. Classification accuracy = 90% (if it had been cut 13, 99.3% accuracy!) 7 1 8 4 STDs=(1.9,9,23,1.2) 9 4 maxSTD=23 for 10 5 11 4 d=e TS on WN150hl(FA,FS,TS,AL 12 7 13 7 14 8 _________[0 CLUS 1.1.1.1.1.1 15 2 . 16) 42 low 12 high CLUS 1.1.1.1.1.2 16 5 [16,22) 15 low 17 4 18 5 . 19 7 20 3 _________[0 . 22) 57 low 12 high CLUS_1.1.1.1.1 21 3 32 high CLUS 1.1.1.1.2 23 2 [22,33) 0 low 24 9 25 3 26 1 27 4 28 4 29 5 30 1 31 2 _________[0 . 33) 57 low 44 high CLUS_1.1.1.1.1 32 1 31 high CLUS 1.1.1.1.2 34 2 [33,60) 0 low 35 1 36 1 37 1 39 1 41 1 42 3 43 1 44 2 45 1 47 6 48 4 49 2 50 1 51 1 53 1 55 1 _________[0 .60) 57 low 75 high CLUS_1.1.1.1 59 1 5 high CLUS 1.1.1.2 63 1 [60,72) 0 low 65 1 67 2 _________[0.72) 57 low 89 high CLUS1.1.1 69 1 74 1 [72,80] high CLUS1.1 CLUS1.1.2 _________[0.80) 57 low 892 high 75 1 84 1 7 high CLUS1.2 85 1 [80,95] 86 1 87 1 88 2 _________[0.95) 57 low 89 high CLUS1 89 1 100 1 4 high CLUS2 106 1 113 1 119 1 Identical cuts and accuracy! Tells us that d=eTotal_SO2 is responsible for all separation. WINE
More slides like this


Slide #8.

VOMmean w G=DPP(xod*10) SEED4150(AREA,LENKER,ASYMCOEF,LENKERGRV) 0 1 3 1 6 4 9 3 11 1 12 7 14 1 15 3 16 2 _________[0.19) 0 Kama 17 10 20 10 [19,62) 50 Kama _________[19,23) 0 Kama 22 1 24 1 [23,62) 50 Kama 25 1 26 2 27 1 28 4 _________[23,30) 6 Kama 29 1 31 3 [30,62) 44 Kama _________[30,33) 5 Kama 32 3 34 3 [33,62) 39 Kama _________[33,36) 6 Kama 35 4 CLUS_1.2.2.2.2.1 37 2 38 2 [36,62) 33 Kama CLUS_1.2.2.2.2.2 39 3 40 1 41 2 42 4 43 5 _________[36,45) 18 Kama 44 1 46 5 [45,62) 15 Kama 48 2 _________[45,50) 8 Kama 49 2 _________ 51 3 [50,52) 0 Kama 53 1 _________ 54 4 [52,55) 3 Kama 56 3 _________ 57 3 [55,58) 3 Kama CLUS_1.2.2.2.2.2.2.2.2.2.1 59 5 _________[0.62) Kama 1 Kama 60 1 [58,62) 50 1.2.2.2.2.2.2.2.2.2.2 64 1 [62,89) 0 Kama 66 1 67 1 69 1 70 2 71 1 72 1 73 3 75 4 78 8 81 6 84 1 11 errors, 85 2 87 1 88 1 [0,14) 1 Kama 42 Canada But no algorithm would pick 14 as a cut! 0 Rosa 33 Canada 16 Rosa 17 Canada 0 Rosa 11 Canada 16 Rosa 6 Canada CLUS_1.1 CLUS_1.2 CLUS_1.2.1 CLUS_1.2.2 0 Rosa 16 Rosa 0 Rosa 16 Rosa 0 Rosa 4 Canada 2 Canada 1 Canada 1 Canada 1 Canada 18 Kama . CLUS_1.2.2.1 [14,15) CLUS_1.2.2.2 But no algorithm would cut at 15. CLUS_1.2.2.2.1 [15,16) 13 Kama 2 Rosa no alg w cut. CLUS_1.2.2.2.2 [16,17) 7 Kama 6 Risa 16 Rosa 0 Canada 2 Rosa 14 Rosa 0 Canada 0 Canada CLUS_1.2.2.2.2.2.1 CLUS_1.2.2.2.2.2.2 1 Rosa 3 Rosa 2 Rosa 0 Canada 0 Canada 0 Canada CLUS_1.2.2.2.2.2.2.1 CLUS_1.2.2.2.2.2.2.2.1 CLUS_1.2.2.2.2.2.2.2.2.1 3 Rosa 0 Canada 16 Rosa 50 Canada 5 Rosa 0 Canada 34 Rosa 0 Canada [13,14) 10 Kama 8 Canada . That's either 8 or 10 errors and no algorithm would cut at 14. STDs=(2.9, .6, 1.6, .5)) maxSTD=2.9 for e1 d=eA SEED4150(A,LK,AC,LCG) 11 18 12 25 . 13 18 14 18 15 15 _________[0.17) 49 Kama 8 Rosa 50 Canada CLUS_1 16 13 CLUS_2 17 8 [17,22) 1 Kama 42 Rosa 0 Canada 18 8 19 21 20 2 But that's the only thinning! 21 4 Therefore, we are unable to separate Kama and Canada at all. CLUS_1 CLUS CLUS_2 so accuracy = 93% SEEDS
More slides like this


Slide #9.

VOMmean w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag) 0 1 1 1 5 1 6 1 7 1 _________[0,9) 8 4 9 1 [9,14) 10 1 11 2 12 1 13 5 14 1 [14,18) 15 3 16 3 17 4 18 1 [18,23) 19 3 20 9 21 4 22 3 23 7 24 2 [23,31) 25 4 26 8 27 7 28 7 29 10 30 3 31 1 [31,36) 32 3 33 6 34 4 35 5 37 2 [36,39) 38 2 40 1 [39,52) 42 3 43 1 44 1 45 1 46 4 ______ 49 1 56 1 [52,90) 58 1 61 1 65 1 66 1 69 1 71 1 77 1 80 1 83 1 _________[0.90) 86 1 100 1 [90,113) 103 1 105 1 108 2 112 1 e4 accuracy rate = 104/150= 69% 2 Low 1 Low 5 Medium 3 Medium 5 High 5 High CLUS_1.1.1.1.1.1.1.1.1 CLUS_1.1.1.1.1.1.1.1.2 5 Low 2 Medium 4 High 6 Low 1 Medium 13 High CLUS_1.1.1.1.1.1.2 25 Low 4 Medium 19 High CLUS_1.1.1.1.1.2 3 Low 7 Medium 11 High CLUS_1.1.1.1.2 1 Low 1 Medium 2 High 0 Low 12 Medium 1 High 0 Low 11 Medium CLUS_1.1.1.1.1.1.1.2 . 0 High CLUS_1.1.1.2 CLUS_1.1.2 CLUS_1.2 43 Low 46 Medium 55 High CLUS_1 0 Low 6 Medium 0 High CLUS_2 d=e4 Conc4150 ________=0 0 17 ________=1 1 11 ________=3 3 12 ________=6 6 35 _______ ________=13 13 25 _______ 22 25 =22 24 8 =24 44 7 ________=44 67 ________=67 4 89 2 ________=89 91 4 13 Lo 2 Lo 12 Lo 13 Lo 3 Lo 0 Lo 0 Lo 0 Lo 0 Lo 0 Lo 4 Med 9 Med 0 Med 5 Med 3 Med 6 Med 8 Med 7 Med 4 Med 4 Med 0 4 STD=(101,28,99,81) d=e1 Conc4150 6 3 7 2 8 Low 7 Medium 0 High CLUS_1.1.1.1.1.2 12 10 [9,16) 13 2 14 3 CLUS_1.1.1.1.2 18 9 [16,31) 18 Low 11 Medium 0 High 20 4 22 5 23 3 24 5 27 3 1 Low 3 Medium 3 High CLUS_1.1.1.2 31 3 [31,39) . 36 4 5 Low 5 Medium 7 High CLUS_1.1.2 41 2 [39,52) 42 1 43 4 44 3 46 3 48 2 ______ 49 2 55 13 4 Low 17 Medium 38High CLUS_1.2 58 8 [52,80) 60 6 Entirely inconclusive using e ! 62 5 1 65 4 71 16 72 4 ________[0.80) 43Low 46Medium 55High CLUS_1 74 3 7 High CLUS_2 82 4 [80,101) 0 Low 7 Medium 83 7 97 2 100 1 VOM MN 0 Hi 0 Hi 0 Hi 17 Hi 19 Hi 19 Hi 0 Hi 0 Hi 0 Hi 0 Hi CLUS_11 CLUS_10 CLUS_9 CLUS_8 CLUS_7 . CLUS_6 . CLUS_5 CLUS_3 CLUS_1 CLUS_2 d=e3 Conc4150 0 15 3 2 ________ 5 4 17 3 19 8 21 1 29 3 ________ 41 28 46 3 47 8 48 3 52 4 ________ 53 15 58 3 62 4 63 4 64 1 65 7 67 3 69 4 72 3 73 12 75 2 78 5 83 1 100 4 2,4 [0.9) [9,32) 3 Low 1 Low 16 Medium 8 Medium 2 High CLUS_1.1 6 High CLUS_1.2 [0.32) 4 Low 24 Medium 8 High CLUS_1 [32,101) 39 Low 28 Medium 47 High CLUS_2 [32,55) 21 Low 12 Medium 28 High CLUS_2.1 [55,101) 1 Low 8 Medium 6 High CLUS_2.2 0 2 3 4 ________[0,5) 3 Lo 1 Med 2 Hi CLUS_2 8 4 ________=8 0 Lo 1 Med 3 Hi CLUS_3 10 2 ________=10 0 Lo 2 Med 0 Hi CLUS_4 12 8 ________[10,14) 1 Lo 4 Med 9 Hi CLUS_5 13 4 15 4 16 14 17 3 18 1 19 3 ________[14,21) 9 Lo 9 Med 15 Hi CLUS_6 20 8 22 15 ________[21,25) 3 Lo 1 Med 15 Hi CLUS_7 24 4 ________[25,28) 5 Lo 1 Med 3 Hi CLUS_8 27 9 29 3 e2 accuracy= 30 6 31 3 Inconclusive on e2! 93/150=62% 32 3 33 7 34 4 d=e2 ________[28,36) 14 Lo 12 Med 8 Hi CLUS_9 35 8 0 Hi CLUS_10 5 [36.40) 7 Lo 1 Med Conc4150 37 38 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 18 19 20 21 29 30 31 32 34 35 36 62 64 93 121 125 2 5 6 12 2 1 6 6 1 4 12 11 5 3 2 9 10 4 1 4 9 4 9 4 1 2 5 4 2 4 2,4 =0 2L 0M 0H C14 =1 1L 4M 0H C13 [2,4) 11L 6M 1H C12 =4 0L 2M 0H C11 [5,9) 14L 0M 0H C10 [9,11) 3L 0 M 13H C9 =11 =12 =13 [15,25) 5L 5L 0L 2L 2 M 4H 0 M 0H 3 M 0H 4 M 19H C8 C7 C6 C5 [25,33) 0L 0 M 18H C4 [33,50) 0L 14 M 0H C3 [50,93) 0L 7 M 0H C2 [93,m) 0L 10M 0H C1 Accuracy= 127/150=85% CONCRETE
More slides like this


Slide #10.

Concrete4150(C, W, FA, Ag): Redo without cheating ;-) Even though the accuracy is high, no algorithm would make all of those cuts. VOM2,4MN2,4 Cut only at gaps 5 on first round. Then we iteratively repeat on each subcluster. C1 and C2 accuracy=100%, so 0 2 we skip them and concentrate on C3,C4,C5 to see if a second round will purify them. Start with C5: (F-MN)/4 1 5 2 3 4 5 6 7 8 9 10 11 12 13 18 19 20 21 29 30 31 32 34 35 36 62 64 93 121 125 6 12 2 1 6 6 [0,15) 41L 17 M 18H C5 1 4 12 11 5 3 2 9 [15,25) 2L 4 M 19H C4 (gap=5) 10 4 1 4 [25,50) 0L 14 M 18H C3 (gap=8) 9 4 9 4 1 2 [50,78) 0L 7 M 0H C2 (gap=26) 5 4 [78,m) 0L 10M 0H C1 ((gap=29) 2 4 Accuracy=100% on C1 and C2! 2L 0M 0 2 =0 6 3 _______[1,10) 5L 0M 7 2 12 6 13 2 _______[10,16) 10L 0M 14 2 18 1 _______[19,20) 3L 0M 19 3 21 3 23 4 24 2 25 1 26 3 27 1 _______[20,32) 16L 0M 29 2 4L 0M 33 4 = 33 = 37 4L 0M ________ 37 1 41 2[39,45) 0L 8M 42 1 43 2 ________ 44 3 ________ 46 1[45,48) 0L 1M ________ 49 2[48,52) 0L 2M 54 3[52,66) 0L 5M 58 2 60 3 ________ 61 2 66 3[63,73) 0L 0M 69 1 _______ 77 4[77,83) 0L 0M 78 1 0L 0M 87 1 = 87 0L 0M 99 2 = 99 110 1 =110 0L 0M [52,56) 0L 3M 0H ' [56,59) 0L 2M 0H ' [59,66) 0L 0M 5H (These will show up when we get to gaps of 4 and 2 (actual gap sizes 16 and 8) 0H (outliers) 0H C2 0H C5 0H C9 0H C10 0H c6 0H c7 0H c8 0H outlier 0H C3 5H C4 4H C3 5H C1 C3 next: (F-MN)/4 2M 0H (outliers) 0 2 =0 0M 1H (outliers) 7 1 =7 11 1 13 1 _______[9,15) 2M 0H C10 16 1 18 _______[15,20) 1 0M 2H C10 21 1 22 2 23 1 _______[20,24) 0M 4H C10 1M 0H (outliers) ________ 25 1 =25 31 2 [31,44) 1M 6H C4 32 1 (one 31=M so 1 error!) 34 1 36 1 ________ 41 2 47 3[44,61) 1M 4H C3 50 1 (50=M, separated at gap=3) ________ 54 1 68 1[61,83) 2M 1H C2 70 1 (68=H, separated at gap=2) ________ 72 1 94 1 [83,127) 5M 0H C1 107 1 119 1 1 error on C1,2,3,5 121 1 126 1 1H (outlier) 2H (outliers) 1H (outlier) Accuracy=100% on C5! So there is but 1 error (in the C3 step) for an accuracy of 149/150=99.3%. However, I realized I am still cheating ;-( How would I know to do as the first round instead of VOM1,2,3,4MN1,2,3,4 ? VOM2,4MN2,4 I need to redo this using all 4 attributes. Another issue is: How can we follow this with an agglomeration step which might glue the intra-class subclusters back together? Agglomerate after FAUST Gap Clustering using "separation of the subcluster medians" [or means?] as the measure?!?! C4 next: (F-MN)/3 0 1 5 1 10 2 11 1 13 1 18 1 _______ 21 1 [0,24) 0L 0M 8H C2 27 1 29 1 [24,31) 1L 0M 1H C5 (29=L outlier with gaps 2,4) _______ 33 1 [31,38) 0L 0M 3H C6 35 1 _______ 36 1 40 1 [38,45) 0L 0M 3H C4 _______ 42 2 47 1 [45,55) 0L 0M 3H C3 49 1 _______ 51 1 58 1 [55,83) 1L 3M 1H C1 59 1 {58,59} are the L and H: 64 1 doubleton outlier set bdd 66 1 by gaps of 7 and 5 68 1 117 1 =117 0L 1M 0H outlier CONCRETE
More slides like this


Slide #11.

VOMmean w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag) 0 1 1 1 5 1 6 1 7 1 8 4 med=14 9 1 10 1 11 2 12 1 13 5 14 1 15 3 med=18 16 3 17 4 18 1 19 3 20 9 21 4 22 3 23 7 24 2 med=40 25 4 26 8 27 7 28 7 med=56 29 10 30 3 31 1 32 3 33 6 med=61 34 4 35 5 37 2 38 2 40 1 42 3 43 1 44 1 45 1 46 4 ______ CLUS 4 gap=7 49 1 56 1 [52,74) 0L 7M 0H CLUS_3 58 1 61 1 65 1 66 1 69 1 ______ gap=6 71 1 77 1 [74,90) 0L 4M 0H CLUS_2 80 1 83 1 ________ gap=14 86 1[0.90) 43L 46 M 55H 100 1 [90,113) 0L 6M 0H CLUS_1 103 1 105 1 108 2 112 1 _____________At this level, FinalClus1={17M} 0 errors C1 C2 C3 C4 med=10 med=9 med=17 med=21 med=23 med=34 med=33 med=57 med=62 med=71 med=71 med=86 Redo with all 4 attributes and Fgap5 (which is actually gap=5*4=20). CLUS 4 (F=(DPP-MN)/2, Fgap2 _______ 0L 0M 3H CLUS 4.4.1 gap=7 0 3 =0 0L 0M 4H CLUS 4.4.2 gap=2 7 4 =7 9 1 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err H 10 12 11 8 gap=3 12 7 ______ 0L 0M 4H CLUS 4.3.1 gap=3 15 4 =15 18 10 0L 0M 10H CLUS 4.3.2 gap=3 21 3 =18 22 7 ______ 23 2 [20,24) 0L 10M 2H CLUS 4.7.2 gap=2 25 2 [24,30) 10L 0M 0H CLUS_4.7.1 26 3 27 1 28 2 gap=2 29 1 31 3 CLUS 4.2.1 gap=2 32 1 [30,33] 0L 4M 0H Avg=32.3 34 2 0L 2M 0H CLUS 4.2.2 gap=6 40 4 =34 ______ 0L 4M 0H CLUS_4.2.3 gap=7 47 3 =40 52 1 0L 3M 0H CLUS_4.2.4 gap=5 53 3 =47 54 3 55 4 56 2 57 3 ______ gap=2 58 1 [50,59) 12L 1M 4H CLUS 4.8.1 L60 2 8L 0M 0H CLUS_4.8.2 61 2 [59,63) gap=2 62 4 ______ =64 2L 0M 2H CLUS 4.6.1 gap=3 64 4 [66,70) 10L 0M 0H CLUS 4.6.2 67 2 gap=3 68 1 71 7 ______ gap=7 72 3 [70,79) 10L 0M 0H CLUS_4.5 79 5 5L 0M 0H CLUS_4.1.1 gap=6 85 1 =79 87 2 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=0 Avg=0 Median=7 Avg=7 Median=11 Avg=10.7 Median=15 Avg=15 Median=18 Avg=18 Median=22 Avg=22 2H errs in L Median=26 Avg=26 Median=31 Median=34 Avg=34 Median=40 Avg=40 Median=47 Avt=47 Accuracy=90% Median=55 Avg=55 1M+4H errs in Median=61.5 Avg=61.3 Median=64 Avg=64 2 H errs in L Median=67 Avg=67.3 Median=71 Avg=71.7 Median=79 Avg=79 Median=87 Avg=86.3 Let's review agglomerative clustering in general next (dendograms) Agglomerate (build dendogram) by iteratively gluing together clusters with min Median separation. Should I have normalize the rounds? Should I have used the same Fdivisor and made sure the range of values was the same in 2nd round as it was in the 1st round (on CLUS 4)? Can I normalize after the fact, I by multiplying 1st round values by 100/88=1.76? Agglomerate the 1st round clusters and then independently agglomerate 2nd round clusters? CONCRETE
More slides like this


Slide #12.

Hierarchical Clustering DEFG ABC  DE  A  FG  BC F  B  D E   C  Any maximal anti-chain (maximal set of nodes in which no 2 are directly connected) is a clustering (a dendogram offers many). G 
More slides like this


Slide #13.

Hierarchical Clustering But the “horizontal” anti-chains are the clusterings resulting from the top down (or bottom up) method(s).
More slides like this


Slide #14.

VOMmean w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag) 0 1 1 1 5 1 6 1 7 1 8 4 med=14 9 1 10 1 11 2 12 1 13 5 14 1 15 3 med=18 16 3 17 4 18 1 19 3 20 9 21 4 22 3 23 7 24 2 med=40 25 4 26 8 27 7 28 7 med=56 29 10 30 3 31 1 32 3 33 6 med=61 34 4 35 5 37 2 38 2 40 1 42 3 43 1 44 1 45 1 46 4 ______ CLUS 4 gap=7 49 1 56 1 [52,74) 0L 7M 0H CLUS_3 58 1 61 1 65 1 66 1 69 1 ______ gap=6 71 1 77 1 [74,90) 0L 4M 0H CLUS_2 80 1 83 1 ________ gap=14 86 1[0.90) 43L 46 M 55H 100 1 [90,113) 0L 6M 0H CLUS_1 103 1 105 1 108 2 112 1 _____________At this level, FinalClus1={17M} 0 errors C1 C2 C3 C4 med=10 med=9 med=17 med=21 med=23 med=34 med=33 med=57 med=62 med=71 med=71 med=86 CLUS 4 (F=(DPP-MN)/2, Fgap2 _______ 0L 0M 3H CLUS 4.4.1 gap=7 0 3 =0 0L 0M 4H CLUS 4.4.2 gap=2 7 4 =7 9 1 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err H 10 12 11 8 gap=3 12 7 ______ 0L 0M 4H CLUS 4.3.1 gap=3 15 4 =15 18 10 0L 0M 10H CLUS 4.3.2 gap=3 21 3 =18 22 7 ______ 23 2 [20,24) 0L 10M 2H CLUS 4.7.2 gap=2 25 2 [24,30) 10L 0M 0H CLUS_4.7.1 26 3 27 1 28 2 gap=2 29 1 31 3 CLUS 4.2.1 gap=2 32 1 [30,33] 0L 4M 0H Avg=32.3 34 2 0L 2M 0H CLUS 4.2.2 gap=6 40 4 =34 ______ 0L 4M 0H CLUS_4.2.3 gap=7 47 3 =40 52 1 0L 3M 0H CLUS_4.2.4 gap=5 53 3 =47 54 3 55 4 56 2 57 3 ______ gap=2 58 1 [50,59) 12L 1M 4H CLUS 4.8.1 L60 2 8L 0M 0H CLUS_4.8.2 61 2 [59,63) gap=2 62 4 ______ =64 2L 0M 2H CLUS 4.6.1 gap=3 64 4 [66,70) 10L 0M 0H CLUS 4.6.2 67 2 gap=3 68 1 71 7 ______ gap=7 72 3 [70,79) 10L 0M 0H CLUS_4.5 79 5 5L 0M 0H CLUS_4.1.1 gap=6 85 1 =79 87 2 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=0 Avg=0 Median=7 Avg=7 Median=11 Avg=10.7 Median=15 Avg=15 Median=18 Avg=18 Median=22 Avg=22 2H errs in L Median=26 Avg=26 Median=31 Median=34 Avg=34 Median=40 Avg=40 Median=47 Avt=47 Accuracy=90% Median=55 Avg=55 1M+4H errs in Median=61.5 Avg=61.3 Median=64 Avg=64 2 H errs in L Median=67 Avg=67.3 Median=71 Avg=71.7 Median=79 Avg=79 Median=87 Avg=86.3 Suppose we know (or want) 3 clusters, Low, Medium and High Strength. Then we find Suppose we know that we want 3 strength clusters, Low, Medium and High. We can use an antichain that gives us exactly 3 subclusters two ways, one show in brown and the other in purple Which would we choose? The brown seems to give slightly more uniform subcluster sizes. Brown error count: Low (bottom) 11, Medium (middle) 0, High (top) 26, so 96/133=72% accurate. The Purple error count: Low 2, Medium 22, High 35, so 74/133=56% accurate. What about agglomerating using single link agglomeration (minimum pairwise distance? Agglomerate (build dendogram) by iteratively gluing together clusters with min Median separation. Should I have normalize the rounds? Should I have used the same Fdivisor and made sure the range of values was the same in 2nd round as it was in the 1st round (on CLUS 4)? Can I normalize after the fact, I by multiplying 1st round values by 100/88=1.76? Agglomerate the 1st round clusters and then independently agglomerate 2nd round clusters? CONCRETE
More slides like this


Slide #15.

Agglomerating using single link (min pairwise distance = min gap size! (glue min-gap adjacent clusters 1st) CLUS 4 (F=(DPP-MN)/2, Fgap2 _______ 0L 0M 3H CLUS 4.4.1 gap=7 0 3 =0 0L 0M 4H CLUS 4.4.2 gap=2 7 4 =7 9 1 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err H 10 12 11 8 gap=3 12 7 ______ 0L 0M 4H CLUS 4.3.1 gap=3 15 4 =15 18 10 0L 0M 10H CLUS 4.3.2 gap=3 21 3 =18 22 7 ______ 23 2 [20,24) 0L 10M 2H CLUS 4.7.2 gap=2 25 2 [24,30) 10L 0M 0H CLUS_4.7.1 26 3 27 1 28 2 gap=2 29 1 31 3 CLUS 4.2.1 gap=2 32 1 [30,33] 0L 4M 0H Avg=32.3 34 2 0L 2M 0H CLUS 4.2.2 gap=6 40 4 =34 ______ 0L 4M 0H CLUS_4.2.3 gap=7 47 3 =40 52 1 0L 3M 0H CLUS_4.2.4 gap=5 53 3 =47 54 3 55 4 56 2 57 3 ______ gap=2 58 1 [50,59) 12L 1M 4H CLUS 4.8.1 L60 2 8L 0M 0H CLUS_4.8.2 61 2 [59,63) gap=2 62 4 ______ =64 2L 0M 2H CLUS 4.6.1 gap=3 64 4 [66,70) 10L 0M 0H CLUS 4.6.2 67 2 gap=3 68 1 71 7 ______ gap=7 72 3 [70,79) 10L 0M 0H CLUS_4.5 79 5 5L 0M 0H CLUS_4.1.1 gap=6 85 1 =79 87 2 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=0 Avg=0 Median=7 Avg=7 Median=11 Avg=10.7 Median=15 Avg=15 Median=18 Avg=18 Median=22 Avg=22 2H errs in L Median=26 Avg=26 Median=31 Median=34 Avg=34 Median=40 Avg=40 Median=47 Avt=47 Accuracy=90% Median=55 Avg=55 1M+4H errs in Median=61.5 Avg=61.3 Median=64 Avg=64 2 H errs in L Median=67 Avg=67.3 Median=71 Avg=71.7 Median=79 Avg=79 Median=87 Avg=86.3 The first thing we can notice is that outliers mess up agglomerations which are supervised by knowledge of the number of subclusters expected. Therefore we might remove outliers by backing away from all gap5 agglomerations, then looking for a 3 subcluster max anti-chains. What we have done is to declare F<7 and F>84 as extreme tripleton outliers sets; and F=79. F=40 and F=47 as singleton outlier sets because they are F-gapped by at least 5 (which is actually 10) on either side. The brown gives more uniform sizes. Brown errors: Low (bottom) 8, Medium (middle) 12 and High (top) 6, so 107/133=80% accurate. The one decision to agglomerate C4.7.1 to C4.7.2 (gap=3) instead of C4.3.2 to C4.7.2 (gap=3) lots of error. C4.7.1 and C4.7.2 are problematic since they are separate out, but in increasing F order, it's H M L M L, so if we suspected this pattern we would look for 5 subclusters. The 5 orange errors in increasing F-order are: 6, 2, 0, 0, 8 so 127/133=95% accurate. If you have ever studied concrete, you know it is a very complex material. The fact that it clusters out with a F-order pattern of HMLML is just bizarre! So we should expect errors. CONCRETE
More slides like this


Slide #16.

meanVOM w F=(DPP-MN)/4 Concrete4150(C, W, FA, Ag) F Ct gap (gap5) 0 2 1 2 1 ___ 0L 6M 0H 2 2 ___1 [0,20) 24 1 22 25 2 1 ___ 26 1 ___1 [20,30) 0L 4M 0H 45 2 19 46 2 1 47 1 1 ___ ___ 48 2 1 [30,40) 0L 7M 0H 66 7 18 67 2 1 68 1 1 69 3 1 70 9 1 71 2 1 72 7 1 73 1 1 75 1 2 76 1 1 77 5 1 78 2 1 79 7 1 80 8 1 81 2 1 82 1 1 83 3 1 84 14 1 85 1 1 86 6 1 87 16 1 88 7 1 89 1 1 90 4 1 91 6 1 92 4 1 93 8 1 94 3 1 95 1 1 C1 C2 C3 C4 Redo: Weight d with |MN-VOM|/Len C4 Ct gap (range is ~2 so gaps3) 0 1 20 1 20 ___ 1 ___9 [0,30) 0L 3M 0H C41 29 36 1 7 ___ 1 ___1 [30,40) 0L 2M 0H C42 37 ___ 1 ___5 [40,43) 0L 0M 1H C47 42 44 1 2 45 1 1 ___ 2 ___1 [43,48) 0L 4M 0H C48 46 49 2 3 [48,105) 43L 23M 51H C49 51 1 2 53 1 2 54 2 1 56 1 2 57 2 1 58 3 1 59 3 1 60 2 1 61 4 1 62 5 1 63 2 1 64 4 1 65 2 1 66 2 1 67 2 1 68 4 1 69 2 1 70 1 1 71 3 1 72 1 1 74 1 2 75 1 1 76 4 1 77 6 1 79 1 2 80 1 1 81 2 1 83 6 2 84 1 1 85 3 1 86 1 1 87 1 1 88 10 1 89 1 1 90 1 1 91 1 1 92 1 1 93 2 1 94 5 1 95 3 1 96 3 1 97 1 1 98 1 1 99 4 1 100 2 1 102 2 __ 2 ___ . 103 3 __ 1 [105,110) ___ 0L 0M 2H C46 . 107 2 4 [110,115) 0L 1M 0H C45 . ___ 112 1 __ 5 ___ 0L 0M 1H C44 . 118 1 __ 6 [115,120) 0L 2M 0H C43 122 2 4 [120,123) C49=(f-mn)/5 F Ct gap ___ 0 1 ___ 10 1 10 11 1 1 12 1 1 13 3 1 ___ 14 1 ___1 17 2 3 18 1 1 19 1 1 20 2 1 22 4 2 23 3 1 24 5 1 25 4 1 26 2 1 27 4 1 28 1 1 29 6 1 30 2 1 31 2 1 32 1 1 33 1 1 35 1 2 36 1 1 37 3 1 38 2 1 39 4 1 40 2 1 41 2 1 42 2 1 43 2 1 44 5 1 45 2 1 46 1 1 47 3 1 48 4 1 49 6 1 50 3 1 52 2 2 54 8 2 55 2 1 56 2 1 57 2 1 58 1 1 60 3 2 ___ 61 1 ___1 ___ 64 2 ___3 74 1 ___ 10 ___ 76 1 2 (same range so gaps3) [0,5) 0L 1M 0H C491 [5,15) 2L 4M 1H C492 [15,62) 41L 17M 47H C493 [62,70) 0L 0M 2H C494 [70,77) 0L 1M 1H C495 1 err This uncovers the fact that repeated applications of meanVOM can be non-productive when each applications basically removes sets of outliers at the extremes of the F-value array (because when outliers are removed, the VOM may move toward the mean). CONCRETE
More slides like this


Slide #17.

APPENDIX Functional Gap Clustering using Fpq(x)=RND[(x-p)o(q-p)/|q-p|-minF] on Spaeth image (p=avg X x1 x2 1 1 3 1 2 2 3 3 5 2 9 3 15 1 14 2 15 3 13 4 10 9 1110 9 11 1111 7 8 The 15 Value_Arrays (one for each q=z1,z2,z3,...) z1 0 1 2 5 6 10 11 12 14 z2 0 1 2 5 6 10 11 12 14 0 0 0 0 0 0 0 0 1 z1 0 1 0 0 5 0 0 10 0 0 12 1 0 z2z3 0 0 0 2 0 0 6 0 0 11 0 1 14 0 z3 0 0 0 0 0 0 1 0 0 z4 0 1 0 3 0 6 0 10 0 11 1 12 0 14 0 0 z5z4 0 0 0 0 0 1 0 0 0 0 z6 0 1 0 0 0 0 0 0 0 z7 0 2 1 3 0 5 0 6 0 10 0 11 0 12 0 14 z8z5 0 1 1 0 0 0 0 0 0 0 0 z9 0 0 1 0 0 0 0 0 0 za 0 2 1 3 0 7 0 8 0 9 0 10 0 0 zbz6 0 1 0 1 0 0 0 0 0 0 0 zc 0 0 1 0 0 0 0 0 0 zd 1 1 0 2 0 3 0 4 0 6 0 9 0 11 0 12 0 zez7 0 0 0 0 1 0 0 0 0 0 zf pTree z8 Level0, 0 1 stride=z1 2 3 PointSet 4 6 (as 9 a11 12 mask) z9 0 1 2 3 4 6 7 10 12 13 za 0 1 2 3 4 5 7 11 12 13 zb 0 1 2 3 4 6 8 10 11 12 zc 0 1 2 3 5 6 7 zd 0 1 2 3 7 8 9 10 ze 0 1 2 3 5 7 9 11 12 13 zf 0 1 3 5 6 7 8 The FAUST algorithm: 8 9 11 12 13 9 10 11 The 15 Count_Arrays z1 2 2 4 1 1 1 1 2 1 z2 2 2 4 1 1 1 1 2 1 z3 1 5 2 1 1 1 1 2 1 z4 2 4 2 2 1 1 2 1 z5 2 2 3 1 1 1 1 1 z6 2 1 1 1 1 3 3 3 z7 1 4 1 3 1 1 1 2 2 1 1 z8 1 2 3 1 3 1 1 2 1 z9 2 1 1 2 1 3 1 1 2 1 pTree masks of za 2 1 1 1 1 1 4 1 1 2 zb 1 2 1 1 3 2 1 1 1 2 zc 1 1 1 2 2 1 1 1 1 1 zd 3 3 3 1 1 1 1 2 ze 1 1 2 1 3 2 1 1 2 1 zf 1 2 1 1 2 1 2 2 2 1 1 1 2 3 4 5 6 7 8 9 a b c d e f 1 2 3 4 5 6 7 8 9 a b 1=q 3 2 4 5 gap ] =10 F , =6 f : [F 6 : gap p = 2, F = F [ a 5] d b F=2 c e F=1 Fp=MN,q=z1=0 8 7 9 the 3 z1_clusters (obtained by ORing) 2 1. project onto each pq line using the dot product with the unit vector from p to q. 2. Generate ValueArrays (also generate the CountArray and the mask pTrees). 3. Analyze all gaps and create sub-cluster pTree Masks. z11 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 z12 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 z13 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
More slides like this


Slide #18.

Gap Revealer Z p= z1 z2 z3 z4 z5 z6 z7 z8 z9 za zb zc zd ze zf 1 1 3 1 2 2 3 3 6 2 9 3 15 1 14 2 15 3 13 4 10 9 11 10 9 11 11 11 7 8 F=zod 11 27 23 34 53 80 118 114 125 114 110 121 109 125 83 Width  24 so compute all pTree combinations down to p4 and p'4 p6 p5 p4 p3 p2 p1 p0 0 0 0 1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 0 0 1 1 p6' p5' p4' p3' p2' p1' p0' 1 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 p6'p5'p4' 0 0 0 p6' &p5' &p4' 1 0 1 1 1 [000 0000, 000 1111]= 0 [0,15]=[0,16) 1 1 0 1 is a 24 thinning. 0 has 1 point, z1. This 1 0 1 0 from the right 1 0 1 z1od=11 is only 5 units 0 1 1 0 0 an outlier) 0 1 0 edge, so z1 is not declared 0 0 0 0 0 0 0 dis from the 0 0 Next, we check the min 1 interval to see 0 0 0 right edge of the next 0 0 1 1is actually  24 0 0 if z1's right-side gap0 0 min is a pTree 0 1 (the calculation of the 0 1 0 0 0 1 0 process - no x looping required!) C=5 C=3 C=1 p6'p5'p4 p6'p5'p4 0 0 1 1 1 1 [001 0000, 001 1111] 1 1 = [16,32). The 1 1 0 minimum, z3od=23 is 0 7 units from the 1 1 0 left edge, 16, so z1 has 0 only a 5+7=12 0 0 1 1 1 unit gap on its right (not 1 a 24 gap). So 0 0 14 1 0 0 z1 is not declared a 2 1 1 (and is declared a 0 0 1 24 inlier). 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 C=5 C=3 C=2 p6 p5'p4' p6 &p5' &p4' 0 1 0 1 0 [64, 80). 1 0 1 [100 0000 , 100 1111]= 1gap. 0 1 This is clearly a 240 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 1 p6 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 C10 C=2 C=0 p5'p4 d=M-p p6 p5'p4 0 1 0 [101 0000 , 101 1111]= [80, 96). 1 1 z6od=80, zfod=83 0 0 1 X1 X2 dX1X2 0 X1 X2 dX1X2 z7 z8 1.4 1 z9 z10 2.2 0 z7 z9 2.0 1 z9 z11 7.8 0 z7 z10 3.6 1 z9 z12 8.1 0 z7 z11 9.4 1 z9 z13 10.0 0 z7 z12 9.8 1 z9 z14 8.9 0 1 z7 z13 11.7 1 0 z10 z11 5.8 z7 z14 10.8 1 0 z10 z12 6.3 1 z8 z9 1.4 0 z10 z13 8.1 z8 z10 2.2 1 C10 z8 z11 8.1 z10 z14 7.3 C=2 z8 z12 8.5 C=2 z8 z13 10.3 z8 z14 9.5 1 z1 z2 z7 2 z3 z5 z8 3 z4 z6 z9 4 za 5 M 6 7 8 zf 9 zb a zc b zd ze c 0 1 2 3 4 5 6 7 8 9 a b c d e f p6'p5 p4' p6'p5 p4' 1 1 0 0 1 1 0 0 1 1 0 0 [010 0000 , 010 1111] = [32,48). 1 1 0 02 of 32, so z4 1 z4od=34 is within1 0 is not declared an0 0 0 1 1anomaly. 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 0 0 C=5 C=2 C=1 p6'p5 p4 p6'p5 p4 0 1 1 = [ 48, 64). 0 [011 0000, 011 1111] 1 1 0 0 4 1 0 z5od=53 is 19 from1 0z4od=34 (>2 ) 0 but 11 from 64. But1 0 the next int 1 1 1 1 [64,80) is empty z50 1is 27 from its 0 0 0 1 right nbr. z5 is declared 1 an outlier 0 0 1 1 cut thru z5 0 0 1 and we put a subcluster 1 1 1 0 0 0 0 1 1 [112,128) 1 [111 0000 , 111 1111]= 1 0 0 0 0 1 z7od=118 1 od=114 z8 1 1 0 0 za 1 z9od=125 1 od=114 0 0 C=5 zcod=121 zeod=125 C=2 4 C=1 No 2 gaps. But we can consult p6 p5 p4' p6 p5 p4' 0 0 1 1 [110 0000 , 110 0 01111]= [96,112). 0 0 zbod=110, zdod=109. So both 0 0 1 1 {z6,zf} declared outliers (gap16 0 0 1 1 1 1 0 0 both sides. 1 1 0 0 X1 X2 dX1X2 1 1 0 0 1 1 0 0 z11 z12 1.4 1 1 0 0 z11 z13 2.2 1 1 z11 z14 2.2 1 1 0 0 1 1 1 1 z12 z13 2.2 0 0 1 1 0 0 z12 z14 1.0 C10 p6 p5 p4 p6 p5 p4 0 0 0 0 1 1 0 0 1 1 Which reveals that there are 0 0 1 1 4 0 0 1 no 2 gaps in this 1 1 1 0 subcluster. 0 1 1 1 1 1 it reveals 1 And, incidentally, 1 1 1 1 0 a 5.8 gap between 0 {7,8,9,a} 1 and {b,c,d,e} but 1 that 1 1 0 0 1 and the 1 analysis is messy 1 1 0 gap would be revealed 0 by C10 the next x o fM round on this C=8 C=6 sub-cluster anyway. C=8 C=2 z13 z14 2.0 SpS(d2(x,y) for actual distances:
More slides like this


Slide #19.

Barrel Clustering: (This method attempts to build barrel-shaped gaps around clusters) Allows for a better fit around convex clusters that are elongated in one direction (not round). Exhaustive Search for all barrel gaps: It takes two parameters for a pseudo- exhaustive search (exhaustive modulo a grid width). 1. A StartPoint, p (an n-vector, so n dimensional) 2. A UnitVector, d (a n-direction, so n-1 dimensional - grid on the surface of sphere in Rn). q Gaps in dot product lengths [projections] on the line. Then for every choice of (p,d) (e.g., in a grid of points in R2n-1) two functionals are used to enclose subclusters in barrel shaped gaps. a. SquareBarrelRadius functional, SBR(y) = (y-p)o(y-p) - ((y-p)od)2 b. BarrelLength functional, BL(y) = (y-p)od y barrel cap gap width Given a p, do we need a full grid of ds (directions)? No! d and -d give the same BL-gaps. Given d, do we need a full grid of p starting pts? No! All p' s.t. p'=p+cd give same gaps. Hill climb gap width from a good starting point and direction. MATH: Need dot product projection length and dot product projection distance (in red). y yo f |f| dot prod proj len f |f| p barrel radius gap width (yof) f o y - (yof) f = fo f fo f (yof)2 + (yof)2 fof dot product projection distance (yof)2 squared = yoy - 2 fof (f of)2 Squared y fon f Proj Dis = yoy (yof)2 + (yof)2 fo f squared = yoy - 2 fo f fo f y - yo f f |f| |f| y - (yof) f fo f Squared y-p on q-p Projection Distance = (y-p)o(y-p) 1st = yoy -2yop + pop - ( yo(q-p) - p o(q-p |q-p| |q-p| 2 squared is y - ( (y-p)o(q-p) )2 (q-p)o(q-p) For the dot product length projections (caps) we already needed: (y-p)o M-p |M-p| = ( yo(M-p) - po M-p ) |M-p| |M-p| That is, we needed to compute the green constants and the blue and red dot product functionals in an optimal way (and then do the PTreeSet additions/subtractions/multiplications). What is optimal? (minimizing PTreeSet functional creations and PTreeSet operations.)
More slides like this


Slide #20.

Cone Clustering: (finding cone-shaped clusters) x=s1 cone=1/√2 60 61 62 63 64 65 66 67 69 3 4 3 10 15 9 3 1 2 50 x=s2 cone=1/√2 x=s2 cone=.9 47 59 60 61 62 63 64 65 66 67 69 70 59 60 61 62 63 64 65 66 67 69 70 1 2 4 3 6 10 10 5 4 4 1 1 51 2 3 3 5 9 10 5 4 4 1 1 47 w maxs cone=.707 F=(y-M)o(x-M)/|x-M|-mn 08 restricted to a cosine cone 10 12 13 on IRIS x=e1 cone=.707 33 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 54 55 57 58 60 62 63 64 65 1 2 2 3 1 5 4 2 1 1 6 4 5 1 2 5 1 2 2 1 2 1 1 1 1 1 2 60 14 15 16 17 18 19 x=i1 20 cone=.707 21 22 34 1 23 35 1 24 36 2 25 37 2 26 38 3 27 39 5 28 40 4 29 42 6 30 43 2 31 44 7 32 45 5 33 47 2 34 48 3 35 49 3 36 50 3 37 51 4 38 52 3 40 53 2 41 54 2 42 55 4 43 56 2 44 57 1 45 58 1 46 59 1 47 60 1 48 61 1 49 62 1 51 63 1 52 64 1 53 66 1 55 75 2 1 3 2 1 3 1 3 5 3 5 6 2 4 3 3 9 3 3 3 5 3 4 3 2 2 2 4 1 1 1 4 5 5 7 3 1 6 6 2 1 2 1 1 137 x=s2 cone=.1 w maxs-to-mins cone=.939 w naaa-xaaa cone=.95 39 40 41 44 45 46 47 52 59 60 61 62 63 64 65 66 67 69 70 14 16 18 19 20 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 38 39 40 41 46 47 48 49 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 12 1 13 2 14 1 15 2 16 1 17 1 18 4 19 3 20 2 21 3 22 5 23 6 i21 24 5 25 1 27 1 28 1 29 2 30 2 i7 41/43 e so picks e 2 1 1 1 1 1 1 1 i39 2 4 3 6 10 10 5 4 4 1 1 59 w maxs cone=.93 8 1 13 1 14 3 16 2 17 2 18 1 19 3 20 4 21 1 24 1 25 4 26 1 27 2 29 2 37 1 27/29 i10 e21 e34 i7 are i's w maxs cone=.925 8 1 i10 13 1 14 3 16 3 17 2 18 2 19 3 20 4 21 1 24 1 25 5 26 1 e21 e34 27 2 28 1 29 2 31 1 e35 37 1 i7 31/34 are i's 1 i25 1 i40 2 i16 i42 2 i17 i38 2 i11 i48 2 1 4 i34 i50 3 i24 i28 3 i27 5 3 2 2 3 4 3 4 2 2 2 3 1 2 1 2 1 1 i39 1 2 1 1 8 5 4 7 4 5 5 1 3 1 1 1 114 14 i and 100 s/e. So picks i as 0 w xnnn-nxxx cone=.95 8 2 10 2 11 2 12 4 13 2 14 4 15 3 16 8 17 4 18 7 19 3 20 5 21 1 22 1 23 1 34 1 43/50 i22 i50 Gap in dot product projections onto the cornerpoints line. Cosine cone gap (over some  angle) Corner points w aaan-aaax cone=.54 7 3 i27 i28 8 1 9 3 10 12 i20 i34 11 7 12 13 13 5 14 3 15 7 19 1 20 1 21 7 22 7 23 28 24 6 100/104 s or e so 0 picks i Cosine conical gapping seems quick and easy (cosine = dot product divided by both lengths. i28 i24 i27 i34 i39 e so picks out e Length of the fixed vector, x-M, is a one-time calculation. Length y-M changes with y so build the PTreeSet.
More slides like this


Slide #21.

Separate classr, classv using midpoints of means: means FAUST Classifier Pr=P(x d)
More slides like this


Slide #22.

2/2/13 Datamining Big Data big data: up to trillions of rows (or more) and, possibly, thousands of columns (or many more). I structure data vertically (pTrees) and process it horizontally. Looping across thousands of columns can be orders of magnitude faster than looping down trillions of rows. So sometimes that means a task can be done in human time only if the data is vertically organized. Data mining is [largely] CLASSIFICATION or PREDICTION (assigning a class label to a row based on a training set of classified rows). What about clustering and ARM? They are important and related! Roughly clustering creates/improves training sets and ARM is used to data mine more complex data (e.g., relationship matrixes, etc.). CLASSIFICATION is [largely] case-based reasoning. To make a decision we typically search our memory for similar situations (near neighbor cases) and base our decision on the decisions we made in those cases (we do what worked before for us or others). We let near neighbors vote. "The Magical Number Seven, Plus or Minus Two... Information"[2] cited to argue that the number of objects (contexts) an average human can hold in working memory is 7 ± 2. We can think of classification as providing a better 7 (so it's decision support, not decision making). One can say that all Classification methods (even model based ones) are a form of Near Neighbor Classification. E.g. in Decision Tree Induction (DTI) the classes at the bottom of a decision branch ARE the Near Neighbor set due to the fact that the sample arrived at that leaf. Rows of an entity table (e.g., Iris(SL,SW,PL,PW) or Image(R,G,B) describe instances of the entity (Irises or Image pixels). Columns are descriptive information on the row instances (e.g., Sepal Length, Sepal Width, Pedal Length, Pedal Width or Red, Green, Blue photon counts). If the table consists entirely of real numbers, then the row set can be viewed [as s subset of] a real vector space with dimension = # of columns. Then, the notion of "near" [in classification and clustering] can be defined using a dissimilarity (~distance) or a similarity. Two rows are near if the distance between them is low or their similarity is high. Near for columns can be defined using a correlation (e.g., Pearson's, Spearman's...) If the columns also describe instances of an entity then the table is really a matrix or relationship between instances of the row entity and the column entity. Each matrix cell measures some attribute of that relationship pair (The simplest: 1 if that row is related to that column, else 0. The most complex: an entire structure of data describing that pair (that row instance and that column instance). In Market Basket Research (MBR), the row entity is customers and the columnis items. Each cell: 1 iff that customer has that item in the basket. In Netflix Cinematch, the row entity is customers and column movies and each cell has the 5-star rating that customer gave to that movie. In Bioinformatics the row entity might be experiments and the column entity might be genes and each cell has the expression level of that gene in that experiment or the row and column entities might both be proteins and each cell has a 1-bit iff the two proteins interact in some way. In Facebook the rows might be people and the columns might also be people (and a cell has a one bit iff the row and column persons are friends) Even when the table appears to be a simple entity table with descriptive feature columns, it may be viewable as a relationship between 2 entities. E.g., Image(R,B,G) is a table of pixel instances with columns, R,G,B. The R-values count the photons in a "red" frequency range detected at that pixel over an interval of time. That red frequency range is determined more by the camera technology than by any scientific definition. If we had separate CCD cameras that could count photons in each of a million very thin adjacent frequency intervals, we could view the column values of that image as instances a frequency entity, Then the image would be a relationship matrix between the pixel and the frequency entities. So an entity table can often be usefully viewed as a relationship matrix. If so, it can also be rotated so that the former column entity is now viewed as the new row entity and the former row entity is now viewed as the new set of descriptive columns. The bottom line is that we can often do data mining on a table of data in many ways: as an entity table (classification and clustering), as a relationship matrix (ARM) or upon rotation that matrix, as another entity table. For a rotated entity table, the concepts of nearness that can be used also rotate (e.g., The cosine correlation of two columns morphs into the cosine of the angle between 2 vectors as a row similarity measure.)
More slides like this


Slide #23.

DBs, DWs are merging as In-memory DBs: SAP® In-Memory Computing Enabling Real-Time Computing SAP® In-Memory enables real-time computing by bringing together online transaction proc. OLTP (DB) and online analytical proc. OLAP (DW). Combining advances in hardware technology with SAP InMemory Computing empowers business – from shop floor to boardroom – by giving real-time bus. proc. instantaneous access to data-eliminating today’s info lag for your business. In-memory computing is already under way. The question isn’t if this revolution will impact businesses but when/ how. In-memory computing won’t be introduced because a co. can afford the technology. It will be because a business cannot afford to allow its competitors to adopt the it first. Here is sample of what in-memory computing can do for you: • Enable mixed workloads of analytics, operations, and performance management in a single software landscape. • Support smarter business decisions by providing increased visibility of very large volumes of business information • Enable users to react to business events more quickly through real-time analysis and reporting of operational data. • Deliver innovative real-time analysis and reporting. • Streamline IT landscape and reduce total cost of ownership. Product managers will still look at inventory and point-of-sale data, but in the future they will also receive,eg., tell customers broadcast dissatisfaction with a product over Twitter. Or they might be alerted to a negative product review released online that highlights some unpleasant product features requiring immediate action. From the other side, small businesses running real-time inventory reports will be able to announce to their Facebook and Twitter communities that a high demand product is available, how to order, and where to pick up. Bad movies have been able to enjoy a great opening weekend before crashing 2nd weekend when negative word-of-mouth feedback cools enthusiasm. That week-long grace period is about to disappear for silver screen flops. Consumer feedback won’t take a week, a day, or an hour. The very second showing of a movie could suffer from a noticeable falloff in attendance due to consumer criticism piped instantaneously through the new technologies. It will no longer be good enough to have weekend numbers ready for executives on Monday morning. Executives will run their own reports on revenue, Twitter their reviews, and by Monday morning have acted on their decisions. The final example is from the utilities industry: The most expensive energy a utilities provides is energy to meet unexpected demand during peak periods of consumption. If the company could analyze trends in power consumption based on real-time meter reads, it could offer – in real time – extra low rates for the week or month if they reduce their consumption during the following few hours. In manufacturing enterprises, in-memory computing tech will connect the shop floor to the boardroom, and the shop floor associate will have instant access to the same data as the board [[shop floor = daily transaction processing. Boardroom = executive data mining]]. The shop floor will then see the results of their actions reflected immediately in the relevant Key Performance Indicators (KPI). This advantage will become much more dramatic when we switch to electric cars; predictably, those cars are recharged the minute the owners return home from work. Hardware: blade servers and multicore CPUs and memory capacities measured in terabytes. Software: in-memory database with highly compressible row / column storage designed to maximize in-memory comp. tech. SAP BusinessObjects Event Insight software is key. In what used to be called exception reporting, the software deals with huge amounts of realtime data to determine immediate and appropriate action for a real-time situation. [[Both row and column storage! They convert to column-wise storage only for Long-Lived-High-Value data?]] Parallel processing takes place in the database layer rather than in the app layer - as it does in the client-server arch. Total cost is 30% lower than traditional RDBMSs due to: • Leaner hardware, less system capacity req., as mixed workloads of analytics, operations, performance mgmt is in a single system, which also reduces redundant data storage. [[Back to a single DB rather than a DB for TP and a DW for boardroom dec. sup.]] • Less extract transform load (ETL) between systems and fewer prebuilt reports, reducing support required to run sofwr. Report runtime improvements of up to 1000 times. Compression rates of up to a 10 times. Performance improvements expected even higher in SAP apps natively developed for inmemory DBs. Initial results: a reduction of computing time from hours to seconds. However, in-memory computing will not eliminate the need for data warehousing. Real-time reporting will solve old challenges and create new opportunities, but new challenges will arise. SAP HANA 1.0 software supports realtime database access to data from the SAP apps that support OLTP. Formerly, operational reporting functionality was transferred from OLTP applications to a data warehouse. With in-memory computing technology, this functionality is integrated back into the transaction system. Adopting in-memory computing results in an uncluttered arch based on a few, tightly aligned core systems enabled by service-oriented architecture (SOA) to provide harmonized, valid metadata and master data across business processes. Some of the most salient shifts and trends in future enterprise architectures will be: • A shift to BI self-service apps like data exploration, instead of static report solutions. • Central metadata and masterdata repositories that define the data architecture, allowing data stewards to work across all business units and all platforms Real-time in-memory computing technology will cause a decline Structured Query Language (SQL) satellite databases. The purpose of those databases as flexible, ad hoc, more business-oriented, less IT-static tools might still be required, but their offline status will be a disadvantage and will delay data updates. Some might argue that satellite systems with in-memory computing technology will take over from satellite SQL DBs. SAP Business Explorer tools that use in-memory computing technology represent a paradigm shift. Instead of waiting for IT to work on a long queue of support tickets to create new reports, business users can explore large data sets and define reports on the fly.
More slides like this


Slide #24.

set 51 35 14 00010 set 49 30 14 00010 set 47 32 13 00010 set 46 31 15 00010 set 50 36 14 00010 set 54 39 17 00100 set 46 34 14 00011 set 50 34 15 00010 set 44 29 14 00010 set 49 31 15 00001 set 54 37 15 00010 set 48 34 16 00010 set 48 30 14 00001 set 43 30 11 00001 set 58 40 12 00010 set 57 44 15 00100 set 54 39 13 00100 set 51 35 14 00011 set 57 38 17 00011 set 51 38 15 00011 set 54 34 17 00010 set 51 37 15 00100 set 46 36 10 00010 set 51 33 17 00101 set 48 34 19 00010 set 50 30 16 00010 set 50 34 16 00100 set 52 35 15 00010 set 52 34 14 00010 set 47 32 16 00010 set 48 31 16 00010 set 54 34 15 00100 set 52 41 15 00001 ver 56 42 30 14 45 set 55 001 011 01 ver 58 31 27 15 41 set 49 001 0010 ver 62 32 22 12 45 set 50 001 011 01 ver 56 35 25 13 39 set 55 001 00 11 01 ver 59 31 32 15 48 set 49 01 00010 ver 61 30 28 13 40 set 44 001 0101 ver 63 34 25 15 49 set 51 001 011 01 ver 61 35 28 13 47 set 50 001 010 10 ver 64 23 29 13 43 set 45 001 010 11 ver 66 32 30 13 44 set 44 001 011 00 ver 68 35 28 16 48 set 50 00111 00 ver 67 38 30 19 50 set 51 01 00 1001 ver 60 30 29 14 45 set 48 001 0111 57 38 35 ver 58 26 16 40 set 51 00001 010 111 0000 55 32 24 14 38 ver 50 23 33 set 46 00001 010 101 0110 55 37 24 15 37 ver 56 27 42 set 53 00001 010 111 0001 58 33 27 14 39 ver 57 30 42 set 50 ID PL PW SL DPP 2 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1s1 1 151 0 35 0 14 2 60 0 1 1 1 100 2 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 0 1s2 1 149 0 30 0 14 2 59 0 1 1 1 011 2 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1s3 1 047 1 32 0 13 2 60 0 1 1 1 100 2 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1s4 1 146 1 31 0 15 2 58 0 1 1 1 010 2 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 1s5 1 150 0 36 0 14 2 60 0 1 1 1 100 4 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 0s6 0 054 1 39 0 17 4 58 0 1 1 1 010 3 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 0 1s7 1 146 0 34 0 14 3 60 0 1 1 1 100 2 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1s8 1 150 1 34 0 15 2 59 0 1 1 1 011 2 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 1s9 1 144 0 29 0 14 2 59 0 1 1 1 011 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1s10 1 149 1 31 0 15 1 58 0 1 1 1 010 2 0 1 1 0 1 1 0 1 0 0 1 0 1 0 0 0 1s11 1 154 1 37 0 15 2 60 0 1 1 1 100 2 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0s12 0 048 0 34 0 16 2 58 0 1 1 1 010 1 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1s13 1 148 0 30 0 14 1 59 0 1 1 1 011 1 0 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1s14 0 143 1 30 0 11 1 62 0 1 1 1 110 2 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1s15 1 058 0 40 0 12 2 63 0 1 1 1 111 4 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 0 1s16 1 157 1 44 0 15 4 61 0 1 1 1 101 4 0 1 1 0 1 1 0 1 0 0 1 1 1 0 0 0 1s17 1 054 1 39 0 13 4 61 0 1 1 1 101 3 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 1s18 1 151 0 35 0 14 3 60 0 1 1 1 100 3 0 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 0s19 0 057 1 38 0 17 3 58 0 1 1 1 010 3 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 1s20 1 151 1 38 0 15 3 60 0 1 1 1 100 2 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0s21 0 054 1 34 0 17 2 57 0 1 1 1 001 4 0 1 1 0 0 1 1 1 0 0 1 0 1 0 0 0 1s22 1 151 1 37 0 15 4 59 0 1 1 1 011 2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 0 0 1s23 0 146 0 36 0 10 2 64 1 0 0 0 000 5 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 0s24 0 051 1 33 0 17 5 56 0 1 1 1 000 2 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0s25 0 148 1 34 0 19 2 56 0 1 1 1 000 2 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 1 0s26 0 050 0 30 0 16 2 57 0 1 1 1 001 4 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0s27 0 050 0 34 0 16 4 57 0 1 1 1 001 2 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1s28 1 152 1 35 0 15 2 59 0 1 1 1 011 2 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1s29 1 152 0 34 0 14 2 60 0 1 1 1 100 2 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0s30 0 047 0 32 0 16 2 58 0 1 1 1 010 2 0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 1 0s31 0 048 0 31 0 16 2 57 0 1 1 1 001 4 0 1 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1s32 1 154 1 34 0 15 4 58 0 1 1 1 010 1 0 1 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1s33 1 152 1 41 0 15 1 61 0 1 1 1 101 15 1111055 2 00111101101010 100111011100 0001001s34 01 42 0 14 2 62 0 1 1 1 110 10 1101049 1 00111101000110 001111101111 0001001s35 11 31 0 15 1 58 0 1 1 1 010 15 1110050 2 00111101011100 100100010100 0001001s36 01 32 0 12 2 61 0 1 1 1 101 11 0110155 2 00111101101010 100101001011 0001001s37 11 35 0 13 2 61 0 1 1 1 101 18 0101049 1 00111101000111 011010101010 0001011s38 10 31 0 15 1 58 0 1 1 1 e26 66 01 0 1 30 44 14 27 0 0 1 001 13 0 0 2 00110111110001 001111111000 00010011 s39 1 44 1 30 0 2 60 e27 28 13 48 14 230010111 11 01 068 1 34 15 0 0 2 00111101011111 100101001001 00010110 s40 1 1 51 1 0 15 2 59 e28 30 50 17 210010111 0 1 167 100 1 35 12 1 1 3 00111101011001 100101011010 00010010 s41 1 50 1 0 3 61 e29 29 13 45 15 260010111 1 0 160 13 1 23 3 01100010100010 001101111011 00010011 s42 100145 10 0 13 3 57 e30 57 26 35 10 360011101 00 10 100 14 1 2 01100010100100 100101010100 00010010 s43 1 44 1 32 0 13 2 60 e31 24 38 11 320011101 1 0 055 000 0 35 14 6 01101000011000 100101011010 00011100 s44 0 50 0 0 6 57 e32 24 16 37 10 330011101 0 0 155 17 151 0 38 4 01101000001111 100101111100 00011100 s45 0010 11 0 19 4 56 e33 58 27 39 12 320011101 00 00 0 0 15 1 1 1 3 00111101010000 001111111001 00010010 s46 1 1 48 0 30 0 14 3 58 e34 27 51 16 200010111 0 1 060 1 0 0 10 010 1 100101101100 00011000 1 1 12 1 0 2 001111010011 s47 0 0 51 0 38 0 2 59 e35 30 16 45 15 270010111 0 11 154 11 11100 1 100100 101 001 001 0 00010011 11046 0 32 10 010 2 0011011010 s48 01 0 14 2 59 e36 60 34 45 16 270010111 01 11 0 1 10 010 100 110 1 1001011001 011 0 00010011 0 1 0 13 0 2 00111101 s49 1 1 53 1 37 0 15 2 60 e37 31 47 15 240010111 1 00 067 101 0 10010101 00110 1 00010011 010 11150 1 33 12 2 001111010010 s50 00 0 14 2 60 0 1 1 1 IRIS(SL,SW,PL,PW)  DPPMinVec,MaxVEC vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 00 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 00 vir 00 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 01 vir 63 33 60 1001 58 27 51 0011 71 30 59 0101 63 29 56 0010 65 30 58 0110 76 30 66 0101 49 25 45 0001 73 29 63 0010 67 25 58 0010 72 36 61 1001 65 32 51 0100 64 27 53 0011 68 30 55 0101 57 25 50 0100 58 28 51 1000 64 32 53 0111 65 30 55 0010 77 38 67 0110 77 26 69 0111 60 22 50 1111 69 32 57 0111 56 28 49 0100 77 28 67 0100 63 27 49 0010 67 33 57 0101 72 32 60 0010 62 28 48 0010 61 30 49 0010 64 28 56 0101 72 30 58 0000 74 28 61 0011 79 38 64 0100 64 28 56 0110 63 28 51 1111 61 26 56 1110 77 30 61 0111 63 34 56 1000 64 31 55 0010 60 30 18 0010 69 31 54 0101 67 31 56 1000 69 31 51 0111 58 27 51 0011 68 32 59 25 19 21 18 22 21 17 18 18 25 20 19 21 20 24 23 18 22 23 15 23 20 20 18 21 18 18 18 21 16 19 20 22 15 14 23 24 18 18 21 24 23 19 23 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 0 1 1 0 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 1 1 0 0 0 1 0 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 i1 1 163 00 33 60 25 10 0 0 0 1010 1 i2 0 058 11 27 51 19 19 0 0 1 0011 1 i3 1 071 11 30 59 21 11 0 0 0 1011 1 i4 1 063 00 29 56 18 15 0 0 0 1111 1 i5 1 065 10 30 58 22 12 0 0 0 1100 0 i6 0 076 10 30 66 21 5 0 0 0 0101 0 i7 1 149 01 25 45 17 24 0 0 1 1000 1 i8 1 173 11 29 63 18 8 0 0 0 1000 1 i9 1 067 10 25 58 18 12 0 0 0 1100 1 i10 1 172 0 136 61 25 10 0 0 01010 1 i11 0 065 1 132 51 20 19 0 0 10011 1 i12 0 164 0 127 53 19 16 0 0 10000 1 i13 0 168 1 130 55 21 15 0 0 01111 1 i14 0 057 1 025 50 20 19 0 0 10011 1 i15 0 058 1 128 51 24 17 0 0 10001 1 i16 0 164 0 132 53 23 17 0 0 10001 1 i17 0 165 1 130 55 18 16 0 0 10000 0 i18 0 077 1 138 67 22 6 0 0 0 0110 0 i19 0 177 0 126 69 23 0 0 0 0 0000 1 e50 001 57028 41 13 30 0 0 11110 1 i1 1 063 01 33 60 25 10 0 0 0 1010 1 i2 0 058 01 27 51 19 19 0 0 1 0011 0 i3 0 071 11 30 59 21 11 0 0 0 1011 1 i4 0 063 01 29 56 18 15 0 0 0 1111 1 i5 1 065 01 30 58 22 12 0 0 0 1100 1 i6 1 176 00 30 66 21 5 0 0 0 0101 1 i7 0 049 00 25 45 17 24 0 0 1 1000 1 i8 0 073 01 29 63 18 8 0 0 0 1000 1 i9 1 067 00 25 58 18 12 0 0 0 1100 1 i10 1 072 1 036 61 25 10 0 0 01010 1 i11 1 165 0 132 51 20 19 0 0 10011 0 i12 0 064 0 027 53 19 16 0 0 10000 1 i13 1 068 0 030 55 21 15 0 0 01111 1 i14 0 057 1 125 50 20 19 0 0 10011 1 i15 1 058 0 028 51 24 17 0 0 10001 1 i16 1 164 0 132 53 23 17 0 0 10001 1 i17 1 065 0 030 55 18 16 0 0 10000 1 i18 0 177 1 138 67 22 6 0 0 0 0110 1 i19 0 077 1 026 69 23 0 0 0 0 0000 1 i40 0 169 1 031 54 21 16 0 0 10000 1 i41 1 067 0 031 56 24 13 0 0 01101 1 i42 0 069 1 131 51 23 18 0 0 10010 1 i43 0 058 1 127 51 19 19 0 0 10011 1 i44 1 068 1 132 59 23 11 0 0
More slides like this


Slide #25.

Dot Product Projection (DPP) Check F(y)=(y-p)o(q-p)/|q-p| for gaps or thin intervals. Check actual distances at sparse ends. To illustrate the DPP algorithm, we use IRIS to see how close it comes to separating into the 3 known classes (s=setosa, e=versicolor, i=virginica) We require a DPP-gap of at least 4. We also check any sparse ends of the DPP-range to find outliers (using a table of pairwise distances). We start with p=MinVector of the 4 column minimums and q=MaxVector of the 4 col. maxs. Then we replace some of those with the average. gap>=4 p=nnnn q=xxxx F Count 0 1 1 1 2 1 3 3 4 1 5 6 6 4 7 5 8 7 9 3 10 8 11 5 12 1 13 2 14 1 15 1 19 1 20 1 21 3 26 2 28 1 29 4 30 2 31 2 32 2 33 4 34 3 36 5 37 2 38 2 39 2 40 5 41 6 42 5 43 7 44 2 45 1 46 3 47 2 48 1 49 5 50 4 51 1 52 3 53 2 54 2 55 3 56 2 57 1 58 1 59 1 61 2 64 2 66 2 68 1 CLUS3 outliers removed p=aaax q=aaan No Thining. Sparse Lo end: Check [0,8] distances F Cnt 0 0 3 5 5 6 8 8 0 4 i30 i35 i20 e34 i34 e23 e19 e27 1 2 CLUS3.1 i30 0 12 17 14 12 14 18 11 Thinning=[6,7 ] 2 5 p=anxa q=axna i35 12 0 7 6 6 7 12 11 13 CLUS3.1 <6.5 3 F Cnt i20 17 7 0 5 7 4 5 10 4 8 0 2 e34 14 6 5 0 3 4 8 9 44 ver 4 vir 5 12 3 1 i34 12 6 7 3 0 4 9 6 6 4 5 2 e23 14 7 4 4 4 0 5 6 7 2 6 1 LUS3.2 >6.5 e19 18 12 5 8 9 5 0 9 8 11 8 2 e27 11 11 10 9 6 6 9 0 9 5 2 ver 39 vir 9 4 10 4 i30,i35,i20 outliers because F3 they are 4 from s42 is revealed as an outlier because F(s42)= 1 is 10 3 11 5 5,6,7,8 {e34,i34} doubleton outlier set 4 from 5,6,... and it's 4 from others in [0,4] 11 6 No sparse ends 12 2 12 6 13 7 13 7 Sparse Upper end: Check [16,19] distances 14 3 14 7 16 16 16 19 19 15 2 15 4 e7 e32 e33 e30 e15 16 3 Gaps=[15,19] [21,26] Check dis in [12,28] to see if s16, i39,e49,e8,e11,e44 outliers e7 0 17 12 16 14 19 2 e32 17 0 5 3 6 12 13 13 14 15 19 20 21 21 21 26 26 28 e33 12 5 0 5 4 s34 s6 s45 s19 s16 i39 e49 e8 e11 e44 e32 e30 e31 e30 16 3 5 0 4 s34 0 5 8 5 4 21 25 28 32 28 30 28 31 e15 14 6 4 4 0 s6 5 0 4 3 6 18 21 23 27 24 26 23 27 e15 outlier. So CLUS3.1 = 42 versicolor s45 8 4 0 6 9 18 18 21 25 21 24 22 25 s19 5 3 6 0 6 17 21 24 27 24 25 23 27 s16 4 6 9 6 0 20 26 29 33 29 30 28 31 i39 21 18 18 17 20 0 17 21 24 21 22 19 23 CLUS3.2 = 39 virg, 2 vers e49 25 21 18 21 26 17 0 4 7 4 8 8 9 (unable to separate the 2 vers from the 39 virg) e8 28 23 21 24 29 21 4 0 5 1 7 8 8 e11 32 27 25 27 33 24 7 5 0 4 7 9 7 e44 28 24 21 24 29 21 4 1 4 0 6 8 7 e32 30 26 24 25 30 22 8 7 7 6 0 3 1 e30 28 23 22 23 28 19 8 8 9 8 3 0 4 e31 31 27 25 27 31 23 9 8 7 7 1 4 0 Sparse Lower end: Checking [0,4] distances 0 1 2 3 3 3 4 s14 s42 s45 s23 s16 s43 s3 s14 0 8 14 7 20 3 5 s42 8 0 17 13 24 9 9 s45 14 17 0 11 9 11 10 s23 7 13 11 0 15 5 5 s16 20 24 9 15 0 18 16 s43 3 9 11 5 18 0 3 s3 5 9 10 5 16 3 0 So s16,,i39,e49, e11 are outlier. {e8,e44} doubleton outlier. Separate at 17 and 23, giving CLUS1 F<17 ( CLUS1 =50 Setosa with s16,s42 declared as outliers). 17
More slides like this


Slide #26.

"Gap Hill Climbing": mathematical analysis One way to increase the size of the functional gaps is to hill climb the standard deviation of the functional, F (hoping that a "rotation" of d toward a higher STDev would increase the likelihood that gaps would be larger ( more dispersion allows for more and/or larger gaps). We can also try to grow one particular gap or thinning using support pairs as follows: F-slices are hyperplanes (assuming F=dotd) so it would makes sense to try to "re-orient" d so that the gap grows. Instead of taking the "improved" p and q to be the means of the entire n-dimensional half-spaces which is cut by the gap (or thinning), take as p and q to be the means of the F-slice (n-1)-dimensional hyperplanes defining the gap or thinning. This is easy since our method produces the pTree mask of each F-slice ordered by increasing F-value (in fact it is the sequence of F-values and the sequence of counts of points that give us those value that we use to find large gaps in the first place.). The d2-gap is much larger than the d1=gap. It is still not the optimal gap though. Would it be better to use a weighted mean (weighted by the distance from the gap - that is weighted by the d-barrel radius (from the center of the gap) on which each point lies?) In this example it seems to make for a larger gap, but what weightings should be used? (e.g., 1/radius2) (zero weighting after the first gap is identical to the previous). Also we really want to identify the Support vector pair of the gap (the pair, one from one side and the other from the other side which are closest together) as p and q (in this case, 9 and a but we were just lucky to draw our vector through them.) We could check the d-barrel radius of just these gap slice pairs and select the closest pair as p and q??? 0 1 2 3 4 5 6 7 8 9 a b c d e f 1 0 2 3 4 5 6 7 8 =p 9 d 2-gap d2 d 1-g ap j d k c e m n r f s o g p h i d1 l q f e d c b a 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 a b c d e f 1 2 3 4 5p 6 7 8 9d 1-g ap d 2-gap a q=b d2 f e d c b a 9 8 7 6 5 4 3 2 1 0 a b d d1 j k qc e q f
More slides like this


Slide #27.

HILL CLIMBING GAP WIDTH Dot F 0 6 1 28 2 7 3 7 4 1 5 1 9 7 10 3 11 5 12 13 13 8 14 12 15 4 16 2 17 12 18 5 19 6 20 6 21 3 22 8 23 3 24 3 p=aaan q=aaax CLUS1<7 7 16 (50 Set) (4 Virg, 48 Vers) (46 Virg, 2 Vers) Next we attempt to hill-climb the gap at 16 using the half-space averages. On CLUS2unionCLUS3 p=avg<16 q=avg>16 0 1 1 1 No conclusive gaps Sparse Lo end: Check [0,9] 2 2 0 1 2 2 3 7 7 9 9 3 1 i39 e49 e8 e44 e11 e32 e30 e15 e31 7 2 i39 0 17 21 21 24 22 19 19 23 9 2 e49 17 0 4 4 7 8 8 9 9 10 2 e8 21 4 0 1 5 7 8 10 8 11 3 e44 21 4 1 0 4 6 8 9 7 12 3 e11 24 7 5 4 0 7 9 11 7 13 2 e32 22 8 7 6 7 0 3 6 1 14 5 e30 19 8 8 8 9 3 0 4 4 15 1 e15 19 9 10 9 11 6 4 0 6 16 3 e31 23 9 8 7 7 1 4 6 0 17 3 i39,e49,e11 singleton outliers. {e8,i44} doubleton outlier set 18 2 19 2 20 4 There is a thinning at 22 and it is the same one but it is 21 5 22 2 not more prominent. Next we attempt to hill-climb the 23 5 gap at 16 using the mean of the half-space boundary. 24 9 25 1 (i.e., p is avg=14; q is avg=17. 26 1 27 3 28 2 Sparse Hi end: Check [38,47] distances 29 1 38 39 42 42 44 45 45 47 47 30 3 i31 i8 i36 i10 i6 i23 i32 i18 i19 31 5 i31 0 3 5 10 6 7 12 12 10 32 2 i8 3 0 7 10 5 6 11 11 9 33 3 i36 5 7 0 8 5 7 9 10 9 34 3 i10 10 10 8 0 10 12 9 9 14 35 1 i6 6 5 5 10 0 3 9 8 5 36 2 i23 7 6 7 12 3 0 11 10 4 37 4 i32 12 11 9 9 9 11 0 4 13 38 1 i18 12 11 10 9 8 10 4 0 12 39 1 i19 10 9 9 14 5 4 13 12 0 42 2 i10,i18,i19,i32,i36 singleton outliers {i6,i23} doubleton outlier 44 1 45 2 47 2 CL123 p is avg=14 q is avg=17 0 1 2 3 3 2 4 4 5 7 6 4 7 8 8 2 9 11 10 4 12 3 13 1 20 1 21 1 22 2 Here, the gap between CLUS1 23 1 and CLUS2 is made more 27 2 28 1 pronounced???? (Why?) 29 1 30 2 But the thinning between 31 4 CLUS2 and CLUS3 seems 32 2 33 3 even more obscure??? 34 4 35 1 36 3 Although this doesn't prove 37 4 anything, it is not good news 38 2 39 2 for the method! 40 5 41 3 42 3 It did not grow the gap we 43 6 44 8 wanted to grow (between 45 1 CLUSTER2 and CLUSTER3. 46 2 47 1 48 3 49 3 51 7 52 2 53 2 54 3 55 1 56 3 57 3 58 1 61 2 63 2 64 1 66 1 67 1
More slides like this


Slide #28.

CAINE 2013 Call for Papers 26th International Conference on Computer Applications in Industry and Engineering September 25{27, 2013, Omni Hotel, Los Angles, Califorria, USA Sponsored by the International Society for Computers and Their Applications (ISCA) CAINE{2013 will feature contributed papers as well as workshops and special sessions. Papers will be accepted into oral presentation sessions. The topics will include, but are not limited to, the following areas: Agent-Based Systems Image/Signal Processing Autonomous Systems Information Assurance Big Data Analytics Information Systems/Databases Bioinformatics, Biomedical Systems/Engineering Internet and Web-Based Systems Computer-Aided Design/Manufacturing Knowledge-based Systems Computer Architecture/VLSI Mobile Computing Computer Graphics and Animation Multimedia Applications Computer Modeling/Simulation Neural Networks Computer Security Pattern Recognition/Computer Vision Computers in Education Rough Set and Fuzzy Logic Computers in Healthcare Robotics Computer Networks Fuzzy Logic Control Systems Sensor Networks Data Communication Scientic Computing Data Mining Software Engineering/CASE Distributed Systems Visualization Embedded Systems Wireless Networks and Communication Important Dates: Workshop/special session proposal . . May 2.5,.2.013 Full Paper Submis . .June 5,.2013. Notice Accept ..July.5 , 2013. Pre-registration & Camera-Ready Paper Due . . . ..August 5, 2013. Event Dates . . .Sept 25-27, 2013 SEDE Conf is interested in gathering researchers and professionals in the domains of SE and DE to present and discuss high-quality research results and outcomes in their fields. SEDE 2013 aims at facilitating cross-fertilization of ideas in Software and Data Engineering, The conference topics include, but not limited to: . Requirements Engineering for Data Intensive Software Systems. Software Verification and Model of Checking. Model-Based Methodologies. Software Quality and Software Metrics. Architecture and Design of Data Intensive Software Systems. Software Testing. Service- and Aspect-Oriented Techniques. Adaptive Software Systems . Information System Development. Software and Data Visualization. Development Tools for Data Intensive. Software Systems. Software Processes. Software Project Mgnt . Applications and Case Studies. Engineering Distributed, Parallel, and Peer-to-Peer Databases. Cloud infrastructure, Mobile, Distributed, and Peer-to-Peer Data Management . Semi-Structured Data and XML Databases. Data Integration, Interoperability, and Metadata. Data Mining: Traditional, Large-Scale, and Parallel. Ubiquitous Data Management and Mobile Databases. Data Privacy and Security. Scientific and Biological Databases and Bioinformatics. Social networks, web, and personal information management. Data Grids, Data Warehousing, OLAP. Temporal, Spatial, Sensor, and Multimedia Databases. Taxonomy and Categorization. Pattern Recognition, Clustering, and Classification. Knowledge Management and Ontologies. Query Processing and Optimization. Database Applications and Experiences. Web Data Mgnt and Deep Web May 23, 2013 Paper Submission Deadline June 30, 2013 Notification of Acceptance July 20, 2013 Registration and Camera-Ready Manuscript Conference Website: http://theory.utdallas.edu/SEDE2013/ ACC-2013 provides an international forum for presentation and discussion of research on a variety of aspects of advanced computing and its applications, and communication and networking systems. Important Dates May 5, 2013 - Special Sessions Proposal June 5, 2013 - Full Paper Submission July 5, 2013 - Author Notification Aug. 5, 2013 - Advance Registration & Camera Ready Paper Due CBR International Workshop Case-Based Reasoning CBR-MD 2013 July 19, 2013, New York/USA Topics of interest include (but are not limited to): CBR for signals, images, video, audio and text Similarity assessment Case representation and case mining Retrieval and indexing Conversational CBR Metalearning for model improvement and parameter setting for processing with CBR Incremental model improvement by CBR Case base maintenance for systems Case authoring Life-time of a CBR system Measuring coverage of case bases Ontology learning with CBR Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop on Data Mining in Life Sciences DMLS Discovery of high-level structures, incl e.g. association networks Text mining from biomedical literatur Medical images mining Biomedical signals mining Temporal and sequential data mining Mining heterogeneous data Mining data from molecular biology, genomics, proteomics, pylogenetic classification With regard to different methodologies and case studies: Data mining project development methodology for biomedicine Integration of data mining in the clinic Ontology-driver data mining in life sciences Methodology for mining complex data, e.g. a combination of laboratory test results, images, signals, genomic and proteomic samples Data mining for personal disease management Utility considerations in DMLS, including e.g. cost-sensitive learning Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013 Workshop on Data Mining in Marketing DMM'2013 In business environment data warehousing - the practice of creating huge, central stores of customer data that can be used throughout the enterprise - is becoming more and more common practice and, as a consequence, the importance of data mining is growing stronger. Applications in Marketing Methods for User Profiling Mining Insurance Data E-Markteing with Data Mining Logfile Analysis Churn Management Association Rules for Marketing Applications Online Targeting and Controlling Behavioral Targeting Juridical Conditions of E-Marketing, Online Targeting and so one Controll of Online-Marketing Activities New Trends in Online Marketing Aspects of E-Mailing Activities and Newsletter Mailing Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013 Workshop Data Mining in Ag DMA 2013 Data Mining on Sensor and Spatial Data from Agricultural Applications Analysis of Remote Sensor Data Feature Selection on Agricultural Data Evaluation of Data Mining Experiments Spatial Autocorrelation in Agricultural Data Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013
More slides like this


Slide #29.

|d|=1  |dd|=1, so dd is a unit vector iff d is a unit vector). If |d|=1 then |dd|=1 |dd| = SQRT( i=1..ndi2d12 + i=1..ndi2d22 + ... + i=1..ndi2dn2 ) |dd| = SQRT( j=1..n(i=1..ndi2)dj2 ) the if |dd| = SQRT( j=1..n 1 dj2 ) |dd| = SQRT( j=1..n 1 dj2 ) = 1 the if 1=|dd| = SQRT( i=1..ndi2d12 + i=1..ndi2d22 + ... + i=1..ndi2dn2 ) 1=|dd| = SQRT( (i=1..ndi2) (j=1..ndj2) ) 2 1=|dd| = SQRT( (i=1..ndi2) ) 2 ) mm d d1 = DPPd(X) = x1od x2od dn . . xi xiod xi,j xN VX If |dd|=1 then |d|=1 1=|dd| = SQRT( (i=1..ndi2) X X X1...Xj ...Xn x1 x2 . V X1... Xj ... Xn X1 : Xi XiXj-XiX,j : XN xNod o dd V d1... dj ... dn d1 : di didj : dN =VarDPPdX = V
More slides like this


Slide #30.

FAUST Functional-Gap clustering (FAUST=Functional Analytic Unsupervised and Supervised machine Teaching) relies on choosing a distance dominating functional (map to R1 s.t. |F(x)-F(y)|Dis(x,y) x,y; so that any F-gap implies a linear cluster break. Dot Product Projection: DPPd(y)≡ (y-p)od where the unit vector, d, can be obtained as d=(p-q)/|p-q| for points, p and q. Coordinate Projection is a the simplest DPP: ej(y) ≡ yj Square Distance Functional: SDp(y) ≡ (y-p)o(y-p) Dot Product Radius: DPRpq(y) ≡ √ SDp(y) - DPPpq(y)2 Square Dot Product Radius: SDPRpq(y)≡ SDp(y)-DPPpq(y)2 Note: The same DPPd gaps are revealed by DPd(y)≡ yod since ((y-p)od=yod-pod and thus DP just shifts all DPP values by pod. Finding a good unit vector, d, for Dot Product functional, DPP. to maximize gaps Method-1: Maximize VarianceDPPd(X) wrt d. Let upper bar mean column average. VarDPPd(X) = (Xod)2 = (1/N) = (1/N) = (1/N) - ( Xod )2 ( i=1..N ( (j=1..n xi,jdj)2 ) - i=1..N (j=1..n xi,jdj) (k=1..n xi,kdk) - ( ( j=1..n xi,j2dj2 + 2j
More slides like this


Slide #31.

Algorithm-1 (a heuristic): Compute the vector ( X12 - X12 , ... , Xn2 - Xn2 ). The unit vector (a1...an)≡A maximizing YoA is A=Y/|Y|. So let D≡( √ X12 - X12 ,...,√ X12 - X12 ) and d≡D/|D|. F(x)=xod F Ct 0 1 F 2 3 0 3 2 1 4 4 2 5 4 3 6 6 4 7 8 5 F Ct on CLUS.2 8 3 6 0 2 9 10 7 1 1 10 5 8 2 1 12 2 9 3 3 _________CLUS.1 <15 (50 Setosa) 13 2 10 4 2 18 1 11 5 4 CLUS.2.1.1 <14 12 21 1 CLUS.2 >15 6 3 30 Vers, 2 Virg 13 22 1 (50 Versacolor, 50 Verginica 7 2 23 1 14 8 1 CLUS.2.1.2 14 15 24 1 9 3 12 Vers, 12Virg 28 2 16 10 2 29 1 17 11 2 30 1 18 12 3 31 3 13 4 In the neighborhood of F=15: 32 3 14 3 18 21 22 23 24 33 2 15 7 i39 e49 e8 e44 e11 34 3 16 5 i39 0 17 21 21 24 35 5 17 6 e49 17 0 4 4 7 37 3 _________CLUS.2.1 <19 (42 Vers, 14 Virg) 18 2 e8 21 4 0 1 5 38 3 19 1 e44 21 4 1 0 4 39 3 CLUS.2.2 19 ( 2 Vers, 29Virg) 20 3 e11 24 7 5 4 0 40 1 21 2 i39,e49,e11 outliers (gap=(13,28) 41 5 22 2 or 15) {e8,e44}doubleton outlier 42 4 23 4 43 5 24 4 44 5 25 1 45 7 26 3 47 3 27 2 48 1 28 1 49 3 29 3 51 6 30 3 52 3 31 1 Sparse Hi end 53 3 32 1 30 30 30 31 32 35 54 1 35 1 i3 i26 i44 i31 i8 i36 55 4 i3 0 4 4 5 5 7 56 2 i26 4 0 6 5 4 7 57 1 i44 4 6 0 8 9 9 58 3 i31 5 5 8 0 3 5 59 1 i8 5 4 9 3 0 7 60 1 i36 7 7 9 5 7 0 62 2 i35 outlier. 65 1 66 2 68 1 69 1 Sparse Hi end of CLUS.2 59 60 62 62 65 66 66 68 69 i31 i8 i36 i10 i6 i23 i32 i19 i18 i31 0 3 5 10 6 7 12 10 12 i8 3 0 7 10 5 6 11 9 11 i36 5 7 0 8 5 7 9 9 10 i10 10 10 8 0 10 12 9 14 9 i6 6 5 5 10 0 3 9 5 8 i23 7 6 7 12 3 0 11 4 10 i32 12 11 9 9 9 11 0 13 4 i19 10 9 9 14 5 4 13 0 12 i18 12 11 10 9 8 10 4 12 0 Ct on CLUS.2.1 1 2 1 4. 1 4 3 2 1 2 4 1 4 2. 5 9 4 3 3 10F Ct on CLUS.2.1.2 0 1 4 1 __________ [0.20) 2 virg; 15 1 24 1 __________(20,30) 2 virg; 27 3 34 1 35 1 __________(30,40) 0 virg; 36 1 43 1 44 1 __________(40,50) 1 virg; 46 1 53 1 54 1 55 2 56 1 57 1 __________(50,60) 4 virg; 58 1 69 1 76 2 88 1 __________(60,88] 1 virg; 1 vers (all outliers) 2 vers (all outliers?) 3 vers (all outliers?) 2 vers (all outliers?) 3 vers 3 vers (all outliers) Algorithm-2: F Ct 10 1 Take the ai corresponding to max STD(Yi). 11 1 12 2 STD(PL)=17 over twice others so F(x)=x3 13 7 14 12 15 14 16 7 17 4 18 1 __________CLUS.1 <25 1 virg; 50 seto 19 2 30 1 33 2 35 2 36 1 37 1 38 1 39 3 40 5 41 3 42 4 43 2 44 4 45 8 46 3 47 5 __________25< CLUS.2 <49 3 virg; 46 vers 48 3 49< CLUS.3 <70 46 virg; 4 vers 49 5 50 4 But, would one pick out 49 as a gap/ thinning? 51 8 52 2 60 2 53 2 61 3 54 2 63 1 55 3 64 1 56 6 66 1 57 3 67 2 58 3
More slides like this


Slide #32.

Algorithm-1 (a heuristic): Compute the vector ( X12 - X12 , ... , Xn2 - Xn2 ). Redo CLUS2, spreading F-values out as 2(F-min) D≡( √ X12 - X12,...,√ X12 - X12 ) and d≡D/|D|. F(x)=xod F Ct 0 1 2 3 3 2 4 4 5 4 6 6 7 8 8 3 9 10 10 5 12 2 _________CLUS.1 <15 (50 Setosa) 13 2 18 1 CLUS.2 >15 21 1 22 1 (50 Versacolor, 50 Verginica 23 1 24 1 28 2 29 1 30 1 31 3 In the neighborhood of F=15: 32 3 18 21 22 23 24 33 2 i39 e49 e8 e44 e11 34 3 i39 0 17 21 21 24 35 5 e49 17 0 4 4 7 37 3 e8 21 4 0 1 5 38 3 e44 21 4 1 0 4 39 3 e11 24 7 5 4 0 40 1 i39,e49,e11 outliers (gap=(13,28) 41 5 or 15) {e8,e44}doubleton outlier 42 4 43 5 44 5 45 7 47 3 48 1 49 3 51 6 52 3 53 3 54 1 55 4 56 2 57 1 58 3 59 1 60 1 62 2 65 1 66 2 68 1 69 1 Sparse Hi end of CLUS.2 59 60 62 62 65 66 66 68 69 i31 i8 i36 i10 i6 i23 i32 i19 i18 i31 0 3 5 10 6 7 12 10 12 i8 3 0 7 10 5 6 11 9 11 i36 5 7 0 8 5 7 9 9 10 i10 10 10 8 0 10 12 9 14 9 i6 6 5 5 10 0 3 9 5 8 i23 7 6 7 12 3 0 11 4 10 i32 12 11 9 9 9 11 0 13 4 i19 10 9 9 14 5 4 13 0 12 i18 12 11 10 9 8 10 4 12 0 i10,i32,i19,i18 outliers. {i6,i23}doubleton outlier CLUS 2 2(F-mn) _________[0,1) 2 Vers, 0 Virg all outliers 0 2 2 1 _________[1,2] 1 Vers, 0 Virg outlier 4 1 5 2 6 1 7 1 8 1 9 1 ________(2,11) 9 Vers, 1 Virg 10 3 12 ________[11,12] 4 4 Vers, 0 Virg quad outlier set? 14 2 15 1 _________[14,16) 3 Vers, 0 Virg all outliers? 17 2 18 1 19 1 20 1 _________(16,22) 7 Vers, 0 Virg || [0,22) 26 Vers, 1 Virg 21 2 23 2 24 1 25 1 26 2 27 2 28 2 29 5 _________(22,31) 13 Vers, 3 Virg || [0,31) 39 Vers, 4 Virg 30 1 31 4 32 2 33 2 _________[31,35) 4 Vers, 8 Virg 34 4 36 1 _________(35,38) 1 Vers, 1 Virg outliers 37 1 39 2 _________(38,41) 2 Vers, 1 Virg outliers? 40 1 42 2 43 1 44 1 45 1 46 2 47 3 _________(41,50) 0 Vers, 12 Virg 48 2 51 3 _________(50,53) 0 Vers, 4 Virg 52 1 54 2 _________(53,56) 0 Vers, 3 Virg outliers? 55 1 _________(56,58) 0 Vers, 3 Virg outliers? 57 3 59 2 _________(58,61) 0 Vers, 3 Virg outliers? 60 1 62 1 64 1 _________(61,71) 0 Vers, 3 Virg outlier 70 1
More slides like this


Slide #33.

Algorithm-1: Compute ( X12 - X12 , ... , Xn2 - Xn2 ). Applied to Sat150 (Satlog dataset with 150 pixs) D≡( √ X12 - X12,...,√ X12 - X12 ) and d≡D/|D|. F(x)=xod F Ct 0 1 1 1 _________[0,5) 2 1 _________[5,14) 8 1 _________[14,23) 20 1 25 1 _________[23,27) 26 1 28 1 _________[27,31) 30 1 32 2 _________[31,35) 1c=1 34 1 36 1 37 2 38 1 39 2 40 2 41 1 42 4 43 4 44 2 _________[35,46) 45 3 47 1 48 3 49 3 50 2 51 4 52 2 53 4 54 3 55 1 56 4 57 2 58 3 59 1 _________[46,61) 2c=1 60 1 62 1 63 1 64 1 65 1 66 2 _________[61.68) 2c=1 67 5 69 5 70 1 71 2 72 1 73 3 74 2 75 3 76 1 77 2 78 3 79 2 80 4 81 4 ________ 11c=1 82 2 [68,83) 84 3 85 3 86 2 87 1 88 1 89 1 _________[83,91) 90 2 92 7 93 _________[91,94) 1 1c=1 95 2 96 2 97 2 _________[94,100) 1c=1 98 3 102 1 103 2 _________[100,106) 1c=1 104 1 _________[106,115) 109 1 3c=5 1c=5 1c=2 1c=7 1c=7 2c=7 2c=7 10c=2 1c=4 1c=5 10c=7 19c=2 1c=4 5c=5 7c=7 1c=2 7c=4 1c=2 17c=3 3c=4 13c=3 7c=3 8c=3 3c=3 1c=3 1c=7 3c=7 This Satlog dataset is 150 rows (pixels) and 4 feature columns (R, G, IR1, IR2) There are 6 row-classes with row counts as follows: Count 19 32 50 12 10 27 Class# c=1 c=2 c=3 c=4 c=5 c=7 Class Description red soil cotton crop grey soil damp grey soil soil with vegetation stubble very damp grey soil There are no significant gaps. There is some localization of classes with respect to F, but in a strictly unsupervised setting, that would be impossible to detect. This is somewhat expected since the changes in ground cover class are gradual and smooth (in general) so that classes butt up against one-another (no gaps between them) .
More slides like this


Slide #34.

F-MN/4 _________[0,4) 0 1 7 1 _________[4,9) 11 1 _________[9,13) 1c=0 12 2 14 1 15 2 16 1 17 1 _________[13,19) 1c=0 18 3 1c=6 20 2 21 3 22 3 23 3 24 5 25 3 26 3 _________[19,28) 1c=0 27 1 29 1 _________[28,31) 1c=0 30 1 32 2 33 4 _________[31,34) 1c=0 35 1 36 1 37 4 38 4 39 2 40 2 41 4 42 1 _________[34,44) 1c=0 43 10 45 1 46 3 47 6 48 3 49 3 50 2 51 5 52 3 53 1 _________[44,55) 1c=0 54 2 56 1 57 2 58 3 59 2 60 1 61 1 62 1 _________[55,64) 1c=0 63 2 65 1 66 2 67 1 68 3 _________[64,70) 1c=0 69 4 71 3 72 3 73 2 74 1 75 3 _________[70,78) 1c=0 76 2 79 1 _________[78,81) 1c=0 80 4 82 1 85 1 86 1 87 1 1c=2 Algorithm-1: on Concrete149(Strength=ClassLabel,Mix,Water,FineAgregate,Age) D≡( √ X12 - X12 ,...,√ X12 - X12 ) and d≡D/|D|. F(x)=xod 1c=4 1c=2 1c=4 4c=2 2c=4 13c=2 6c=4 3c=6 1c=2 1c=4 3c=6 13c=2 6c=4 Concrete149 dataset has 149 rows; 1 class column and 4 feature columns (ST,MX,WA,FA,AG)) There are 4 Strength classes with row counts as follows: Count Class# Class Description (Concrete Strength of...) 19 c=0 [0,10) 32 c=2 [20,30) 50 c=4 [40,50) 12 c=6 [60,100) I deleted Strength= 10's, 30's, 50's so as to introduce gaps to identify. I really didn't find any!! 16c=2 7c=4 6c=6 1c=2 10c=4 18c=6 13c=2 6c=4 7c=6 1c=2 6c=4 4c=6 1c=2 3c=4 10c=6 13c=2 1c=4 4c=6
More slides like this


Slide #35.

F=MX _________[0,5) 1c=0 1c=2 2c=4 0 4 6 3 _________[5,10) 1c=0 4c=2 7 2 12 10 13 2 _________[10,16) 1c=0 7c=2 7c=4 14 3 18 9 _________[16,19) 1c=0 3c=2 6c=4 20 4 _________[19,21) 1c=0 3c=2 1c=4 22 5 23 3 _________[21,25) 1c=0 10c=2 3c=4 24 5 27 3 _________[25,29) 1c=0 2c=2 1c=4 _________[29,31) 1c=0 1c=2 3c=4 31 3 _________[31,39) 1c=0 3c=2 3c=4 36 4 41 2 43 4 44 3 _________[39,45) 1c=0 2c=2 3c=4 4c=6 46 3 48 2 _________[45,52) 1c=0 2c=2 3c=4 3c=6 49 2 55 13 _________[5256,) 1c=0 1c=2 3c=4 12c=6 58 8 60 6 _________[56,64) 1c=0 3c=2 8c=4 8c=6 62 5 _________[64,66) 1c=0 2c=2 1c=4 3c=6 65 4 71 16 72 4 _________[66,78) 1c=0 2c=2 8c=4 15c=6 74 3 82 4 _________[78,90) 1c=0 2c=2 5c=4 6c=6 83 7 97 2 _________[90,101) 1c=0 2c=2 2c=4 1c=6 100 1 F=FA _________[0,2) 1c=0 0 15 3 2 _________[2,10) 1c=0 5 4 17 3 19 8 21 1 _________[10,25) 1c=0 _________[25,30) 1c=0 29 3 _________[30,43) 1c=0 41 28 46 3 47 8 _________[43,50) 1c=0 48 3 52 4 _________[50,55) 2c=0 53 15 58 3 _________[55,60) 1c=0 62 4 63 4 65 7 67 3 _________[60,70) 1c=0 69 4 72 3 73 12 75 2 _________[70,80) 1c=0 78 5 83 1 100 4 1c=2 14c=4 1c=6 2c=2 2c=4 2c=6 1c=2 8c=4 3c=6 1c=2 14c=4 3c=6 6c=2 3c=4 19c=6 6c=2 4c=4 3c=6 6c=2 5c=4 6c=6 1c=2 2c=4 3c=6 10c=2 6c=4 6c=6 4c=2 7c=4 11c=6 Algorithm-2: on Concrete149(Strength=ClassLabel,Mix,Water,FineAgregate,Age) STD(MX)=101, STD(WA)=28, STD(FA)=99, STD(AG(=81) so we pick MX. d=StdVec 0 1 7 1 11 1 _________[0,14) 12 2 14 1 15 2 16 1 17 1 _________[14,19) 18 3 20 2 21 3 22 3 23 3 24 5 25 3 26 3 _________[19,28) 27 1 29 1 _________[28,31) 30 1 32 2 33 _________[31,34) 4 35 1 36 1 37 4 38 4 39 2 40 2 41 4 42 1 _________[34,44) 43 10 45 1 46 3 47 6 48 3 49 3 50 2 51 5 52 3 53 1 _________[44,55) 54 2 56 1 57 2 58 3 59 2 60 1 61 1 62 1 _________[55,64) 63 2 65 1 66 2 67 1 68 3 _________[64,70) 69 4 71 3 72 3 73 2 74 1 75 3 _________[70,78) 76 2 79 1 _________[78,81) 80 4 82 1 85 1 86 1 d = STDVector = (101,28,99,81) divided by length. 1c=0 2c=2 2c=4 1c=0 4c=2 2c=4 1c=6 1c=0 13c=2 6c=4 3c=6 1c= 0 1c=2 1c=4 1c=6 4c=2 6c=4 1c=6 1c=0 1c=0 12c=2 7c=4 6c=6 1c=0 1c=2 10c=4 18c=6 1c=0 4c=2 6c=4 7c=6 1c=0 1c=2 6c=4 4c=6 1c=0 1c=2 3c=4 10c=6 1c=0 1c=2 1c=4 4c=6
More slides like this


Slide #36.

_________[56,58) 56 1 59 _________[58,60) 5 61 2 62 2 63 6 64 8 65 13 66 19 67 17 68 11 69 7 70 15 _________[60,72) 71 12 72 5 73 7 74 10 _________[72,76) 75 6 76 11 77 8 78 10 79 9 _________[76,81) 80 7 81 1 82 3 83 3 84 2 85 3 86 2 87 1 88 2 89 1 _________[81,91) 90 1 1c=1 3c=1 0c=2 0c=2 0c=3 2c=3 59c=1 2c=2 51c=3 3c=1 14c=2 11c=3 4c=1 37c=2 4c=3 0c=1 17c=2 2c=3 Alg-1: on SEEDS210(CLS123, area, compact, asym_coef, len_kernel_groove) The classes are Class1 = Kama Class2 = Rosa Class3 = Canadian
More slides like this


Slide #37.

85 1 Alg-1: on 87 2 88 3 89 3 (CLS123, 90 1 91 2 _________[85,92) 6c=1 0c=2 6c=3 92 7 93 11 94 17 95 12 96 13 97 15 98 10 99 10 100 6 101 5 102 10 103 3 104 5 _________[92,106) 60c=1 4c=2 61c=3 105 1 106 1 107 7 108 7 109 8 110 9 111 8 _________[106,112) 4c=1 33c=2 3c=3 112 1 113 9 114 6 115 2 116 2 117 3 118 2 119 1 120 1 121 3 123 2 _________[112,126) 0c=1 33c=2 0c=3 125 1 SEEDS210 area, perimeter, compact, kern_length. kern_width, asym_coef, len_kernel_groove)
More slides like this


Slide #38.

6 8 7 2 8 7 9 1 10 6 11 1 12 5 _________[6.15) 13 2 15 5 _________[15,17) 16 6 18 4 19 6 20 2 21 3 _________[17,23) 22 5 24 4 25 3 26 2 27 3 28 4 _________[23,30) 29 4 31 3 32 2 33 8 34 3 35 2 36 2 37 1 _________[30,39) 38 1 40 2 41 1 42 1 44 5 45 1 46 2 47 1 48 3 49 1 _________[39,51) 50 2 52 3 53 2 54 3 56 1 57 2 _________[51,59) 58 1 1c=7 60 1 _________[59,63) 61 2 65 1 67 1 68 2 _________[63,71) 69 1 72 1 73 1 77 1 78 1 _________[71,79) 79 1 Alg-1: on WINE_Quality150 (150 wine samples, 4 feature columns, 0-10 quality levels (only 4-7 occur)) 1c=4 12c=5 8c=6 3c=7 0c=4 2c=5 3c=6 1c=7 1c=4 13c=5 5c=6 1c=7 1c=4 13c=5 6c=6 3c=7 1c=4 15c=5 6c=6 3c=7 1c=4 13c=5 4c=6 3c=7 1c=4 10c=5 1c=6 1c=4 2c=5 8c=6 3c=7 1c=4 3c=5 2c=6 3c=7 1c=4 4c=5 8c=6 3c=7
More slides like this


Slide #39.

4 1 5 2 6 3 7 4 8 9 9 6 10 7 11 8 12 5 13 2 14 3 15 11 16 4 17 9 18 2 19 6 20 5 21 4 22 3 23 6 24 3 25 5 26 3 27 3 28 3 29 3 _________[0.31) 30 1 31 1 32 5 33 2 34 5 35 2 36 1 37 1 38 1 _________[31,41) 39 1 42 1 43 2 44 1 _________[41,51) 46 1 57 1 58 1 60 1 _________[51,91) 88 1 Alg-1: on WN150468 (149 wine samples, 4 features ( highest std/(max-min) ), 4,6,8 quality levels only) 22c=4 85c=6 4c=4 14c=6 1c=4 1c=6 1c=4 3c=6 14c=8 3c=8
More slides like this


Slide #40.

4 1 10 2 11 4 12 8 13 9 14 13 15 6 16 9 17 8 18 9 _________[0.21) 23c=L 26c=H 25 1 26 4 27 2 28 3 29 6 30 1 31 6 32 2 33 2 34 4 35 1 _________[21,37) 13c=L 37c=H 36 1 38 2 39 1 40 1 _________[37,42) 2c=L 3c=H 41 1 44 1 45 1 46 1 47 1 48 2 49 3 50 1 51 2 53 1 54 2 55 1 56 4 57 1 _________[42,60) 10c=L 13c=H 58 2 62 1 63 1 _________[60,72) 3c=L 2c=H 64 3 75 1 76 2 _________[72,80) 2c=L 2c=H 77 1 85 1 86 1 87 2 89 1 90 1 92 1 _________[80,100) 2c=L 6c=H 94 1 106 1 107 1 109 1 _________[100,122) 2c=L 2c=H 121 1 Alg-1: on WN150468 (149 wine samples, 4 features ( highest std/(max-min) ), L=34, H=7+8 2 quality levels only) DPPMinVec-MaxVec: on WN150468 (149 wine samples, 4 features ( highest std/(max-min) ), L=34, H=7+8 2 quality levels only) 0 1 1 15 2 18 3 13 4 12 5 9 6 8 7 11 8 11 9 3 10 4 _________[0. 11) 11 1 12 2 13 5 14 5 15 4 16 7 18 4 _________[11,20) 19 1 22 3 23 1 25 1 26 3 27 2 _________[20,30) 28 2 33 2 34 1 38 1 38c=L 67c=H 13c=L 16c=H 4c=L 8c=H
More slides like this


2019 slides.show. All Rights Reserved