WEKA – with more than two classes • Contact Lenses with Naïve Bayes === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall 0.8 0.053 0.8 0.8 0.25 0.1 0.333 0.25 0.8 0.444 0.75 0.8 === Confusion Matrix === a b c <-- classified as 4 0 1 | a = soft 0 1 3 | b = hard 1 2 12 | c = none F-Measure 0.8 0.286 0.774 Class soft hard none • Class exercise – show how to calculate recall, precision, fmeasure for each class
View full slide show




CCR (Corners of Circumscribing Coordinate Rectangle) (rnd-f, rnd-g, then lin-d) f=MinVecX≡(minXx1..minXxn) g≡MaxVecX≡(maxXx1..maxXxn), d≡(g-f)/|g-f| Sequence thru main diagonal pairs, {f, g} lexicographically. For each, create d. Notes: No calculation required to find f and g (assuming f1=MnVec RnGp>4 none start  g1=MxVec RnGp>4 Sub 0 7 vir18... Clus1 1 47 ver30 Sub 0 53 ver49.. Clus2 0 74 set14 SubClus1 Lin>4 none f2=0001 RnGp>4 none CCR-1. Do SpS((x-f)o(x-f)) round gap analysis CCR-2. Do SpS((x-g)o(x-g)) rnd gap analysis. CCR-3. Do SpS((xod)) linear gap analysis. SubCluster2 g2=1110 RnGp>4 none This ends SubClus2 = 47 setosa only Lin>4 none f1=0000 RnGp>4 none g1=1111 RnGp>4 none Lin>4 none f3=0010 RnGp>4 none f2=0001 RnGp>4 none g2=1110 RnGp>4 none Lin>4 none Lin>4 none f3=0010 RnGp>4 none g3=1101 RnGp>4 none Lin>4 none f4=0011 RnGp>4 none f4=0011 RnGp>4 none g4=1100 RnGp>4 none Lin>4 none f5=0100 RnGp>4 none g5=1011 RnGp>4 none Lin>4 none g3=1101 RnGp>4 none g4=1100 RnGp>4 none Lin>4 none f5=0100 RnGp>4 none g5=1011 RnGp>4 none Lin>4 none f6=0101 RnGp>4 none g6=1010 RnGp>4 none f6=0101 RnGp>4 1 19 set26 0 28 ver49 0 31 set42 0 31 ver8 0 32 set36 0 32 ver44 1 35 ver11 0 41 ver13 ver49 0.0 19.8 3.9 21.3 3.9 7.2 Lin>4 none f7=0110 RnGp>4 none g7=1001 RnGp>4 none Lin>4 none f8=0111 RnGp>4 none g8=1000 RnGp>4 none Lin>4 none f7=0110 RnGp>4 1 28 ver13 0 33 vir49 f8=0111 RnGp>4 none set42 ver8 19.8 3.9 0.0 21.6 21.6 0.0 10.4 23.9 21.8 1.4 23.8 4.6 set36 ver44 ver11 21.3 3.9 7.2 10.4 21.8 23.8 23.9 1.4 4.6 0.0 24.2 27.1 24.2 0.0 3.6 27.1 3.6 0.0 g6=1010 RnGp>4 none g7=1001 RnGp>4 none g8=1000 RnGp>4 none Lin>4 none Lin>4 none Lin>4 none MaxVecX and MinVecX have been calculated and residualized when PTreeSetX was captured.) 3. (and 2.?) may be unproductive in finding new subclusters (either because 1 finds almost all or because 2 and/or 3 find the same ones) and could be skipped (very likely if dimension is high, since the main diagonal corners are typically far from X, in a high dimensional vector space and thus the radii of a round gap is large and large radii rnd gaps are near linear, suggesting a will find all the subclusters that b and c would find. 2. good!, else setosa/versicolor+virginica are not separated! 3. is unproductive, suggesting productive to calculate 1., 2. but having done that, 3. will probably not be productive. Next consider only 3. to see if it is as productive as 1.+2. Subc2.1 ver49 ver8 ver44 ver11 CCR is as good as the combo (projection on d appears to be as accurate as the combination of square length of f and of g). This is probably because the round gaps (centered at the corners) are nearly linear by the time they get to the set X itself. To compare the time costs, we note: Combo (p-x)o(p-x) = pop + xox2xop = pop + k=1..nxk2 + k=1..n(-2pk)xk has n multiplications in the second term, n scalar multiplications and n additions in the third term. For both p=f and p=g, then, it takes 2n multiplications, 2n scalar multiplications and 2n additions. For CCR, xod = k=1..n(dk)xk involves n scalar mults and n additions. It appears to be cheaper (timewise) This ends SubClus1 = 95 ver and vir samples only
View full slide show




CCR(fgd) (Corners of Circumscribing Coordinate Rectangle) f1=minVecX≡(minXx1..minXxn) (0000) g1=MaxVecX≡(MaxXx1..MaxXxn) (1111), d=(g-f)/|g-f| Sequence thru main diagonal pairs, {f, g} lexicographically. For each, create d. f1=MnVec RnGp>4 none start  g1=MxVec RnGp>4 Sub 0 7 vir18... Clus1 1 47 ver30 Sub 0 53 ver49.. Clus2 0 74 set14 SubClus1 Lin>4 none f2=0001 RnGp>4 none CCR(f) Do SpS((x-f)o(x-f)) round gap analysis CCR(g) Do SpS((x-g)o(x-g)) round gap analysis. CCR(d) Do SpS((xod)) linear gap analysis. Notes: No calculation required to find f and g (assuming MaxVecX and minVecX have been calculated and residualized when PTreeSetX was captured.) SubCluster2 g2=1110 RnGp>4 none This ends SubClus2 = 47 setosa only Lin>4 none f1=0000 RnGp>4 none g1=1111 RnGp>4 none Lin>4 none f3=0010 RnGp>4 none f2=0001 RnGp>4 none g2=1110 RnGp>4 none Lin>4 none Lin>4 none f3=0010 RnGp>4 none g3=1101 RnGp>4 none Lin>4 none f4=0011 RnGp>4 none f4=0011 RnGp>4 none g4=1100 RnGp>4 none Lin>4 none f5=0100 RnGp>4 none g5=1011 RnGp>4 none Lin>4 none If the dimension is high, since the main diagonal corners are liekly far from X and thus the large radii make the round gaps nearly linear. g3=1101 RnGp>4 none g4=1100 RnGp>4 none Lin>4 none f5=0100 RnGp>4 none g5=1011 RnGp>4 none Lin>4 none f6=0101 RnGp>4 none g6=1010 RnGp>4 none f6=0101 RnGp>4 1 19 set26 0 28 ver49 0 31 set42 0 31 ver8 0 32 set36 0 32 ver44 1 35 ver11 0 41 ver13 ver49 0.0 19.8 3.9 21.3 3.9 7.2 Lin>4 none f7=0110 RnGp>4 none g7=1001 RnGp>4 none Lin>4 none f8=0111 RnGp>4 none f7=0110 RnGp>4 1 28 ver13 0 33 vir49 f8=0111 RnGp>4 none set42 ver8 19.8 3.9 0.0 21.6 21.6 0.0 10.4 23.9 21.8 1.4 23.8 4.6 g6=1010 RnGp>4 none g7=1001 RnGp>4 none g8=1000 RnGp>4 none g8=1000 RnGp>4 none Lin>4 none set36 ver44 ver11 21.3 3.9 7.2 10.4 21.8 23.8 23.9 1.4 4.6 0.0 24.2 27.1 24.2 0.0 3.6 27.1 3.6 0.0 This ends SubClus1 = 95 ver and vir samples only Lin>4 none Lin>4 none Lin>4 none Subc2.1 ver49 ver8 ver44 ver11
View full slide show




EyeMed Vision Plans Vision Exams Eyeglass Lenses Frames Contact Lenses EyeMed Base Plan EyeMed Buy Up Plan Once every 12 months Once every 12 months Once every 24 months $100 allowance 20% off balance over $130 Once every 12 months Once every 12 months Once every 12 months Once every 12 months $130 allowance 20% off balance over $130 Once every 12 months In-Network Benefit Vision Exam Single Vision Lenses Bifocal Lenses Trifocal & Lenticular Lenses Standard Progressive Lens Conventional Contact Lenses (in lieu of eyeglasses) Disposable Contact Lenses (in lieu of eyeglasses) $5 $15 $15 $15 $65 $0 Copay; $100 allowance; 15% off balance $0 Copay; $100 allowance; plus balance $5 $15 $15 $15 $65 $0 Copay; $130 allowance; 15% off balance $0 Copay; $130 allowance; plus balance Monthly Premium Employee Only Coverage Family Coverage $3.91 $10.02 $4.83 $12.36 For customer service, visit www.eyemed.com or call 866-804-0982. ICUBA is on the INSIGHT Network! * Please refer to EyeMed Plan Summary for more detailed information and out of network options EyeMed makes it easy for you to buy glasses and contact lenses online if you are not able to visit an optical dispensary. Visit www.ContactsDirect.com or www.Glasses.com to use your in-network benefits on a large variety of glasses and contact lenses. You can even virtually “try on’ your frames and have contact lenses delivered to your door!
View full slide show




Comparison with other methods Recently, Tjong and Zhou (2007) developed a neural network method for predicting DNA-binding sites. In their method, for each surface residue, the PSSM and solvent accessibilities of the residue and its 14 neighbors were used as input to a neural network in the form of vectors. In their publication, Tjong and Zhou showed that their method achieved better performance than other previously published methods. In the current study, the 13 test proteins were obtained from the study of Tjong and Zhou. Thus, we can compare the method proposed in the current study with Tjong and Zhou’s neural network method using the 13 proteins. Figure 1. Tradeoff between coverage and accuracy In their publication, Tjong and Zhou also used coverage and accuracy to evaluate the predictions. However, they defined accuracy using a loosened criterion of “true positive” such that if a predicted interface residue is within four nearest neighbors of an actual interface residue, then it is counted as a true positive. Here, in the comparison of the two methods, the strict definition of true positive is used, i.e., a predicted interface residue is counted as true positive only when it is a true interface residue. The original data were obtained from table 1 of Tjong and Zhou (2007), the accuracy for the neural network method was recalculated using this strict definition (Table 3). The coverage of the neural network was directly taken from Tjong and Zhou (2007). For each protein, Tjong and Zhou’s method reported one coverage and one accuracy. In contrast, the method proposed this study allows the users to tradeoff between coverage and accuracy based on their actual need. For the purpose of comparison, for each test protein, topranking patches are included into the set of predicted interface residues one by one in the decreasing order of ranks until coverage is the same as or higher than the coverage that the neural network method achieved on that protein. Then the coverage and accuracy of the two methods are compared. On a test protein, method A is better than B, if accuracy(A)>accuracy(B) and coverage (A)≥coverage(B). Table 3 shows that the graph kernel method proposed in this study achieves better results than the neural network method on 7 proteins (in bold font in table 3). On 4 proteins (shown in gray shading in table 3), the neural network method is better than the graph kernel method. On the remaining 2 proteins (in italic font in table 3), conclusions can be drawn because the two conditions, accuracy(A)>accuracy(B) and coverage (A)≥coverage(B), never become true at the same time, i.e., when coverage (graph kernel)>coverage(neural network), we have accuracy(graph kernel)accuracy(neural network). Note that the coverage of the graph kernel method increases in a discontinuous fashion as we use more patches to predict DNA-binding sites. One these two proteins, we were not able to reach at a point where the two methods have identical coverage. Given these situations, we consider that the two methods tie on these 2 proteins. Thus, these comparisons show that the graph kernel method can achieves better results than the neural network on 7 of the 13 proteins (shown in bold font in Table 3). Additionally, on another 4 proteins (shown in Italic font in Table 3), the graph kernel method ties with the neural network method. When averaged over the 13 proteins, the coverage and accuracy for the graph kernel method are 59% and 64%. It is worth to point out that, in the current study, the predictions are made using the protein structures that are unbound with DNA. In contrast, the data we obtained from Tjong and Zhou’s study were obtained using proteins structures bound with DNA. In their study, Tjong and Zhou showed that when unbound structures were used, the average coverage decreased by 6.3% and average accuracy by 4.7% for the 14 proteins (but the data for each protein was not shown).
View full slide show




WEKA • • Part of the results provided by WEKA (that we’ve ignored so far) Let’s look at an example (Naïve Bayes on my-weather-nominal) === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall 0.667 0.125 0.8 0.667 0.875 0.333 0.778 0.875 === Confusion Matrix === a b <-- classified as 4 2 | a = yes 1 7 | b = no F-Measure 0.727 0.824 Class yes no • TP rate and recall are the same = TP / (TP + FN) – For Yes = 4 / (4 + 2); For No = 7 / (7 + 1) • FP rate = FP / (FP + TN) – For Yes = 1 / (1 + 7); For No = 2 / (2 + 4) • Precision = TP / (TP + FP) – For yes = 4 / (4 + 1); For No = 7 / (7 + 2) • F-measure = 2TP / (2TP + FP + FN) – For Yes = 2*4 / (2*4 + 1 + 2) = 8 / 11 – For No = 2 * 7 / (2*7 + 2 + 1) = 14/17
View full slide show




Experiment 4 - Results 9 Classes Method Naïve Bayes Classification Tree Logistic Regression SVM AUC 0.818 0.820 0.790 0.912 CA 0.958 0.977 0.981 0.984 F1 0.552 0.684 0.700 0.800 Precision 0.471 0.725 0.875 0.769 Recall 0.667 0.650 0.583 0.833 Similar to results from Experiment 2 8 Classes - Combine Useful and Ubuntu class into 1 (see next slide for reasoning) Method Naïve Bayes Classification Tree Logistic Regression SVM AUC 0.986 0.925 0.933 0.971 CA 0.974 0.987 0.987 0.984 F1 0.852 0.907 0.909 0.898 Precision 0.742 0.971 0.952 0.846 Recall 1.000 0.852 0.870 0.957 Better Back to binary classification scheme with 2 Classes, but now with Ubuntu and Useful combined Method AUC CA F1 Precision Recall Naïve Bayes Classification Tree Logistic Regression SVM Best 0.980 0.941 0.975 0.970 0.965 0.982 0.994 0.984 0.797 0.877 0.954 0.896 0.663 0.861 0.955 0.843 1.000 0.894 0.954 0.955
View full slide show




d3 Sub 0 Clus1 1 Sub 0 Clus2 0 10 19 30 69 set23...50set+vir39 set25 ver49...50ver_49vir vir19 d5 (f5=vir23, g5=set14) none f5 none g5 none d5 (f5=vir32, g5=set14) none f5 none g5 none d5 (f5=vir6, g5=set14) none f5 none g5 none (d1+d3)/sqr(2) clus1 none (d1+d3)/sqr(2) clus2: ver49 ver8 ver44 ver11 0 57.3 ver49 0.0 3.9 3.9 7.2 0 58.0 ver8 3.9 0.0 1.4 4.6 0 58.7 ver44 3.9 1.4 0.0 3.6 1 60.1 ver11 7.2 4.6 3.6 0.0 0 64.3 ver10 none (d3+d4)/sqr(2) clus1 none (d3+d4)/sqr(2) clus2 none (d1+d3+d4)/sqr(3) clus1 1 44.5 set19 0 55.4 vir39 (d1+d3+d4)/sqr(3) clus2 none (d1+d2+d3+d4)/sqr(4) clus1 (d1+d2+d3+d4)/sqr(4) clus2 none d5 (f5=vir19, g5=set14) none f5 1 0.0 vir19 clus2 0 4.1 vir23 g5 none d5 (f5=vir18, g5=set14) none f5 1 0.0 vir18 clus2 1 4.1 vir32 0 8.2 vir6 g5 none
View full slide show




• Using guard digits to avoid excessive error. For example, in a 10-digit calculator, 1/3 is represented as 0.333 333 333 3, multiplying 3 results in 0.999 999 999 9, but not 1. However, in a calculator with 2 guard bits, 1/3 is represented as 0.333 333 333 333, but still displayed as 0.333 333 333 3, multiplying 3 results in 1. 24
View full slide show




Model Structure Tree Structure Specified for the Nested Logit Model Sample proportions are marginal, not conditional. Choices marked with * are excluded for the IIA test. ----------------+----------------+----------------+----------------+------+--Trunk (prop.)|Limb (prop.)|Branch (prop.)|Choice (prop.)|Weight|IIA ----------------+----------------+----------------+----------------+------+--Trunk{1} 1.00000|TRAVEL 1.00000|PRIVATE .55714|AIR .27619| 1.000| | | |CAR .28095| 1.000| | |PUBLIC .44286|TRAIN .30000| 1.000| | | |BUS .14286| 1.000| ----------------+----------------+----------------+----------------+------+--+---------------------------------------------------------------+ | Model Specification: Table entry is the attribute that | | multiplies the indicated parameter. | +--------+------+-----------------------------------------------+ | Choice |******| Parameter | | |Row 1| GC TTME INVT INVC A_AIR | | |Row 2| AIR_HIN1 A_TRAIN TRA_HIN3 A_BUS BUS_HIN4 | +--------+------+-----------------------------------------------+ |AIR | 1| GC TTME INVT INVC Constant | | | 2| HINC none none none none | |CAR | 1| GC TTME INVT INVC none | | | 2| none none none none none | |TRAIN | 1| GC TTME INVT INVC none | | | 2| none Constant HINC none none | |BUS | 1| GC TTME INVT INVC none | | | 2| none none none Constant HINC | +---------------------------------------------------------------+
View full slide show




Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MOR AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words Wordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words wordsWordswords words MOREWORDS WORDSWordswords words MOREWORDS WORDS MORE AND MORE WORDS WORDS words worsds wordswordswords WORDS WORDS more words long words hosrt words words
View full slide show