NeMoFinder adapts SPIN [27] to extract frequent trees and expands them into non-isomorphic graphs.[8] NeMoFinder utilizes frequent size-n trees to partition the input network into a collection of size-n graphs, afterward finding frequent size-n sub-graphs by expansion of frequent trees edge-by-edge until getting a complete size-n graph Kn. The algorithm finds NMs in undirected networks and is not limited to extracting only induced sub-graphs. Furthermore, NeMoFinder is an exact enumeration algorithm and is not based on a sampling method. As Chen et al. claim, NeMoFinder is applicable for detecting relatively large NMs, for instance, finding NMs up to size-12 from the whole S. cerevisiae (yeast) PPI network as the authors claimed.[28] NeMoFinder consists of three main steps. First, finding frequent size-n trees, then utilizing repeated size-n trees to divide the entire network into a collection of size-n graphs, finally, performing sub-graph join operations to find frequent size-n sub-graphs.[26] In the first step, the algorithm detects all non-isomorphic size-n trees and mappings from a tree to the network. In the second step, the ranges of these mappings are employed to partition the network into size-n graphs. Up to this step, there is no distinction between NeMoFinder and an exact enumeration method. However, a large portion of non-isomorphic size-n graphs still remain. NeMoFinder exploits a heuristic to enumerate non-tree size-n graphs by the obtained information from preceding steps. The main advantage is in third step, which generates candidate sub-graphs from previously enumerated sub-graphs. This generation of new size-n sub-graphs is done by joining each previous sub-graph with derivative sub-graphs from itself called cousin sub-graphs. These new sub-graphs contain one additional edge in comparison to the previous sub-graphs. However, there exist some problems in generating new sub-graphs: There is no clear method to derive cousins from a graph, joining a sub-graph with its cousins leads to redundancy in generating particular sub-graph more than once, and cousin determination is done by a canonical representation of the adjacency matrix which is not closed under join operation. NeMoFinder is an efficient network motif finding algorithm for motifs up to size 12 only for protein-protein interaction networks, which are presented as undirected graphs. And it is not able to work on directed networks which are so important in the field of complex and biological networks. The pseudocode of NeMoFinder is shown here: NeMoFinder Input: G - PPI network; N - Number of randomized networks; K - Maximal network motif size; F - Frequency threshold; S - Uniqueness threshold; Output: U - Repeated and unique network motif set; D ← ∅; for motif-size k from 3 to K do T ← FindRepeatedTrees(k); GDk ← GraphPartition(G, T) D ← D ∪ T; D′ ← T; i ← k; while D″ = ∅ and i ≤ k × (k - 1) / 2 do D′ ← FindRepeatedGraphs(k, i, D′); D ← D ∪ D′; i ← i + 1; end while end for for counter i from 1 to N do Grand ← RandomizedNetworkGeneration(); for each g ∈ D do GetRandFrequency(g, Grand); end for end for U ← ∅; for each g ∈ D do s ← GetUniqunessValue(g); if s ≥ S then U ← U ∪ {g}; end if end for return U Grochow and Kellis [29] proposed an exact alg for enumerating sub-graph appearances, which is based on a motif-centric approach, which means that the frequency of a given sub-graph,called the query graph, is exhaustively determined by searching for all possible mappings from the query graph into the larger network. It is claimed [29] that a motif-centric method in comparison to network-centric methods has some beneficial features. First of all it avoids the increased complexity of sub-graph enumeration. Also, by using mapping instead of enumerating, it enables an improvement in the isomorphism test. To improve the performance of the alg, since it is an inefficient exact enumeration alg, the authors introduced a fast method which is called symmetry-breaking conditions. During straightforward sub-graph isomorphism tests, a sub-graph may be mapped to the same sub-graph of the query graph multiple times. In Grochow-Kellis alg symmetrybreaking is used to avoid such multiple mappings. GK alg and symmetry-breaking condition which eliminates redundant isomorphism tests. (a) graph G, (b) illustration of all automorphisms of G that is showed in (a). From set AutG we can obtain a set of symmetrybreaking conditions of G given by SymG in (c). Only the first mapping in AutG satisfies the SynG conditions; so, by applying SymG in Isomorphism Extension module alg only enumerate each match-able sub-graph to G once. Note that SynG is not a unique set for an arbitrary graph G. The GK alg discovers the whole set of mappings of a given query graph to the network in two major steps. It starts with the computation of symmetry-breaking conditions of the query graph. Next, by means of a branch-and-bound method, alg tries to find every possible mapping from the query graph to the network that meets the associated symmetry-breaking conditions. Computing symmetry-breaking conditions requires finding all automorphisms of a given query graph. Even though, there is no efficient (or polynomial time) algorithm for the graph automorphism problem, this problem can be tackled efficiently in practice by McKay’s tools.[24][25] As it is claimed, using symmetry-breaking conditions in NM detection lead to save a great deal of running time. Moreover, it can be inferred from the results in [29][30] that using (a) graph G, (b) illustration of all automorphisms of G that is showed in (a). From set AutG we can obtain a set the symmetry-breaking conditions results in high efficiency particularly for directed networks in comparison to undirected of symmetry-breaking conditions of G given by SymG networks. The symmetry-breaking conditions used in the GK algorithm are similar to the restriction which ESU algorithm in (c). Only the first mapping in AutG satisfies the applies to the labels in EXT and SUB sets. In conclusion, the GK algorithm computes the exact number of appearance of a SynG conditions; as a result, by applying SymG in the given query graph in a large complex network and exploiting symmetry-breaking conditions improves the algorithm Isomorphism Extension module the algorithm only performance. Also, GK alg is 1 of the known algorithms having no limitation for motif size in implementation and potentially enumerate each match-able sub-graph in network to G once. SynG is not a unique set for an arbitrary graph G. it can find motifs of any size.

MOTIFs: Cliques, k-plexes, k-cores and other communities are subgraphs defined in terms of the count of their edges (an internal count). Subgraph is a Motif if its isomorphic occurrences in the graph (an external count) is higher than “expected”. Wikipedia: motifs are defined as recurrent and statistically significant sub-graphs or patterns. Network motifs are sub-graphs that repeat themselves in a specific network or even among various networks. Each of these sub-graphs, defined by a particular edge pattern between vertices, may mean that a particular function is achieved efficiently. Indeed, motifs are of importance because they may reflect functional properties Motif detection is computationally challenging. Most algorithms for Motif discovery are used to find induced Motifs (induced sub-graphs). Graph G′ is a sub-graph of G (G′⊆G) if V′⊆V and E′⊆E∩(V′×V′). If G′⊆G and G′ contains all of the edges ‹u,v›∈E with u,v ∈V′, then G′ is an induced sub-graph of G. G′ and G are isomorphic (G′↔G), if there exists a bijection (one-to-one) f:V′→V with ‹u,v› ∈E′ ⇔‹f(u), f(v)› ∈E u,v∈V′. When G″⊂G and an isomorphism between sub-graph G″ and a graph G′, this mapping represents an appearance of G′ in G. The number of appearances of graph G′ in G is called the frequency FG of G′ in G. G is recurrent or frequent in G, when its frequency FG(G′) is above a predefined threshold or cut-off value. We used terms pattern and frequent sub-graph in this review interchangeably. motif discovery methods can be classified as exact counting, sampling, pattern growth methods and so on. Motif discovery has 2 steps: calc # of occurrences then, evaluating significance. Mfinder implements full enumeration and sampling. Until 2004, the only exact counting method for network motif detection was brute-force proposed by Milo et al.[3] It was successful at discovering small motifs, but for finding even size 5 or 6 motifs it was not computationally feasible. Hence, a new approach to this problem was needed. Kashtan et al. [9] sampling NM algorithm, based on edge sampling throughout the network, estimates concentrations of induced sub-graphs and can be utilized in directed or undirected networks. The sampling procedure starts from an arbitrary edge that leads to a sub-graph of size two, and then expands the sub-graph by choosing a random edge that is incident to the current sub-graph. Then, it continues choosing random neighboring edges until a sub-graph of size n is obtained. Finally, the sampled sub-graph is expanded to include all of the edges that exist in the network between these n nodes. In sampling, taking unbiased samples is important. Kashtan et al. proposed a weighting scheme that assigns different weights to the different sub-graphs.[9], which exploits the info of the sampling probability for each sub-graph, i.e. the probable sub-graphs will obtain comparatively less weights in comparison to the improbable sub-graphs; hence, the algorithm must calculate the sampling probability of each sub-graph that has been sampled. This weighting technique assists mfinder to determine sub-graph concentrations impartially. If expanded to include sharp contrast to exhaustive search, the computational time of the algorithm surprisingly is asymptotically independent of the network size. An analysis of the computational time has shown that it takes O(nn) for each sample of a subgraph of size n from the network. On the other hand, there is no analysis in [9] on the classification time of sampled sub-graphs that requires solving the graph isomorphism problem for each sub-graph sample. Additionally, an extra computational effort is imposed on the algorithm by the sub-graph weight calculation. But it is unavoidable to say that theconclusion, algorithm may sample the same sub-graph multiple timessearch, – spending gathering any information.[10] In by sampling, mfinder is faster than exhaustive but ittime onlywithout determines sub-graphs concentrations approximately. This algorithm can find motifs up to size 6 because of its main implementation, and as result it gives the most significant motif. Also, it is necessary to mention that this tool has no option of visual presentation. mfinder: Es=set of picked edges. Vs= set of all nodes that are touched by the edges in E. Init Vs and Es be empty sets. 1. Pick a random edge e1 = (vi, vj). Update Es = {e1}, Vs = {vi, vj} 2. Make a list L of all neighbor edges of Es. Omit from L all edges between members of Vs. 3. Pick a random edge e = {vk,vl} from L. Update Es = Es ⋃ {e}, Vs = Vs ⋃ {vk, vl}. 4. Repeat steps 2-3 until completing an n-node subgraph (until |Vs| = n). 5. Calculate the probability to sample the picked n-node subgraph. (M1 – M4) are different occurrences of sub-graph (b) in graph (a). For frequency concept F1, M1, M2, M3, M4 represent all matches, F1 = 4. For F2, one of the two set M1, M4 or M2, M3 are possible matches, F2 = 2. For F3, merely one of the matches (M1 to M4) is allowed, therefore F3 = 1. Frequency decreases as the usage of network elements are restricted.

Schreiber, Schwöbbermeyer [12] proposed flexible pattern finder (FPF) in a system Mavisto.[23] It exploits downward closure , applicable for frequency concepts F2 and F3. The downward closure property asserts that the frequency for sub-graphs decrease monotonically by increasing the size of sub-graphs; but it does not hold necessarily for frequency concept F1. FPF is based on a pattern tree (see figure) consisting of nodes that represents different graphs (or patterns), where the parent is a sub-graph of its children nodes; i.e., corresp. graph of each pattern tree’s node is expanded by adding a new edge of its parent node. At first, FPF enumerates and maintains info of all matches of a sub-graph at the root of the pattern tree. Then builds child nodes of previous node by adding 1 edge supported by a matching edge in target graph, tries to expand all of previous info about matches to the new sub-graph (child node).[In next step, it decides whether the frequency of the current pattern is lower than a predefined threshold or not. If it is lower and if downward closure holds, FPF can abandon that path and not traverse further in this part of the tree; as a result, unnecessary computation is avoided. This procedure is continued until there is no remaining path to traverse. It does not consider infrequent sub-graphs and tries to finish the enumeration process as soon as possible; therefore, it only spends time for promising nodes in the pattern tree and discards all other nodes. As an added bonus, the pattern tree notion permits FPF to be implemented and executed in a parallel manner since it is possible to traverse each path of the pattern tree independently. But, FPF is most useful for frequency concepts F2 and F3, because downward closure is not applicable to F1. Still the pattern tree is still practical for F1 if the algorithm runs in parallel. It has no limitation on motif size, which makes it more amenable to improvements. ESU (FANMOD) Sampling bias of Kashtan et al. [9] provided great impetus for designing better algs for NM discovery, Even after weighting scheme, this method imposed an undesired overhead on the running time as well a more complicated impl. It supports visual options and is time efficient. But it doesn’t allow searching for motifs of size 9. Wernicke [10] RAND-ESU is better than jfinder, based on the exact enumeration algorithm ESU, has been implemented as an app called FANMOD.[10] Rand-esu is a discovery alg applicable for both directed and undirected networks. It effectively exploits an unbiased node sampling, and prevents overcounting sub-graphs. RAND-ESU uses DIRECT for determining sub-graph significance instead of an ensemble of random networks as a Null-model. DIRECT estimates sub-graph # w/oexplicitly generating random networks.[10] Empirically, DIRECT is more efficient than random network ensemble for sub-graphs with a very low concentration. But classical Null-model is faster than DIRECT for highly concentrated sub-graphs.[3][10] ESU alg: We show how this exact algorithm can be modified efficiently to RAND-ESU that estimates sub-graphs concentrations. The algorithms ESU and RAND-ESU are fairly simple, and hence easy to implement. ESU first finds the set of all induced sub-graphs of size k, let Sk be this set. ESU can be implemented as a recursive function; the running of this function can be displayed as a tree-like structure of depth k, called the ESU-Tree (see figure). Each of the ESU-Tree nodes indicate the status of the recursive function that entails two consecutive sets SUB and EXT. SUB refers to nodes in the target network that are adjacent and establish a partial sub-graph of size |SUB|≤k. If |SUB|=k, alg has found induced complete sub-graph, Sk=SUB ∪Sk. If |SUB|v} graphs of size 3 in the target graph. call ExtendSubgraph({v}, VExtension, v) endfor Leaves: set S3 or all of size-3 induced sub-graphs of the target graph (a). ESUtree nodes incl 2 adjoining sets: adjacent ExtendSubgraph(VSubgraph, VExtension, v) nodes called SUB and EXT=all adjacent if |VSubG|=k output G[VSubG] return 1 SUB node and where their numerical While VExt≠∅ do Remove arbitrary vertex w from VExt labels > SUB nodes labels. EXT set is VExtension′←VExtension∪{u∈Nexcl(w,VSubgraph)|u>v} utilized by the alg to expand a SUB set call ExtendSubgraph(VSubgraph ∪ {w}, VExtension′, v) until it reaches a desired size placed at return lowest level of ESU-Tree (or its leaves).

Graph kernel: Kernel methods are a popular method with broad applications in data mining. In a simple way, a kernel function can be considered as a positive definite matrix that measures the similarities between each pair of input data. It the currently study, a graph kernel method, namely shortest-path kernel, developed by Borgwart and Kriegel, is used to compute the similarities between graphs. The first step of the shortest-path kernel is to transform original graphs into shortest-path graphs. A shortest-path graph has the same nodes as its original graph, and between each pair of nodes, there is an edge labeled with the shortest distance between the two nodes in the original graph. In the current study, the edge label will be referred to as the weight of the edge. This transformation can be done using any algorithm that solves the all-pairs-shortest-paths problem. In the current study, the Floyd-Warshall algorithm was used. Let G1 and G2 be two original graphs. They are transformed into shortest-path graphs S1(V1, E1) and S2(V2, E2), where V1 and V2 are the sets of nodes in S1 and S2, and E1 and E2 are the sets of edges in S1 and S2. Then a kernel function is used to calculate the similarity between G1 and G2 by comparing all pairs of edges between S1 and S2. where, kedge( ) is a kernel function for comparing two edges (including the node labels and the edge weight). Let e1 be the edge between nodes v1 and w1, and e2 be the edge between nodes v2 and w2. Then, where, knode( ) is a kernel function for comparing the labels of two nodes, and kweight( ) is a kernel function for comparing the weights of two edges. These two functions are defined as in Borgward et al.(2005): where, labels(v) returns the vector of attributes associated with node v. Note that Knode() is a Gaussian kernel function. was set to 72 by trying different values between 32 and 128 with increments of 2. where, weight(e) returns the weight of edge e. Kweight( ) is a Brownian bridge kernel that assigns the highest value to the edges that are identical in length. Constant c was set to 2 as in Borgward et al.(2005). Classification and cross-validation When the shortest-path graph kernel is used to compute similarities between graphs, the results are affected by the sizes of the graphs. Consider the case that graph G is compared with graphs Gx and Gy separately using the graph kernel: If Gx has more nodes than Gy does, then |Ex|>|Ey|, where Ex and Ey are the sets of edges in the shortest-path graphs of Gx and Gy. Therefore, the summation (i.e., SS( ) ) in K(G, Gx ) includes more items than the summation in K(G, Gy) does. Each item (i.e., kedge( )) inside the summation has a non-negative value. The consequence is that if K(G, Gx)>K(G,Gy) it may not necessary indicate that Gx is more similar to G than Gy is, in stead, it could be an artifact of the fact that Gx has more nodes than Gy. To overcome this problem, a voting strategy is developed for predicting whether a graph (or a patch) is an interface patch: Algoritm Voting_Stategy (G) Input: graph G Output: G is an interface patch or non-interface patch Let T be the set of proteins in the training setLet v be the number of votes given to “G is an interface patch” v=0 While (T is not empty) { Take one protein (P) out of T Let Gint and Gnon-int be the interface and non-interface patches from P. If K(G, Gint)>K(G,Gnon-int), then increase v by 1 } If , then G is an interface patch Else G is a non-interface patch Using this strategy, when K(G, Gint) is compared with K(G, Gnon-int), Gint and Gnon-int are guaranteed to have identical number of nodes, since they are the interface and non-interface patches extracted from the same protein (see section 2.4 for details). Each time K(G, Gint)>K(G, Gnon-int) is true, one vote is given to “G is an interface patch”. In the end G is predicted to be an interface patch if “G is an interface patch” gets more than half of the total votes, i.e.,. Leave-one-out cross-validation was performed at protein level. In one round of the experiment, the interface patch and non-interface patch of a

Introduction In this project, the Co-PI will develop graph models for protein representation. The graph models will maintain crucial structural information and encode the spatial distribution of multiple features on the proteins. Then, the Co-PI will develop new graph kernel methods that can fully exploit the rich information contained in the graphs for binding site prediction. Justification and Feasibility Stable interactions between macromolecules require the binding sites to possess favorable conditions in multiple aspects, including geometric complementarities, evolutionary conservation, hydrophobic force, electrostatic force and other physical and chemical forces. Thus, a method must assess features covering these aspects of the proteins in order to identify the binding sites. The graph models proposed in this work incorporate a wide range of features covering these aspects, and the proposed graph kernels combined with machine-learning methods are capable of discover complex patterns in the proposed graph models. Thus, we believe that the proposed methods can achieve success in binding site prediction and analysis. To test the feasibility of the proposed approach, we conducted a preliminary study on using graph kernel method to predict DNA-binding sites. We used a dataset of 171 DNA-binding proteins collected in our previous study19. We divided the protein surface into overlapping patches, such that each patch included a surface residue and its neighboring residues. Each patch was assigned to either positive class or negative class depending on whether the center residue was in the binding sites. Then, each patch was represented as a graph, such that each amino acid residue was represented using a vertex and an edge was added between two vertices if the corresponding residues were within a distance of 3.5 Å. Each vertex was then labeled with six features of the corresponding amino acid residue, including residue identity, sequence conservation score, structural conservation score, solvent accessibility, electrostatic potential and surface curvature. We used a shortest-path graph kernel to calculate the similarity between graphs. For more details about the shortest-path graph kernels, please see section C.3.ii.d. Briefly, a shortestpath graph kernel method compares all-pairs shortest paths between graphs. The comparison of two paths includes the comparison of path length and source and destination vertices. A Gaussian function was used to compare vertices based on the vertex labels. A Brownian kernel was used to compare the path length. The graph kernel was embedded into a support vector machine (SVM) to build a predictor for DNAbinding site prediction. When evaluated using leave-one-out cross-validation, the predictor achieved 89% accuracy, 90% specificity and 88% sensitivity. We also evaluated how each of the six features affected the prediction performance. When the feature number increased from one to six, the accuracy gradually increased from 86% to 89%. To further evaluate the method, we tested it using an independent set of 13 proteins used in a previous study21 whose apo-state structure (i.e. unbounded with DNA) and holo-state structure (i.e. bounded with DNA) were available. We used the predictor to predict DNA-binding sites on the apo-state structures and the predictions were compared against the actual binding sites gleaned from the holo-state structures. For each test protein, we ranked the surface patches by the prediction score from high to low. Remarkably, the top 1 patch in all of the 13 proteins belonged to the actual DNA-binding sites. This preliminary study shows that the graph kernel approach is able to discover predictive patterns on DNA-binding sites. The results are very encouraging. The top 1 prediction patch accurately indicates the location of the DNA-binding sites in all independent test proteins. This level of success would allow the method to make significant contribution in real applications.

E1 0&1= 0 01 1 1 1 E1 SubGraph Path pTrees E2 13 0 0 0 1 E3 14 0 1 1 0 13 0 0 0 1 1 0 1 1 134 1 1 1 01 0 1 0 C1 0 0 1 1 134 1 0 0 0 1 0 1 1 142 0 10 0 1 0 1 0 E2 0 0 0 1 14 0 0 1 0 142 0 0 0 0 24 1 0 1 0 143 1 10 0 1 0 1 0 241 0 0 1 0 143 1 0 0 0 243 1 0 0 0 E3 1 10 0 1 0 1 1 C3 1 0 0 1 31 0 0 0 1 31 0 0 0 1 34 1 1 0 0 314 0 0 1 0 3411 0 0 0 1 1 1 0 1 0 1 1 314 0 10 1 1 1 1 0 E4 1 10 1 1 1 1 0 1 0 1 1 41 0 0 1 0 34 1 0 0 0 341 0 0 1 0 342 0 10 0 1 0 1 0 342 0 0 0 0 C4 1 0 1 0 1 0 1 1 413 0 10 0 1 0 1 1 41 0 0 1 0 42 0 0 0 0 1 0 1 1 42 0 0 0 0 413 0 0 0 1 43 1 0 0 0 1 0 1 1 431 0 10 0 1 0 1 1 43 1 0 0 0 1 2 3 4 G1 C in orange 1 PC= 01 431 0 0 0 1 1 To get the C Path pTree, remove all C’ pTrees. & each G pTree with P C. Kill the 2nd bit (or keep vertex2 having no incident edges (then all pTrees are the same depth and can operate on each other. Diameter of C? Cdiamk is the max of the min path lengths from k to the other Cvertices. For each k, proceed down from C k a level at a time and record the first occurrence of kh , hk. DiamC = maxkV(Diamk) = 1 CDiam1=max{fo13 fo14}=max{11}=1 Diam3=max{fo31 fo34}=max{11}=1 Diam4=max{fo41 fo43}=max{11}=1 Always use pop-count for 1-counts as we AND, then C is a clique iff all C level-1 counts are |V C|-1. In fact one can mine out all cliques by just analyzing the G level=1 counts. Note: If one creates the G Path pTree, lots of tasks become easy! E.g., clique mining, shortest path mining, degree community mining, density community mining! What else? A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. k-plex existence alg (using the GPpT): C is a k-plex iff vC|Cv| |VC|2–k2 k-plex inheritance thm: Every induced subgraph of a k-plex is a k-plex. Mine all max k-plexes: Use |Cv| vC . k-core inheritance thm: If a cover of G by induced k-cores, G is a k-core. Mine all max k-cores: Use |Cv| vC k-core existence alg (using the GPpT): C is a k-core iff vC, |V | k C Clique=community s.t. edge between each vertex pair. Recommenders: # edges = 1MB (1015) Community=subgraph w more edges inside than linked to its outside. Gene-Gene Ints: # edges = 1B (10 9) Person-Tweet Security: # edges = 7B*10K= 10 14 Friends Social: # edges = 4BB (1018) Stock-Price: # edges = 1013 An Induced SubGraph (ISG) C, is a subgraph that inherits all of G’s edges on its own vertices. A k-ISG (k vertices), C, is a k-clique iff all of its (k-1)-Sub-ISGs are (k-1)-cliques. 1 3:2 As a Rolodex card C Ekey 1,3 1,4 2,4 3,4 2:3 12 2 3 1 V2 V1 4:3 1:2 2:3 3:2 4:3 E=Adj matrix 1:2 2:3 3:2 4:3 V1 1 | 1 | 2 | 3 | PVL,1 1 1 1 1 Bit offset 1 2 V (vertex tbl) 3 Vkey VL 4 1 2 5 6 2 3 7 3 2 8 4 3 9 10 11 PVL,0 PC 12 13 0 1 0 14 1 1 15 0 1 16 1 Ekey 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 PE 0 0 1 1_ 0 0 0 1_ 1 0 0 1_ 1 1 1 0 PU 0 0 1 1_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 EL 0 0 1 2_ 0 0 0 3_ 0 0 0 1_ 2 3 1 0 PEL.,1 0 0 0 1_ 0 0 0 1_ 0 0 0 1_ 1 1 0 0 PEL.,0 0 0 1 0_ 0 0 0 1_ 1 0 0 0_ 0 1 1 0 P1 1 1 1 1_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 P2 0 0 0 0_ 1 1 1 1_ 0 0 0 0_ 0 0 0 0 P3 0 0 0 0_ 0 0 0 0_ 1 1 1 1_ 0 0 0 0 P4 0 0 0 0_ 0 0 0 0_ 0 0 0 0_ 1 1 1 1 PEC=PE&PC 0 0 1 1_ 0 0 0 0_ 1 0 0 1_ 1 0 1 0 PUC=PU&PC 0 0 1 1_ 0 0 0 0_ 0 0 0 1_ 0 0 0 0 (C=Induced SubGraph with VC={1,3,4}) Clique Existence Alg is induced SG a clique. Edge Count existence thm (EC): |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) 1 2 V2 ELabel 3 1 4 2 4 3 4 1 3 Apply EC 3vertex ISGs (3-Clique iff |PU|= 3!/(2!1!)=3) 1 VC={1,3,4} VD={1,2,3} VF={1,2,4} PUC 0 0 1 1_ 0 0 0 0_ 0 0 0 1_ 0 0 0 0 Ct=3 VH={2,3,4} PUD 0 0 1 0_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 Ct=1 PUF 0 0 0 1_ 0 0 0 1_ 0 0 0 0_ 0 0 0 0 Ct=2 PUH 0 0 0 0_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 Ct=2 C only 3-Clique. SubGraph existence theorem (SG): (VC,EC) is a k-clique iff every induced k-1 subgraph, (V D,ED) is a (k-1)-clique. SG or EC better? Extend to quasi-cliques? Extend to mine out all cliques? A Clique Mining algorithm finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet By the SG clique thm, CCSk+1= all s of CSk pairs having k-1 common vertices. Let CCCSk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be the kth vertices (different) of the two k-cliques, then CCSk+1 iff (PE)(v,w)=1. (We just need to check a single bit in P E.) Form CCSk+1: Union CSk pairs sharing k-1 vertices, check single PE bit. Below, k=2, so we check edge pairs sharing 1 vertex, then check the 1 new edge bit in P E. CS2=E={13 14 24 34} PE(3,4) = PE(4*[3-1]+4=12)=1 134CS3 Already have 134 PE(2,3)=PE(4*[2-1]+3=7)=0 PE(1,2) = PE(4*[1-1]+2=2)=0 The only expensive part of this is forming CCSk. And that is expensive only for CCS3 (as in Apriori ARM) Next? List out CS3 = {134} form CCS4 = . Done. Internal degree of C, kCint =vC kvint 2=|PC&PE&Pv1|=kv1int Int/Ext degree of v∈C, kv =# edges v to wC/C’ 0=|P’C&PE&Pv1|=kv1ext Intra-cluster density δint(C)=|edges(C,C)|/(nc(nc−1)/2)=|PE&PC&PLT|/(3*2/2)=3/3=1 int/wxt int External degree of C, kCext =vC kvext 2=|PC&PE&Pv3| =kv3int 2=|PC&PE&Pv4|=kv4int 6=kC ext kC=7 Total degree of C, kC= kCint +kCext 0=|P’C&PE&Pv3|=kv3ext 1=|P’C&PE&Pv4|=kv4ext 1=kC δ Cδ C=1–1/3=2/3 int ext Inter-cluster density δext(C)=|edges(C,C’)| / (nc(n-nc)) =|PE&P’C&PLT|=1/(3*1)=1/3 Tradeoff between large δint(C) and small δext(C) is goal of many community mining algorithms. A simple approach is to Maximize differences. Density Difference algorithm for Communities: δint(C)−δext(C) >Threshold? Degree Difference algorithm: kCint – kCext > Threshold? Easy to compute w pTrees, even for Big Graphs. Graphs are ubiquitous for complex data in all of science. Ignoring Subgraphs of 2 vertices, the four 3-vertex subgraphs are: C={1,3,4}, D={1,2,3}, F={1,2,4}, H={2,3,4} δint(D) =|PE&PD&PLT|/(3*2/2)=1/3 δext(D)=|PE&P’D&PLT|=1/(3*1)=3/3=1 δintD - δextD=1/3–1=-2/3 δext(H)=|PE&P’H&PLT|=1/(3*1)=2/3 D δint(H) =|PE&PH&PLT|/(3*2/2)=2/3 δintH - δextH=2/3-2/3=0 δint(F) =|PE&PF&PLT|/(3*2/2)=2/3 δext(F)=|PE&P’F&PLT|=1/(3*1)=2/3 δintF - δextF=2/3-2/3=0 F H

Mining for Communities with more relaxed definitions than cliques (taken from Fortunato’s survey) There are many cohesiveness definitions other than a Clique. Another criterion for subgraph cohesion relies on adjacency of its vertices. The idea is that a vertex must be adjacent to some minimum number of other vertices in the subgraph. In the literature on social network analysis there are two complementary ways of expressing this. A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. A k-core is essentially the same as a p-quasi complete subgraph, which is a subgraph such that the degree of each vertex is larger than p(k-1) , where p is a real number in [0; 1] and k the order of the subgraph. As cohesive as a subgraph can be, it would hardly be a community if there is strong cohesion also between the subgraph and the rest of the graph. Therefore, it is important to compare the internal and external cohesion of a subgraph. In fact, this is what is usually done in the most recent definitions of community. The first recipe, however, is not recent and stems from social network analysis. An LS-set is a subgraph such that the internal degree of each vertex is greater than its external degree. This condition is quite strict and can be relaxed into the so-called weak definition of community, for which it suffices that the internal degree of the subgraph exceeds its external degree. A community is strong if the internal degree of any vertex exceeds the number of edges that the vertex shares with any other community. A community is weak if its total internal degree exceeds the number of edges shared by the community with the other communities. Another definition focuses on the robustness of clusters to edge removal and uses the concept of edge connectivity. Edge connectivity of a pair of vertices is the minimal number of edges that need to be removed in order to disconnect them (no path between). A lambda set is a subgraph such that any pair of vertices of the subgraph has a larger edge connectivity than any pair formed by one vertex of the subgraph and one outside the subgraph. However, vertices of a lambda-set need not be adjacent and may be quite distant from each other. Communities can also be identified by a fitness measure, expressing to which extent a subgraph satisfies a given property related to its cohesion. The larger the fitness, the more definite is the community. This is the same principle behind quality functions, which give an estimate of the goodness of a graph partition. The simplest fitness measure for a cluster is its intra-cluster density int(C) (see slide 1). One could say subgraph C with k vertices is a cluster if int(C)>threshold. Finding such subgraphs is NP-complete, as it coincides with the NP-complete Clique Problem when the threshold =1. It is better to fix the size of the subgraph because, without this conditions, any clique would be one of the best possible communities, including trivial two-cliques (simple edges). Variants of this problem focus on the number of internal edges of the subgraph. Another measure is the relative density (C) of a subgraph C, defined as the ratio between the internal and the total degree of C (see slide1). Finding subgraphs of a given size with (C) larger than a threshold is NP-complete. Fitness measures can also be associated to the connectivity of the subgraph to the other vertices of the graph. A good community is expected to have a small cut size, i. e. small # of edges joining it to the rest of the graph.

Seismic attribute-assisted interpretation of incised valley fill geometries: A case study of Anadarko Basin Red Fork interval. Yoscel Suarez*, Chesapeake Energy and The University of Oklahoma, USA Kurt J. Marfurt, The University of Oklahoma, USA Mark Falk, Chesapeake Energy, USA Al Warner , Chesapeake Energy, USA Seismic Attribute Generation Edge Detection Relative Acoustic Impedance The Relative Acoustic Impedance (RAI) is a simplified inversion. This attribute is widely used for lithology discrimination and as a thickness variation indicator. Since the RAI enhances impedance contrast boundaries, it may help delimit different facies within an incised valley-fill complex. Figure 15 shows the better delineation of the different valley-fill episodes. The impedance amplitude variations within the system may be correlated to sand/shale ratios. Higher values of RAI seem to be related to sandier intervals (black arrow). Coherence According to Chopra and Marfurt (2007) coherence is a measure of similarity between waveforms or traces. Peyton et al. (1998) showed the value of this edge detection attribute to identify channel boundaries in the Red Fork level. Figure 11 shows the results of the modern coherence algorithm and the interpretation. The modern coherence algorithm is slightly superior. It shows additional features (blue arrows), and enhances the edge of Phase II (pink arrow). It also shows that the current outlines of Phase II could be modified in the encircled areas. Figure 15. Relative Acoustic Impedance (RAI) at the Red Fork level. Figure 11. Modern coherency horizon slice at the Red Fork level Figure 12. Other modern edge-detector attributes: a) Sobel coherence. b) Energy ratio coherence Energy Weighted Coherent Amplitude Gradients Chopra and Marfurt (2007), by using a wedge model, demonstrate that waveform difference detection algorithms are insensitive to waveform changes below tuning frequencies. In this study the energy ratio coherence, defined by the coherent energy normalized by the total energy of the traces within the calculation window, and the Sobel coherence, which is a measure of relative changes in amplitude were used. Figure 12 shows a horizon slice of the energy ratio coherence and the Sobel coherence at the Red Fork level. The results from these two energy weighted routines are very similar to the coherence attribute, however the level of detail of the coherency algorithm is greater in the encircled areas. Even though both algorithms show similar features, the Sobel coherence seems to be more affected by the acquisition footprint than does the energy ratio coherence. Seismic Attribute Blending Peak Frequency and Peak Amplitude Displays Liu and Marfurt (2007) show that by combining the peak frequency and peak amplitude volumes extracted from the spectral decomposition analysis, the interpreter can identify highly tuned intervals. Low peak frequency values correlate with thicker intervals and high peak frequencies with thinner features. Figures 16 (a,b) show the peak frequency and peak amplitude volumes respectively. Figure 16(c) shows the combination of both displays, which simplifies the interpretation of multiple volumes of data. Figure 16(d) shows the blended image with the overlain geological interpretation. This combination iof attributes shows a better definition of the Phases boundaries especially the Phase II in the NW corner of the survey, in between the two valley branches. The changes in facies within the Phase V are evident in the southernmost green arrow. The differentiation between the Phase III and Phase V is sharper (northernmost green arrow). Outside of the incised valley system the lithology relationship with frequency is still unclear. The dashed orange lines show the proposed changes to the Phase II outline. Curvature Although successful in delineating channels in Mesozoic rocks in Alberta, Canada (Chopra and Marfurt, 2008), for this study, volumetric curvature does not provide images of additional interpretational value. While the Red Fork channel boundaries can be delineated using this attribute (Figure 13), the results shown by the coherence and spectral decomposition are superior. In this situation the acquisition footprint negatively impacts the lateral resolution quality of the attribute. Blue arrows indicate channel edges. Figure 13. Other modern edge-detector attributes: a) Sobel coherence. b) Energy ratio coherence Figure 16. Peak Frequency and Peak Amplitude analysis at the Red Fork level. (a) Peak Frequency volume, red corresponds to higher frequencies. (b) Peak Amplitude volume, white corresponds to higher peak amplitude values. (c) Peak frequency and peak amplitude blended volume. The co-rendered image shows valley-fill boundaries. (d) co-rrendered image with interpretation Spectral Decomposition Matching pursuit spectral decomposition was used to generate individual frequency volumes as well as peak amplitude and peak frequency datasets. Castagna et al. (2003) discuss the value of using matching pursuit spectral decomposition and how we can associate different “tuning frequencies” to different reservoir properties like fluid content, thickness and/or lithology. Figure 14 shows a matching pursuit 36 Hz spectral component at the Red Fork level. The level of detail using matching pursuit spectral decomposition is superior to that provided by the DFT Figure 14. 36 Hz matching pursuit spectral decomposition. Note the enhanced level of detail offered by the matching pursuit spectral decomposition. a) without geological interpretation b) with geological interpretation This study has identified correlations between attribute expressions of Red Fork channels that can be applied to underexploited exploration areas in the Mid-continent, and to fluvial deltaic channels in Paleozoic rocks in general. When it comes to answer the key questions discussed at the beginning of this paper, we learned that the coherence and energy weighted attributes help improve the resolution of subtle features like small channels and channel levees. They also help differentiate the cutbank from the gradational inner bank. It is also evident from this study that even though there have been some improvements in the coherence routines, the differences between current algorithms with the ones applied by Peyton et al. in 1998 are minimal. Additionally, detailed channel geomorphology and lithology discrimination were possible by introducing the spectral decomposition and relative acoustic impedance attributes in the analysis. On one hand, the use of spectral decomposition helped define different facies within the channel system and increased the resolution of channel boundaries. On the other hand, the variations in the RAI values were found to be correlative to lithology infill, for instance higher values of RAI show direct relationship to shalier intervals within the channel complex. One of the key findings of this study is the great value that blended images of attributes bring to the interpreter. Such technology was not available ten years ago. But today, by combining multiple attributes, fluvial facies delineation is possible when co-rendering edge detection attributes with lithology indicators. It is important to mention that the signal/noise ratio of the data is a key factor that will determine the resolution and quality of the seismic attribute response. In this study, curvature did not provide images of additional interpretational value. These unsatisfactory results may be related to acquisition footprint contamination. Therefore, footprint removal methods will be performed in an attempt to enhance signal-tonoise ratio. Acknowledgments We thank Chesapeake Energy for their support in this research effort. We give special thanks to Larry Lunardi, Carroll Shearer, Mike Horn, Mike Lovell and Travis Wilson for their valuable contribution and feedback. And to my closest friends Carlos Santacruz and Luisa Aurrecoechea for cheering me up at all times. Amplitude Variability Semblance of the Relative Acoustic Impedance Chopra and Marfurt (2007) define semblance as “the ratio of the energy of the average trace to the average energy of all the traces along a specified dip.” Since RAI has sharper facies boundaries the semblance computed from RAI should be crisper than semblance computed from the conventional seismic. Figure 17 shows the value of combining these attributes. Outside of the channel complex the lithology relationship with frequency is still unclear(red arrow). The yellow arrow points to a potential fluvial channel outside of the incised valley-system. The dashed orange lines show the proposed changes to the Phase II outline. Conclusions Figure 17 a) the Semblance of the RAI and b) RAI and RAI semblance blended image. The combination of both attributes helps delineate Relative Acoustic Impedance boundaries.

Classification Model Similarity Measure Similarity measure classifiers determine the feature to feature similarity between vectors Jaccard Vectors are numeric and categorical Matching Count Categories are defined as LOW, MED, or HIGH Test Vector Mean Negative Train Vector • Threshold is Numeric for Cosine • Threshold is match count for Jaccard Threshold Threshold Mean Positive Train Vector Cosine Numeric Similarity-P Similarity-N Correct Correct Classification Classification Incorrect Incorrect Classification Classification True True Positive Positive True True Negative Negative False False Positive Positive False False Negative Negative

TSCS and Corpus Size • Similarity is a value [0,1] and it is not clear what is the threshold to use to assert two documents are “similar” • Cosine similarity varies with changes in corpus size • We ran an experiment to see how the similarity of two seeded document varies with changes in corpus size (a=0.5) Size of Corpus 4 5 10 15 20 30 40 Similarity of Set #1 0.89 0.89 0.90 0.90 0.91 0.92 0.92 Similarity of Set #2 0.52 0.53 0.54 0.54 0.56 0.57 0.59 Similarity variations with different corpus sizes using TSCS Size of Corpus 4 5 10 15 20 30 40 Similarity of Set #1 0.85 0.85 0.87 0.86 0.89 0.90 0.91 Similarity of Set #2 0.48 0.50 0.51 0.52 0.44 0.57 0.60 Similarity variations with different corpus sizes using Cosine

APPENDIX Always use pop-count for 1-counts as we AND, then C is a clique iff all C level-1 counts are |V C|-1. In fact one can mine out all cliques by just analyzing the PT counts. Note: If one creates PT, lots of tasks become easy! E.g., clique mining, shortest path mining, degree community mining, density community mining! What else? 1 2 3 4 G1 A k-plex is a maximal subgraph in which each vertex is adjacent to all other vertices of the subgraph except at most k of them. A k-core is a maximal subgraph in which each vertex is adjacent to at least k other vertices of the subgraph. In any graph there is a whole hierarchy of cores of different order. k-plex existence alg (using the GPpT): C is a k-plex iff vC|Cv| |VC|2–k2 k-core existence alg (using the GPpT): C is a k-core iff vC, |VC| k . k-plex inheritance thm: Every induced subgraph of a k-plex is a k-plex. Mine all max k-plexes: Use |Cv| vC k-core inheritance thm: If a cover of G by induced k-cores, G is a k-core. Mine all max k-cores: Use |Cv| vC Clique=community s.t. edge between each vertex pair. Community=subgraph w more edges inside than linked to its outside. Gene-Gene Ints: # edges = 1B (10 9) Person-Tweet Security: # edges = 7B*10K= 10 14 Recommenders: # edges = 1MB (1015) Friends Social: # edges = 4BB (1018) Stock-Price: # edges = 1013 An Induced SubGraph (ISG) C, is a subgraph that inherits all of G’s edges on its own vertices. A k-ISG (k vertices), C, is a k-clique iff all of its (k-1)-Sub-ISGs are (k-1)-cliques. 1 3:2 As a Rolodex card C Ekey 1,3 1,4 2,4 3,4 2:3 12 2 3 1 V2 V1 4:3 1:2 V1 1 | 1 | 2 | 3 | 2:3 3:2 4:3 E=Adj matrix 1:2 V2 ELabel 3 1 4 2 4 3 4 1 PVL,1 1 1 1 1 Bit offset 1 2 V (vertex tbl) 3 Vkey VL 4 1 2 5 6 2 3 7 3 2 8 4 3 9 10 11 PVL,0 PC 12 13 0 1 14 0 1 1 15 0 1 16 1 Ekey 1,1 1,2 1,3 1,4_ 2,1 2,2 2,3 2,4_ 3,1 3,2 3,3 3,4_ 4,1 4,2 4,3 4,4 PE 0 0 1 1_ 0 0 0 1_ 1 0 0 1_ 1 1 1 0 PU 0 0 1 1_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 EL 0 0 1 2_ 0 0 0 3_ 0 0 0 1_ 2 3 1 0 PEL.,1 0 0 0 1_ 0 0 0 1_ 0 0 0 1_ 1 1 0 0 PEL.,0 0 0 1 0_ 0 0 0 1_ 1 0 0 0_ 0 1 1 0 P1 1 1 1 1_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 P2 0 0 0 0_ 1 1 1 1_ 0 0 0 0_ 0 0 0 0 P3 0 0 0 0_ 0 0 0 0_ 1 1 1 1_ 0 0 0 0 P4 0 0 0 0_ 0 0 0 0_ 0 0 0 0_ 1 1 1 1 PEC=PE&PC 0 0 1 1_ 0 0 0 0_ 1 0 0 1_ 1 0 1 0 PUC=PU&PC 0 0 1 1_ 0 0 0 0_ 0 0 0 1_ 0 0 0 0 (C=Induced SubGraph with VC={1,3,4}) 2:3 3:2 4:3 PUC 0 0 1 1_ 0 0 0 0_ 0 0 0 1_ 0 0 0 0 Ct=3 PUD 0 0 1 0_ 0 0 0 0_ 0 0 0 0_ 0 0 0 0 Ct=1 PUF 0 0 0 1_ 0 0 0 1_ 0 0 0 0_ 0 0 0 0 Ct=2 PUH 0 0 0 0_ 0 0 0 1_ 0 0 0 1_ 0 0 0 0 Ct=2 1 2 3 1 Clique Existence Alg is induced SG a clique. Edge Count existence thm (EC): |EC||PUC|=COMB(|VC|,2)|VC|!/((|VC|-2)!2!) Apply EC 3vertex ISGs (3-Clique iff |PU|= 3!/(2!1!)=3) VC={1,3,4} VD={1,2,3} VF={1,2,4} VH={2,3,4} C only 3-Clique. SubGraph existence theorem (SG): (VC,EC) is a k-clique iff every induced k-1 subgraph, (V D,ED) is a (k-1)-clique. SG or EC better? Extend to quasi-cliques? Extend to mine out all cliques? A Clique Mining algorithm finds all cliques in a graph. For Clique-Mining we can use an ARM-Apriori-like downward closure property: CSkkCliqueSet, CCSk+1Candidatek+1CliqueSet By the SG clique thm, CCSk+1= all s of CSk pairs having k-1 common vertices. Let CCCSk+1 be a union of two k-cliques with k-1 common vertices. Let v and w be the kth vertices (different) of the two k-cliques, then CCSk+1 iff (PE)(v,w)=1. (We just need to check a single bit in P E.) Form CCSk+1: Union CSk pairs sharing k-1 vertices, check single PE bit. Below, k=2, so we check edge pairs sharing 1 vertex, then check the 1 new edge bit in P E. CS2=E={13 14 24 34} PE(3,4) = PE(4*[3-1]+4=12)=1 134CS3 Already have 134 PE(2,3)=PE(4*[2-1]+3=7)=0 PE(1,2) = PE(4*[1-1]+2=2)=0 The only expensive part of this is forming CCSk. And that is expensive only for CCS3 (as in Apriori ARM) Next? List out CS3 = {134} form CCS4 = . Done. Internal degree of C, kCint =vC kvint 2=|PC&PE&Pv1|=kv1int Int/Ext degree of v∈C, kv =# edges v to wC/C’ 0=|P’C&PE&Pv1|=kv1ext Intra-cluster density δint(C)=|edges(C,C)|/(nc(nc−1)/2)=|PE&PC&PLT|/(3*2/2)=3/3=1 int/wxt int External degree of C, kCext =vC kvext 2=|PC&PE&Pv3| =kv3int 2=|PC&PE&Pv4|=kv4int 6=kC ext kC=7 Total degree of C, kC= kCint +kCext 0=|P’C&PE&Pv3|=kv3ext 1=|P’C&PE&Pv4|=kv4ext 1=kC δ Cδ C=1–1/3=2/3 int ext Inter-cluster density δext(C)=|edges(C,C’)| / (nc(n-nc)) =|PE&P’C&PLT|=1/(3*1)=1/3 Tradeoff between large δint(C) and small δext(C) is goal of many community mining algorithms. A simple approach is to Maximize differences. Density Difference algorithm for Communities: δint(C)−δext(C) >Threshold? Degree Difference algorithm: kCint – kCext > Threshold? Easy to compute w pTrees, even for Big Graphs. Graphs are ubiquitous for complex data in all of science. Ignoring Subgraphs of 2 vertices, the four 3-vertex subgraphs are: C={1,3,4}, D={1,2,3}, F={1,2,4}, H={2,3,4} δint(D) =|PE&PD&PLT|/(3*2/2)=1/3 δext(D)=|PE&P’D&PLT|=1/(3*1)=3/3=1 δintD - δextD=1/3–1=-2/3 δext(H)=|PE&P’H&PLT|=1/(3*1)=2/3 D δint(H) =|PE&PH&PLT|/(3*2/2)=2/3 δintH - δextH=2/3-2/3=0 δint(F) =|PE&PF&PLT|/(3*2/2)=2/3 δext(F)=|PE&P’F&PLT|=1/(3*1)=2/3 δintF - δextF=2/3-2/3=0 F H

The Exception Class Hierarchy • Part of the error and exception class hierarchy in the Java API: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html https://docs.oracle.com/javase/7/docs/api/java/lang/Error.html https://docs.oracle.com/javase/7/docs/api/java/lang/Exception.html https://docs.oracle.com/javase/7/docs/api/java/lang/RuntimeException.html Java Foundations, 3rd Edition, Lewis/DePasquale/Chase 10 - 22