Slide #1.

Social Networks Community Analysis Modified from R. Zafarani, M. A. Abbasi, and H. Liu, Social Networks Mining: An Introduction, Cambridge University Press, 2014.
More slides like this


Slide #2.

Social Community [real-world] community A group of individuals with common economic, social, or political interests or characteristics, often living in relative proximity. 2
More slides like this


Slide #3.

Why analyze communities? Analyzing communities helps better understand users – Users form groups based on their interests Groups provide a clear global view of user interactions • E.g., find polarization Some behaviors are only observable in a group setting and not on an individual level – Some republican can agree with some democrats, but their parties can disagree 3
More slides like this


Slide #4.

Example: political blogs 1 Digbys Blog 2 JamesWalcott 3 Pandago n 4 blog.johnkerry.com 5 OliverWillis 6 AmericaBlog 7 CrookedTimber 8 DailyKos 9 AmericanProspect 10Eschaton 11Wonkette 12TalkLeft 13PoliticalWire 14TalkingPointsMemo 15Matthew Yglesia s 16Washing tonMonthly 17MyDD 18JuanCole 19Left Coaster 20Bradford DeLong (A) 1 21 2 3 4 6 7 9 10 8 25 26 15 14 16 (B) (C) 20 33 35 34 37 29 30 32 31 19 27 28 12 17 23 24 11 13 18 22 5 38 40 39 36 21 JawaReport 22VokaPundit 23Roger LSimon 24TimBlair 25Andrew Sulli van 26 Instapundit 27BlogsforBush 28 Little GreenFootballs 29Belmont Club 30Captain’sQuarters 31Powerline 32 Hugh Hewitt 33 INDC Journal 34RealClearPolitics 35Winds ofChange 36Allahpundi t 37MichelleMalkin 38WizBang 39Dean’sWorld 40Volokh (Aug 29th – Nov 15th, 2004) A) all citations between A-list blogs in 2 months preceding the 2004 election B) citations between A-list blogs with at least 5 citations in both directions C) edges further limited to those exceeding 25 combined citations only 15% of the citations bridge communities 4 source: Adamic & Glance, LinkKDD2005
More slides like this


Slide #5.

Social Media Communities • Formation: – When like-minded users on social media form a link and start interacting with each other • More Formal Formation: 1. 2. • A set of at least two nodes sharing some interest, and Interactions with respect to that interest. Social Media Communities – Explicit (emic): formed by user subscriptions – Implicit (etic): implicitly formed by social interactions • Example: individuals calling Canada from the United States • Phone operator considers them one community for promotional offers • Other community names: group, cluster, cohesive subgroup, or module 5
More slides like this


Slide #6.

Examples of Explicit Social Media Communities Facebook has groups and communities. Users can – post messages and images, – can comment on other messages, – can like posts, and – can view activities of other users – Circles in Google+ represent communities – Communities form as lists. – Users join lists to receive information in the form of tweets – LinkedIn provides Groups and Associations. – Users can join professional groups where they can post and share information related to the group 6
More slides like this


Slide #7.

Finding Implicit Communities: An Example A simple graph in which three implicit communities are found, enclosed by the dashed circles 7
More slides like this


Slide #8.

Overlapping vs. Disjoint Communities Overlapping Communities Disjoint Communities 8
More slides like this


Slide #9.

Subjectivity of Community Definition 9 A densely-knit community 9 Each component is a community Definition of a community can be subjective. 9
More slides like this


Slide #10.

Implicit communities in other domains Protein-protein interaction networks – Communities are likely to group proteins having the same specific function within the cell World Wide Web – Communities may correspond to groups of pages dealing with the same or related topics Metabolic networks – Communities may be related to functional modules such as cycles and pathways Food webs – Communities may identify compartments 10
More slides like this


Slide #11.

Real-world Implicit Communities Collaboration network between scientists working at the Santa Fe Institute The colors indicate high level communities and correspond to research divisions of the institute 11
More slides like this


Slide #12.

What is Community Analysis? • Community detection – Discovering implicit communities • Community evolution – Studying temporal evolution of communities • Community evaluation – Evaluating Detected Communities 12
More slides like this


Slide #13.

Community Detection 13
More slides like this


Slide #14.

What is community detection? • The process of finding clusters of nodes (‘‘communities’’) – – – – With Strong internal connections and Weak connections between different communities Given: a social network Output: community membership of (some) actors • Ideal decomposition of a large graph – Completely disjoint communities – There are no interactions between different communities • In practice, – find community partitions that are maximally decoupled 14
More slides like this


Slide #15.

Why Detecting Communities is Important? Zachary's karate club Interactions between 34 members of a karate club for over two years http:// networkkarate.tumblr.com • The club members split into two groups (gray and white) • Disagreement between the administrator of the club (node 34) and the club’s instructor (node 1) • The same of communities The members one group left to start their own club can be found using community detection 15
More slides like this


Slide #16.

Why Community Detection? Network Summarization – A community can be considered as a summary of the whole network – Easier to visualize and understand – Forming the basis for other tasks such as data mining Preserve Privacy – [Sometimes] a community can reveal some properties without releasing the individuals’ private information 16
More slides like this


Slide #17.

Community Detection vs. Clustering Clustering •   • Data is often non-linked (matrix rows) • Clustering works on the distance or similarity matrix, – e.g., -means • If you use -means with adjacency matrix rows, you are only considering the ego-centric network Community detection • Data is linked (a graph) • Network data tends to be “discrete”, leading to algorithms using the graph property directly – -clique, quasi-clique, or edge-betweenness 17
More slides like this


Slide #18.

Community Detection Algorithms Group Users based on Group attributes Group Users based on Member attributes 18
More slides like this


Slide #19.

Member-Based Community Detection 19
More slides like this


Slide #20.

Member-Based Community Detection •• Look at node characteristics; and   • Identify nodes with similar characteristics and consider them a community Node Characteristics A. Degree – Complete Mutuality – Example: cliques B. Reachability – Nodes that are close (small shortest paths) are in one community – Example: -cliques, -clubs, and -clans C. Similarity – Similar nodes are in the same community 20
More slides like this


Slide #21.

A. Complete Mutuality Most common subgraph searched for: • Clique: a maximum complete subgraph in which all nodes inside the subgraph adjacent to each other Find communities by searching for 1. The maximum clique: the one with the largest number of vertices, or 2. All maximal cliques: cliques that are not subgraphs of a larger clique; i.e., cannot be expanded further To overcome this, we can I. Brute Force II. Relax cliques III. Use cliques as the core for larger communities Both problems are NPhard 21
More slides like this


Slide #22.

I. Brute-Force Method •Can   find all the maximal cliques in the graph For each vertex , we find the maximal clique that contains node  Impractical for large networks: • For a complete graph of only 100 nodes, the algorithm will generate at least different cliques starting from any node in the graph 22
More slides like this


Slide #23.

Enhancing the Brute-Force Performance Pruning can help: •   • When searching for cliques of size or larger • If the clique is found, each node should have a degree equal to or more than • We can first prune all nodes (and edges connected to them) with degrees less than – More nodes will have degrees less than – Prune them recursively • For large , many nodes are pruned as social media networks follow a power-law degree distribution 23
More slides like this


Slide #24.

Maximum Clique: Pruning… •Example.   to find a clique , remove all nodes with degree – Remove nodes 2 and 9 – Remove nodes 1 and 3 – Remove node 4 Even with pruning, cliques are less desirable – – – – Cliques are rare A clique of 1000 nodes, has 999x1000/2 edges A single edge removal destroys the clique That is less than 0.0002% of the edges! 24
More slides like this


Slide #25.

II. Relaxing Cliques: k-core, k-plex • Each node should have a certain number of connections to nodes within the group – k-core: a substracture that each node connects to at least k members within the group – k-plex: for a group with ns nodes, each node should be adjacent no fewer than ns-k in the group • The definitions are complementary – A k-core is a (ns-k)-plex – However, the set of all k-cores for a given k is not same as the set of all (ns-k)-plexes as group size ns can vary for each k-core 25
More slides like this


Slide #26.

II. Relaxing Cliques: k-plex •• -plex. a set of vertices in which we have   • is the degree of in the induced subgraph – Number of nodes from that are connected to • Clique of size is a -plex • Finding the maximum -plex: NP-hard – In practice, relatively easier due to smaller search space  Maximal -plexes 26
More slides like this


Slide #27.

III. Using Cliques as a seed of a Community •Clique Percolation Method (CPM)   – Uses cliques as seeds to find larger communities – CPM finds overlapping communities • Input – A parameter , and a network • Procedure – Find out all cliques of size in the given network – Construct a clique graph • Two cliques are adjacent if they share nodes – Each connected components in the clique graph form a community 27
More slides like this


Slide #28.

Clique Percolation Method: Example   Cliques of size 3: ,, ,,,, ,   Communities: , , 28
More slides like this


Slide #29.

B. Node Reachability The two extremes Nodes are assumed to be in the same community 1. If there is a path between them (regardless of the distance) or 2. They are so close as to be immediate neighbors How? Find using BFS/DFS Challenge: most nodes are in one community (giant component) How? Finding Cliques Challenge: Cliques are challenging to find and are rarely observed Solution: find communities that are in between cliques and connected components in terms of connectivity and have small shortest paths between their nodes 29
More slides like this


Slide #30.

Special Subgraphs 1. •  -Clique: a maximal subgraph in which the largest geodesic distance between any nodes is less than or equal to 2. -Club: follows the same definition as a -clique – Additional Constraint: nodes on the shortest paths should be part of the subgraph 3. -Clan: a -clique where for all shortest paths within the subgraph the distance is equal or less than – All -clans are -cliques, but not vice versa 30
More slides like this


Slide #31.

Reachability: k-clique, k-club • Any node in a group should be reachable in k hops • k-clique: a maximal subgraph in which the largest geodesic distance between any nodes <= k • A k-clique can have diameter larger than k within the subgraph – e.g., 2-clique {12, 4, 10, 1, 6} – Within the subgraph d(1, 6) = 3 • k-club: a substructure of diameter <= k – e.g., {1,2,5,6,8,9}, {12, 4, 10, 1} are 2-clubs 31
More slides like this


Slide #32.

C. Node Similarity •• Similar (or most similar) nodes are assumed to be in the same   community. – A classical clustering algorithm (e.g., -means) is applied to node similarities to find communities. • Node similarity can be defined – Using the similarity of node neighborhoods (Structural Equivalence) – Ch. 3 – Similarity of social circles (Regular Equivalence) – Ch. 3 Structural equivalence: two nodes are structurally equivalent iff. they are connecting to the same set of actors Nodes 1 and 3 are structurally equivalent, So are nodes 5 and 7. 32
More slides like this


Slide #33.

Node Similarity (Structural Equivalence) Jaccard Similarity Cosine similarity 33
More slides like this


Slide #34.

Vector Similarity 1 a vector structurally equivalent 2 3 4 1 5 5 6 7 8 9 10 1 1 1 2 1 3 1 8 1 1 1 9 1 1 1 Cosine Similarity: 1 1 sim (5,8)   2 3 6 Jaccard Similarity: J (5,8)  |{6}| |{1, 2 , 6 ,13}| 1 / 4 34 34
More slides like this


Slide #35.

Within-Outside Ties: LS sets • LS sets: Any of its proper subsets has more ties to other nodes in the group than outside the group – Too strict, not reasonable for network analysis • A relaxed definition is k-component – Require the computation of edge-connectivity between any pair of nodes via minimum-cut, maximum-flow algorithm – 1-component is a connected component 35
More slides like this


Slide #36.

Recap of Node-Centric Communities • Each node has to satisfy certain properties – – – – Nodal degrees (clique, k-plex, k-core) Reachability (k-clique, k-clan, k-club) Node similarity (structural, cosine, jaccard, …) Within-Outside ties • Limitations: – Too strict, but can be used as the core of a community – Not scalable, commonly used in network analysis with small-size network – Sometimes not consistent with property of large-scale networks • e.g., nodal degrees for scale-free networks 36
More slides like this


Slide #37.

Group-Based Community Detection 37
More slides like this


Slide #38.

Group-Based Community Detection • In   group-based community detection, we are interested in communities that have certain group properties Properties: I. Balanced: Spectral clustering II. Robust: -connected graphs III. Modular: Modularity Maximization IV. Dense: Quasi-cliques V. Hierarchical: Hierarchical clustering 38
More slides like this


Slide #39.

I. Balanced Communities • Community detection can be thought of graph clustering • Graph clustering: we cut the graph into several partitions and assume these partitions represent communities • Cut: partitioning (cut) of the graph into two (or more) sets (cutsets) – The size of the cut is the number of edges that are being cut • Minimum cut (min-cut) problem: find a graph partition such that the number of edges between the two sets is minimized Min-cut Min-cuts can be computed efficiently using the max-flow mincut theorem Min-cut often returns an imbalanced partition, with one set being a singleton 39
More slides like this


Slide #40.

Ratio Cut and Normalized Cut • To   mitigate the min-cut problem we can change the objective function to consider community size • is the complement cut set • is the size of the cut 40
More slides like this


Slide #41.

Ratio Cut & Normalized Cut: Example B For Cut A A For Cut B Both ratio cut and normalized cut prefer a balanced partition 41
More slides like this


Slide #42.

Spectral Clustering Both ratio/normalized cut can be reformulated as   , , is a diagonal degree matrix • Spectral relaxation:   Optimal Solution is the top eigenvectors with the smallest eigenvalues 44
More slides like this


Slide #43.

Spectral Clustering: Example   Two communities:   2 Eigenvectors i.e., we want 2 communities -means,   46
More slides like this


Slide #44.

MDS-example 1, 2, 3, 4, 10, 12 5, 6, 7, 8, 9, 11, 13 k-means S Geodesic Distance Matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 1 0 1 1 1 2 2 3 1 1 2 4 2 2 2 1 0 2 2 1 2 3 2 2 3 4 3 3 3 1 2 0 2 3 3 4 2 2 3 4 1 2 2 0 3 2 3 2 2 1 5 4 3 1 3 3 5 2 1 3 3 0 1 2 2 2 2 3 3 3 6 2 2 3 2 1 0 1 1 1 1 2 2 2 7 3 3 4 3 2 1 0 2 2 2 1 3 3 8 1 2 2 2 2 1 2 0 2 2 3 3 1 9 1 2 2 2 2 1 2 2 0 2 3 3 1 10 2 3 3 1 2 1 2 2 2 0 3 1 3 11 4 4 5 4 3 2 1 3 3 3 0 4 4 12 2 3 3 1 3 2 3 3 3 1 4 0 4 13 2 3 3 3 3 2 3 1 1 3 4 4 0 MDS -1.22 -0.88 -2.12 -1.01 0.43 0.78 1.81 -0.09 -0.09 0.30 2.85 -0.47 -0.29 -0.12 -0.39 -0.29 1.07 -0.28 0.04 0.02 -0.77 -0.77 1.18 0.00 2.13 -1.81 48
More slides like this


Slide #45.

Block-Model Approximation After Reordering Network Interaction Matrix Block Structure Objective: Minimize the difference between an interaction S is a matrix and a block structure community indicator matrix Challenge: S is discrete, difficult to solve Relaxation: Allow S to be continuous satisfying Solution: the top eigenvectors of A Post-Processing: Apply k-means to S to find the partition 49
More slides like this


Slide #46.

II. Robust Communities • The goal is find subgraphs robust enough such   that removing some edges or vertices does not disconnect the subgraph • -vertex connected (-connected) graph: – is the minimum number of nodes that must be removed to disconnect the graph • -edge connected: at least edges must be removed to disconnect the graph • Examples: – Complete graph of size : unique -connected graph – A cycle: -connected graph 50
More slides like this


Slide #47.

III. Modular Communities a graph , where the degrees are known •Consider   beforehand however edges are not – Consider two vertices and with degrees and . What is an expected number of edges between and ? • For any edge going out of randomly the probability of this edge getting connected to vertex is 51
More slides like this


Slide #48.

Modularity and Modularity Maximization • Given a degree distribution, we know the expected number of edges between any pairs of vertices • We assume that real-world networks should be far from random. – Therefore, the more distant they are from this randomly generated network, the more structural they are. • Modularity defines this distance and modularity maximization tries to maximize this distance 52
More slides like this


Slide #49.

Modularity Maximization: Example Two Communities: {1, 2, 3, 4} and {5, 6, 7, 8, 9}  -means 2 eigenvectors Modularity Matrix 55
More slides like this


Slide #50.

IV. Dense Communities: -dense • The density of a graph defines how close a   graph is to a clique • A subgraph is a -dense (or quasi-clique) if • A -dense graph is a clique • We can find quasi-cliques using the brute force algorithm discussed previously, but there are more efficient methods. 56
More slides like this


Slide #51.

Finding Maximal -dense Quasi-Cliques We can use a two-step procedure consisting of “local search” and “heuristic pruning” Local search • Sample a subgraph, and find a maximal -dense quasi-clique • A greedy approach is to expand a quasi-clique by all of its highdegree neighbors until the density drops below  Heuristic pruning • For a -dense quasi-clique of size k, we recursively remove nodes with degree less than  k and incident edges • We can start from low-degree nodes and recursively remove all nodes with degree less that  k 57
More slides like this


Slide #52.

V. Hierarchical Communities • Previous methods consider communities at a   single level – Communities may have hierarchies. • Each community can have sub/super communities. – Hierarchical clustering deals with this scenario and generates community hierarchies. • Initially members are considered as either 1 or communities in hierarchical clustering. These communities are gradually – merged (agglomerative hierarchical clustering) or – split (divisive hierarchical clustering) 58
More slides like this


Slide #53.

Hierarchical Community Detection • Goal: build a hierarchical structure of communities based on network topology • Allow the analysis of a network at different resolutions • Representative approaches: – Divisive Hierarchical Clustering – Agglomerative Hierarchical clustering 59
More slides like this


Slide #54.

Divisive Hierarchical Clustering • Divisive clustering – Partition nodes into several sets – Each set is further divided into smaller ones – Network-centric partition can be applied for the partition • One particular example: Girvan-Newman Algorithm: recursively remove the “weakest” links within a “community” – Find the edge with the weakest link – Remove the edge and update the corresponding strength of each edge • Recursively apply the above two steps until a network is discomposed into a desired number of connected components. • Each component forms a community 60
More slides like this


Slide #55.

Edge Betweenness • To determine weakest links, the algorithm uses edge betweenness. Edge betweenness is the number of shortest paths that pass along with the edge • Edge betweenness measures the “bridgeness” of an edge between two communities • The edge with high betweenness tends to be the bridge between two communities. 61
More slides like this


Slide #56.

Edge Betweenness: Example  The edge betweenness of is , as all the shortest paths from to have to either pass or, and is the shortest path between and 62
More slides like this


Slide #57.

The Girvan-Newman Algorithm 1.Calculate edge betweenness for all edges in the graph 2.Remove the edge with the highest betweenness 3.Recalculate betweenness for all edges affected by the edge removal 4.Repeat until all edges are removed 63
More slides like this


Slide #58.

Edge Betweenness Divisive Clustering: Example Initial betweenness value the first edge that needs to be removed is e(4, 5) ( or e(4, 6) ) By removing e(4, 5), we compute the edge betweenness once again; this time, e(4, 6) has the highest betweenness value: 20  This is because all shortest paths between nodes {1,2,3,4} to nodes {5,6,7,8,9} must pass e(4, 6); therefore, it has betweenness 45 = 64
More slides like this


Slide #59.

Divisive clustering on Edge-Betweenness • Progressively remove edges with the 3 highest betweenness 3 3 5 – Remove e(2,4), e(3, 5) – Remove e(4,6), e(5,6) – Remove e(1,2), e(2,3), e(3,1) 5 4 4 root V1,v2,v3 v 1 v 2 V4, v5, v6 v 3 v 4 v 5 v 6 65
More slides like this


Slide #60.

Agglomerative Hierarchical Clustering • Initialize each node as a community • Choose two communities satisfying certain criteria and merge them into larger ones – Maximum Modularity Increase – Maximum Node Similarity roo t V4, v5, v6 V1, v2, v3 v 3 V1,v2 v 1 v 2 v 4 (Based on Jaccard Similarity) V1,v2 v 5 v 6 66
More slides like this


Slide #61.

Recap of Hierarchical Clustering • Most hierarchical clustering algorithm output a binary tree – Each node has two children nodes – Might be highly imbalanced • Agglomerative clustering can be very sensitive to the nodes processing order and merging criteria adopted. • Divisive clustering is more stable, but generally more computationally expensive 67
More slides like this


Slide #62.

Hierarchical clustering: Zachary Karate Club source: Girvan and Newman, PNAS June 11, 2002 99(12):7821-7826 68
More slides like this


Slide #63.

Is hierarchical clustering really this bad? Zachary karate club data hierarchical clustering tree using edge-independent path counts 69
More slides like this


Slide #64.

betweenness clustering algorithm & the karate club data set 70
More slides like this


Slide #65.

Community Detection Algorithms 71
More slides like this


Slide #66.

How Communities Evolve? 72
More slides like this


Slide #67.

Community Evolution • Communities also expand, shrink, or dissolve in dynamic networks 73
More slides like this


Slide #68.

Community Detection in Evolving Networks 74
More slides like this


Slide #69.

Extending Previous Methods •1.  Take t snapshots of the network, , where is a snapshot at time 2. Perform a static community detection algorithm (all methods discussed before) on all snapshots independently 3. Assign community members based on communities found in all time stamps. – e.g., Assign nodes to communities based on voting (assign nodes to communities they belong to the most over time) Unstable in highly dynamic networks as community memberships are always changing 75
More slides like this


Slide #70.

Evolutionary Clustering • Assume that communities don’t change most of the time • Minimize an objective function that considers – Snapshot Cost. Communities at different times (SC) – Temporal Cost. How communities evolve (TC) • Objective function is defined as 76
More slides like this


Slide #71.

Evolutionary Clustering •• If  we use spectral clustering for each snapshot • Then, , the objective function at time is • is the community membership matrix at • To define TC we can use • Challenges with this definition – Assumes that we have the same number of communities at time and – is non-unique (any orthogonal transformation is still a solution) 77
More slides like this


Slide #72.

Evolutionary Clustering • Assuming Normalized Laplacian is used   Similar to Spectral Clustering can be obtained by taking the top eigenvectors of 79
More slides like this


Slide #73.

Community Evaluation 80
More slides like this


Slide #74.

Evaluating the Communities   We are given objects of two different kinds (, ) • The perfect communities would be that objects of the same type are in the same community. • Evaluation with ground truth • Evaluation without ground truth 81
More slides like this


Slide #75.

Evaluation with Ground Truth • When ground truth is available – We have partial knowledge of what communities should look like – We are given the correct community (clustering) assignments • Measures – Precision and Recall, or F-Measure – Purity – Normalized Mutual Information (NMI) 82
More slides like this


Slide #76.

Precision and Recall     True Positive (TP) : • When similar members are assigned to the same communities • A correct decision. False Negative (FN) : • When similar members are assigned to different communities • An incorrect decision True Negative (TN) : • When dissimilar members are assigned to different communities • A correct decision False Positive (FP) : • When dissimilar members are assigned to the same communities • An incorrect decision 83 83
More slides like this


Slide #77.

Precision and Recall: Example 84
More slides like this


Slide #78.

F-Measure •Either   or measures one aspect of the performance, – To integrate them into one measure, we can use the harmonic mean of precision of recall For the example earlier, 85
More slides like this


Slide #79.

Purity We can assume the majority of a community represents the community – We use the label of the majority against the label of each member to evaluate the communities Purity. The fraction of instances that have labels equal to the community’s majority label  • • • • : the number of communities : total number of nodes, :• thePoints set of instances with label in all communities being singleton communities : the set of members in community Purity can be easily tampered by • or by Very large communities (of size 1); 86
More slides like this


Slide #80.

Mutual Information • Mutual information (MI). The amount of information that two random variables share – By knowing one of the variables, it measures the amount of uncertainty reduced regarding the others  • and are labels and found communities; • and are the number of data points in community and with label , respectively; • is the number of nodes in community and with label ; and is the number of nodes 87
More slides like this


Slide #81.

Normalized Mutual Information: Example Found communities (H) – [1,1,1,1,1,1, 2,2,2,2,2,2,2,2] Actual Labels (L) – [2,1,1,1,1,1, 2,2,2,2,2,2,1,1]   nh nl nh,l h=1 6 7 h=1 5 1 h=2 8 7 h=2 2 6 88
More slides like this


Slide #82.

Normalized Mutual Information  • where and are known (with labels) and found communities, respectively • and are the number of members in the community and , respectively, • is the number of members in community and labeled , • is the size of the dataset • NMI values close to – one indicate high similarity between communities found and labels – zero indicate high dissimilarity between them 91
More slides like this


Slide #83.

Evaluation without Ground Truth • Evaluation with Semantics – A simple way of analyzing detected communities is to analyze other attributes (posts, profile information, content generated, etc.) of community members to see if there is a coherency among community members – The coherency is often checked via human subjects • Or through labor markets: Amazon Mechanical Turk – To help analyze these communities, one can use word frequencies. By generating a list of frequent keywords for each community, human subjects determine whether these keywords represent a coherent topic • Evaluation Using Clustering Quality Measures – Use clustering quality measures (Sum of Squared Error) – Use more than two community detection algorithms and compare the results and pick the algorithm with better quality measure 92
More slides like this