Slide #1.

CS6604 Digital Libraries Social Communities Knowledge Management: Social Interactome Final Term Project Presentation Presenter Prashant Chandrasekar {peecee}@vt.edu Instructor Dr. Edward A. Fox Virginia Polytechnic Institute and State University Blacksburg, VA, 24061 May 2, 2017
More slides like this


Slide #2.

Acknowledgements • Dr. Edward A. Fox • Global events team • Social Interactome team • The Social Interactome of Recovery: Social Media as Therapy Development (NIH Grant 1R01DA039456-01) • Xuan Zhang and Yufeng Ma • Mostafa Mohammed 2 Final Presentation
More slides like this


Slide #3.

Outline • Background • Social network community; Social Interactome • Data • Challenges • Goal • Approaches • Network Classification • Learning via Markov Logic Networks • Future Work 3 Final Presentation
More slides like this


Slide #4.

Background: Social Interactome (SI) • Social Interactome • NIH-funded project conducted by a team of researchers • Study the community of people, who are recovering from addiction • Study their interactions in an online social network, built to provide support and management of their recovery • The project is broken down into set of “test vs. control” experiments with variables defined: • • • • Duration of study Number of participants required Avenue of recruitment Null and alternative hypotheses Final Presentation 4
More slides like this


Slide #5.

Background: SI Setup • The project is broken down into a set of clinical trials. For each clinical trial: • The team decides on a set of null and alternative hypotheses and the duration of the trial • Recruits participants for the trial • Organizes the participants into one of two (or more) 128node social network • Participants interact with the website and their assigned friends • Two 16-week clinical trials have been completed. Along with a set of smaller scaled trials executed via Amazon Mturk Final Presentation 5
More slides like this


Slide #6.

Background: SI Participant Info Demographic ... Family's history with addiction Past Social network experience Family Info Recovery Participation Scale Primary Addiction Religious Commitment Inventory Secondary Addiction Participants Minute Discounting - Collected from 19,070 questions - ~10 psychologybased measures - 16 surveys Addiction Severity Index Assessment Recovery Capital Social Connected Scale Adult Social Network Index Relapse Recovery Capital Scale Big 5 Personalities DSM-V 6 Final Presentation
More slides like this


Slide #7.

Background:SI Website Use Data TES Modules/ TES scores Pictures/Links Uploads News Stories Posts/Likes /Shares /Comments Success Stories Participants Response From Admin Posts Unpaid Assessments Private Messages Video Meetings 7 Final Presentation
More slides like this


Slide #8.

Overall Challenges • How do you organize the data? • How do you validate/clean the data? • What do you analyze first? And in what order do you go about it? • How do you make sense of the data? • How to interpret psychology-related measures? • Big goal: How to streamline the entire process from data collection to analyses to presentation such that it is reproducible and extensible? 8 Final Presentation
More slides like this


Slide #9.

Goal • Goal: Investigate/explore ways to model the data and recommend an approach. • Approaches to understand the data • • • • Frequency Distributions / Histograms Time series Checking for correlations Comparing means and standard deviations • t-tests • Statistical modeling 9 Final Presentation
More slides like this


Slide #10.

Approaches • Statistical Modeling • What do we model? • • • • • • Substance relapse Engagement/Change in engagement Change in psychology-related measures Change in behavior Homophily Friendship or Trust • Factors • Classification: What would be the predictor variables? Response variables? • PGMs: Directed or Undirected? What would be the factors? 10 Final Presentation
More slides like this


Slide #11.

Approaches • Classification • Network-Classification using NetKit-SRL (Statistical Relational Learning)1 [Focus of the presentation] • Learning using Markov Logic Networks2 1 Sofus A. Macskassy , Foster Provost. "Classification in Networked Data: A toolkit and a univariate case study," Journal of Machine Learning, 8(May):935-983, 2007. 2 Domingos, Pedro and Richardson, Matthew (2007). Markov Logic: A Unifying Framework for Statistical Relational Learning. In L. Getoor and B. Taskar (eds.), Introduction to Statistical Relational Learning (pp. 339371), 2007. Cambridge, MA: MIT Press. Final Presentation 11
More slides like this


Slide #12.

Network Classification • Idea: Taking advantage of relational information in addition to attribute information for entity classification. Example: Networked data. • Focuses on within-network classification • Networks of web pages, research papers, social networks, etc. • Netkit-SRL: Toolkit developed to employ statistical relational learning and inference 12 Final Presentation
More slides like this


Slide #13.

Network Classification • Netkit-SRL • Network learning toolkit for classification and inference • Developed by Dr. Macskassy & Dr. Provost • Has 3 components • Non-relational model • Relational model • Collective inference • Specific Outcomes: • Maximize P(x|GK), where x are labels to be estimated and GK is everything known in the network • Estimating joint distribution over the labels • Input: • Graph with edges describing relationships and attributes of nodes Final Presentation 13
More slides like this


Slide #14.

Network Classification • Netkit-SRL Components Purpose Approaches Local (Non-relational) Classifier Returns a model which uses only attributes of a node to estimate its class label. 1) Uniform prior; 2) Class-prior Relational Classifier Returns a model which uses not only the local attributes of a node but also attributes of related nodes, including their (estimated) class membership. 1) Weighted-vote relational neighbor; 2) Classdistributional relational neighbor; 3) Network-only multinomial Bayes classifier with Markov Random Field estimation Collective Inference This module applies collective inference in order to (approximately) maximize the joint probability of the labels of all nodes in the graph whose labels were initially unknown. 1) Relaxation labeling; 2) Iterative classification; 3) Gibb’s sampling 14 Final Presentation
More slides like this


Slide #15.

Network Classification • Possible instantiations Author Chakrabarti et al. (1998)1 Lu & Getoor (2003)2 Macskassy & Provost (2003)3 Non-relational Classifier Naïve Bayes classifier Relational Classifier Collective Inference Naïve Bayes Markov Random Field Relaxation labeling Logistic regression Logistic regression Iterative classification Classes priors Majority vote of neighboring classes Relaxation labeling [1] Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced Hypertext Categorization Using Hyperlinks. Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 307– 318). [2] Lu, Q., & Getoor, L. (2003). Link-Based Classification. International Conference on Machine Learning, ICML-2003 (pp. 496–503). [3] Macskassy, S. A., & Provost, F. (2003). A Simple Relational Classifier. Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003 (pp. 64–76). Final Presentation 15
More slides like this


Slide #16.

Network Classification • Weighted-vote relational neighbor classifier (wv-RN) • Authors: Macskassy, S. A., & Provost, F. (2003) • Estimates class membership by assuming existence of homophily • Weighted mean of class-membership probabilities of entities in De (where De is the neighbors of entity/node e) 16 Final Presentation
More slides like this


Slide #17.

Network Classification • Collective Inference using Relaxation Labeling • Definition of collective inference: • Similar but different to Gibbs sampling in that: • Keeps track of class probability estimates for XU • Instead of updating the graph one node at a time, updates class probabilities of all vertices, at iteration t+1, based on estimations from step t. 17 Final Presentation
More slides like this


Slide #18.

Network Classification: Experiment • Experiment • Rationale: Participants who are “homopholous” (who have shared background in common), have common interests. • Hypothesis: Given a set of common interests, between pairs of participants, one can predict the homophilymeasures with good accuracy. • Input graph • Nodes: Participants • Attributes: Addiction, Education, Income • Edges: Edge weight is the number of news stories + success stories + educational modules that both nodes (connected via the edge) have viewed in common. • Predicted attribute: Addiction Final Presentation 18
More slides like this


Slide #19.

Network Classification: Experiment Config • Possible Experiment configurations • Non-relational classifier: None • Relational classifier: (Options) • Weighted Vote Relational Neighbor • Class-Distributional Relational Neighbor • Collective inference: (Options) • Relaxation labeling • Gibbs sampling • Iterative classification • Data: Nodes and edges extracted from experiment 1 replicate 2 (E1R2) participant interactions. • Goal: Predict 1) Primary Addiction (given graph); 2) Education (given graph); 3) Income bracket (given graph) Final Presentation 19
More slides like this


Slide #20.

Network Classification: Experiment • E1R2 data statistics • # of nodes: 256; # of edges: 436 Pr im ar y Substance Breakdown A m ong 256 participants 160 139 140 120 100 Frequency 80 60 40 41 30 18 20 17 1 0 sti m a ul s nt l ho co l a s id io p o s di ci a so e tiv 1 ca co e in es pr n tio ir p c s er li iz u nq rt a s es pr e /d ts an 7 is ab n n ca 1 1 r he Ot e tin ci o n Primary Substance 20 Final Presentation
More slides like this


Slide #21.

Network Classification: Experiment • Experiment E1R2 data statistics • Edge weight breakdown Edge weight breakdown 350 317 300 Frequency 250 200 150 100 66 50 24 0 1 2 3 11 7 5 1 3 2 1 4 5 6 7 8 11 12 Edge Strength Final Presentation 21
More slides like this


Slide #22.

Network Classification: Experiment Results • Network classification framework results (various experiment configurations given as row/column names) (Metric: Accuracy) • Goal: Predict “Primary Addiction” of participants Relational Classifier/Collective Inference methods Relaxation Labeling Gibbs Sampling Iterative Classification Weighted Vote Relational Neighbor (wvRN) 0.36601 0.37908 0.39216 Class-Distributional Relational Neighbor 0.15686 0.22222 0.18954 22 Final Presentation
More slides like this


Slide #23.

Network Classification: Experiment Results • Predicted Response/Class = Primary Addiction • Configuration: wvRN with relaxation labeling • Confusion Matrix 23 Final Presentation
More slides like this


Slide #24.

Network Classification: Experiment Results • Predicted Response/Class = Education • Configuration: wvRN with relaxation labeling • Confusion Matrix 24 Final Presentation
More slides like this


Slide #25.

Network Classification: Experiment Results • Predicted Response/Class = Income • Configuration: wvRN with relaxation labeling • Confusion Matrix 25 Final Presentation
More slides like this


Slide #26.

Network Classification: Experiment Conclusion • Conclusion • The highest accuracy for all experiment configurations for predicting primary addiction as shown in slide 22, is 0.392 • The confusion matrix for predicting each of primary addiction, education and income shows more details on the accuracy of predicting each class. • The accuracy is low. • This is probably due to the fact that our experiment configuration does NOT include a non-relational component. • Furthermore, our graph edges, and attributes have only 1-3 fields. The graph needs to be more dense with a lot more information to be used for network-based inference. Final Presentation 26
More slides like this


Slide #27.

Network Classification: Next Steps • Possible extensions of the work: • Build graph with different representation of edges • Construct more attributes of the node for non-relational (local) classifier step • Try experiments with priors learnt from various traditional classification models. • Problem/Challenge • Extension or further work is open-ended. • Part of doctoral work: Build a logical flowchart of inquiries/hypotheses. • The logical flowchart of inquiries can be used and called upon based on user’s line of inquiry. Final Presentation 27
More slides like this


Slide #28.

Learning via Markov Logic Networks • A Markov Logic Network (MLN) is a set of pairs (F, w) where • F is a formula in first-order logic • w is a real number • Together with a set of constants, it defines a Markov network with • One node for each grounding of each predicate in the MLN • One feature for each grounding of each formula F in the MLN, with the corresponding weight w *Slide source: http://www.cs.washington.edu/homes/pedrod/psrai.ppt Final Presentation 28
More slides like this


Slide #29.

Learning via Markov Logic Networks Two constants: Anna (A) and Bob (B) Smokes(A) Cancer(A) Smokes(B) Cancer(B) *Slide source: http://www.cs.washington.edu/homes/pedrod/psrai.ppt Final Presentation 29
More slides like this


Slide #30.

Learning via Markov Logic Networks Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Cancer(A) Friends(B,B) Cancer(B) Friends(B,A) *Slide source: http://www.cs.washington.edu/homes/pedrod/psrai.ppt Final Presentation 30
More slides like this


Slide #31.

Learning via Markov Logic Networks Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Cancer(A) Friends(B,B) Cancer(B) Friends(B,A) *Slide source: http://www.cs.washington.edu/homes/pedrod/psrai.ppt Final Presentation 31
More slides like this


Slide #32.

Learning via Markov Logic Networks Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Cancer(A) Friends(B,B) Cancer(B) Friends(B,A) 1   Probability of a world x: P ( x )  exp   wi ni ( x)  Z  i  Weight of formula i No. of true groundings of formula i in x *Slide source: http://www.cs.washington.edu/homes/pedrod/psrai.ppt Final Presentation 32
More slides like this


Slide #33.

Learning via Markov Logic Networks Tasks/Applications • • • • • • • Basics Logistic regression Hypertext classification Information retrieval Entity resolution Hidden Markov models Information extraction • • • • • • • Statistical parsing Semantic processing Bayesian networks Relational models Robot mapping Planning and MDPs Practical tips *Slide source: http://www.cs.washington.edu/homes/pedrod/psrai.ppt Final Presentation 33
More slides like this


Slide #34.

Future work • Next steps • Extract more attributes for each participant • Compiledifferent ways to represent edge weight • Build local classifier and testing results for Netkit-SRL • Use Alchemy to represent data using Markov Logic networks. 34 Final Presentation
More slides like this


Slide #35.

Questions?
More slides like this


Slide #36.

Network Classification • Other works • • • • • • • Inductive logic programming Markov random fields Conditional random fields Probabilistic relational models Relational Bayesian networks Relational dependency networks Relational Markov networks 36 Final Presentation
More slides like this