Slide #1.

翟翟翟翟翟翟翟翟翟翟翟翟翟翟翟 Education for Big Data and Big Data for Education: Towards Integration of Big Data and Education ChengXiang Zhai ( 翟翟翟 ) Department of Computer Science University of Illinois at Urbana-Champaign USA BDSE2016, May 25, 2016, Guiyang, China 1
More slides like this


Slide #2.

The Big Data revolution: “DataScope” enhances human perception Microscope Telescope DataScope ( 数数数 ) 2
More slides like this


Slide #3.

DataScope enables prediction & optimal decision making Predicted Values of Real World Variables Predictive Model Teacher Change the World Real World Student Sensor 1 … Sensor k … Non-Text Data Multiple Predictors (Features) … Joint Mining of Non-Text and Text Text Data 3
More slides like this


Slide #4.

Big Data creates both challenges and opportunities for education • Challenges for education: Education for Big Data – Educate many data scientists & engineers quickly and affordably • Opportunities for education: Big Data for Education – Leverage Big Data technology to scale up and improve education • Big Data and education are mutually beneficial 翟 Integration! – Education supplies workforce for developing innovative Big Data technology and applications – Big Data supplies technology for scaling up and improving quality of education 4
More slides like this


Slide #5.

Rest of the talk 1. Education for Big Data 2. Big Data for Education 3. Integration of Big Data and Education 5
More slides like this


Slide #6.

Part 1: Education for Big Data “….(in “….(in the the next next few few years) years) we we project project a a need need for for 1.5 1.5 million million additional additional analysts analysts in in the the United United States States who who can can analyze analyze data data effectively…“, effectively…“, --- McKinsey McKinsey Big Big Data Data Study, Study, 2012 2012 The need is global … 6
More slides like this


Slide #7.

Educating workforce for Big Data • Question 1: What to teach in Big Data? PhD, MS, BS in Data Science • Question 2: How to teach Big Data effectively at large scale with low cost? Massive Open Online Courses (MOOCs) 7
More slides like this


Slide #8.

What to teach? New degrees in Data Science? Cloud computing Artificial intelligence Operations research Human-computer interactions … + Health, Medicine, Finance, Smart City, Education, … Application Highly interdisciplinary! Analysis Acquisition Sensor network Internet of things Statistical sampling … Aggregation Data mining Machine learning Statistical modeling Scalable systems … Databases Information retrieval NLP, Computer vision … 8
More slides like this


Slide #9.

How to teach? Emergency of Massive Open Online Courses (MOOCs) • Many platforms: Coursera, Edx, Udacity, 数数数数数数数数 ,… • Characteristics – Free/affordable education at large scale on all kinds of topics – Limited assessment support, but strong online community support – Partnership with universities • Early stage of “education revolution” enabled by IT & Big Data (more later) 9
More slides like this


Slide #10.

My experience with MOOCs • Taught 2 MOOCs in 2015 = CS410 Text Info Systems at UIUC – Text Retrieval and Search Engines – Text Mining and Analytics • Coordinated Data Mining Specialization: 5 courses + Capstone – Pattern Discovery – Cluster Analysis – Text Retrieval – Text Mining – Visualization – Capstone Project 10
More slides like this


Slide #11.

Text Retrieval & Text Mining MOOCs • Each lasted 4 weeks – Modularized video lectures – Weekly quizzes – Programming assignment (open challenge with a leaderboard) with auto grading • Enrollment – ~50,000 signed up – > 10,000 seriously watched lecture videos – 1,000~1,500 completed the course – 700~900 did programming assignments 11
More slides like this


Slide #12.

Students are from all over the world! 64,651 Learners 181 Countries 12
More slides like this


Slide #13.

The majority of learners are 25~44 years old 25~44 years old 13
More slides like this


Slide #14.

US, India, and China have most of the learners United States India China 14
More slides like this


Slide #15.

Most learners have full-time job and {BS, MS} degree 15
More slides like this


Slide #16.

Challenges in teaching “big data” at large scale • General challenges in MOOCs – Variable student background – Variable student needs – Reliability of assessment • Special challenges to “big data” – Programming assignments are essential: variable student resources & background – Availability of interesting real-world data sets – Automated grading of programming assignments 16
More slides like this


Slide #17.

Self-Sustaining Data Set Annotations & Open Challenge Annotations Annotations ... Annotation Assignment Auto Grader ... Raw Data Set ... Annotations Open Challenge Competition Assignment Leaderboard #1 Team1 0.81 #2 Team 2 0.75 … Test Collection 18
More slides like this


Slide #18.

Example of a new data set (for online course retrieval) High grades 翟 More reliable annotations 19
More slides like this


Slide #19.

Search Engine Contest: Leaderboard 20
More slides like this


Slide #20.

Overall lessons from the MOOCs • Learners of MOOCs are a different crowd than the on-campus students – Practical mindset, self-motivated, but less background and less time – Pre-quiz is necessary for such technical courses (set realistic expectation) – Learners form self-supporting online communities • Short modularized lecture videos are preferred • Programming assignments are very much appreciated • Crowdsourcing annotations and open competition worked well 数 MOOC goes beyond education to support research! • Limitations of current MOOCs – Lack of “individual care” (students don’t all get the needed help) – Solely rely on peer grading of sophisticated assignments (unreliable grading & ineffective feedback to students) 21
More slides like this


Slide #21.

Current Trend: Integration of MOOCs and Traditional Education Quality Traditional Classrooms HIGH cost Campus Degree + Flipped/Blended classroom LOW cost Online Degree + High Engagement component MINUM cost Specialization Certificate MINUM cost Course Certificate MOOC FREE No Certificate Scalability 22
More slides like this


Slide #22.

A new online MOOC-based program: MCS-DS at UIUC • • • • MCS-DS = Master of Computer Science in Data Science Tuition = $20,000 Courses =MOOCs + High Engagement Components Interdisciplinary – Courses mostly offered by Computer Science Department • Data Mining Specialization • Cloud Computing Specialization • Machine Learning – Other units include School of Information Science & Statistics Department 23
More slides like this


Slide #23.

Part 2: Big Data for Education Quality Scalable Intelligent MOOC Small Classrooms Towards Intelligent MOOC “Big Data Technology” Automate grading with machine learning MOOC Automate question answering on forums Scalability 24
More slides like this


Slide #24.

Traditional Manual Grading Submitted Assignments Graded Assignments Grade: 93 85 …. Proposed Automated Grading Submitted Assignments Multi-dimensional Grade Predictor Clustering Improvement Graded Assignments Grade Verification Batch grading Detailed Grading Results Performance & Behavior Analysis 25
More slides like this


Slide #25.

Preliminary results on grading medical case assignments are promising [Geigle et al. 2016] Chase Geigle, ChengXiang Zhai, Duncan Ferguson, An Exploration of Automated Grading of Complex Assignments, ACM Learning at Scale 2016. 26
More slides like this


Slide #26.

Towards Intelligent MOOC: Limitations of Current MOOC • Instruction materials limited to those pre-defined by an instructor 数 can’t take advantage of useful materials on the Web • Limited search capability inside a course 数 can’t easily find the most relevant video clip or discussion posts about a topic • No understanding of students 数 can’t personalize the instruction and learning experience • Limited support for collaborative learning 数 can’t leverage massive student behavior data to recommend materials for individual students • Limited support for interactions with students 数 can’t engage students in a natural dialogue 27
More slides like this


Slide #27.

Novel Features of an Intelligent MOOC • Seamless integration of MOOC and Web search 数 enable students to learn from the Web • Concept/Topic search, navigation, and summarization 数 enable students to quickly find all materials about a concept or topic • Dynamic and adaptive student modeling 数 enable deep understanding of student state of knowledge • Lifetime learning from student behavior data 数enable effective support of collaborative learning • Interactive personalized teaching 数 enable personalized natural conversations between students and the system 28
More slides like this


Slide #28.

Current MOOC Student Record … Traditional MOOC Platform MOOC Course Content MOOC Activity Log 29
More slides like this


Slide #29.

An Intelligent MOOC Student Model Open Web Concept Recommender … Student Modeler Interactive Teaching Interface MOOC Activity Log Personalized Search agent Topic/Concept Graph generator Concept Navigator Concept/Topic Search agent MOOC Course Content 30
More slides like this


Slide #30.

Part 3: Integration of Big Data and Education Intelligent MOOC Platform Improve Scalability & Quality Educate ? Applied to MOOC Log Education Big Data Research & Develop Big Data Technology 32
More slides like this


Slide #31.

Toward a Cloud-based Big Data Virtual Lab Leaderboard #1 Team1 0.81 #2 Team 2 0.75 … App Data 1 Big Data Tool 1 … Log Data Big Data Tool 2 … App Data N Leaderboard #1 Team1 0.5 #2 Team 2 0.3 … Big Data Tool 1 Big Data Education System … 33
More slides like this


Slide #32.

Unification of education, research, and applications! 4. Industry data sets not released to students & researchers 翟 Privacy-preserving Big Data education & research 3. Well-archived interaction history 翟 Reproducibility of research 2. Encourage open exploration (research) 翟 Remove gap between education & research 1. Directly work on industry data sets and problems 翟 Remove gap between education & applications 34
More slides like this


Slide #33.

Final Thoughts: Education Revolution & Automation • Big Data and IT enable education revolution and automation toward more affordable high-quality education – IT enables one teacher to teach many more students than before (efficiency) – Big Data technology would enable “automated” TA/instructor (scalability) – Intelligent MOOC would improve quality of education at low cost • Implications: Many traditional boundaries will likely disappear! – No strict distinction between a teacher and a student (everyone learns from each other) – No strict distinction between grade levels or age groups (learn at your own pace) – No inherent boundaries between different courses (due to high modularization) – No boundaries of subject areas (due to high modularization) – No boundaries of institutions (MOOCs unify all institutions!) 35
More slides like this


Slide #34.

Thank You! Questions/Comments? 36
More slides like this