by Jingchen (Monika) Hu and Ming-Wen An
Vassar College hosted its first DataFest, an American Statistical Association sponsored weekend-long data analysis competition from Friday April 8 to Sunday April 10, 2016. (See also Vassar DataFest 2017.) Student response was enthusiastic; approximately 40 students (9 teams) representing 9 different single majors and 6 different double major combinations participated in DataFest. Generous support was provided by Vassar administrative offices and academic departments across disciplines, as well as external companies.
On Friday evening, the large and messy datasets from Ticketmaster were revealed to the students in a kick-off event. Over the next 48 hours, each team developed their own research question and worked together to analyze and gain insights into the data. Throughout the weekend, over a dozen Vassar faculty members and professionals from the Poughkeepsie area volunteered as consultants. On Sunday afternoon, each team presented their findings to the other teams and a panel of volunteer judges.
The weekend involved challenging and multi-dimensional problem solving. Students faced such questions as, How to formulate an appropriate research question? Can a subset of variables in the dataset answer the research question? How to best visualize the data? Do we need any external sources to complement our analysis, and if so, can we access them? What are the best statistical methods to tackle the question? How to implement the chosen methods on a large dataset using statistical software? What key messages to conclude from our findings? How to most effectively communicate our findings to a judge panel and other participants, in 15 minutes? How to work with others with different skill sets and background? How to best utilize the strengths of all team members? How to work under time pressure and make compromises, if necessary?
Overall, participating in DataFest involved scientific reasoning, critical thinking, teamwork, communication and more. Such an experience is valuable for students in any stage of their academic career.
Quite a number of students expressed strong interest in competing again next year. While many of them appreciated the workshops for R and MATLAB held prior to DataFest, participating in DataFest and being challenged with a real-world data problem made them realize how much more statistics knowledge and data processing skills they would like to improve through future coursework and hands-on experiences. Students also enjoyed working in teams and learning about approaches the other teams had taken. Faculty and professional volunteers were amazed at students’ turnout, excitement and enthusiasm for a not-for-credit event.
As statisticians, we are building a sequence of statistics courses and potentially a correlate track at Vassar. We are passionate about promoting statistics both in and beyond the classroom. DataFest is one such example, where our students have the opportunity to practice their data analytic skills outside the classroom and use real-world data from industry. Statistics is a naturally interdisciplinary subject, and we believe every student should become data/statistics literate as they pursue their own academic interests at Vassar and beyond.