Better Data Beats Big Data

1882 Words4 Pages

3. DATA
We consider a large set of CLCT student usage data, collected in 2010. Although the tutor was used in several thousand schools across the United States, its full logging capability was activated for only about 20% of schools in which it was used. Our initial dataset covered 144,080 registered students in 899 schools with close to 473 million records overall, including activity unrelated to problem-solving, like signing in, signing out, and solving practice problems. After extracting targeted, substantive, problem-solving activity, we arrived at a dataset that included 342 schools, 72,082 students, and 88.6 million problem-solving actions.
We queried the National Center for Education Statistics (NCES) and internal data for school metadata that included the number of students enrolled (as a proxy of school’s relative size), student-teacher ratio, number of students eligible to receive free or reduced price lunch (as a proxy for socio-economic status), and roughly the setting of the school’s location: rural, suburban, or urban. Although some of the school metadata from NCES and internal records were from the year 2011, we assume that fluctuations in the numbers are negligible for our analyses. We matched full NCES and internal records for a subset of 232 schools, narrowing our selection to, 55,012 students with substantive usage (i.e., attempting one than one unit of instruction) and 67.3 million problem-solving actions.
In addition to the school metadata, we computed student performance statistics from our logs. For each school we have computed the average number of distinct units students were attempting and the standard error of the number of units attempted. To further characterize schools we ran a mixed effects logisti...

... middle of paper ...

...r Modeling, Adaptation, and Personalization (UMAP 2010), Big Island, HI, USA, 2010 (pp. 255-266). Springer.
[6] Ritter, S., Anderson, J.R., Koedinger, K.R., Corbett, A.T. (2007). Cognitive Tutor: applied research in mathematics education. Psychon Bull Rev, 14:249-255.
[7] Wang, Y. & Beck, J. (2013). Class vs. Student in a Bayesian Network Student Model. In 16th International Conference on Artificial Intelligence in Education (AIED 2013), Memphis, TN, USA, 2013 (pp. 151-160). Springer.
[8] Ward, J. H. (1963) Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, 58, 236–244.
[9] Yudelson, M., Koedinger, K. R., & Gordon, G. J. (2013). Individualized Bayesian Knowledge Tracing Models. In 16th International Conference on Artificial Intelligence in Education (AIED 2013), Memphis, TN, USA, 2013 (pp. 171-180). Springer.

Open Document