Advanced, Analytic, Automated (AAA) Measurement of Engagement During Learning Sidney D’Mello,* Ed Dieterle, and Angela Duckworth Author information ► Copyright and License information ► Disclaimer Go to: Abstract It is generally acknowledged that engagement plays a critical role in learning. Unfortunately, the study of engagement has been stymied by a lack of valid and efficient measures. We introduce the advanced, analytic, and automated (AAA) approach to measure engagement at fine-grained temporal resolutions. The AAA measurement approach is grounded in embodied theories of cognition and affect, which advocate a close coupling between thought and action. It uses machine-learned computational models to automatically infer mental states associated with engagement (e.g., interest, flow) from machine-readable behavioral and physiological signals (e.g., facial expressions, eye tracking, click-stream data) and from aspects of the environmental context. We present15 case studies that illustrate the potential of the AAA approach for measuring engagement in digital learning environments. We discuss strengths and weaknesses of the AAA approach, concluding that it has significant promise to catalyze engagement research. Keywords: engagement, measurement, machine learning, digital learning In the popular 1999 Hollywood film The Matrix, the character Trinity learns to fly a helicopter in a matter of seconds by downloading the training program directly into her brain. Another character, Neo, learns Kung-Fu in much the same way. If only learning could be this efficient and effortless. Alas, most meaningful learning takes considerable time and effort (but see Shibata, Watanabe, Sasaki, and Kawato (2011) who appear to have made initial progress towards Matrix-style learning). It also requires sustained engagement, a point widely recognized by researchers, practitioners, and policy-makers (Loveless, 2015; PISA, 2012). Researchers have also made significant advances in conceptualizing student engagement or academic engagement as a complex multi-componential, multitemporal construct involving a diverse range of phenomena, such as momentary affective states of interest and enjoyment to long-term dispositions about school (Christenson, Reschly, & Wylie, 2012; Linnenbrink-Garcia & Pekrun, 2011; Sinatra, Heddy, & Lombardi, 2015). Unfortunately, methodological advances have lagged theoretical developments (Azevedo, 2015; Sinatra et al., 2015). Traditional measures of engagement include self-report questionnaires, experience-sampling methods, online observations, video coding, teacher ratings, and discourse analysis (Fredricks & McColskey, 2012; Henrie, Halverson, & Graham, 2015). Methodological advances have so far been limited to iterative refinement of traditional measures or combining methods (Greene, 2015). In our view, radical improvements require a qualitatively different measurement approach. The digital revolution has fundamentally transformed how students engage in learning. In parallel, a new and exciting digital measurement approach is emerging as a viable complement to traditional measures. This approach uses advanced computational techniques for the analytic measurement of fine-grained components of engagement in a fully automated fashion. This advanced, analytic, and automated (AAA) measurement approach is theoretically-grounded in the embodied affective and cognitive sciences, while its methodological footing stems from the fields of digital signal processing and machine learning. The AAA approach espouses measures that are fine-grained and contextually coupled with unfolding learning events, so these measures can answer questions about why a learner is engaged, what an engaged interaction looks like, and how engagement changes over time. This information, in turn, can be used to develop interventions that dynamically respond to periods of waning engagement, thereby facilitating change in tandem with measurement. We believe that AAA-based measures fill a critical gap in educational measurement. Contributors to a recent special issue of Educational Psychologist on “The Challenges of Defining and Measuring Student Engagement in Science” highlighted the need for new and innovative measures of engagement, especially micro-level measures to complement existing macro-level measures. For example, in their introductory article, the guest editors Sinatra et al. (2015) noted, “Also absent [from the special issue] are studies using more micro-level analyses of engagement such as eye tracking, physiology measures, and even brain imaging work” (p. 15). Such measures have been in development for over a decade in specialized research areas (e.g., affective computing and augmented cognition) that might be unfamiliar to most educational psychologists. However, an interdisciplinary approach is precisely what is needed to catalyze innovation in measurement of a complex construct like engagement. This point is aptly made by Azevedo (2015) in his commentary on the special issue and his perspective on the future of the field: “It is important to explicitly highlight that the first path [to develop an overarching and unifying theoretical framework to account for the majority of critical elements of the construct] is challenging and that many researchers may not be willing to pursue it for a variety of reasons (…). Such a challenge will require interdisciplinary research efforts currently witnessed in several fields” (p. 88). We respond to this call to action by providing an accessible introduction, selective review, and analysis of the AAA measurement approach that has emerged at the intersection of the psychological and computing sciences. What is Engagement? A scientific definition of engagement remains elusive. Reschly and Christenson (2012) note that the term engagement has been used to describe diverse behaviors, thoughts, perceptions, feelings, and attitudes, and at the same time, diverse terms have been used by different authors to refer to similar constructs. Theorists generally agree that engagement is a multidimensional construct, although the number and nature of the dimensions are unclear. Fredricks, Blumenfeld, and Paris (2004) proposed three components of engagement. Emotional engagement encompasses feelings and attitudes about the learning task or learning context, such as feelings of interest towards a particular subject, teacher (Renninger & Bachrach, 2015), or general satisfaction about school. Behavioral engagement broadly refers to learners’ participation in learning, including effort, persistence, and concentration. Cognitive engagement pertains to learners’ investment in the learning task, such as how they allocate effort toward learning, and their understanding and mastery of the material. Reeve and Tseng (2011) recently suggested a fourth dimension: agentic engagement, characterized by learners proactively contributing to the learning process. Alternatively, Pekrun and Linnenbrink-Garcia (2012) proposed a five component model that includes cognitive (e.g., attention and memory processes), motivational (e.g., intrinsic and extrinsic motivation), behavioral (e.g., effort and persistence), social-behavioral (e.g., participating with peers), and cognitive-behavioral (e.g., strategy use and self-regulation) aspects of engagement. We can trace the diverse components of engagement to different theoretical traditions. Theories of motivation, including self-determination theory (Deci & Ryan, 1985; Ryan & Deci, 2000), expectancy-value theory (Eccles & Wigfield, 2002), and self-efficacy theory (Bandura, 1986, 1997; Schunk & Pajares, 2005), focus on precursors of engagement, such as self-efficacy, interest in and value of a learning activity, autonomy, and the alignment between skill and challenge. Cognitive theories focus instead on the extent to which the learning activity engages the cognitive system (Eastwood, Frischen, Fenske, & Smilek, 2012). For example, the Interactive-Constructive-Active-Passive (ICAP) framework proposes four levels of cognitive engagement based on the level of interactivity afforded by the learning activity (Chi & Wylie, 2014). The levels, in decreasing order of expected engagement and learning, are Interactive (e.g., reciprocal teaching), Constructive (e.g., self-explanation), Active (e.g., verbatim note taking), and Passive (e.g., viewing a lecture). Author (year) extend ICAP to ICAP-A (attention) by suggesting that attentional control follows a similar pattern in that learners will maximally attend to interactive tasks and minimally to passive tasks (i.e., I > C > A > P). Finally, affective theories, including the control-value theory of academic emotions (Pekrun & Linnenbrink-Garcia, 2012), the assimilation-accommodation framework (Fiedler & Beier, 2014), and discrepancy-interruption and goal appraisal theories (Author, year; Mandler, 1990; Stein & Levine, 1991) emphasize the role of physiological arousal and cognitive appraisal in triggering emotions during learning and on the influence of affect on cognition and instrumental action. Thus, engagement has emerged as a broad and complex construct pertaining to diverse aspects of the educational experience (e.g., showing up, completing homework, feelings of belongingness, graduating) and across multiple time scales (e.g., momentary affective episodes, stable dispositions such as general disengagement with school, and life-altering outcomes like dropping out of school). As Eccles and Wang (2012) note, these broad all-encompassing definitions make the construct more accessible for policy-makers and the educated lay person, but less useful for scientific research where precise definitions are of greater value, especially when it comes to elucidating cause and effect relationships. Thus, measuring “general” engagement might be as theoretically diffuse as measuring “cognition” or “emotion.” It may be more fruitful to study specific aspects of this complex construct with an eye for broader assimilation across measures. In this vein, Sinatra et al. (2015) conceptualize engagement along a continuum, anchored by person-oriented perspectives at one extreme, context-oriented at the other, and person-in-context in between. Person-oriented perspectives focus on the cognitive, affective, and motivational states of the student at the moment of learning and are best captured with fine-grained physiological and behavioral measures (e.g., electrodermal activity, facial expressions, actions). The context-oriented perspective emphasizes the environmental context as the analytic unit. Here, the focus is on macro-level structures like teachers, classrooms, schools, and the community, rather than the individual student. Finally, the intermediate-grain size, person-in-context perspective conceptualizes engagement at the level of the interaction between student and context (e.g., how students interact with each other or with technology). We adopt a multi-componential perspective. For this we operationalize engagement in terms of affective states, cognitive states, and behaviors that arise from interactions with the learning environment. We conceptualize engagement as a goal-directed state of active and focused involvement in a learning activity. It is temporally constrained in that we are concerned with the state (not trait) of engagement across micro-level time scales ranging from seconds to minutes. Thus, our operationalization of engagement, and the AAA measurement approach derived from it, aligns with the person-oriented level of analysis of Sinatra et al. (2015). We should clarify that the term person-oriented does not imply that engagement is stable over time; rather, it refers to a micro-level analysis centered on the thoughts, feelings, and behaviors that emerge from a person’s interaction with his or her environment. It is also distinct from a person-in-context level of analysis because the focus is on the person rather than his or her interaction with the environment. Contemporary Engagement Measures The most widely used measures of engagement are self-report questionnaires; see Fredricks and McColskey (2012); Greene (2015); Henrie et al. (2015) for reviews. Although relatively inexpensive, easy to administer, and generally reliable, questionnaires have well-known limitations (Author, year; Krosnick, 1999). For instance, when endorsing items, respondents must compare the target (e.g., a teacher rating a student, a student rating himself or herself) to some implicit standard, and standards may vary from respondent to respondent. To one student, “I am a hard worker” may be exemplified by doing five hours of homework each day; for others, the same statement may be exemplified by simply showing up for class. For both informant-report and self-report questionnaires, biases that arise from heterogeneous frames of reference reduce validity (Heine, Lehman, Peng, & Greenholtz, 2002). For self-report questionnaires, social desirability bias is another important limitation (Krosnick, 1999), both when respondents aim to appear admirable to others and also when they inflate responses to preserve their own self-esteem. Likewise, memory recall limitations and acquiescence bias can influence self-reports, and halo effects can influence informant-reports (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). Several non-questionnaire engagement measures have also been developed. Examples include experience-sampling methods (ESM) (Csikszentmihalyi & Larson, 1987), day reconstruction (Kahneman, Krueger, Schkade, Schwarz, & Stone, 2004), and interviews (Turner & Meyer, 2000). However, because they still rely on self- and informer-reports, they are subject to similar biases as questionnaires. Observational methods are an attractive alternative to self- and informer-reports because they are arguably more objective (Nystrand & Gamoran, 1991; Pianta, Hamre, & Allen, 2012; Renninger & Bachrach, 2015; Ryu & Lombardi, 2015; Volpe, DiPerna, Hintze, & Shapiro, 2005). Unfortunately, these methods entail considerable human effort, which might not be a major limitation for small scale studies, but poses a significant challenge for repeated long-term measurement at scale. Further, observations cannot be conducted in some learning contexts, such as students’ homes. Researchers have attempted to circumvent some of the limitations of observational methods by combining automated data collection with semi-automated or manual data coding. For example, the Electronically Activated Recorder (EAR) is a device that randomly samples audio clips in naturalistic environments (Mehl, Pennebaker, Crow, Dabbs, & Price, 2001). Data collection with the EAR is efficient and cost-effective; however, the data still need to be transcribed and coded by humans, which increases cost and reduces scalability. Similarly, engagement can be coded from videos by researchers (Author, year) or even teachers (Author, year), but video coding is a labor- and time-intensive effort. Finally, engagement can be adduced from academic and behavior records, such as homework completion, absences, achievement test scores, and teacher ratings of classroom conduct (Lehr, Sinclair, & Christenson, 2004; Skinner & Belmont, 1993), but these measures are limited in what they can reveal about engagement at the micro-analytic level espoused here. The Advanced, Analytic, Automated (AAA) Measurement Approach An AAA-based measure provides continual assessments of person-oriented components of engagement at a fine-grained temporal resolution, all with no human involvement. These measures have several advantages over counterparts. They are uniquely suited to track person-oriented components of engagement since they operate at fine-grained time scales ranging from seconds to a few minutes. They are more objective because computers provide the measurements, thereby partially obviating reference, social desirability, acquiescence, and other biases associated with self- and observer-reports. AAA-based measures are also unaffected by momentary lapses in attention or by fatigue, as can occur with humans. They vastly reduce time and effort, which is a limitation of ESM, day-reconstruction, video coding, and observations. In this paper we introduce the theoretical and methodological foundation of the AAA approach, highlight exemplary AAA-based measures, and analyze the approach and measures derived from it. To keep the scope manageable, we emphasize measures that are nonintrusive, cost-effective, and are usable in the near-term. These include analyzing machine-readable aspects of a learning session, such as log files recorded during interactions with digital learning environments, facial features, eye gaze, and physiology. Several of these signals have a long history in the psychological sciences, including the measurement of cognitive engagement (Miller, 2015). However, they have mainly been used as passive data sources that humans analyze offline. The AAA approach stands apart because it combines machine-sensing and machine-analysis to provide measurement that is real-time and fully-automated. Go to: Theoretical and Methodological Foundations We ground the AAA measurement approach in the aforementioned person-oriented operationalization of engagement as the momentary affective and cognitive states that arise throughout the learning process. Embodied theories of cognition and affect posit that these mental states manifest in the body in multiple ways because cognition and affect are in the service of action and bodies are the agents of action (Barsalou, 2008; deVega, Glenberg, & Graesser, 2008; Ekman, 1992; Niedenthal, 2007; Russell, Bachorowski, & Fernandez-Dols, 2003). For example, there is increased activation in the sympathetic nervous system during fight or flight responses (Larsen, Berntson, Poehlmann, Ito, & Cacioppo, 2008). Similarly, there are well-known relationships between facial expressions and affective states (Ekman, 1984; Keltner & Ekman, 2000; Matsumoto, Keltner, Shiota, O’Sullivan, & Frank, 2008), for example the furrowed brow during experiences of confusion (Author, year; Darwin, 1872). Researchers have also identified bodily/physiological correlates of cognitive states like attention and cognitive load. Eye movements are an invaluable tool to investigate visual attention due to the so called eye-mind link (Deubel & Schneider, 1996; Hoffman & Subramaniam, 1995; Rayner, 1998), while electroencephalography (EEG) can index mental workload via a brain-mind link (Berka et al., 2007). The mind-body link suggests that observable bodily responses can be used to infer unobservable mental states, which is at the heart of the AAA measurement approach. Here’s the basic assumption: cognitive and affective states reflecting different components of engagement are associated with responses at multiple levels (neurobiological, physiological, bodily expressions, overt actions, metacognitive, and subjective), which in turn influence the states themselves in a form of circular causality (Lewis, 2005). Some of these responses are implicit (e.g., neurobiological, some physiological changes) in that they occur outside of conscious awareness, while others are more explicit (e.g., metacognitive reflections, subjective feelings). The states are modulated by individual differences as well as contextual, social, and cultural influences (Elfenbein & Ambady, 2002; Kappas, 2013; Mesquita & Boiger, 2014). Some of these responses are detectable by machine sensors and human observers, but others are only accessible to the self. In particular, external observers only have access to visible behaviors (e.g., facial expressions, gestures, actions), information on the environmental context, and physiological changes (e.g., respiration rate), and must rely more heavily on inference to decode a person’s mental state (Mehu & Scherer, 2012). In contrast, the self has privileged access to subjective feelings, memories, meta-cognitive reflections, and some physiological changes, but not to other responses (e.g., involuntary expressions and neurobiological changes).. Machine sensors can measure neurobiological, bodily, physiological, and action-oriented responses beyond what is available to humans (e.g., thermal cameras, electroencephalogram), but they cannot infer the mental state from the measurements nor can they interpret contextual cues on par with humans. Thus, the core problem faced by machines is to infer the latent mental states associated with engagement (e.g., concentration, interest) from machine-readable signals and from aspects of the environmental context. AAA measurement begins when sensors record low-level signals. Signals are then processed to obtain high-level abstractions, called features. For instance, a video is the signal recorded from a web-cam (the sensor). Sample features, computed by applying computer vision techniques to video, include activations of specific facial muscles (also called action units, such as inner brow raise or lip pucker (Ekman & Friesen, 1978)), facial textures, and head position and orientation (Pantic & Patras, 2006; Valstar, Mehu, Jiang, Pantic, & Scherer, 2012). Similarly, digital signal processing techniques in the speech domain (Eyben, Wöllmer, & Schuller, 2010) are used to extract paralinguistic (also called acoustic-prosodic) features such as pitch and amplitude from an audio signal recorded with microphones (the sensor). Researchers can also use this paradigm to analyze spoken content; in this case they’ll leverage automatic speech recognition and natural language understanding techniques. In general, signal processing methods (denoising, filtering, smoothing, feature extraction, etc.) are required to compute features from the raw signals (see Author, year; St. John, Kobus, Morrison, and Schmorrow (2004) for details on these methods). The next step in the AAA measurement approach entails inferring mental states from the corresponding features. This is done with machine learning, which prescribes methods to learn a program (or computational model) from data (Domingos, 2012). Machine learning has many subfields, of which supervised learning is most widely used in the AAA approach. Supervised learning (see Figure 1) requires training data, consisting of features (extracted from signals recorded by sensors as noted above) along with temporally synchronized annotations of mental states (e.g., from self-reports or observer judgments), collected at multiple points in a learning session. In a training phase, supervised learning methods automatically model (learn) the relationship between the features and human annotations to yield a computational model. The degree of overlap between the model-generated and human-provided annotations is assessed in a validation phase. The model can then take sensor data collected at some future time and/or from a new set of students and automatically generate estimates of mental states without needing human annotations. An external file that holds a picture, illustration, etc. Object name is nihms862914f1.jpg Open in a separate window Figure 1 Major steps involved in building an automated engagement measure The computational model can take on many forms depending on the supervised learning method. Selecting a computational model is a design decision with multiple tradeoffs - separability of feature space (i.e., data), transparency of internal representations, accuracy, generalizability, computational efficiency, robustness to noisy data, and others not discussed here. Hence, in contemplating the question of how to select an appropriate computational model? it is prudent to first ask, appropriate for what purpose? One important factor involves the linear separability of the data; i.e., whether the different classes (e.g., bored vs. curious vs. confused) as represented in feature space can be discriminated by linear functions, such as lines for two-dimensional data or hyperplanes for higher-dimensional data. Linear models are attractive in their simplicity, but are ineffective when the data is non-linearly separable, which is usually the case. These situations require more sophisticated models; for example, support vector machines transform a non-linearly separable feature space into a linearly separable space by projecting it into higher dimensions (Cortes & Vapnik, 1995). The added sophistication does incur a price, especially for some of the more complex models which have internal representations that are not inspectable. This so called “black box” problem is a frequent critique of machine learning. Although the concerns are valid for some models (e.g., neural networks), other models are much more transparent - for instance, models that operate by rule induction (e.g., If blink rate is high and heart rate is low then Boredom = high), organize rules into decision trees, or compute conditional probabilities of mental states given features (e.g., Probability [Boredom | {Blink rate = high and Heart rate = low]). In most cases, it is sufficient to select models with inspectable representations and with sufficient performance. However, priorities might shift when models are intended for real-time measurement, such as to trigger technological interventions aimed at re-engaging a bored learner (Author, year). Here, computational efficiency (both in terms of clock time and computational resources) and robustness (in the face of noisy or missing data) might take precedence over transparency and performance. This leads to another issue: how to quantify performance. Given that the goal is to use the model to provide accurate estimates of engagement on unseen data, two key performance metrics are accuracy and generalizability. Accuracy (similar to convergent validity) is measured as the alignment between automated estimates and an external standard, typically self- or observer-annotations. The alignment can be quantified by a number of standard metrics (e.g., recognition rate, kappa, correlation). Although it is difficult to specify exact bounds on what constitutes “good” accuracy (as discussed in detail later on), at a minimum it should exceed random guessing (chance). Generalizability is concerned with the robustness of the model when applied to data beyond what was used to train the model. It is usually established by dividing the data into two sets (A and B), training the model on one set (A or B), and testing it on its complement (B or A). Cross-validation is a widely used variant of this procedure in which each set serves as training and testing sets across multiple folds. For example, in 3-fold cross-validation, the data is divided into three sets A, B, and C, and folds are created as follows: Fold 1: train A and B, test C; Fold 2: train A and C, test B; Fold 3: train B and C, test A. This method ensures that every data point is tested at least once. The level of generalizability achieved depends on the data and how the folds are constructed. Instance-level validation ensures that individual cases are either in the training or testing set, but data points (albeit different ones) from the same person can be in both sets. The resulting model risks over-fitting to individual characteristics and may not generalize to new people. In contrast, person- or student-level validation ensures that data from the same person are either in the training or testing set but never both. This provides more confidence that the model will generalize to new people with similar characteristics. In population-level validation, the data are split on some population characteristic (e.g., gender) and tested on its complement (e.g., train males and test females, and vice versa). Go to: Case Studies We now discuss representative case studies featuring the AAA approach to measure person-oriented components of engagement during learning with technology. We have selected 15 studies to emphasize key dimensions of the measurement approach, including sensor-free vs. sensor-based measurement, annotations by the self vs. external observers, unimodal vs. multimodal sensing, lab vs. classroom research, learning activities with varying levels of interactivity, and different validation methods. We prioritized studies that can be considered as pioneering in the field, such as the first study showcasing multimodal engagement measurement in real-world classrooms (Arroyo et al., 2009), the first study emphasizing generalizability beyond the individual (Ocumpaugh, Baker, Gowda, Heffernan, & Heffernan, 2014), or the first person-independent automated measure of mind wandering (Author, year). We acknowledge that our choice of case studies is both subjective and incomplete, but our goal is to provide an overview of a promising new approach rather than review a well-established paradigm. We hope that the studies covered here will pique interest and inspire further inquiry into AAA-based measures. Table 1 provides an overview of the studies. Despite the considerable variability, each study followed the basic approach discussed above and summarized in Figure 1. Step 1 consists of recording signals (video, physiology, log files, etc.) as students complete a learning activity within a particular learning context (Step 1a) followed by computing features from the raw signals (Step 1b). In Step 2, annotations of mental states reflecting various components of engagement are obtained, from the students themselves, from external observers, or via some other method (see Author (year) for a review of methods to annotate mental states in learning contexts). In Step 3, supervised learning methods computationally model the relationship between the features and temporally synchronized annotations. In Step 4, the resulting model produces computer-generated engagement estimates that are compared to human-provided annotations for validation. Table 1 Overview of case studies Study Learning Context Component of engagement Annotation Method Sensor Signal Features Supervised Learning Method Generalizes to New Students Accuracy (Metric) Improvement over chance Discussed in main text Author (year) Computer literacy from AutoTutor Boredom, flow, confusion, frustration Offline video coding by self, peers, trained judges None Log files Discourse features and interaction patterns Varied No 0.71 (RR) 42% a Pardos et al. (2013) Math with ASSISTments Boredom, frustration, confusion, engaged concentration Online observations by researchers None Log files Interaction patterns Varied Yes 0.68 (A′) 30% b Gobert et al. (2015) Science microworlds with Inq-ITS Disengaged from task goal Offline coding of logs None Log files Interaction patterns PART Yes 0.81 (A′) 41% b Whitehill et al. (2014) Cognitive skills training on iPad Behavioral engagement (4 levels) Video coding by researchers Webcam Video Facial expressions Support vector machine Yes 0.73 (2AFC) 31% b Mota and Picard (2003) Constraint satisfaction game Interest (3-levels) Video coding by teachers Pressure-sensitive pads Pressure maps Body movements/ posture Hidden Markov Models Yes 0.77 (RR) 61% c Author (year) Research methods from text Probe-caught mind wandering (yes or no) Online self-reports Eye tracker Eye gaze & log files Eye movements, contextual cues Bayesian Yes 0.70 (RR) 25% b Arroyo et al. (2009) Math with Wayang Outpost Interest, confidence, excitement, frustration (1–5 scale) Online self-reports Webcam, physiological sensor, pressure-mouse, pressure-pads Log files, video, pressure maps, time series Interaction features, facial expressions, skin conductance, pressure exerted on mouse, body movements/ posture Linear Regression No 0.47 (R2) 47% d Bosch et al. (2016) Newtonian Physics with Physics Playground Boredom, engaged concentration, confusion, frustration, delight Online observations by researchers Webcam Video Facial expressions and body movements Varied Yes 0.69 (AUC) 37% e Discussed in supplementary review (Online Supplement A) Baker et al. (2012) Algebra with a Cognitive Tutor Boredom, engaged concentration, confusion, frustration Online observations by researchers None Log files Interaction patterns Varied Yes 0.85 (A′) 30% b Author (year) Writing proficiency with computer interface Engagement/flow, boredom Offline self-reports None Log files Keystrokes, individual attributes, task appraisals REP Tree Yes 0.87 (RR) 37% b Sabourin, Mott, and Lester (2011) Microbiology with Crystal Island Positive affect (curious, focused, excited) vs. other (anxious, bored, confused, frustrated) Online self-reports None Log files Interaction patterns and individual attributes Dynamic Bayesian Network Yes 0.73 (RR) 43% c Drummond and Litman (2010) Biology from text Zone outs (high vs. low) Online self-reports Microphone Audio Acoustic-prosodic feaures J48 Decision Trees No 0.64 (RR) 22% f Author (year) Research methods from text Probe-caught mind wandering (yes or no) Online self-reports Wearable physiological sensors Time series Skin conductance and skin temperature (&context) Filtered Classifier; LAD Tree Yes 0.58 (RR) 19% b Kapoor and Picard (2005) Constraint satisfaction game Interest (3-levels) Offline video coding by teachers Infrared camera, pressure-pads Log files, video, pressure maps Interaction context, facial expressions, body movements/posture Mixture of Gausian Processes No 0.87 (RR) 73% f Author (year) Creative writing Behavioral engagement (2-levels) Online self-reports, offline video-coding by self Depth-camera Depth maps, color video Facial expressions, facial textures, heart rate Updatable Naive Bayes Yes 0.75 (AUC) 49% e Open in a separate window Note. RR = recognition rate; 2AFC = 2-alternative forced choice. AUC = area under the receiver operating characteristic (ROC) curve. A′ = A-prime. R2 = coefficient of determination. a,b,c,d,e,fdenote the method used to obtain percent improvement above chance: a estimated as proportion improvement in RR over base rate; b Cohen’s kappa as reported; c Cohen’s kappa as estimated from reported classification tables; d R2 as reported in paper (assumes that a chance model will yield an R2 of 0); e estimated as improvement of achieved AUC over chance AUC of 0.5; f estimated as proportion improvement in RR over majority baseline (classifying all instances as the majority class). In the interest of brevity, we discuss eight case studies below and present the remaining seven in Supplementary Material A. We organize the case studies by sensors used. Sensor-free measures analyze digital traces recorded in log-files while sensor-based measures use physical sensors. We further categorize the sensor-based measures as sensor-light if they use sensors that are readily available in contemporary digital devices (e.g., webcams, microphones) or sensor-heavy if they require nonstandard sensors like eye trackers, pressure pads, and physiological sensors (see Figure 2). An external file that holds a picture, illustration, etc. Object name is nihms862914f2.jpg Open in a separate window Figure 2