Background
Educational and psychological assessments are ubiquitous in school settings around the world. Assessments can range from low-stakes formative activities that guide instructional planning to high-stakes exams that inform individual college entrance. Moreover, international studies routinely administer standardized assessments as a way of rank ordering educational systems globally. Researchers from multiple disciplines use educational and psychological assessments as part of program evaluation activities.
Irrespective of their specific use, virtually all educational and psychological assessments involve the presentation of items of varied difficulty levels to ascertain individual differences in knowledge (e.g., math achievement) and/or ability (e.g., IQ tests). Assessments are typically administered in quiet settings using standardized instructions and protocols to provide the greatest assurance that a student’s performance represents their optimal knowledge/ability. This traditional approach to assessment assumes that individual and situational differences that may contribute to a student’s test performance (e.g., test anxiety, fatigue) are ignorable and/or can be mitigated (e.g., through rapport building with assessors, judicious decisions about test length, the provision of breaks). Contrary to this assumption, there is consistent evidence that test performance reflects an unknown mix of knowledge/ability and effort, especially in low-stakes tests (Duckworth et al., 2011). Here, “effort” refers to a range of motivational, dispositional, and energetic factors that contribute to student performance and that are distinct from knowledge/ability.
The conflation of knowledge/ability with effort undermines the use of test data for decision making. For example, a variety of international exams are used to facilitate between-country comparisons in academic achievement and problem-solving skills (e.g., Program for International Student Assessment). Whereas these exams have low stakes for students, they have high stakes for governments and policy makers, who use these data to inform educational and economic policy. Students routinely endorse low levels of effort when completing these tests, and the degree of effort varies across countries (Akyol et al., 2021).
Similarly, elementary school educators routinely conduct universal testing to identify students who are eligible to participate in gifted and accelerated learning programs (Carman et al., 2018). Whereas this testing is perceived as having low stakes by children, it is perceived as high stakes by their parents. To the extent that students vary in the amount of effort they put forth (which is typically unknown), decisions about which students receive enrichment activities are potentially biased and inequitable.
Two general approaches have been adopted to address the conflation of effort and knowledge/ability in test scores (Rios, 2021; Wise & DeMars, 2005). One approach involves increasing student effort to engage in testing. Strategies for improving effort include increasing the perceived stakes of testing (e.g., via motivational interviewing), incentivizing performance, and providing clear feedback. In low-stakes assessment settings, experimental application of such strategies (particularly the provision of external incentives and increasing relevancy) provide benefits to both test effort and test performance (Rios, 2021). A second approach involves identifying students who put forth low effort, such as through self-reported measures like the student opinion scale or by identifying low performance relative to expectations (Steedle, 2014). Excluding their data from analyses (sometimes referred to as “motivational filtering”) is argued to improve validity of test scores (Wise & DeMars, 2010).
Researchers have disproportionally relied on subjective self-reports to measure effort during testing (David et al., 2024), despite concerns about the validity and reliability of this approach (Naismith & Cavalcanti, 2015; Vanhove et al., 2016). Although directly asking students how much effort they put forth during a test appears straightforward, it makes the strong assumption that all students are equally proficient at introspection. Moreover, it is unclear when in development children can make reliable and valid inferences about their own effort. Whereas the use of response time metrics may provide a more objective index of effort, they are limited to a subset of tests, conditions (computerized assessment), and item types.
Researchers from multiple disciplines have used wearable sensors to measure task engagement, mental effort, and fatigue (Adão Martins et al., 2021). A unifying idea is that individuals exhibit changes in central and autonomic nervous system activity when they encounter a mentally challenging task. That is, engaging in cognitively effortful activity has multiple biological “signatures” (e.g., changes in the amount of oxygenated blood flow to the brain or in sweat gland activity in the skin). Much of this research has focused on occupational settings, including identifying the onset of cognitive fatigue to prevent workplace injuries and to promote workplace productivity (Giorgi et al., 2021; Krämer et al., 2022). While many methods show promise, none are ideally suited for widespread use in educational and psychological assessment. For example, many of these methods require specialized equipment, which can be costly, and they often require substantial technical expertise related to signal processing. The use of wearable sensors in educational and psychological assessments also runs the risk of eliciting reactivity in students, thereby altering their performance. As we will elaborate, thermal imaging is a new approach that leverages the general promise of using sensors to objectively measure cognitive effort, while overcoming key limitations of existing methods (i.e., low-cost thermal cameras are available; thermal imaging does not involve any direct sensor contact with students; automated methods for acquiring and scoring of thermal imaging data are being developed).
Thermal Imaging—Overview
Thermal imaging is a measure of infrared (IR) radiation emitted over a three-dimensional space that is rendered into a two-dimensional image. Thermal imaging camera sensors detect IR electromagnetic energy that is naturally released in the environment in the form of heat. Thermal resolution is typically in the hundredths of a degree Celsius, with improving camera resolution and decreasing cost available as technology advances (Cardone & Merla, 2017). Thermal imaging has been used for industrial (e.g., detecting heat loss in buildings), agriculture (e.g., detecting plant disease), and medical (e.g., diagnostics) purposes (Merlaand & Romani, 2006; Vadivambal & Jayas, 2011). In medicine, a key insight has been that collecting a series of sequential thermal images can be used to generate quantitative measures of functional processes within the body (i.e., temporal changes in heat) and thereby provide insight into physiological activity. For example, thermal imaging has been used clinically to screen for tumors based on differing heat patterns that accompany changes in vasculature or metabolic activity (Kateb et al., 2009). IR measurements of skin temperature reflect electromagnetic energy up to a few hundred microns deep into the body’s surface and are sensitive to factors such as blood flow (Or & Duffy, 2007).
Psychophysiologists have used thermal imaging to noninvasively measure autonomic nervous system activity, with special interest in dynamic changes in heat in the facial region (Cardone et al., 2021). The continuous recording of thermal images during an activity is known as functional infrared thermal imaging (FITI). A typical FITI study occurs in an indoor (often laboratory) setting with stable temperature and humidity and involves a camera with a clear view of a person’s face, often at a distance of 1–3 m. Before the visit, participants may be discouraged from consuming substances that may affect vasomotor control (e.g., caffeine, alcohol); in some cases, the use of facial moisturizers or makeup may also be discouraged (Cardone & Merla, 2017; Ioannou et al., 2014). Participants in a FITI procedure typically engage in a baseline or resting task, which is followed by a cognitively challenging or emotionally eliciting task. Thermal imaging cameras record sequential images of the face that are converted to temperature-coded maps. Temperature data are available for each pixel in each image across the entire span of time in which a recording was made. These time series data are typically checked for data quality (e.g., motion artifact) and may be preprocessed (e.g., smoothed to remove high-frequency noise).
Regions of interest (ROIs), which include an array of pixels at specific locations on the face, are often defined. The ROIs (e.g., nose, forehead, periorbital regions) differ in their underlying vasculature, with changes in skin temperature reflecting changes in blood flow (Ioannou et al., 2014). Although ROIs can be manually constructed, computer vision methods are often used to automate the detection of ROIs (Cardone et al., 2021). The average temperature values for an ROI are constructed for each frame of video, which yields time series data that represent temporal changes in skin temperature in each ROI that reflect dynamic changes in blood flow. Ioannou and colleagues summarized the patterns of temperature changes in specific ROIs as they relate to a range of emotions (Ioannou et al., 2014). In addition to general emotional state, some FITI studies have described how patterns of temperature at multiple ROIs can be used to detect specific emotions, including lying and deceitful behaviors (Moliné et al., 2018; Pavlidis et al., 2002), as well as joy and guilt in infants and young children (Ioannou et al., 2013; Nakanishi & Imai-Matsumura, 2008).
A small number of studies have used FITI methods to index cognitive effort. These studies typically involve a laboratory setup wherein a thermal camera is positioned to continuously record temperature data over a participant’s face while they perform multiple tasks (see Figure 1 for an example). Dynamic changes in skin temperature at specific ROIs are understood to reflect changes in blood flow that are under control of the autonomic nervous system and to correlate with mental effort (Marinescu et al., 2018; Or & Duffy, 2007). Most studies have focused on the nose region. Changes in nose temperatures have been observed when participants engage in cognitively demanding tasks. The emphasis is on within-person changes in nose temperature (i.e., changes in temperature from a baseline condition to task engagement), not on between-person differences in absolute nose temperature. Many studies have reported that (at least a subset of) participants exhibit reductions in nose temperatures when engaging in cognitively challenging tasks. Reductions in nose temperature reflect activation of the sympathetic nervous system in response to a challenge, which includes decreased blood flow to the periphery and increased blood flow to the brain (Or & Duffy, 2007). That is, cognitive effort has an associated “physiological signature,” including changes in nose temperature.
Table 1 summarizes key features of FITI studies that have broadly considered mental effort and the nose as an ROI. Notably, all studies involved a small number of adult participants and were exclusively conducted in controlled settings. Of the 12 studies in Table 1, 10 reported decreases in nose temperature when participants engaged in cognitively challenging activities, such as a Stroop task, Trier Social Stress Test, or mental arithmetic task. In one study, this effect varied by task (Moliné et al., 2018). That is, when subjects participated in a realistic job interview, they demonstrated a decrease in nose temperature (average reduction of 2.4°C relative to the anticipatory phase); however, when they participated in a simulated activity, they demonstrated an increase in nose temperature (average increase of 0.9°C relative to the anticipatory phase). Among the two studies that did not observe expected temperature changes, one reported a mean increase or no changes in nose temperature during verbal memory tasks (Cardone et al., 2022). The other study failed to find a consistent thermal pattern (Stemberger et al., 2010).
Features of tasks and participant perceptions of task difficulty were often invoked to help explain individual differences in facial temperature that are evident within studies. Multiple studies also collected subjective measures of mental workload or physiological measures (e.g., galvanic skin response). In general, subjective ratings and physiological measures helped to validate the experimental paradigms and/or served as criterion measures for changes in facial temperature. Although most studies have focused on task-evoked changes in nose temperature, changes in temperature occur in the order of seconds (Abdelrahman et al., 2017). Hence, thermal imaging methods could conceivably be used to track item-by-item changes in effort.
Thermal Imaging—Opportunities for Educational and Psychological Assessment?
Thermal imaging methods represent an untested but promising approach for objectively measuring student effort during cognitive and academic testing. Thermal imaging methods are unobtrusive and should be relatively easy to incorporate into individualized assessments. Moreover, many school settings share features similar to laboratory settings (e.g., stable temperature, humidity, lighting). To the extent that thermal imaging methods can yield objective indicators of student effort during academic or cognitive testing, this could improve the quality of decisions inferred from these data. There may also be opportunities to use thermal imaging metrics to improve test development, similar to the current use of reaction time metrics (Wise, 2017). Finally, although we have emphasized the use of thermal imaging metrics in conjunction with test performance, the ability to objectively measure effort is interesting on its own. For example, thermal imaging metrics may be relevant in studies that test the extent to which certain childhood diseases impact children’s cognitive function and academic performance via cognitive fatigue (Kyriklaki et al., 2019; Milner et al., 2020).
At least three challenges complicate the current incorporation of thermal imaging methods into educational and psychological assessments. First, we need automated (versus manual) methods for facial and ROI detection that are appropriate for use with children from diverse backgrounds. It is unclear how well existing facial and ROI detection algorithms, which were developed using adult faces, will work with children of varied ages (differing face sizes) and backgrounds (skin tone). Second, thermal cameras range from a few hundred to a few thousand dollars each. Establishing the reliability and validity of lower cost cameras for detecting small changes in facial temperature will address potential financial barriers to this work. Third, students are tested in multiple formats (e.g., paper/pencil versus computerized; individual versus group administration). Thermal imaging methods will be most easily implemented in the context of individualized, computer-based assessments (facilitating linkages between individual changes in face temperature with task performance).
Conclusion
The prospect that thermal imaging methods could be used to objectively measure the amount of effort students put forth during testing is provocative and has potentially widespread implications. Although thermal imaging methods are not a new technology, their application to educational and psychological assessments is. These methods have the potential to overcome specific limitations of measuring student effort using subjective self-reports or wearable sensors. Thermal imaging methods may open new possibilities for both student assessment (e.g., determining when to discontinue testing) and personalized learning protocols (e.g., dynamically adjusting the difficulty level of content presented in educational apps). We hope that this brief helps to motivate more widespread interest in using thermal imaging methods in education and psychological assessment.
Data Availability Statement
In this publication, we do not report on, analyze, or generate any data.
RTI Press Associate Editor: Jonathan Stern