Using Thermal Imaging to Measure Mental Effort: Does the Nose Know?

Michael T. Willoughby; Timothy Slade; Pooja Gaur; Amanda Wylie; Carmen Strigel

doi:10.3768/rtipress.2025.rb.0041.2503

Key Points

Student test performance reflects a mix of ability (knowledge) and effort.
Difficulties in defining and measuring the degree of effort that students put forth in any assessment undermine the optimal use of assessment data.
We describe a literature that has considered the use of thermal imaging technology as a strategy for objectively measuring effort during testing.
Thermal imaging methods have the potential to overcome specific limitations of measuring student effort using subjective self-reports or wearable sensors.
Thermal imaging methods may open new possibilities for both student assessment and personalized learning protocols.

Background

Educational and psychological assessments are ubiquitous in school settings around the world. Assessments can range from low-stakes formative activities that guide instructional planning to high-stakes exams that inform individual college entrance. Moreover, international studies routinely administer standardized assessments as a way of rank ordering educational systems globally. Researchers from multiple disciplines use educational and psychological assessments as part of program evaluation activities.

Irrespective of their specific use, virtually all educational and psychological assessments involve the presentation of items of varied difficulty levels to ascertain individual differences in knowledge (e.g., math achievement) and/or ability (e.g., IQ tests). Assessments are typically administered in quiet settings using standardized instructions and protocols to provide the greatest assurance that a student’s performance represents their optimal knowledge/ability. This traditional approach to assessment assumes that individual and situational differences that may contribute to a student’s test performance (e.g., test anxiety, fatigue) are ignorable and/or can be mitigated (e.g., through rapport building with assessors, judicious decisions about test length, the provision of breaks). Contrary to this assumption, there is consistent evidence that test performance reflects an unknown mix of knowledge/ability and effort, especially in low-stakes tests (Duckworth et al., 2011). Here, “effort” refers to a range of motivational, dispositional, and energetic factors that contribute to student performance and that are distinct from knowledge/ability.

The conflation of knowledge/ability with effort undermines the use of test data for decision making. For example, a variety of international exams are used to facilitate between-country comparisons in academic achievement and problem-solving skills (e.g., Program for International Student Assessment). Whereas these exams have low stakes for students, they have high stakes for governments and policy makers, who use these data to inform educational and economic policy. Students routinely endorse low levels of effort when completing these tests, and the degree of effort varies across countries (Akyol et al., 2021).

Similarly, elementary school educators routinely conduct universal testing to identify students who are eligible to participate in gifted and accelerated learning programs (Carman et al., 2018). Whereas this testing is perceived as having low stakes by children, it is perceived as high stakes by their parents. To the extent that students vary in the amount of effort they put forth (which is typically unknown), decisions about which students receive enrichment activities are potentially biased and inequitable.

Two general approaches have been adopted to address the conflation of effort and knowledge/ability in test scores (Rios, 2021; Wise & DeMars, 2005). One approach involves increasing student effort to engage in testing. Strategies for improving effort include increasing the perceived stakes of testing (e.g., via motivational interviewing), incentivizing performance, and providing clear feedback. In low-stakes assessment settings, experimental application of such strategies (particularly the provision of external incentives and increasing relevancy) provide benefits to both test effort and test performance (Rios, 2021). A second approach involves identifying students who put forth low effort, such as through self-reported measures like the student opinion scale or by identifying low performance relative to expectations (Steedle, 2014). Excluding their data from analyses (sometimes referred to as “motivational filtering”) is argued to improve validity of test scores (Wise & DeMars, 2010).

Researchers have disproportionally relied on subjective self-reports to measure effort during testing (David et al., 2024), despite concerns about the validity and reliability of this approach (Naismith & Cavalcanti, 2015; Vanhove et al., 2016). Although directly asking students how much effort they put forth during a test appears straightforward, it makes the strong assumption that all students are equally proficient at introspection. Moreover, it is unclear when in development children can make reliable and valid inferences about their own effort. Whereas the use of response time metrics may provide a more objective index of effort, they are limited to a subset of tests, conditions (computerized assessment), and item types.

Researchers from multiple disciplines have used wearable sensors to measure task engagement, mental effort, and fatigue (Adão Martins et al., 2021). A unifying idea is that individuals exhibit changes in central and autonomic nervous system activity when they encounter a mentally challenging task. That is, engaging in cognitively effortful activity has multiple biological “signatures” (e.g., changes in the amount of oxygenated blood flow to the brain or in sweat gland activity in the skin). Much of this research has focused on occupational settings, including identifying the onset of cognitive fatigue to prevent workplace injuries and to promote workplace productivity (Giorgi et al., 2021; Krämer et al., 2022). While many methods show promise, none are ideally suited for widespread use in educational and psychological assessment. For example, many of these methods require specialized equipment, which can be costly, and they often require substantial technical expertise related to signal processing. The use of wearable sensors in educational and psychological assessments also runs the risk of eliciting reactivity in students, thereby altering their performance. As we will elaborate, thermal imaging is a new approach that leverages the general promise of using sensors to objectively measure cognitive effort, while overcoming key limitations of existing methods (i.e., low-cost thermal cameras are available; thermal imaging does not involve any direct sensor contact with students; automated methods for acquiring and scoring of thermal imaging data are being developed).

Thermal Imaging—Overview

Thermal imaging is a measure of infrared (IR) radiation emitted over a three-dimensional space that is rendered into a two-dimensional image. Thermal imaging camera sensors detect IR electromagnetic energy that is naturally released in the environment in the form of heat. Thermal resolution is typically in the hundredths of a degree Celsius, with improving camera resolution and decreasing cost available as technology advances (Cardone & Merla, 2017). Thermal imaging has been used for industrial (e.g., detecting heat loss in buildings), agriculture (e.g., detecting plant disease), and medical (e.g., diagnostics) purposes (Merlaand & Romani, 2006; Vadivambal & Jayas, 2011). In medicine, a key insight has been that collecting a series of sequential thermal images can be used to generate quantitative measures of functional processes within the body (i.e., temporal changes in heat) and thereby provide insight into physiological activity. For example, thermal imaging has been used clinically to screen for tumors based on differing heat patterns that accompany changes in vasculature or metabolic activity (Kateb et al., 2009). IR measurements of skin temperature reflect electromagnetic energy up to a few hundred microns deep into the body’s surface and are sensitive to factors such as blood flow (Or & Duffy, 2007).

Psychophysiologists have used thermal imaging to noninvasively measure autonomic nervous system activity, with special interest in dynamic changes in heat in the facial region (Cardone et al., 2021). The continuous recording of thermal images during an activity is known as functional infrared thermal imaging (FITI). A typical FITI study occurs in an indoor (often laboratory) setting with stable temperature and humidity and involves a camera with a clear view of a person’s face, often at a distance of 1–3 m. Before the visit, participants may be discouraged from consuming substances that may affect vasomotor control (e.g., caffeine, alcohol); in some cases, the use of facial moisturizers or makeup may also be discouraged (Cardone & Merla, 2017; Ioannou et al., 2014). Participants in a FITI procedure typically engage in a baseline or resting task, which is followed by a cognitively challenging or emotionally eliciting task. Thermal imaging cameras record sequential images of the face that are converted to temperature-coded maps. Temperature data are available for each pixel in each image across the entire span of time in which a recording was made. These time series data are typically checked for data quality (e.g., motion artifact) and may be preprocessed (e.g., smoothed to remove high-frequency noise).

Regions of interest (ROIs), which include an array of pixels at specific locations on the face, are often defined. The ROIs (e.g., nose, forehead, periorbital regions) differ in their underlying vasculature, with changes in skin temperature reflecting changes in blood flow (Ioannou et al., 2014). Although ROIs can be manually constructed, computer vision methods are often used to automate the detection of ROIs (Cardone et al., 2021). The average temperature values for an ROI are constructed for each frame of video, which yields time series data that represent temporal changes in skin temperature in each ROI that reflect dynamic changes in blood flow. Ioannou and colleagues summarized the patterns of temperature changes in specific ROIs as they relate to a range of emotions (Ioannou et al., 2014). In addition to general emotional state, some FITI studies have described how patterns of temperature at multiple ROIs can be used to detect specific emotions, including lying and deceitful behaviors (Moliné et al., 2018; Pavlidis et al., 2002), as well as joy and guilt in infants and young children (Ioannou et al., 2013; Nakanishi & Imai-Matsumura, 2008).

A small number of studies have used FITI methods to index cognitive effort. These studies typically involve a laboratory setup wherein a thermal camera is positioned to continuously record temperature data over a participant’s face while they perform multiple tasks (see Figure 1 for an example). Dynamic changes in skin temperature at specific ROIs are understood to reflect changes in blood flow that are under control of the autonomic nervous system and to correlate with mental effort (Marinescu et al., 2018; Or & Duffy, 2007). Most studies have focused on the nose region. Changes in nose temperatures have been observed when participants engage in cognitively demanding tasks. The emphasis is on within-person changes in nose temperature (i.e., changes in temperature from a baseline condition to task engagement), not on between-person differences in absolute nose temperature. Many studies have reported that (at least a subset of) participants exhibit reductions in nose temperatures when engaging in cognitively challenging tasks. Reductions in nose temperature reflect activation of the sympathetic nervous system in response to a challenge, which includes decreased blood flow to the periphery and increased blood flow to the brain (Or & Duffy, 2007). That is, cognitive effort has an associated “physiological signature,” including changes in nose temperature.

Figure 1.Exemplar depiction of a functional infrared thermal imaging setup

Source: Image generated by DALL-E 3, September 4, 2024, from the prompt “Generate an image of a student working on a laptop while being filmed with a thermal imaging camera. Limit the image to a single person. Require that the thermal image be visible on the camera screen.”

Table 1 summarizes key features of FITI studies that have broadly considered mental effort and the nose as an ROI. Notably, all studies involved a small number of adult participants and were exclusively conducted in controlled settings. Of the 12 studies in Table 1, 10 reported decreases in nose temperature when participants engaged in cognitively challenging activities, such as a Stroop task, Trier Social Stress Test, or mental arithmetic task. In one study, this effect varied by task (Moliné et al., 2018). That is, when subjects participated in a realistic job interview, they demonstrated a decrease in nose temperature (average reduction of 2.4°C relative to the anticipatory phase); however, when they participated in a simulated activity, they demonstrated an increase in nose temperature (average increase of 0.9°C relative to the anticipatory phase). Among the two studies that did not observe expected temperature changes, one reported a mean increase or no changes in nose temperature during verbal memory tasks (Cardone et al., 2022). The other study failed to find a consistent thermal pattern (Stemberger et al., 2010).

Table 1.Selected FITI studies of cognitive effort or mental workload

Study	ROIs	Experimental paradigms (N)	Nose temperature changes related to cognitive task	Other ROIs related to cognitive task	General takeaway: Mental workload & nose temp
Abdelrahman Y, Velloso E, Dingler T, Schmidt A, Vetere F. Cognitive heat: exploring the usage of thermal imaging to unobtrusively estimate cognitive load. Proc ACM Interact Mob Wearable Ubiquitous Technol 2017;1(3):1–20. https://doi.org/10.1145/3130898	Nose and forehead	Reading task with four levels of difficulty (N = 12) Stroop test with four levels of difficulty (N = 24)	For both tasks: Increasing task difficulty associated with decreasing nose temperature.	For both tasks: Increasing task difficulty associated with increasing forehead temperature.	Decrease in nose temperature
Cardone D, Perpetuini D, Filippini C, Mancini L, Nocco S, Tritto M, et al. Classification of drivers' mental workload levels: comparison of machine learning methods based on ECG and infrared thermal signals. Sensors 2022;22(19):7300. https://doi.org/10.3390/s22197300	Nose and glabella	Simulated driving while completing digit span test (DST) and Rey auditory verbal learning test (RAVLT) (N = 26)	For DST: No differences in nose temperature between task conditions (forward and backward span). For RAVLT: Nose temperature increased between immediate recall and delayed recall and between immediate recall and recovery.	For both tasks: No differences in mean temperature of glabella between task conditions.	Task-dependent increase or no change in nose temperature
Engert V, Merla A, Grant JA, Cardone D, Tusche A, Singer T. Exploring the use of thermal infrared imaging in human stress research. PLoS One 2014;9(3):e90782. https://doi.org/10.1371/journal.pone.0090782	Nose, corrugator, forehead periorbital, perioral, and chin regions	Trier Social Stress Test (N = 15)	Relative to baseline, nose temperature decreased during anticipation and increased during recovery.	Corrugator chin temperature decreased during anticipation; perioral temperature increased during recovery.	Decrease in nose temperature
Gioia F, Pascali MA, Greco A, Colantonio S, Scilingo EP. Discriminating stress from cognitive load using contactless imaging devices. In: 43rd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society; 2021 Oct 31-Nov 4; virtual. New York (NY): IEEE; 2021. p. 608–11. https://doi.org/10.1109/EMBC46164.2021.9630860	Nose, nasal septum, forehead and chin (C,L,R each), cheeks, maxillary, & periorbital (L&R each)	Stroop test and mental arithmetic task (N = 17)	For both tasks, nose temperature increased during rest and decreased during stress.	For both tasks: no differences in forehead temperature. Temperature change observed in various other facial ROIs during mental workload task (e.g., L forehead, L maxillary).	Decrease in nose temperature
Gioia F, Greco A, Callara AL, Scilingo EP. Toward a contactless stress classification using thermal imaging. Sensors 2022;22(3):976. https://doi.org/10.3390/s22030976	Nose, nasal septum, forehead and chin (C,L,R each), cheeks, maxillary, & periorbital (L&R each)	Stroop test (N = 25)	Relative to rest, nose temperature decreased during task.	Mean nasal septum, and R periorbital temperature decreased during Stroop relative to rest. L cheek temperature increased during Stroop relative to rest.	Decrease in nose temperature
Gioia F, Nardelli M, Scilingo EP, Greco A. Autonomic regulation of facial temperature during stress: a cross-mapping analysis. Sensors (Basel) 2023;23(14):6403. https://doi.org/10.3390/s23146403	Nose, forehead, cheeks	Stroop test (N = 30)	Relative to rest, nose temperature decreased during task.	No significant differences in forehead and cheek temperature between Stroop and rest.	Decrease in nose temperature
Kang C, Babski-Reeves K. Detecting mental workload fluctuation during learning of a novel task using thermography. In: Proceedings of the Human Factors and Ergonomics Society 52nd Annual Meeting; 2008 Sep 22-26; New York, NY. Human Factors and Ergonomics Society; 2008. p. 1527–31. https://doi.org/10.1037/e578262012-047	Nose and forehead (maximum temperature)	Alpha numeric task, testing task learning (N = 20)	Relative to baseline, nose temperature decreased during the first math block then increased across subsequent blocks.	N/A	Decrease in nose temperature
Moliné A, Dominguez E, Salazar-López E, Gálvez-García G, Fernández-Gómez J, De la Fuente J, et al. The mental nose and the Pinocchio effect: thermography, planning, anxiety, and lies. J Invest Psychol Offender Profiling 2018;15(2):234–48. https://doi.org/10.1002/jip.1505	Nose, entire forehead, eye region, mouth region, and cheeks	Zoo Map EF Task (N = 20) Simulated and “real” interview (TSST Anticipatory and Speech phases) (N = 20)	For Zoo Map: Relative to baseline, nose temperature increased during cognitive task. For Interview Task: Relative to anticipatory phase, nose temperature decreased during speech phase for participants in the “real” interview but increased for participants in the simulated interview.	For Zoo Map: Relative to baseline, forehead temperature increased during cognitive task. For Interview Task: Relative to anticipatory phase, forehead temperature increased during speech phase for participants in the simulated interview.	Task-dependent increase or decrease in nose temperature
Or CKL, Duffy VG. Development of a facial skin temperature-based methodology for non-intrusive mental workload measurement. Occup Ergon 2007;7(2):83–94. https://doi.org/10.3233/OER-2007-7202	Nose and forehead	Simulated driving experiments (city, highway, each with and without mental arithmetic task), Simulated vs. real driving experiment (N = 33)	For simulated driving experiments: Relative to baseline, nose temperature decreased after experiments. Nose temperature declined relative to task difficulty.	No changes in forehead temperature.	Decrease in nose temperature
Pinti P, Cardone D, Merla A. Simultaneous fNIRS and thermal infrared imaging during cognitive task reveal autonomic correlates of prefrontal cortex activity. Sci Rep 2015;5(1):17471. https://doi.org/10.1038/srep17471	Nose	Arithmetic subtractions (N = 18)	During task, nose temperature decreased with a delay of 10 s from the first stimulus, then returned to initial temperature during recovery phase.	N/A	Decrease in nose temperature
Stemberger J, Allison RS, Schnell T. Thermal imaging as a way to classify cognitive workload. In: Canadian Conference Computer and Robot Vision; 2010 May 31-Jun 2; Ontario, Canada. New York (NY): IEEE; 2010. p. 231–8. https://doi.org/10.1109/CRV.2010.37	Nose, forehead, eyes, cheeks (L&R), chin	Cognitive stress test, with three levels of difficulty (N = 12)	No consistent temperature change pattern for the nose as a function of workload.	No consistent pattern in facial temperature as a function of workload.	No consistent pattern
Veltman HJ, Vos WW. Facial temperature as a measure of mental workload. In: International Symposium on Aviation Psychology; 2005. 2005. p. 777–81	17 total ROIs; nose tip, nose bridge, L&R nose of primary interest	Continuous Memory Task with two levels of difficulty (N = 8)	Nose, L&R nose side temperature decreased during tasks and increased during rest; for nose and L nose, temperature decreased more during higher difficulty task.	No significant changes in forehead temperature. Other significant effects were small and not systematic—not reported.	Decrease in nose temperature

Notes: FITI = functional infrared thermal imaging; ROI = region of interest; C = center; L = left; R = right; N/A = not applicable; all study participants were adults 25–50 years of age.

Features of tasks and participant perceptions of task difficulty were often invoked to help explain individual differences in facial temperature that are evident within studies. Multiple studies also collected subjective measures of mental workload or physiological measures (e.g., galvanic skin response). In general, subjective ratings and physiological measures helped to validate the experimental paradigms and/or served as criterion measures for changes in facial temperature. Although most studies have focused on task-evoked changes in nose temperature, changes in temperature occur in the order of seconds (Abdelrahman et al., 2017). Hence, thermal imaging methods could conceivably be used to track item-by-item changes in effort.

Thermal Imaging—Opportunities for Educational and Psychological Assessment?

Thermal imaging methods represent an untested but promising approach for objectively measuring student effort during cognitive and academic testing. Thermal imaging methods are unobtrusive and should be relatively easy to incorporate into individualized assessments. Moreover, many school settings share features similar to laboratory settings (e.g., stable temperature, humidity, lighting). To the extent that thermal imaging methods can yield objective indicators of student effort during academic or cognitive testing, this could improve the quality of decisions inferred from these data. There may also be opportunities to use thermal imaging metrics to improve test development, similar to the current use of reaction time metrics (Wise, 2017). Finally, although we have emphasized the use of thermal imaging metrics in conjunction with test performance, the ability to objectively measure effort is interesting on its own. For example, thermal imaging metrics may be relevant in studies that test the extent to which certain childhood diseases impact children’s cognitive function and academic performance via cognitive fatigue (Kyriklaki et al., 2019; Milner et al., 2020).

At least three challenges complicate the current incorporation of thermal imaging methods into educational and psychological assessments. First, we need automated (versus manual) methods for facial and ROI detection that are appropriate for use with children from diverse backgrounds. It is unclear how well existing facial and ROI detection algorithms, which were developed using adult faces, will work with children of varied ages (differing face sizes) and backgrounds (skin tone). Second, thermal cameras range from a few hundred to a few thousand dollars each. Establishing the reliability and validity of lower cost cameras for detecting small changes in facial temperature will address potential financial barriers to this work. Third, students are tested in multiple formats (e.g., paper/pencil versus computerized; individual versus group administration). Thermal imaging methods will be most easily implemented in the context of individualized, computer-based assessments (facilitating linkages between individual changes in face temperature with task performance).

Conclusion

The prospect that thermal imaging methods could be used to objectively measure the amount of effort students put forth during testing is provocative and has potentially widespread implications. Although thermal imaging methods are not a new technology, their application to educational and psychological assessments is. These methods have the potential to overcome specific limitations of measuring student effort using subjective self-reports or wearable sensors. Thermal imaging methods may open new possibilities for both student assessment (e.g., determining when to discontinue testing) and personalized learning protocols (e.g., dynamically adjusting the difficulty level of content presented in educational apps). We hope that this brief helps to motivate more widespread interest in using thermal imaging methods in education and psychological assessment.

Data Availability Statement

In this publication, we do not report on, analyze, or generate any data.

RTI Press Associate Editor: Jonathan Stern