Data have proliferated in seemingly every area of life. “Big data” and the algorithms that make sense of them have revolutionized fields like medicine and business (Mayer-Schönberger & Cukier, 2013) and have led to the rise of data analytics and data science, which use visualization, algorithms, and other data analysis techniques to extract insights from data and drive decisionmaking (Loukides, 2010). Postsecondary education, too, is experiencing its own data deluge. Online applications now capture fine-grained teaching and learning data on a large scale (Picciano, 2012), and large quantities of institutional data have been collected in response to increased external pressure for accountability in higher education (Campbell et al., 2007).

Key Findings
  • Data analytics and data science can address challenges to student success.
  • Postsecondary institutions have already creatively used analytics to address the problem of college completion through innovations such as academic early warning systems and adaptive learning technologies.
  • A diverse array of data on postsecondary education exists both within and outside of institutions and can be used in analytics to provide a richer view of student success and improve equity.
  • Increased data collection and analysis open up the challenges of data linkage across units and the risk of ethical and privacy violations, which deserve more attention.

Student success is a key area where these data can be put to use. Student success can be defined broadly to include students’ personal development and goals; however, efforts to foster student success have generally focused on students’ academic performance (Defining Student Success Data Initiative. Defining Student Success Data: Recommendations for Changing the Conversation, 2018). College completion in particular has gained significant attention since it was made a focus of the Obama administration’s higher education policy. However, although completion rates have improved, concerns about equity have not been adequately addressed (Teranishi & Bezbatchenko, 2015). Other issues also affect students’ success. Costs of college and debt burdens are both rising (Goldrick-Rab, 2016), as is anxiety about the value of a postsecondary credential in the labor market (Carnevale et al., 2017). And even with the abundance of available data, providing relevant, timely information to the students and families who need it remains a challenge (Bleemer & Zafar, 2014). With their ability to deliver insights quickly from large troves of data, data analytics and data science have the potential to address some of the most pressing challenges to student success.

In this brief, I discuss the potential of new data sources in postsecondary education for data analytics and data science focused on student success, with the aim of sharing this knowledge with stakeholders in postsecondary education and with data scientists. Postsecondary stakeholders could benefit from understanding the new possibilities that data science may provide for addressing student success issues, and data scientists could benefit from knowing about the opportunities and analytic questions in postsecondary education. I first describe how institutions have applied data analytics to student academic data to address problems related to completion. I then discuss the range of data that exist both within and outside institutions and provide a taxonomy of how these data may be used. I argue that incorporating a broader array of data into analytics could provide richer answers to questions about how to improve student success. Finally, I conclude with a discussion of the practical and ethical issues that accompany increased data collection and analysis.

Academic Analytics for College Completion

In recent years, more and more postsecondary institutions have begun using analytics to aid in institutional operations; this growing field is termed “academic analytics” (Baepler & Murdoch, 2010). Although the goals of academic analytics have been diverse, serving postsecondary institutions’ business and accountability goals in admissions, fundraising, and student affairs (Ekowo & Palmer, 2016), a major area of application has been in student success, specifically college completion. I discuss examples of these applications subsequently.

Georgia State University has been recognized as a leading institution on this front. Using predictive analytics, among other innovations, it raised its 6-year graduation rate from 32 percent in 2003 to 55 percent in 2018 and reduced graduation rate gaps between disadvantaged and non-disadvantaged students (Renick, 2018). At the core of this strategy is a predictive analytics system tracking more than 800 risk factors of dropout (McMurtrie, 2018; Renick, 2018). A team of advisers monitors this system to provide targeted help to students (McMurtrie, 2018; Renick, 2018). Informed by insights from this predictive analytics system, the university has also clarified student academic pathways and made changes to the structure of course requirements.

Other institutions also have developed tools to facilitate students’ academic progress. At Austin Peay State University in Tennessee, a tool called Degree Compass provides “Netflix-style” course recommendations for students that account for their interests, program requirements, and past performance to help them tackle the challenge of choosing courses from a wide array of options (Denley, 2012). Tools like Course Signals at Purdue and ECoach at the University of Michigan help students manage their performance within courses by providing personalized advice and early warnings of poor performance (Arnold, 2010; Calvert Mason, 2015). Universities have often partnered with external vendors to build these tools. The “Check My Activity” tool at the University of Maryland, Baltimore County, was developed in collaboration with Blackboard, a course management system, and allows students to compare their level of activity in Blackboard against an anonymous summary of their course peers’ performance (Fritz, 2011).

Alongside work in academic analytics, work in the field of learning analytics has focused on using data-driven tools and models to analyze online learning behavior and provide personalized solutions to students’ learning challenges (Ferguson, 2012). This work has led to the development of adaptive learning technologies, which use machine learning to adapt online learning experiences in real time to each student’s individual skills and understanding of the material (Dziuban et al., 2018). Institutions have begun to implement these technologies: for example, the University of Central Florida and Colorado Technical University partnered with Realizeit, an online learning vendor, to pilot an adaptive learning platform in select courses. Early evidence shows improvement in student outcomes, such as greater engagement with course concepts (Dziuban et al., 2017).

Alternative Data Sources and Their Applications for Student Success

The above examples demonstrate that higher education institutions have made strides in harnessing student academic data and classroom and learning data to promote student success and completion. However, barriers to completion are often nonacademic: many students face challenges such as food and housing insecurity (Goldrick-Rab et al., 2019), as well as employment and family demands that compete for their time and hinder their ability to obtain a credential (Wilbur & Roscigno, 2016). Linking students’ nonacademic records to their academic records could provide insight into the ways these challenges systematically affect academic progress, and could also be used as early warning indicators. Institutions could also evaluate the effectiveness of the student services they offer using linked student services and academic data.

Digital tools also could offer solutions directly to students; for example, such tools could connect students to available housing or sources of food, or help students navigate services the institution already provides. Many schools offer emergency aid, for instance, but students are often unaware of such programs (Kruger et al., 2016). Researchers have begun to make creative use of new data sources within institutions to study aspects of students’ experiences outside of class that can affect completion. Studies of the digital traces of campus social life, such as key card swipes into campus buildings, have examined their potential to predict student engagement and integration, which are ultimately crucial to retention and completion (Bowman et al., 2019).

To make sense of the growing diversity of data sources, Table 1 displays the types of data sources available within postsecondary institutions, as well as student success issues that can be addressed by each data type. Classroom and learning data pertain to higher education’s core function of educating students and measuring learning. Student academic records measure students’ progress through their academic programs. Student nonacademic records pertain to students themselves and to the nonacademic aspects of students’ lives. Institutional unit data relate to the operations of an institution’s academic programs and business units.

Table 1.Data within postsecondary institutions and student success issues they can address
Data Type Data Sources Student Success Issues
Classroom and learning data Online learning platforms, learning management systems, instructors’ records Student learning; curricular improvement; classroom environment and peer effects
Student academic records Institutions, transcripts, National Student Clearinghouse Courses that are barriers to completion; academic pathways; personalized advising
Student nonacademic records Institutions, financial aid organizations Student social life and integration; connection to resources for basic needs; usage and effectiveness of student services
Institutional unit data Academic programs and business units within institutions Composition of student body within units; differences in student outcomes across units

In addition, a vast amount of postsecondary education data are being collected outside of individual institutions, and these data, and the tools and analyses that make sense of them, can also be brought to bear on student success. Diverse sources of data can provide a broader, more holistic understanding of what student success entails and the factors that shape it: student success could encompass not only the completion of a degree or certificate program, but also skills or knowledge acquisition, finding a job, and socioeconomic mobility.

Data linking academic experiences to workforce and economic data can be brought to bear on these issues. State governments have begun to build State Longitudinal Data Systems that link unit-level K–12, postsecondary, and workforce data, as well as tools and dashboards to explore these troves of data (Carnevale et al., 2017). Information from these systems connecting students’ college experiences to their labor market outcomes would be invaluable in informing job-seeking students, as well as researchers and policymakers, about which courses, majors, and other experiences can lead to certain occupations or industries. Meanwhile, the economic aspects of postsecondary education have well-documented effects on student outcomes (Goldrick-Rab, 2016), but despite the abundance of data displayed in online net price calculators (Higher Education Opportunity Act of 2008, 2008) and college search websites for prospective students (O’Shaughnessy, 2010; Schonberg, 2018), students still suffer from a lack of clear and targeted information that would enable them to navigate this complex financial landscape (Bleemer & Zafar, 2014). These comprehensive data on costs, tuition, and financial aid could be made more interpretable and personalized, and could also be integrated into institutional systems to better inform students of the personal financial implications of their degree plans.

Table 2 below displays the postsecondary data topics, data sources, and student success issues that can be addressed by data sources across and outside of postsecondary institutions. Institution-level data provide information about a given postsecondary institution and its performance relative to other institutions. Data on linkages to K–12 and to workforce provide information on students’ transitions between education levels and the workforce, and how postsecondary education relates to K–12 education and workforce needs. Governance data pertain to the activities of leaders and governments that affect postsecondary education.

Table 2.Data across and outside of postsecondary institutions and student success issues they can address
Data Type Data Sources Student Success Issues
Institution-level data College Scorecard, Integrated Postsecondary Education Data System (IPEDS) Differences between institutions in graduation and retention rates; differences between institutions in tuition, financial aid, and debt
Data on linkages to K–12 and to workforce State Longitudinal Data Systems, employers High school preparation for college success; student employment and earnings; credentials or experiences related to careers
Governance data Governments, professional associations (e.g., State Higher Education Executive Officers Association [SHEEO]) Budget appropriations for educational institutions; laws and policies across education systems

Finally, higher education must continue to address the issue of equity, both within and between postsecondary institutions. Inequalities in student outcomes by race and class persist, as do resource differentials between institutions (Teranishi & Bezbatchenko, 2015), and institution-level data and institutional unit data on student outcomes can illuminate the patterns and causes of differences at higher levels of analysis. Institutional Mobility Report Cards, for instance, use tax records linked to institution-level data from the Department of Education’s College Scorecard to characterize an institution’s contribution to promoting intergenerational income mobility for its students (Chetty et al., 2017). Governance data, such as data on federal funding for higher education institutions, have received relatively little attention in their implications for inequality, but deserve more focus (Goldrick-Rab, 2016). Along similar lines, work also should focus on ensuring that institutions with fewer resources can still implement and reap the benefits of robust data analytics for their students.

Challenges

“Big data,” analytics, and data science show immense promise for working toward student success in postsecondary education. Work should continue to develop and expand the application of these approaches to understanding and addressing both academic and nonacademic issues in students’ lives, as well as to alleviating inequities. However, large-scale data collection, linkage, and analysis also open practical and ethical challenges.

One important challenge lies in building the capacity and infrastructure to create data linkages across institutional units or organizations. Data necessary for an analysis may be scattered across separate entities that do not communicate with each other, contributing to disorganized, inconsistent, or incomplete data (Daniel, 2015). Higher education institutions should continue to build and expand the infrastructure and capacity necessary to collect and analyze data, working within information systems and institutional cultures to align data collection standards, build knowledge, and foster the willingness to adopt new methods (Cardoza & Gold, 2018). Toward this end, the Gates Foundation and the Institute for Higher Education Policy (IHEP) have spearheaded an effort to map and standardize data reporting on college students’ entire postsecondary trajectory (Engle, 2016). Along similar lines, a closer dialogue across sectors, for example between higher education administrators, researchers, policymakers, and data scientists, could develop a shared understanding that would encourage progress and innovation in using data to help address the most pressing problems hindering student success.

Large-scale data collection and analysis also raise the issues of data privacy and ethical data collection and use. Even as institutions collect reams of data on their students, students themselves often are unaware of this fact and have not been given the opportunity to provide informed consent or opt out (Gilliard, 2018; Rubel & Jones, 2016). It is imperative that administrators justify the data they collect by weighing the potential benefits to students against the potential harms, actively communicate these considerations to students, and allow students some real measure of control over what happens to their information (Rubel & Jones, 2016). The risks of large-scale student data collection can be substantial: as more and more data are collected on individuals, individuals are increasingly vulnerable to the negative consequences of data breaches caused by improper data collection, storage, and analysis practices. These effects are magnified when multiple data sources are linked, potentially making individuals easier to identify (Ekowo & Palmer, 2016). In addition, biased algorithms can unfairly profile individuals on the basis of their characteristics (Ekowo & Palmer, 2016). While work has called for more serious consideration of the legal and ethical implications of large-scale student data collection and linkages (Ekowo & Palmer, 2016; Gilliard, 2018; Rubel & Jones, 2016; Sun, 2014), more institutions and stakeholders need to take heed and develop guidelines and policies to ensure transparency, consent, and fairness. Above all, stakeholders in postsecondary education must not lose sight of ethical considerations as they explore the potential of data analytics for improving student success.


Acknowledgments

This project was supported by a Professional Development Award through RTI International’s Professional Development Program. I am indebted to Robin Henke and other members of the Center for Postsecondary Education at RTI International for comments on a presentation of the material in this brief, as well as to Gayle Bieler of RTI International’s Center for Data Science.