Exploring the Future of Science Education with AI Technologies: Opportunities and Challenges (Special Issue of Education Sciences)

https://www.mdpi.com/journal/education/special_issues/22489J7123

Guest Editors: Libby Gerard and Marcia C. Linn

Tools using generative and supervised AI are creating new ways to harness information and impact science learning in pre-college classrooms. Learning sciences research is informing advances in how we design and study such tools to bring benefit to diverse students and teachers. Yet richer evidence from classroom studies is needed to understand both the potential and pitfalls of AI in education. There is reason to be cautious as AI tools can proliferate bias of dominant narratives, discourage teacher or student agency, and take up limited school resources. What have we learned about ways to use contemporary AI tools in science education? 

This Special Issue of Education Sciences provides a platform for education researchers and particularly research practice partnerships to report on current classroom research on the use of AI in science education. The issue includes empirical investigations of powerful AI tools that have implications for K12 classrooms. The audience for this special issue includes educators, administrators, and policy makers.

The special issue captures the opportunities and challenges in using AI to strengthen teaching and learning in science. Topics include:

  • Research showing how contemporary AI tools deployed in classrooms can deepen student understanding.
  • Studies exploring data privacy and assessment accuracy for different AI methods in science instruction
  • Investigations of promising AI supports for teacher practice in science classrooms.
  • Studies of assessments using AI that capture student understanding.
  • Studies that use AI to capture students’ experiences and intuitions to personalize science instruction.
  • Syntheses of research using AI to strengthen science instruction

Papers (Open Access)

A Comparison of Responsive and General Guidance to Promote Learning in an Online Science Dialog

by Libby Gerard, Marcia C. Linn, and Marlen Holtmann
Educ. Sci. 2024, 14(12), 1383; https://doi.org/10.3390/educsci14121383

Students benefit from dialogs about their explanations of complex scientific phenomena, and middle school science teachers cannot realistically provide all the guidance they need. We study ways to extend generative teacher–student dialogs to more students by using AI tools. We compare Responsive web-based dialogs to General web-based dialogs by evaluating the ideas students add and the quality of their revised explanations. We designed the General guidance to motivate and encourage students to revise their explanations, similar to how an experienced classroom teacher might instruct the class. We designed the Responsive guidance to emulate a student–teacher dialog, based on studies of experienced teachers guiding individual students. The analyses comparing the Responsive and the General condition are based on a randomized assignment of a total sample of 507 pre-college students. These students were taught by five different teachers in four schools. A significantly higher proportion of students added new accurate ideas in the Responsive condition compared to the General condition during the dialog. This research shows that by using NLP to identify ideas and assign guidance, students can broaden and refine their ideas. Responsive guidance, inspired by how experienced teachers guide individual students, is more valuable than General guidance.

Generative AI for Culturally Responsive Science Assessment: A Conceptual Framework

by Matthew Nyaaba, Xiaoming Zhai, and Morgan Z. Faison
Educ. Sci. 2024, 14(12), 1325; https://doi.org/10.3390/educsci14121325

In diverse classrooms, one of the challenges educators face is creating assessments that reflect the different cultural backgrounds of every student. This study presents a novel approach to the automatic generation of cultural and context-specific science assessments items for K-12 education using generative AI (GenAI). We first developed a GenAI Culturally Responsive Science Assessment (GenAI-CRSciA) framework that connects CRSciA, specifically key cultural tenets such as indigenous language, Indigenous knowledge, ethnicity/race, and religion, with the capabilities of GenAI. Using the CRSciA framework, along with interactive guided dynamic prompt strategies, we developed the CRSciA-Generator tool within the OpenAI platform. The CRSciA-Generator allows users to automatically generate assessment items that are customized to align with their students’ cultural and contextual needs. We further conducted a pilot demonstration of item generation between the CRSciA-Generator and the base GPT-4o using standard prompts. Both tools were tasked with generating CRSciAs that aligned with the Next Generation Science Standard on predator and prey relationship for use with students from Ghana, the USA, and China. The results showed that the CRSciA-Generator output assessment items incorporated more tailored cultural and context assessment items for each specific group with examples, such as traditional stories of lions and antelopes in Ghana, Native American views on wolves in the USA, and Taoist or Buddhist teachings on the Amur tiger in China compared to the standard prompt assessment items within the base GPT-4o. However, due to the focus on nationality in the pilot demonstration, the CRSciA-Generator assessment items treated the countries as culturally homogeneous, overlooking subcultural diversity in these countries. Therefore, we recommend that educators provide detailed background information about their students when using the CRSciA-Generator. We further recommend future studies involving expert reviews to assess the cultural and contextual validity of the assessment items generated by the CRSciA-Generator.

Using Artificial Intelligence to Support Peer-to-Peer Discussions in Science Classrooms

by Kelly Billings, Hsin-Yi Chang, Jonathan M. Lim-Breitbart, and Marcia C. Linn
Educ. Sci. 2024, 14(12), 1411; https://doi.org/10.3390/educsci14121411

In successful peer discussions students respond to each other and benefit from supports that focus discussion on one another’s ideas. We explore using artificial intelligence (AI) to form groups and guide peer discussion for grade 7 students. We use natural language processing (NLP) to identify student ideas in science explanations. The identified ideas, along with Knowledge Integration (KI) pedagogy, informed the design of a question bank to support students during the discussion. We compare groups formed by maximizing the variety of ideas among participants to randomly formed groups. We embedded the chat tool in an earth science unit and tested it in two classrooms at the same school. We report on the accuracy of the NLP idea detection, the impact of maximized versus random grouping, and the role of the question bank in focusing the discussion on student ideas. We found that the similarity of student ideas limited the value of maximizing idea variety and that the question bank facilitated students’ use of knowledge integration processes.

Integrating Youth Perspectives into the Design of AI-Supported Collaborative Learning Environments

by Megan Humburg, Dalila Dragnić-Cindrić, Cindy E. Hmelo-Silver, Krista Glazewski, James C. Lester, and Joshua A. Danish
Educ. Sci. 2024, 14(11), 1197; https://doi.org/10.3390/educsci14111197

This study highlights how middle schoolers discuss the benefits and drawbacks of AI-driven conversational agents in learning. Using thematic analysis of focus groups, we identified five themes in students’ views of AI applications in education. Students recognized the benefits of AI in making learning more engaging and providing personalized, adaptable scaffolding. They emphasized that AI use in education needs to be safe and equitable. Students identified the potential of AI in supporting teachers and noted that AI educational agents fall short when compared to emotionally and intellectually complex humans. Overall, we argue that even without technical expertise, middle schoolers can articulate deep, multifaceted understandings of the possibilities and pitfalls of AI in education. Centering student voices in AI design can also provide learners with much-desired agency over their future learning experiences.

Applying Natural Language Processing Adaptive Dialogs to Promote Knowledge Integration During Instruction

by Weiying Li
Educ. Sci. 2025, 15(2), 207; https://doi.org/10.3390/educsci15020207

We explored the value of adding NLP adaptive dialogs to a web-based, inquiry unit on photosynthesis and cellular respiration designed following the Knowledge Integration (KI) framework. The unit was taught by one science teacher in seventh grade middle school classrooms with 162 students. We measured students’ integrated understanding at three time points across instruction using KI scores. Students received significantly higher KI scores after the dialog and with instruction. We found that students who had complete engagement with the dialogs at three time points during instruction received higher KI scores than those who had inconsistent engagement with the dialog across instruction. By investigating the idea progression among students with full engagement with the dialogs, we found significant improvements in KI scores in revised explanations after the dialog at three instruction time points, with significant interaction with the dialog and instruction facilitating a shift toward more KI links. Two rounds of guidance in the dialog elicited more ideas. Students were more likely to add mechanistic ideas of photosynthesis reactants and cellular respiration after the dialog, especially during and after instruction. Case analyses highlight how adaptive dialogs helped one student refine and integrate scientific mechanisms at three time points. These findings demonstrate the potential of combining NLP adaptive dialogs with instruction to foster deeper scientific reasoning.

Investigating Teachers’ Use of an AI-Enabled System and Their Perceptions of AI Integration in Science Classrooms: A Case Study

by Lehong Shi, Ai-Chu (Elisha) Ding, and Ikseon Choi
Educ. Sci. 2024, 14(11), 1187; https://doi.org/10.3390/educsci14111187

Recent research indicates the significant potential of artificial intelligence (AI) in enhancing teachers’ instructional practices in areas such as lesson planning, personalized teacher intervention and feedback, and performance assessment. To fully realize the potential of AI in teaching, it is crucial to understand how teachers innovatively apply and critically evaluate AI applications in their teaching practices. However, there is a research gap in investigating how teachers use various features of an AI-enabled system and their perceptions of AI integration in teaching to promote teachers’ effective AI integration practices. Employing an exploratory case study design, we investigated how six science teachers utilized an AI-enabled inquiry intelligent tutoring system (Inq-ITS) within their teaching and examined their perceptions of AI integration. Classroom observations and teacher interview data were collected. When using Inq-ITS functionalities, two teachers with a pedagogical orientation of teacher-guided scientific inquiry mainly engaged with its virtual tutor and teacher report summary features. Conversely, four teachers, practicing the pedagogical orientation of AI-guided scientific inquiry, relied on the AI system to guide student learning, interacting intensively with its features, particularly real-time teacher alerts and teacher inquiry practice support. Regardless of the differences in using Inq-ITS features, all teachers recognized the potential benefits of pedagogical change and encountered various challenges. This analysis also revealed that teachers exhibited distinct perceptions regarding the role of Inq-ITS integration in their teaching. Teachers who adopted a teacher-guided pedagogical orientation perceived the Inq-ITS as a supporting tool that enhanced traditional teaching methods. In contrast, those with an AI-guided pedagogical orientation viewed the Inq-ITS as akin to a teaching assistant and a pedagogical collaborator. The findings underscored the importance of enhancing teachers’ realization of the pedagogical affordance of AI in teaching through their use of AI functionalities. It is essential to consider teachers’ diverse perceptions of AI integration when promoting their integration of AI into teaching practices.

Sequence Analysis-Enhanced AI: Transforming Interactive E-Book Data into Educational Insights for Teachers

by Yaroslav Opanasenko, Emanuele Bardone, Margus Pedaste, and Leo Aleksander Siiman
Educ. Sci. 2025, 15(1), 28; https://doi.org/10.3390/educsci15010028

This study explores the potential of large language models as interfaces for conducting sequence analysis on log data from interactive E-Books. As studies show, qualitative methods are not sufficient to comprehensively study the process of interaction with interactive E-Books. The quantitative method of educational data mining (EDM) has been considered as one of the most promising approaches for studying learner interactions with E-Books. Recently, sequence analysis showed potential in identifying typical patterns of interaction from log data collected from the Estonian Interactive E-Book Platform Opiq, allowing one to see the types of sessions from students in different grades, clusters of students based on the amount of the content they studied, and the interaction type they preferred. The main goal of the present study is to understand how teachers can utilize insights from CustomGPT to enhance their understanding of students’ interaction strategies with digital learning environments (DLEs) such as Opiq, and what the potential areas for further development of such tools are. We specified the process for developing a chatbot for transferring teachers’ queries into sequence analysis results and gathered feedback from teachers, allowing us both to estimate current design solutions to make sequence analysis results available and to find potential vectors of its development. Participants provided explicit feedback on CustomGPT, appreciating its potential for group and individual analysis, while suggesting improvements in visualization clarity, legend design, descriptive explanations, and personalized tips to better meet their needs. Potential areas of development, such as integrating personalized learning statistics, enhancing visualizations and reports for individual progress and mitigating AI hallucinations by expanding training data, are described.

Large Language Model and Traditional Machine Learning Scoring of Evolutionary Explanations: Benefits and Drawbacks

by Yunlong Pan and Ross H. Nehm
Educ. Sci. 2025, 15(6), 676; https://doi.org/10.3390/educsci15060676

Few studies have compared Large Language Models (LLMs) to traditional Machine Learning (ML)-based automated scoring methods in terms of accuracy, ethics, and economics. Using a corpus of 1000 expert-scored and interview-validated scientific explanations derived from the ACORNS instrument, this study employed three LLMs and the ML-based scoring engine, EvoGrader. We measured scoring reliability (percentage agreement, kappa, precision, recall, F1), processing time, and explored contextual factors like ethics and cost. Results showed that with very basic prompt engineering, ChatGPT-4o achieved the highest performance across LLMs. Proprietary LLMs outperformed open-weight LLMs for most concepts. GPT-4o achieved robust but less accurate scoring than EvoGrader (~500 additional scoring errors). Ethical concerns over data ownership, reliability, and replicability over time were LLM limitations. EvoGrader offered superior accuracy, reliability, and replicability, but required, in its development a large, high-quality, human-scored corpus, domain expertise, and restricted assessment items. These findings highlight the diversity of considerations that should be used when considering LLM and ML scoring in science education. Despite impressive LLM advances, ML approaches may remain valuable in some contexts, particularly those prioritizing precision, reliability, replicability, privacy, and controlled implementation.