Abstract
Many have investigated the effectiveness of teaching methods across different disciplines. In a previous study, we found the superior effectiveness of a two-part learning sequence consisting of self-study followed by discussion in humanities education. To examine the possibility of similar findings in science education, we compared that sequence with a traditional lecture–based teaching. Results indicate that self-study combined with discussion yields higher test scores, demonstrating its potential as a beneficial teaching strategy in science education. In Experiment 1, undergraduate students from various disciplines studied introductory biology under five different conditions: self-study and discussion, lecture and discussion, lecture and review, self-study and review, and self-study and review with additional learning materials. Results indicated that the self-study and discussion condition yielded higher test scores than both the lecture and discussion condition and the three review groups. There were no significant differences in scores among the three review groups. These results demonstrate that discussion is an effective learning method and that self-study enhances the benefits of discussion. In Experiment 2, which involved science and engineering majors, we successfully replicated the design of Experiment 1 using physics for the two discussion conditions: self-study or lecture. We also performed in-depth analyses of student interactions during the discussions. Among the two discussion groups, the self-study and discussion group exhibited more active and constructive engagement in sharing unknowns, leading to enhanced learning. These findings suggest that integrating self-study with discussion can significantly enhance learning outcomes in science education. This approach not only improves test performance but also fosters deeper student thinking. Thus, educators might consider adopting this method to create more interactive and effective learning environments in science courses.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many educators have explored the effectiveness of teaching methods in different disciplines. With the growing impact of science and technology, efforts to improve science teaching methods are increasing (e.g., Taber & Akpan, 2017). Part of this includes exploring how science education differs from humanities education. Philosopher Foley (2018) argues that science pursues non-indexical (or general), non-perspectival (or impartial), descriptive, and collective insights, whereas the humanities pursue localized, contextualized, prescriptive, and individual insights. Moreover, science instruction often involves asking questions, conducting investigations, and analyzing data (Hutner & Sampson, 2015; NRC, 2012, 2013). The Next Generation Science Standards (NRC, 2013) likewise urge science teachers to create a need for learning and make students’ thinking visible. They recommend engaging students in activities before delving into content and involving them in both science practice and negotiating meaning. Consequently, effective learning methods in humanities may not be readily accepted in sciences.
While the sciences and humanities are often perceived as fundamentally different in their objectives and practices, this distinction does not necessarily preclude the applicability of effective teaching strategies across both domains. Except for engaging in actual scientific activity (NRC, 2013), these practices are relevant not only to science education but also to humanities education, to varying degrees. In this context, this study aims to determine whether the combination of self-study and discussion, proven effective in the humanities (Lim & Park, 2023), can likewise enhance learning in science education.
Lim and Park (2023) examined a two-part learning sequence comprising self-study and discussion in the humanities. They defined self-study as individual learning to understand or master the learning material, such as reading and re-reading, rehearsing, self-explaining, taking notes, and drawing a concept map. Lim and Park (2023) chose an unfamiliar topic for undergraduate students, specifically the criminal procedure code. They found that combining self-study and discussion improved learning outcomes more than combining watching a lecture and discussion. They also found that the learning outcome of these two combinations was better than that of the combination of watching a lecture and reviewing. These results show that self-study and discussion can be a viable alternative to lectures. Based on these findings, we thus aim to examine if self-study and discussion are effective in science education.
1.1 Theoretical Background
Self-study offers significant benefits, particularly when combined with discussions. It allows students to learn at their own pace, focusing more on challenging concepts, which leads to a deeper understanding (Lim et al., 2023). Self-study promotes active learning through note-taking, summarizing, and self-testing, enhancing retention and comprehension (e.g., Chi, 2009). Additionally, it helps identify knowledge gaps that can be addressed during discussions, making them more productive. Discussions, as highlighted by Murphy et al. (2018), enhance comprehension and critical thinking. Discussions allow students to articulate their understanding, question assumptions, and engage in collaborative problem-solving. Therefore, since these two activities are different and complementary activities, combining self-study with discussions enables students to build a foundational understanding independently and then refine it through peer interaction (cf. Lim et al., 2023).
Unfortunately, at present, neither self-study nor discussion is widely used in STEM classrooms (Deslauriers et al., 2019; Freeman et al., 2014; Stains et al., 2018). Stains et al. (2018), for example, examined learning environments in the STEM field. They gathered data from over 2,000 STEM classes taught by over 500 STEM faculty. Then, they analyzed 13 student behaviors (e.g., listening, answering questions) and 12 instructor behaviors (e.g., lecturing, posing questions) every 2 min for each class. Results showed that lecturing was the most common behavior observed in instructors, accounting for an average of 74.9% of the total classroom activities. Students spent 87.1% of the time primarily listening to the instructor and only 18% of the observations involved a student-centered classroom style. Likewise, didactic and transmissive lecture-based instruction has clear limitations: lecturers try to pass on too much information and students cannot maintain their attention (e.g., Poh et al., 2010). Consequently, students do not acquire very much knowledge (e.g., Hake, 1998; Hrepic et al., 2007; Snyder, 1971; Wieman & Perkins, 2005) nor are they encouraged to think broadly or deeply (e.g., Bonawitz et al., 2011; Trigwell & Prosser, 2020). More fundamentally, lectures are based on the belief that a person’s knowledge can be fully transmitted to another person (Schmidt et al., 2015).
However, most researchers take a different view about knowledge acquisition. They believe that students need to use their prior knowledge first to interpret new information in their own words and then struggle to construct knowledge by writing their understandings or discussing with other students during learning. These researchers urge to replace lectures with more active learning methods or at least use lectures in combination with other activities (e.g., Chi, 2009; Chi & Wylie, 2014; Deslauriers et al., 2019; Freeman et al., 2014).
In fact, there is evidence that active learning improves the learning effect compared to lectures. Freeman et al. (2014) conducted a meta-analysis of 225 studies comparing the academic performances of traditional lecture–centered classes and active learning classes in the field of STEM. Active learning in the study varied in method and intensity, but all involved some form of student engagement during class time. In contrast, traditional passive learning was defined by continuous exposition by the instructor, where student activity was largely limited to note-taking and occasional, unprompted questions. Academic performance was measured based on (1) equivalent test scores, concept inventories, and other assessments and (2) the rate of receiving D and F grades, as well as withdrawing from the course (DFW rate). The results showed that active learning leads to higher scores and lower DFW rates compared to traditional lectures. Furthermore, the findings indicated that in STEM classrooms, instructors primarily focused on a teaching-by-telling approach.
Along the same lines, Deslauriers et al. (2019) conducted an experiment to compare students’ actual learning and their perception of learning under two different instructional methods: active learning and passive lectures. The study was conducted in large-enrollment introductory college physics courses at the university. The active learning sessions involved students working in small groups to solve problems, with the instructor providing guidance, while the passive lectures involved traditional chalkboard teaching. The experiment was repeated twice, once in the fall semester and once in the spring, to ensure consistency and reliability of the results. Their findings showed that posttest scores were higher in the active learning classroom, whereas students preferred the passive learning classroom. Therefore, based on the two studies mentioned above, it is evident that active learning should be implemented in STEM classrooms rather than lecture-centered learning.
Similarly, dialogic teaching interventions have been proven effective in enhancing reasoning and problem-solving skills, with students in dialogic programs showing significant progress in various subjects (Alexander, 2018; Mercer & Sams, 2006). European Commission–funded projects also support that dialogic learning environments improve academic outcomes by fostering both individual and collaborative knowledge construction (Flecha, 2015; Howe & Abedin, 2013). In the same vein, Chi and Wylie (2014) demonstrated that academic achievement followed a distinct pattern, being lowest in the Passive mode, then progressively increasing through the Active and Constructive modes, and reaching its highest in the Interactive mode. For instance, students merely listening to a lecture (Passive) tend to retain less information compared to those who listen and then engage in activities like discussing the material (Active), generating their own explanations (Constructive), or engaging in group discussions where they explain concepts to each other and build on each other's ideas (Interactive). But, in the Interactive mode, since each learner engages in constructive learning repeatedly through conversation, students should prepare for their dialogic interactions (Chi & Wylie, 2014; Lam & Kapur, 2018).
However, as learning has become a means to obtaining a degree or to various experiences rather than being valued for learning itself (e.g., Fischman & Gardner, 2022), students’ absolute study time has decreased (e.g., Babcock & Marks, 2011). Students often enter class without any preparation and perceive themselves as consumers of education (e.g., Bunce et al., 2017). Even for the few students who do come to class prepared, the importance of preparation is lost on them since the contents that they have studied on their own are delivered in the lecture during class. In addition, since evaluation often covers what is learned, students devote themselves to memorization of lecture material and ultimately do not utilize self-directed learning methods.
In this context, the purpose of this study is to determine if the combined method of self-study and discussion, effective in the humanities, can also enhance learning outcomes in STEM disciplines such as biology and physics. It explores whether the unique characteristics of science education require distinct methods compared to the humanities. By examining this approach in STEM, we seek to understand if self-study followed by discussion can improve comprehension and retention in science subjects. This research will investigate if integrating self-study and discussion can address the limitations of traditional lecture–based instruction, potentially supporting a shift toward more interactive and student-centered learning in STEM classrooms.
2 Experiment 1
In Experiment 1, we addressed the question of whether the learning outcomes of Lim and Park’s (2023) method in the natural sciences are different from those in the social sciences. Following their framework, we examined the effect of different individual preparation activities and discussions on learning: lecture and discussion (LD group) versus self-study and discussion (SD group). Furthermore, in order to observe whether discussions actually improve learning, test scores after discussions were compared with those after review (the LR and SR groups). The present study also added an exploratory group (i.e., the SRA group) that was provided with additional instructional materials, since students provided with more information are predicted to be more successful (cf. Glogger et al., 2012; Winne & Perry, 2000). Moreover, STEM instructors believe that learning science requires mastering many concepts and that mathematics is used extensively compared to the humanities (e.g., Golub, 2019; Kuhn, 1962). All in all, students were assigned to five conditions: discussion after watching a video lecture (LD), discussion after self-study (SD), reviewing after watching a video lecture (LR), reviewing after self-study (SR), and reviewing after self-study with additional instructional materials (SRA).
In line with the results from Lim and Park (2023), we predicted that the combination of self-study and discussion will produce learning outcomes significantly higher than those of watching a lecture and discussion, both of which are superior to those of reviewing. We hypothesized that the order of learning outcomes would be LR, SR, SRA, LD, and SD, in ascending order.
2.1 Methods
2.1.1 Participants
Undergraduate students at a university in Seoul participated in this experiment for course credits (ages 19–24 years). All students were Asian, and the university was located in a mid-high socioeconomic area of the city. The study was conducted early in the second semester of the academic year. A total of 137 students, including 54 women, participated in the experiment. Participants came from a range of majors, including Computer Science, Business, Engineering, Biology, and Psychology. For detailed information on the distribution of students across all majors, refer to Table 1. The sample size for this experiment was determined based on the guidelines and precedent set by Lim and Park (2023) in their study on the effectiveness of self-study and discussion. This approach ensures statistical power and the ability to detect significant differences between groups, aligning with standard practices in educational research to ensure robustness and generalizability.
Students were randomly assigned to the five groups: LR (n = 27), SR (n = 30), SRA (n = 18), LD (n = 30), and SD (n = 32).
2.1.2 Instructional Video
The video lecture featured a single speaker delivering the content directly to the camera without any interactive elements or audience participation, which is referred to as monologue style. The instructor in the video is an expert in the field, ensuring that the information is accurate and up to date. Since it was not edited, students watched the lecture exactly as it was originally filmed. We used a video lecture from a course in introductory biology, specifically on covalent bonds, non-covalent bonds, the octet rule, etc. These concepts are not only foundational to biology but are also essential in other fields of science, including chemistry, physics, and materials science. The understanding of these bonds and rules is critical for a wide range of scientific disciplines, making the topic broadly applicable regardless of one’s specific field of study.
The length of the video was 20 min, and the lecture was delivered by a male instructor. In the two lecture conditions, students watched the video lecture on their personal screens. While watching the lecture, students were not allowed to stop or rewind the video or manipulate it any other way.
2.1.3 Written Material
The written instructional material contained information covered in the lecture. However, it was not a transcription of the video. Two domain experts reviewed the instructional video and written material to ensure that both addressed the same key concepts (see Appendix A). All the students were provided with the same written material at the start of the experiment, except the SRA group, which received additional instructional material. The number of key concepts covered in the material for the SRA group was the same, but additional examples and explanations were added.
2.1.4 Test Items
The test items are shown in Appendix A. The test questions were made up of items testing deep knowledge. Based on prior work by Lam and Muldner (2017), we defined deep knowledge based on several criteria that typically include the following aspects: the ability to analyze, evaluate, and create. Deep knowledge outcomes are typically contrasted with shallow (or surface) knowledge, where students may only memorize information without fully understanding or being able to use it in different contexts. Students are required to possess factual definitions and descriptions of the concepts while also demonstrating the ability to apply or transfer them. The final test consisted of ten questions worth a total of 33 points, and it required students to transfer knowledge they have acquired, involving novel situations that were not present in the instructional material.
Although there was an answer sheet, since the questions required explaining the concepts learned, it was possible to get the answer right even if it was not worded exactly the same. To exclude the subjectivity of scoring, we calculated intra-class correlations (ICCs) between the three raters. Forty-two tests were randomly selected and scored by the first author and two raters using an answer sheet (30% of the final test answers). ICC (3, k) was as high as 0.90 (p < 0.001), and thus, the remaining test responses were graded by the first author.
2.1.5 Design and Procedure
The procedure description is shown in Fig. 1. Students were briefed by the researcher, who explained that experimental results would be anonymous and would not affect their course grades. After students signed the consent form, they were randomly assigned to five conditions and participated in the experiment in separate locations. (1) The LR group individually watched the 20-min video lecture. After a 5-min break, students reviewed on their own for 15 min using the written material. (2) The SR group studied the written material on their own for 20 min, without lecture. After a 5-min break, they continued with another 15 min of reviewing, again individually, without any interactive elements or additional materials. (3) The SRA group underwent the same procedure as the SR group but received longer learning material with additional explanations and examples. (4) The LD group watched a video lecture individually for 20 min along with the written material. After a 5-min break, they held 15-min discussions in groups of three or four (the total number of groups was nine). (5) The SD group, after individual studying the written material on their own for 20 min, took a 5-min break and then discussed for 15 min in groups of three or four (the total number of groups was ten).
The experimenter did not intervene while the students were learning (e.g., during review, self-study, or discussion). In other words, the experimenter did not give further instructions or engage in any intervention during the experiment. After the learning phase, the researcher administered a test for 20 min. Students who were done before 20 min were asked to go over their answers until the time was over.
2.2 Results
2.2.1 Learning Outcome
The mean and standard deviation of the test scores are presented in Table 2. A series of analyses of variance (ANOVA) was conducted to examine the differences between the test scores of the groups.
Significant differences were found among the groups in total mean values, F (4, 132) = 23.73, p < 0.001, partial eta-squared = 0.42. Subsequent comparisons revealed that the total mean of the SD group was significantly higher than that of the other four groups, ps < 0.05. In addition, the LD group received significantly higher scores than the LR group, p < 0.001; the SR group, p = 0.001; and the SRA group, p < 0.001. However, there was no difference among the scores of the LR, SR, and SRA groups, p = 1.000.
3 Experiment 2
Experiment 2 addressed the question of whether there really is a difference in learning between topics that require mathematical knowledge and those that do not. Thus, in order to address whether the learning process differs between biology and physics, Experiment 2 was designed. In addition to the fact that physics was covered rather than biology, Experiment 2 differed from Experiment 1 as follows. First, we only included students majoring in STEM fields since differences in prior knowledge could affect the results. Second, only two discussion groups, i.e., the SD and LD groups, were compared. Third, since combinations of different learning activities can affect transfer as well as rote memory (as shown by Chi & Wylie, 2014; Schwartz & Bransford, 1998), the final test consisted of verbatim items that measure shallow learning and transfer items measuring deep learning.
We subsequently analyzed the content of the discussions to examine the process of learning. We predicted that there would be significant differences in the quality of students’ utterances if the SD group performed better than the LD group.
3.1 Methods
3.1.1 Participants
Second- and third-year undergraduate students who did not participate in Experiment 1 were recruited (N = 73; 51 men and 22 women) at the same university. All participants were Korean. They were all majoring in science and engineering disciplines. Out of the 73 students who participated, the majority were majoring in Chemistry, Physics, and various Engineering fields. Table 3 shows detailed information. Students were randomly divided into two groups: the LD group (n = 37, a total of 12 groups) and the SD group (n = 36, a total of 11 groups).
3.1.2 Materials
The study used the following materials related to the photoelectric effect and the Compton effect: (1) an eight-page instructional written text, (2) a 30-min instructional video lecture, and (3) a pretest and posttest.
Instructional Video and Written Material
The 30-min video lecture was of monologue style. We used one video lecture on physics, specifically dealing with the photoelectric effect and the Compton effect. A male instructor delivered the lecture conveying information in the written material, explaining how to apply and use the key concepts. For example, the instructor would solve a sample problem and apply the presented formula to explain it. While watching the video, students could not manipulate it, such as stopping or rewinding.
The written instructional material is presented in Appendix B. The eight-page written text was designed to provide a general overview of the photoelectric effect and the Compton effect to help prepare students for the discussion afterwards. The material was not a transcription of the video lecture. All the students were provided with the written material at the start of the experiment.
Pretest and Posttests
To measure learning gain, we created two twin tests with 10 items. Each test consisted of (1) four shallow knowledge questions comprised of multiple-choice questions with five options and (2) six deep knowledge questions made up of three multiple-choice questions and three descriptive-type questions assessing the transfer of learning. The multiple-choice questions included an “I have no idea” option. The transfer questions included items that require comprehension of the photoelectric effect and the Compton effect but without explicitly mentioning these concepts. For example, a question on the Compton effect asked “X-rays of 10 [pm] wavelength is scattered at 45 degrees from the target. Find the maximum kinetic energy of the reflected electron” (see Appendix B for more examples).
The intra-class correlation was used to check the interrater reliability. Three raters scored 20% of the data, randomly selected from each of the two tests (pre- and posttest). Since ICC (3, k) was high enough for both (pretest = 0.94, p < 0.001; posttest = 0.91, p < 0.001), one rater scored the remaining responses.
We combined twin tests into one by assigning the items to odd and even numbers. In a separate pilot experiment, 25 participants watched the same lecture for 30 min, reviewed for 15 min, and then took a 20-question test. After the test, the items were divided again into odd and even number groups, and no differences were observed in the means. The correlation calculated through the Spearman-Brown prophecy formula was 0.82. Thus, we judged the two tests equivalent, and we used the 10 odd-numbered items as a pretest and even-numbered ones as a posttest.
3.1.3 Design and Procedure
Procedures were the same as in Experiment 1, except for the following two changes. One was that only the SD and LD groups were included. The other was that all the discussions were audio-recorded. Students were randomly assigned to two groups. The two groups studied in different locations. (1) In the LD condition, students were provided with written material, watched a video lecture for 30 min, took a 5-min break, and engaged in a discussion for 15 min. (2) The SD group was also given the written material and studied by themselves for 30 min. They then took a 5-min break and participated in a 15-min discussion in groups of three or four students (~ 50 min). The discussion was student-led, and the experimenter did not provide any directions. After the study session, both groups took a 20-min test (~ 70 min).
3.2 Results
3.2.1 Learning Outcome
Shallow Learning
We checked students’ prior knowledge using the pretest and found no significant differences across the two conditions, F (1, 71) = 1.49, p = 0.227, partial eta-squared = 0.02. We used analysis of covariance (ANCOVA) with the pretest score as a covariate to measure gains from pre- to post-learning. ANCOVA with the pretest score as the covariate and the posttest score as the dependent variable revealed a significant difference between the two groups, F (1, 69) = 25.13, p < 0.001, partial eta-squared = 0.27. The planned comparison showed that the posttest scores of the SD group on shallow knowledge items were significantly higher than those of the LD group, t (71) = 5.05, p < 0.001, d = 1.19.
Deep Learning
The pre- and posttest results are summarized in Table 4, based on that of 73 students who completed all learning activities and transferred posttest items. We used ANCOVA to analyze the differences in outcome across conditions. The level of deep learning shown in the pretest scores between the two groups was not significant, F (1, 71) = 0.05, p = 0.821, partial eta-squared = 0.00. However, as with the shallow item results regarding the shallow items, there was a significant difference in the level of deep learning shown in the posttest scores, F (1, 69) = 8.73, p < 0.05, partial eta-squared = 0.11. We found that the posttest scores of the SD group were significantly higher than those of the LD group, t (71) = 3.55, p = 0.001, d = 0.83.
The interaction of the two groups of students was significantly different even though the instructor did not lead the interactions. What caused this difference between the two conditions? To answer this question, we analyzed and compared students’ conversations.
3.2.2 Students’ Conversations Within Small Groups
In this set of analyses, we explored whether there were differences in the way the students interacted with one another. We analyzed the students’ conversation data to capture their engagement behavior. The present study referred to Chi et al. (2017) when analyzing the verbal content of the discussion. The audio-recorded discussions were first transcribed for the analyses, then segmented into statements, and coded. To identify students’ substantive contributions, we segmented their verbal utterances at a phrasal level according to Chi et al. (2008). We used the following definition of a substantive comment: meaningful contribution to an ongoing activity such as problem-solving. Segments related to topics learned either during self-study or during watching a lecture were considered substantive. However, simply reading or repeating statements in the material or video and making off-topic comments or meta-talk such as “umm” or “yeah” were not classified as substantive. We wanted to analyze ideas and thoughts that were related to the topic of interest, i.e., the photoelectric effect and the Compton effect, but not explicitly from the instructional material. In this context, conversational starters related to the material were also considered substantive comments. The following consecutive student utterances (from script no.7) are examples of either substantive or non-substantive comments:
-
[1]
Then, what did you think about electromagnetic waves?
-
[2]
Is this part related to the electromagnetic wave theory?
-
[3]
(reading a text) The text shows that when the intensity of light decreases, it takes time to reach a large vibration, or amplitude.
Segment [1] is related to the material on hand and is a starting point of a new conversation topic, which is considered substantive. Segment [2] is also considered a substantive comment because it is a phrase that specifies what was said in segment [1]. However, segment [3] is not considered substantive as the student is simply reading what is written in the material.
Students’ conversations were further analyzed to explore how interactive the students were within the groups. Transcripts were segmented into episodes. An episode is a multiturn conversation on the same topic or concepts within the instructional material. An episode was considered an interaction episode if more than two students in a group provided at least one substantive comment. Appendix C includes an example of an interaction episode. Student 1 provided substantive comments in lines 1 and 4, while Student 2 and Student 3 also provided substantive comments in lines 2 and 3 in response. For example, Episode 3 on the Compton effect was identified as an interaction episode.
A turn is defined as a change in speakers (e.g., Traum & Heeman, 1996). Chi et al. (2017) defined a co-constructive turn as a change in speakers that contains substantive contributions from both speakers. In the present study, we defined a co-constructive turn as a change in speakers that contains substantive comments from two or more students in a group. For example, in Episode 7 of Appendix C, Student 6 made a substantive comment at the beginning of Turn 1, and Student 4 made a substantive comment afterwards. Therefore, these two turns in which the two students provided substantive comments are identified as one co-constructive turn. However, Student 5 did not provide a substantive comment, immediately afterwards. Thus, Episode 7 contains one co-constructive turn in total. We predicted that the SD group would have more substantive comments, interactive episodes, and co-constructive turns than the LD group in their discussions.
3.2.3 Analyses of Discussions
One of the researchers and two graduate students coded 20% of randomly selected transcripts to analyze the dialogues. Since interrater reliability was sufficiently high (ICC (3, k) = 0.92, p < 0.001), thus the remaining transcripts were coded by one of the researchers. Table 5 shows the mean values and standard deviations for the number of substantive comments, interaction episodes, and co-constructive turns in the two conditions. We also examined how many utterances the students gave in each group and the proportion of substantive comments among the utterances.
The SD group made more utterances than did the LD group (58.36 vs. 40.83). The SD group also generated more substantive comments than did the LD group (49.45 vs. 24.17). Furthermore, the SD group had a higher proportion of substantive contributions than the LD group (79.45% vs. 55.88%). There was no significant difference between the two groups in the number of interaction episodes (3.73 vs. 3.33). This may possibly be due to the lack of a number of concepts in the instructional material. This result suggests that further analysis of co-constructive interactions is necessary to identify significant differences between the two discussion groups.
Therefore, we analyzed the number of substantive comments in an interaction episode. The results showed that discussions from the SD group were richer in that episodes contained more substantive comments than those from the LD group (14.60 vs. 7.97). We also confirmed that when the members in the SD group discussed, there were more back-and-forth substantive comments (also co-constructive turns) within each episode than those in the LD group (11.17 vs. 18.91).
Therefore, students in the SD group generated more substantive contributions than those in the LD group, and this pattern appears to correspond to differences in their achievement. Since generating substantive contributions is associated with greater learning (Chi et al., 2008), these results may explain the differences in student learning outcomes. The next analysis attempts to go through this possibility step by step. We now present exploratory analyses on our sample of audio-recorded dialogues to shed light on the results.
3.2.4 From Substantive Comments to Learning Outcome
The above results suggest that substantive comments may have a direct impact on student learning outcomes (in particular, transfer). To substantiate our interpretation that substantive comments had a direct impact on the scores of transfer items, we ran an exploratory regression analysis. We set in-group members’ average scores of transfer items as the dependent variable and the number of substantive comments, preparation activity, and substantive comments × preparation activity as the explanatory variables. Since individual preparation activity (self-study or watching a lecture) is a categorical variable, we transformed it into two binary dummy variables (0, 1), where 0 corresponds to watching a lecture and 1 corresponds to self-study. The results are presented in Table 6.
The overall model was significant, F (3, 19) = 11.43, p < 0.001 (R-squared = 0.64). The effect of the number of substantive comments or preparation activity by itself was not significant, p = 0.193, p = 0.103, respectively. However, the effect of the number of substantive comments × preparation activity was significant, p = 0.012. These results imply that substantive contributions might have influenced students’ scores of transfer items in the LD and SD groups. In line with the findings from previous research (Lim & Park, 2023; Muldner et al., 2014), our results also show that the interaction term is modest yet significant, indicating substantive comments have a stronger positive relationship with student learning outcomes in the context of self-study as opposed to watching a lecture.
4 Discussion
This research underscored the importance of exploring diverse instructional methods to enhance academic achievement across disciplines. Specifically, this study examined whether the learning methods successfully applied in the humanities by Lim and Park (2023) can also be applied to the sciences. In Experiment 1, undergraduate students were placed in five different groups and studied topics in introductory biology. We compared the average test scores of the five groups: self-study and discussion (SD), lecture and discussion (LD), and three different review groups: lecture and review (LR), self-study and review (SR), and self-study and review with additional material (SRA). The results showed that the SD group had the highest scores, followed by the LD group and other review groups in descending order. There was no significant difference among the three review groups (Table 2). Based on previous studies by Chi 2009, 2014) and Lim and Park (2023), it is expected that discussion groups would outperform the review groups, underscoring the efficacy of discussions as a learning activity. Interestingly, the SRA group, which received additional material, also scored lower than the discussion groups. This outcome can be attributed to several factors. Firstly, cognitive overload may have occurred due to the excessive amount of information, hindering effective processing and learning (Glogger et al., 2012). Secondly, the additional materials may have emphasized quantity over quality, making it difficult for students to identify and focus on key concepts. Lastly, the complexity of the additional materials might have posed challenges, preventing students from fully understanding and applying the concepts.
Experiment 2 extended our investigation to topics in physics, which require mathematical knowledge. Only the two discussion groups (LD and SD) were compared, and students’ prior knowledge was controlled as a covariate. Consistent with the results of Experiment 1, the SD group scored higher than the LD group. Moreover, the SD group was also superior to the LD group in terms of learning gains (Table 4). Predicting that there would be differences in overt behaviors of the students when discussing with their peers after self-study and after watching a lecture, we analyzed the dialogue of their discussions. The results showed that the SD group was beyond active, also constructive, and productive during the discussion compared to the LD group. Specifically, the SD group shared significantly more substantive comments, not only because they spoke more utterances but also more substantial contributions within each utterance. From the significant difference in the number of co-constructive turns, we can infer as more substantive comments are shared between the students within an interactive episode, the interaction becomes more constructive. Therefore, this result successfully replicates the findings of Lim and Park (2023), who demonstrated that the combination of self-study and discussions leads to more effective learning outcomes compared to traditional lecture–based methods, underscoring the enhanced productivity and engagement facilitated by self-study prior to discussions. That is, we can apply the speculative argument by Lim and Park (2023) to the present results: “the students in the self-study group are more likely to mark parts they did not understand. The marked parts may not be helpful for immediate learning, but they can be drawn out during discussions. The subsequent discussion would be more productive and lead to better learning outcomes.” Thus, the results highlight that engaging in discussions can be more beneficial for learning than merely increasing the amount of information for review, even in the context of science.
Also, given that discussions are very effective at fostering learning but impractical in terms of scalability, it is also not practical to provide instructors and teachers for every discussion group. Thus, finding alternative interventions that are beneficial for learning but also scalable has been a long-standing goal in the learning sciences field. Our findings indicate that the self-study and discussion combination has higher overall utility than the lecture and review and lecture and discussion combinations.
Our findings also support the preparation for future collaboration paradigm, which suggests that pre-discussion activities can significantly enhance the learning outcomes of subsequent discussions (e.g., Lam & Kapur, 2018; Lam & Muldner, 2017). This paradigm posits that preliminary activity prepares students to engage more deeply in discussions, as they come with a foundational understanding and specific points of confusion or interest to explore further during the collaborative phase. As suggested by Lam and Kapur (2018), the assumption that generative preparation enhances students’ ability to learn from subsequent collaborative activities can be extended to our findings. The self-study group likely benefited from the opportunity to engage in self-directed learning with new and challenging materials, which facilitated more effective collaboration and deeper learning outcomes. This aligns with the principles of self-regulated learning (SRL), a central component of self-study, which involves goal-setting, monitoring progress, and adjusting learning strategies. These processes are critical for successful discussion (Zimmerman & Schunk, 2011). Thus, through self-study, students develop the ability to understand their strengths and weaknesses for subsequent discussion, making them more effective team members.
While the present study provides significant insights into self-study and discussion combinations for science education, several limitations need to be acknowledged. First, the simplistic dichotomy between monologic (teacher-centered) and dialogic (learner-centered) approaches could not capture the nuanced and sophisticated nature of effective teaching and learning in the twenty-first century. Effective pedagogy in modern education requires a context-based and situated integration of both approaches, tailored to the specific learning environment and objectives (Lave & Wenger, 1991; Vygotsky, 1978). This integrated approach supports the co-construction of knowledge on both the intermental (social interaction) and intramental (individual cognitive processes) planes, facilitating a deeper and more comprehensive learning experience (Rogoff, 1990). Nonetheless, this study only aimed to emphasize that student-centered approaches, particularly self-study and discussion combination, can be effectively applied not only in the humanities but also in the fields of science.
Another limitation of the study is the lack of a comprehensive qualitative analysis in Experiment 1, which could have provided deeper insights into the underlying mechanisms driving the observed effects of the self-study and discussion combination. While Experiment 2 incorporated dialogue analysis to explore the reasons behind the superior performance of self-study and discussion, a similar analysis in Experiment 1 could have strengthened the overall findings. Future research should consider employing both qualitative and quantitative methods across all experiments to provide a more holistic understanding of the learning processes involved.
In addition, the sample size in our study was relatively small, which may limit the generalizability of our findings. In particular, the rationale for selecting the SRA group was not sufficiently detailed or justified, potentially undermining the credibility of our study design and methodology. This lack of clear rationale, combined with the small sample size, might have introduced biases that influenced the interpretation of our experimental outcomes.
Moreover, while this study focused on the comparison between self-study and lecture-based learning followed by discussion, there are other potential combinations of teaching methods that were not explored. Future studies could investigate the effects of integrating problem-solving activities, writing tasks, or group projects with self-study and discussion to identify the most effective combinations for different types of learning objectives. This line of inquiry could help in developing more versatile and adaptive teaching strategies that cater to a wide range of learners and educational goals.
Lastly, the present was conducted exclusively with university students under specific conditions that may not fully represent real-world scenarios, thereby limiting the applicability of our findings across different contexts. Therefore, follow-up research related to this study is in progress, one of which is examining what contributes to discussions following self-study being better than discussions after lectures. We are suggesting a possible line of inquiry: comparing the quantity and quality of questions students ask after self-study and watching a lecture. We also plan to generalize the study to a wider variety of topics and different age groups. This is important because, regardless of the psychological mechanisms or underlying factors contributing to better performance, it can provide a way to increase learning effectiveness. This is because STEM courses at the university level first teach concepts through lectures and then have students learn them through review, rather than have students study on their own and then resolve what they do not know through discussions in class. If the same results were to be obtained for more topics, then the use of the existing lecture-centered teaching method should be reduced. If discussion after lecture is better for some topics and discussion after self-study is better for others, you can find out what characteristics of each topic are and apply the more effective method in class. In short, extensive follow-up research on various topics in STEM can provide a basis for diversifying teaching methods.
Although the self-study and discussion combination and the lecture and discussion combination were compared in this study, it is also possible to add other activities such as writing or problem-solving. Adding these activities may deepen students’ understanding and provide opportunities to consolidate what they have learned. It would be worth exploring whether there is a golden ratio for these various learning activities and how the ratio varies depending on the subject. We hope that this study will spark researchers’ interest in these exciting explorations.
5 Conclusion
This study provides robust evidence that integrating self-study with discussion is a superior instructional strategy in science education compared to traditional lecture–based methods. Students who engaged in self-study followed by discussion consistently achieved higher test scores in biology than those who participated in traditional lecture–based methods or review sessions. Another experiment further confirmed these findings in a physics context, where STEM majors in the self-study and discussion group outperformed their peers in both shallow and deep learning assessments. Moreover, the analysis of student interactions during discussions revealed that those in the self-study group not only contributed more substantive comments but also engaged in more constructive and productive exchanges. These interactions, characterized by co-constructive turns and deeper engagement with the material, suggest that self-study effectively prepares students for richer, more meaningful discussions, ultimately leading to better comprehension and retention of scientific concepts. These findings underscore the potential of self-study and discussion as a highly effective instructional strategy in science education. Therefore, educators are encouraged to integrate this approach into their curricula to foster a more interactive and student-centered learning environment. Future research should aim to explore the broader applicability of this method across various scientific disciplines and educational contexts, as well as investigate the underlying mechanisms that contribute to its effectiveness.
6 Supplementary Information
Data Availability
Data are available from the corresponding author upon reasonable request.
References
Alexander, R. J. (2018). Developing dialogue: Process, trial, outcomes. University of Cambridge Faculty of Education.
Babcock, P., & Marks, M. (2011). The falling time cost of college: Evidence from half a century of time use data. Review of Economics and Statistics, 93(2), 468–478.
Bonawitz, E., Shafto, P., Gweon, H., Goodman, N. D., Spelke, E., & Schulz, L. (2011). The double-edged sword of pedagogy: Instruction limits spontaneous exploration and discovery. Cognition, 120(3), 322–330.
Bunce, L., Baird, A., & Jones, S. E. (2017). The student-as-consumer approach in higher education and its effects on academic performance. Studies in Higher Education, 42(11), 1958–1978.
Chi, M. T. (2009). Active-constructive-interactive: A conceptual framework for differentiating learning activities. Topics in Cognitive Science, 1(1), 73–105.
Chi, M. T., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243.
Chi, M. T., Roy, M., & Hausmann, R. G. (2008). Observing tutorial dialogues collaboratively: Insights about human tutoring effectiveness from vicarious learning. Cognitive Science, 32(2), 301–341.
Chi, M. T., Kang, S., & Yaghmourian, D. L. (2017). Why students learn more from dialogue-than monologue-videos: Analyses of peer interactions. Journal of the Learning Sciences, 26(1), 10–50.
Deslauriers, L., McCarty, L. S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39), 19251–19257.
Fischman, W., & Gardner, H. (2022). The real world of college: What higher education is and what it can be. MIT Press.
Flecha, R. (2015). Successful educational actions for inclusion and social cohesion in Europe. Springer.
Foley, R. (2018). The geography of insight: The sciences, the humanities, how they differ, why they matter. Oxford University Press.
Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Sciences, 111(23), 8410–8415.
Glogger, I., Schwonke, R., Holzäpfel, L., Nückles, M., & Renkl, A. (2012). Learning strategies assessed by journal writing: Prediction of learning outcomes by quantity, quality, and combinations of learning strategies. Journal of Educational Psychology, 104(2), 452.
Golub, K. (2019). Open science in the humanities, or: Open humanities? Publications, 7(3), 1–10.
Hake, R. R. (1998). Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66(1), 64–74.
Howe, C., & Abedin, M. (2013). Classroom dialogue: A systematic review across four decades of research. Cambridge Journal of Education, 43(3), 325–356.
Hrepic, Z., Zollman, D. A., & Sanjay Rebello, N. (2007). Comparing students’ and experts’ understanding of the content of a lecture. Journal of Science Education and Technology, 16, 213–224.
Hutner, T. L., & Sampson, V. (2015). New ways of teaching and observing science class. Phi Delta Kappan, 96(8), 52–56.
Kuhn, T. S. (1962). The structure of scientific revolutions. University of Chicago Press.
Lam, R., & Kapur, M. (2018). Preparation for future collaboration: Cognitively preparing for learning from collaboration. The Journal of Experimental Education, 86(4), 546–559.
Lam, R., & Muldner, K. (2017). Manipulating cognitive engagement in preparation-to-collaborate tasks and the effects on learning. Learning and Instruction, 52, 90–101.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge University Press.
Lim, J., & Park, J. (2023). Self-study enhances the learning effect of discussions. Journal of the Learning Sciences, 32(3), 455–476.
Lim, J., Shin, Y., Lee, S., Chun, M. S., Park, J., & Ihm, J. (2023). Improving learning effects of student-led and teacher-led discussion contingent on prediscussion activity. The Journal of Experimental Education, 92(4), 626–643.
Mercer, N., & Sams, C. (2006). Teaching children how to use language to solve maths problems. Language and Education, 20(6), 507–528.
Muldner, K., Lam, R., & Chi, M. T. (2014). Comparing learning from observing and from human tutoring. Journal of Educational Psychology, 106(1), 69.
Murphy, P. K., Greene, J. A., Firetto, C. M., Hendrick, B. D., Li, M., Montalbano, C., & Wei, L. (2018). Quality talk: Developing students’ discourse to promote high-level comprehension. American Educational Research Journal, 55(5), 1113-1160.
National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press.
National Research Council. (2013). Next generation science standards: For states, by states.
Poh, M. Z., Swenson, N. C., & Picard, R. W. (2010). A wearable sensor for unobtrusive, long-term assessment of electrodermal activity. IEEE Transactions on Biomedical Engineering, 57(5), 1243–1252.
Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social context. Oxford University Press.
Schmidt, H. G., Wagener, S. L., Smeets, G. A., Keemink, L. M., & van Der Molen, H. T. (2015). On the use and misuse of lectures in higher education. Health Professions Education, 1(1), 12–18.
Schwartz, D. L., & Bransford, J. D. (1998). A time for telling. Cognition and Instruction, 16(4), 475–5223.
Snyder, B. R. (1971). The hidden curriculum (1st ed.). Knopf.
Stains, M., Harshman, J., Barker, M. K., Chasteen, S. V., Cole, R., DeChenne-Peters, S. E., ... & Young, A. M. (2018). Anatomy of STEM teaching in North American universities. Science, 359(6383), 1468–1470.
Taber, K. S. & Akpan, B. (Eds.). (2017). Science education: An international course companion. Springer.
Traum, D. R., & Heeman, P. A. (1996). Utterance units in spoken dialogue. In Workshop on dialogue processing in spoken language systems (pp. 125–140). Springer Berlin Heidelberg.
Trigwell, K., & Prosser, M. (2020). Exploring University Teaching and Learning. Springer International Publishing.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
Wieman, C., & Perkins, K. (2005). Transforming physics education. Physics Today, 58(11), 36–41.
Winne, P. H., & Perry, N. E. (2000). Measuring self-regulated learning. Handbook of self-regulation, 531–566.
Zimmerman, B. J., & Schunk, D. H. (2011). Self-regulated learning and performance: An introduction and an overview. In B. J. Zimmerman & D. H. Schunk (Eds.), Handbook of self-regulation of learning and performance (pp. 1–12). Routledge.
Funding
Open Access funding enabled and organized by Seoul National University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
All experiments were reviewed and approved by the university’s Institutional Review Board (IRB, No. 2112/003–001).
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lim, J., Yang, J.W., Song, M.H. et al. Self-Study and Discussion Promote Students’ Science Learning. Sci & Educ (2024). https://doi.org/10.1007/s11191-024-00562-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11191-024-00562-8