Mastery Learning in Introductory Programming

Mastery Learning in Introductory Programming


LUR: introduction textbox

Mastery learning in CS1

The course IT1001 Information Technology, introduction is a CS1 course which gives an introduction to programming in Python for first year STEM teacher students at the NTNU. The course uses self-paced mastery learning inspired by Keller’s PSI, where the curriculum is divided into modules that must be taken in sequence by the students, but where each student can decide on their personal ambition level. Automated module tests are used both formatively and summatively with a high degree of test transparency. In parallel with the tests, students perform an individual programming project which is developed iteratively and incrementally.

The course runs every autumn, and the first offering was in 2023.

Paper presented at CDIO 2024

dividing line


LUR: Motivation for Creating a New Course

Motivation for creating a new course

We had three main reasons for creating a dedicated course in IT Fundamentals specifically for the Teacher Education in Science program:

  • Better class environment: STEM Teacher education did not have its own course in the first semester, with the class spread across various courses depending on their science specialization, often in large classes together with other study programs.
     
  • Improved relevance: The new course would primarily serve as an introductory programming course in Python, but it would also include learning outcomes that involve reflecting on the pedagogical use of programming, which would be particularly relevant for students training to become teachers.
     
  • Testing a different course design based on mastery learning: It was advantageous to have a smaller class for such a pilot (STEM Teacher education has approximately 50 students per year) rather than dealing with hundreds or thousands of students. For STEM teacher students, it was also especially relevant to be exposed to various teaching methods during their studies.

Sindre, G., Hansen, G., Korpås, G. S., Kirknes, A., Skøien, J. X., & Magnussen, J. L. (2023). Mastery Learning in Introductory Programming: Plan for a New Course. Learning about Learning, 10(1)

LUR: Motivation for Creating a New Course – toggler

Mastery learning was introduced in the 1960s by Benjamin Bloom ("Learning for Mastery") and Fred Keller ("Personalized System of Instruction" – PSI). Keller tried this method while teaching psychology at the University of Brasília in the 1960s. Key ideas:

  • Mastery of course material. The syllabus is divided into learning modules, and students must master the first module before moving on to the second, and so on, typically through a test with a high degree of perfection (e.g., a score above 90%).
     
  • Choosing one’s own pace ("self-pacing"). Some students learn faster, others slower, and each can choose a suitable pace from module to module. The main idea is that instead of forcing all students into the same pace, where some achieve mastery and others do not, everyone should be able to achieve mastery at their own pace, though this may require more time.
     
  • Do not introduce new material through full-class lectures! Since students are at different levels, full-class lectures are less effective. If lectures are used at all, it should be for motivation and demonstration, not for learning new material.

Interest in mastery learning grew in America in the 1970s, with positive results in empirical studies (Kulik et al., 1979; Kulik et al., 1990), but interest later declined, partly because of the resource-intensive nature of testing. Modern IT tools can help automate parts of the work related to conducting and grading tests, which brings hope for a resurgence in mastery learning (Eyre, 2007).


Bloom, B. S. (1968). Learning for Mastery. Instruction and Curriculum. Regional Education Laboratory for the Carolinas and Virginia, Topical Papers and Reprints, Number 1. Evaluation comment, 1(2), n2.

Keller, F. S. (1968). Good-bye, teacher... Journal of applied behavior analysis, 1(1), 79.

Keller, F. S. (1967). Engineering personalized instruction in the classroom. Revista Interamericana de Psicologia/Interamerican Journal of Psychology, 1(3).

Kulik, J. A., Kulik, C. L. C., & Cohen, P. A. (1979). A meta-analysis of outcome studies of Keller's personalized system of instruction. American psychologist, 34(4), 307.

Kulik, C. L. C., Kulik, J. A., & Bangert-Drowns, R. L. (1990). Effectiveness of mastery learning programs: A meta-analysis. Review of educational research, 60(2), 265-299.

Eyre, H. L. (2007). Keller's Personalized System of Instruction: Was it a Fleeting Fancy or is there a Revival on the Horizon?. The Behavior Analyst Today, 8(3), 317.

Mastery learning originally emerged in completely different fields, such as psychology, but programming is a subject where mastery learning can be highly beneficial. There are some fundamental concepts followed by gradually more advanced concepts that build on the basic ones. It becomes difficult to understand the more advanced concepts if one has not mastered the foundational ones. Examples of very basic concepts include variables and operators. If these are not understood, it becomes challenging to move on to, for example, control structures like if-statements and loops, as variables and operators are typically involved there as well. Robins (2010) refers to this phenomenon as "learning edge momentum"—a student who has a good mastery of the basic concepts will have an advantage moving forward, while those who have fallen slightly behind will increasingly fall further behind.

Mastery learning has therefore been used quite a bit in introductory programming teaching in recent years, both in Norway (Purao et al., 2016) and abroad (e.g., Garner et al., 2019). Our approach draws inspiration from these examples but is also different in certain ways.


Purao, S., Sein, M., Nilsen, H., & Larsen, E. Å. (2016). Setting the pace: Experiments with Keller's PSI. IEEE Transactions on Education, 60(2), 97-104.

Garner, J., Denny, P., & Luxton-Reilly, A. (2019, January). Mastery learning in computer science education. In Proceedings of the Twenty-First Australasian Computing Education Conference (pp. 37-46).

The course is divided into 9 modules, named I, H, G, F, E, D, C, B, A – because the grade is directly determined by how many modules the student has completed, as shown in the figure.

Figure showing the different modules. Figure
Figure: Guttorm Sindre/NTNU and Gabrielle Hansen/NTNU

To pass the course, the student must complete the first 5 of the 9 modules, while all 9 modules must be completed to achieve the highest grade. Completing a module involves:

  • Passing an auto-graded test for the module (typically with a passing threshold of 90%).
     
  • Submitting a project. The project is an individual programming project that can be the same program throughout the semester, with the student building on the program incrementally for each letter level. The learning outcomes for the course can be viewed here.

The learning outcomes for the course can be viewed here.

In this setup, knowledge objectives K1, K2, K3, and partially F1 are mostly covered by the automated tests, while objectives F2, F3, and G1 are mainly covered by the project.


Summative tests for each module were held every Friday throughout the semester. Central to the teaching plan were also mandatory seminars. In the fall of 2023, these were held as a double period every Thursday.

The tests are conducted in the exam tool Inspera and are 100% auto-graded. Unfortunately, Inspera does not have a task format where code writing can be auto-graded. Thus, the tasks partly involve understanding program code (figuring out what it will do, whether it is correct or not), partly completing code where something is missing (filling in or choosing what should be inserted in blanks in partially completed code), or ordering lines of code that have been shuffled—a task known internationally in programming education as Parsons problems (Du et al., 2020). Tasks involving code completion and reordering have generally been shown to correlate well with tasks involving code writing (Cheng & Harrington, 2017).

A unique aspect of our setup was the high level of transparency regarding the tests. Students had access to formative practice tests containing the same types of tasks as the summative tests. This allowed students to use them as an active tool for self-feedback. Take a practice test. If you score only 60% and know that you need 90% on the summative test, on which sub-tasks are you losing points, and what programming concepts do you need to understand better? Watch learning videos on precisely these topics, try a new practice test, and so on. However, to prevent such a transparent system from leading to mere memorization of answers, we created many different variations of each sub-task.

Below, we show two specific examples of tasks from the G-test, which was the third test in the series, where the main theme was logical conditions and if-statements. Both tasks existed in about 20 variations that Inspera would randomly select from for each student’s practice test or summative test. The first figure shows a variant of task G-7, which tests the understanding of code, specifically understanding the result of a logical expression that includes the operators and, or, not.

Figure of an example task from G-7. Figure
Figure: Guttorm Sindre/NTNU and Gabrielle Hansen/NTNU

The next figure shows a variant of task G-8, which requires much more algorithmic thinking, as it involves piecing together a functioning program by arranging available code snippets in the correct order and with the correct indentation, since indentation is semantically significant in Python. The learning outcome this task aims to test is whether students have understood if-statements, the difference between when to use nested if-statements or two independent if-statements, and whether they can correctly connect the else part to the appropriate if.


Du, Y., Luxton-Reilly, A., & Denny, P. (2020, February). A review of research on parsons problems. In Proceedings of the Twenty-Second Australasian Computing Education Conference (pp. 195-202).

Cheng, N., & Harrington, B. (2017, March). The Code Mangler: Evaluating coding ability without writing any code. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 123-128).

The project was individual, meaning each student had to write their own program. We also considered making the project a group assignment but thought it could have more disadvantages than advantages, given that each student might have different paces and ambition levels. Each student was free to decide what their program should be about, with one important guideline: the program had to be something that could potentially be used for teaching the student's subject discipline in the STEM teacher program, for grades 8-13 in school.

STEM teacher students choose different specializations from the first semester. There are five possible tracks: Mathematics + Physics, Mathematics + Informatics, Mathematics + Chemistry, Mathematics + Biology, and Chemistry + Biology. For example, a student specializing in Chemistry + Biology could choose from seven different subjects in which the Python program could be used: Science for grade 10, Science for upper secondary school year 1, Science for upper secondary school year 3, Chemistry 1 for upper secondary school, Chemistry 2, Biology 1, and Biology 2. Within these subjects, the student was entirely free to choose the academic topic for the program and the type of learning program it should be (e.g., quiz program, game, tutorial, simulation of natural phenomena, etc.).

Let's say our student decides to create something for Biology 2 in upper secondary school. They could look at a textbook for inspiration, refer to Udir's curriculum, speak with former teachers of the subject, talk to classmates, or other sources. If we base it on Udir's curriculum for Biology 2, there are 11 bullet points with learning objectives, and each of these could be a potential topic for a program. For some, a simulator might fit well, such as point 3 on population ecology, while for others, a program that collects and presents information could be suitable (e.g., data from fieldwork), and for others, a quiz program might be best.

Unlike the tests that train students to solve many small and generic tasks, the project provides the opportunity to create something more creative that is contextually relevant to the student's future profession as a teacher.

To avoid being overwhelmed by grading, we did not have strict qualitative requirements for the project. The requirements for each submission were only that:

  • the program must run without errors
     
  • it must display understandable information on the screen
     
  • it must have made purposeful use of the key concepts covered in the corresponding tests up to and including the current module

For the G-project, this meant that the code had to include logical conditions and if-statements, which were central themes for that module. The requirements for each module were set up in checklists that the student first reviewed and marked themselves, and which the academic staff then looked over.

The seminars were held in an interactive learning space with group tables, each equipped with a medium-sized wall screen. Since students were at different levels, there was no point in presenting material to the whole class. The only seminar that resembled a lecture was the very first one, where the instructor explained and motivated the course design and teaching approach, not to teach programming. For the rest of the semester, seminars typically started with a 5-minute motivational talk from the instructor, after which the students worked in groups.

Some students used the seminar to prepare for the next test. In this case, 5-6 students at the same level, for instance, preparing for the G-test the next day, would bring up a G-practice test on the screen and discuss it task by task. Some tasks were quick, while others (e.g., G-8) required substantial discussion before agreeing on the correct answer and why. Once they submitted their answers, they received automatic feedback and could see if they were correct – and if there was time left, they could try another G-practice test. Other students might be at a different level, preparing for the F-test, while others preferred to work on their programming project during seminar time – these students were grouped according to their project progress.

Since the seminars had an 80% mandatory attendance policy, everyone attended unless sick, which was an important factor in helping students get to know each other in the class.

person-portlet

Gabrielle Hansen
Researcher
gabrielle.hansen@ntnu.no
Guttorm Sindre
Professor
guttorm.sindre@ntnu.no
+47-73594479
+4794430245

Partners

Partners