Medical Students Reach Sustained Topic Mastery in a Median of 10 Encounters on Ora AI's World-First Spaced-Repetition QBank, Validated Across 1.46M Per-Student Topic Records.

Ora AI Research Mastery Trajectories

Research · Mastery Trajectories

Medical Students Reach Sustained Topic Mastery in a Median of 10 Encounters on Ora AI's World-First Spaced-Repetition QBank, Validated Across 1.46M Per-Student Topic Records.

Ora AI Research Team. Topic Mastery Trajectories at Scale.

Mean topic-encounter accuracy rose from 52.5% on first encounter to 59.4% by encounter 10 on Ora's clinical-vignette qbank substrate. The analysis starts from 1,459,436 per-user x topic stats rows, reconstructs 1.21 million anonymized response-topic events, and applies the trajectory model to 8,606 user x topic trajectories meeting the analytic encounter threshold. Among trajectories reaching the default sustained-mastery criterion, the median time-to-mastery was 10 encounters. To our knowledge, this is the first published per-user x topic mastery-trajectory characterization in medical education at this scale.

Drawn from Ora's production database. Descriptive empirical characterization, not a causal estimate of any specific scheduler design. Companion to Q-2, which characterizes retention on the same qbank substrate.

+6.9 ppAccuracy gain
encounter 1 to 10

1.46MPer-user x topic
stats rows

8,606Analytic user x topic
trajectories

10Median encounters
to sustained mastery

Topic mastery across encounters

N = 8,606 trajectories, cluster-bootstrap CI

Aggregate accuracy curve

Mean accuracy at topic-encounter milestones, aggregated across the analytic user x topic sample.

1encounter

5encounters

10encounters

20encounters

95% cluster-bootstrap CI: encounter 1 = 51.4-53.5%; encounter 10 = 58.3-60.3%. Encounter 20 remains above the first encounter but flattens.

Growth by topic family

Coarse topic-root families all show positive encounter 1 to 10 growth; finer named topic clusters remain out of the public chart until cell sizes support disclosure.

Systemsn = 4,023

Specialtiesn = 2,598

Subjectsn = 1,889

Root-category aggregation avoids per-topic sparsity while preserving the directional test of mastery growth.

Interpretation

The finding is not that individual learners improve monotonically on every topic. They do not: topic sequences are noisy, scheduler-selected, and heterogeneous. The contribution is the aggregate empirical characterization: across the analytic sample, repeated topic encounters move accuracy upward in the direction predicted by mastery-learning and knowledge-tracing theory, with a stronger signal in tutor mode than in timed mode.

What this adds

Mastery learning, item-response theory, Bayesian Knowledge Tracing, and Learning Factors Analysis already provide the theoretical vocabulary for skill acquisition and learner modeling.¹²³⁴ What has been sparse in medical education is large-scale per-user x per-topic empirical trajectory data on clinical-knowledge questions. Ora's qbank substrate makes that characterization observable: every submitted clinical vignette response can be attributed to the topic graph, ordered within a learner-topic sequence, and summarized without exposing users, schools, or vignette text.

Method

Substrate

Tables. Vignette response, variant, and topic-attribution tables in Ora's production database.
Scale. 201,205 submitted responses; 1.21M response-topic events after topic attribution.
Privacy. No user IDs, schools, item text, or response IDs in public artifacts.

Trajectory

Unit. User x depth-2 topic cluster with at least 10 encounters.
Attribution. Concept-level vignette joins; duplicate leaf topics collapsed to one depth-2 cluster per response.
Uncertainty. 1,000-iteration cluster bootstrap over user x topic trajectories.

Checks

Pilot. Stratified 2,000-trajectory pilot matched the full-sample curve.
Sensitivity. First-attempt-only, tutor/timed mode, time-window, and attribution checks run.
Mastery. Default criterion: 80% rolling-window accuracy, window 5, sustained for 3 windows.

Limitations

This is descriptive, not causal: the empirical curve reflects both learner progress and scheduler-driven encounter selection. The analytic threshold excludes low-encounter user x topic starts, so the trajectory claim applies to the engaged analytic sample, not every topic exposure. Multiple-choice accuracy is a recognition measure, not the production-task mastery measure used in much of the classical literature. Multi-topic attribution matters because one vignette can map to several topic clusters; fractional attribution preserved the same directional pattern, but a strict single-cluster approximation was too small for the public headline. Named depth-2 topic clusters did not meet the public min-N rule, so this brief reports aggregate and root-category results only.

References

Bloom BS. The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher. 1984;13(6):4-16. doi:10.3102/0013189X013006004
Kulik C-LC, Kulik JA, Bangert-Drowns RL. Effectiveness of Mastery Learning Programs: A Meta-Analysis. Review of Educational Research. 1990;60(2):265-299. doi:10.3102/00346543060002265
Corbett AT, Anderson JR. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction. 1994;4:253-278. doi:10.1007/BF01099821
Cen H, Koedinger KR, Junker B. Learning Factors Analysis: A general method for cognitive model evaluation and improvement. Lecture Notes in Computer Science. 2006;4053:164-175. doi:10.1007/11774303_17
Larsen DP, Butler AC, Roediger HL III. Test-enhanced learning in medical education. Medical Education. 2008;42(10):959-966. doi:10.1111/j.1365-2923.2008.03124.x