Medical Students Reach Sustained Topic Mastery in a Median of 10 Encounters on Ora AI's World-First Spaced-Repetition QBank, Validated Across 1.46M Per-Student Topic Records.
Ora AI Research Mastery Trajectories
Research · Mastery Trajectories

Medical Students Reach Sustained Topic Mastery in a Median of 10 Encounters on Ora AI's World-First Spaced-Repetition QBank, Validated Across 1.46M Per-Student Topic Records.

Ora AI Research Team. Topic Mastery Trajectories at Scale.

Mean topic-encounter accuracy rose from 52.5% on first encounter to 59.4% by encounter 10 on Ora's clinical-vignette qbank substrate. The analysis starts from 1,459,436 per-user x topic stats rows, reconstructs 1.21 million anonymized response-topic events, and applies the trajectory model to 8,606 user x topic trajectories meeting the analytic encounter threshold. Among trajectories reaching the default sustained-mastery criterion, the median time-to-mastery was 10 encounters. To our knowledge, this is the first published per-user x topic mastery-trajectory characterization in medical education at this scale.

Drawn from Ora's production database. Descriptive empirical characterization, not a causal estimate of any specific scheduler design. Companion to Q-2, which characterizes retention on the same qbank substrate.
+6.9 ppAccuracy gain
encounter 1 to 10
1.46MPer-user x topic
stats rows
8,606Analytic user x topic
trajectories
10Median encounters
to sustained mastery
Topic mastery across encounters
N = 8,606 trajectories, cluster-bootstrap CI
Aggregate accuracy curve
Mean accuracy at topic-encounter milestones, aggregated across the analytic user x topic sample.
1encounter
5encounters
10encounters
20encounters

95% cluster-bootstrap CI: encounter 1 = 51.4-53.5%; encounter 10 = 58.3-60.3%. Encounter 20 remains above the first encounter but flattens.

Growth by topic family
Coarse topic-root families all show positive encounter 1 to 10 growth; finer named topic clusters remain out of the public chart until cell sizes support disclosure.
Systemsn = 4,023
Specialtiesn = 2,598
Subjectsn = 1,889

Root-category aggregation avoids per-topic sparsity while preserving the directional test of mastery growth.

Interpretation

The finding is not that individual learners improve monotonically on every topic. They do not: topic sequences are noisy, scheduler-selected, and heterogeneous. The contribution is the aggregate empirical characterization: across the analytic sample, repeated topic encounters move accuracy upward in the direction predicted by mastery-learning and knowledge-tracing theory, with a stronger signal in tutor mode than in timed mode.

What this adds

Mastery learning, item-response theory, Bayesian Knowledge Tracing, and Learning Factors Analysis already provide the theoretical vocabulary for skill acquisition and learner modeling.1234 What has been sparse in medical education is large-scale per-user x per-topic empirical trajectory data on clinical-knowledge questions. Ora's qbank substrate makes that characterization observable: every submitted clinical vignette response can be attributed to the topic graph, ordered within a learner-topic sequence, and summarized without exposing users, schools, or vignette text.

Method

Substrate
  • Tables. Vignette response, variant, and topic-attribution tables in Ora's production database.
  • Scale. 201,205 submitted responses; 1.21M response-topic events after topic attribution.
  • Privacy. No user IDs, schools, item text, or response IDs in public artifacts.
Trajectory
  • Unit. User x depth-2 topic cluster with at least 10 encounters.
  • Attribution. Concept-level vignette joins; duplicate leaf topics collapsed to one depth-2 cluster per response.
  • Uncertainty. 1,000-iteration cluster bootstrap over user x topic trajectories.
Checks
  • Pilot. Stratified 2,000-trajectory pilot matched the full-sample curve.
  • Sensitivity. First-attempt-only, tutor/timed mode, time-window, and attribution checks run.
  • Mastery. Default criterion: 80% rolling-window accuracy, window 5, sustained for 3 windows.
Limitations

This is descriptive, not causal: the empirical curve reflects both learner progress and scheduler-driven encounter selection. The analytic threshold excludes low-encounter user x topic starts, so the trajectory claim applies to the engaged analytic sample, not every topic exposure. Multiple-choice accuracy is a recognition measure, not the production-task mastery measure used in much of the classical literature. Multi-topic attribution matters because one vignette can map to several topic clusters; fractional attribution preserved the same directional pattern, but a strict single-cluster approximation was too small for the public headline. Named depth-2 topic clusters did not meet the public min-N rule, so this brief reports aggregate and root-category results only.

References

  1. Bloom BS. The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher. 1984;13(6):4-16. doi:10.3102/0013189X013006004
  2. Kulik C-LC, Kulik JA, Bangert-Drowns RL. Effectiveness of Mastery Learning Programs: A Meta-Analysis. Review of Educational Research. 1990;60(2):265-299. doi:10.3102/00346543060002265
  3. Corbett AT, Anderson JR. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction. 1994;4:253-278. doi:10.1007/BF01099821
  4. Cen H, Koedinger KR, Junker B. Learning Factors Analysis: A general method for cognitive model evaluation and improvement. Lecture Notes in Computer Science. 2006;4053:164-175. doi:10.1007/11774303_17
  5. Larsen DP, Butler AC, Roediger HL III. Test-enhanced learning in medical education. Medical Education. 2008;42(10):959-966. doi:10.1111/j.1365-2923.2008.03124.x