Ora AI’s Active-Recall Videos Lift QBank Accuracy Up to 4.4 Points on 80,000 Same-Topic Responses.
Ora AI Research Team. A first-attempt QBank-accuracy lift associated with completing topically-linked Ora videos.
Across an analytic sample of 80,238 first-attempt QBank responses, watching the Ora video on a vignette’s topic raised first-attempt accuracy from 61.1% to 63.7%, a headline +2.66-point lift. The lift grows with dose to +4.41 pts after three same-topic videos and with elapsed time to +4.05 pts at 30+ days, consistent with durable retention. It survives a within-student paired analysis (+1.64 pts, 95% CI 0.02–3.26), ruling out “better students watch more videos.” Ora’s videos carry 1,420 mid-video active-recall questions across 97% of the catalog, grounded in the interpolated-testing literature.
accuracy lift on same-topic items
(0 vs 3+ same-topic videos)
after the video watch
across 355 of 366 videos
Monotonic across all four dose levels. 0 vs 3+ gap = +4.41 pts. First-attempt only.
The lift grows with elapsed time, peaking at +4.05 pts at 30+ days. The same-day reversal reflects selection bias toward weak topics.
Restricted to a 117-student paired-analysis subset meeting the ≥10-response threshold in both arms, each student’s own accuracy gap was computed and averaged. Within-student mean lift = +1.64 pts (95% CI: 0.02–3.26, p ≈ 0.049); 58.1% individually scored higher on same-topic items where they had completed a video first. The same student does better on topics where they previously watched a same-topic video than on topics where they did not. The effect direction holds; magnitude shrinks under the stricter design.
What the brief says (and does not say)
This brief reports an associational pattern, not a causal effect: students self-select into videos based on their topic-by-topic strengths and Ora’s scheduler. The within-student paired analysis rules out the most obvious confounder (better students watch more videos and score higher in general), but residual within-student confounding remains: a student may watch a video on a given topic because they have more time or attention available that day, and that same conscientiousness may carry over into the subsequent vignette. The same-day reversal (−4.98 pts vs no-video baseline) is consistent with students preferentially attempting QBank items on topics they just watched because they know they’re weak there, and is reported transparently rather than excluded.
A calibration check straddling the interactive-question layer rollout found the lift was +4.5 pts before and +4.5 pts after, essentially unchanged. We therefore cannot attribute additional lift to the interactive-question layer specifically with the current data; the layer is reported here as the operationalized design (grounded in the testing-effect literature), not as the measured source of the lift.
Method, intervention, and substrate
- Unit. First-attempt vignette response, deduped: each response counted once even when its vignette links to multiple lectures.
- Exposure. At least one completed video watch on a video that shares a parent lecture with the vignette, with completion before the response.
- Within-student check. Per-user accuracy difference between arms, restricted to users with ≥10 responses in each; paired-t inference.
- Interactive recall questions. Four-option multiple-choice prompts that auto-pause playback at predetermined timestamps.
- Coverage. 1,420 questions across 355 of 366 videos (97%); median 4 per video.
- Library. 290 Osmosis (CC BY-SA 4.0); 58 Anatomy (Ora-produced, VOKA visuals); 18 Ora original.
- Video activity. Voluntary video-watch events from the analytic sample; completion timestamped before the linked vignette response.
- Vignette responses. Analytic sample of 80,238 first-attempt responses on vignettes with at least one lecture link.
- Link graph: 19,953 video↔lecture edges × 140,054 vignette↔lecture edges over 7,279 lectures spanning both modalities.
Observational, not randomized; selection bias is the primary threat to interpretation, and the within-student analysis mitigates but does not eliminate it. The link graph is lecture-based (videos and vignettes joined through their shared parent lecture), which is coarser than a direct video↔vignette link; direct cross-modal linking is on the roadmap, so this lecture-mediated signal is the best currently available rather than the cleanest possible one. The same-day reversal (−4.98 pts vs no-video baseline) reflects student topic-selection behavior, not a harmful video effect. The calibration check straddling the interactive-question layer rollout finds no additional lift attributable to that layer specifically; the layer’s mechanism evidence is the cited testing-effect literature, not these data. Per-question response capture is on the instrumentation roadmap and will enable a sharper analysis in a future iteration.
References
- Szpunar KK, Khan NY, Schacter DL. Interpolated memory tests reduce mind wandering and improve learning of online lectures. Proc Natl Acad Sci USA. 2013;110(16):6313–6317. doi:10.1073/pnas.1221764110
- Schacter DL, Szpunar KK. Enhancing attention and memory during video-recorded lectures. Scholarship of Teaching and Learning in Psychology. 2015;1(1):60–71. doi:10.1037/stl0000011
- Roediger HL III, Karpicke JD. The power of testing memory: basic research and implications for educational practice. Perspectives on Psychological Science. 2006;1(3):181–210. doi:10.1111/j.1745-6916.2006.00012.x
- Adesope OO, Trevisan DA, Sundararajan N. Rethinking the use of tests: a meta-analysis of practice testing. Review of Educational Research. 2017;87(3):659–701. doi:10.3102/0034654316689306
- Osmosis × Wiki Project Med Foundation. Videos from Osmosis on Wikimedia Commons, released under Creative Commons Attribution-ShareAlike 4.0 International. commons.wikimedia.org/wiki/Category:Videos_from_Osmosis