Ora AI Scaffolds Medical Students using 3.2 Million Context-Dependent Hyper-Specific Micro-Explanation Atoms.

Ora AI Research Content Architecture

Research · Content Architecture

Ora AI Scaffolds Medical Students using 3.2 Million Context-Dependent Hyper-Specific Micro-Explanation Atoms.

Ora AI Research Team. Corpus-structural characterization.

Ora's study content is built on a concept-keyed explanation graph: 3,235,292 short, context-specific explanation atoms covering 338,114 distinct medical concepts across the platform's vignettes, flashcards, and library articles. Each atom explains one concept for one surface, so coherence lives at the concept layer rather than in shared text: a concept like macrophages gets a tailored explanation wherever it appears. Across the corpus, 84,165 concepts (24.9%) are explained on two or more of the three modalities and 36,034 (10.7%) on all three. To our knowledge this is the first published corpus-level characterization of a modular explanation-atom architecture at multi-million-atom scale in medical education.

Drawn from Ora's production content database. Aggregate corpus-structural analysis on the full population of atoms and links; no user data involved. Cross-modal reuse is measured at the concept level (shared concept vocabulary); atoms are single-surface by design (see Method).

3.24M Context-specific
explanation atoms

338,114 Distinct medical
concepts covered

24.9% Concepts taught
across ≥2 modalities

10.7% Concepts taught
across all three

How widely concepts travel across the graph

338,114 concepts · full corpus

Concepts by number of content modalities

Each concept counted once, by how many of the three modalities carry an atom for it. Concepts in ≥2 modalities are the coherence layer.

1 modalityInstance-
specific

2 modalities

3 modalitiesFully
cross-modal

84,165 concepts (24.9%) span ≥2 modalities; 36,034 (10.7%) span all three. Across the 337,702 concepts with at least one surface link.

Cross-modal concepts skew foundational

Share of concepts carrying a foundational-tier atom, by how many modalities the concept spans.

1 mod.

2 mod.

3 mod.

Foundational-tier atoms are 30.6% of the corpus. The monotonic rise is the signature of a two-tier graph.

Atoms per content item

Median explanation atoms per vignette, article section, and flashcard. Coverage is near-complete on every surface.

VignetteIQR 37–53

ArticleIQR 19–37

FlashcardIQR 11–17

Medians across surface-linked items; coverage 98% / 99% / 93%. Tight interquartile ranges indicate uniform, dense coverage.

The architecture in one line

Ora's explanation graph reuses knowledge at the concept layer, not the row: every atom is written for one surface, while a shared vocabulary of 338,114 concepts ties the surfaces together. The more modalities a concept spans, the more likely it is foundational (31% → 51% → 78%), producing a two-tier graph: a foundational core recurring consistently across vignettes, flashcards, and articles, plus a long tail of instance-specific atoms on single surfaces.

The atom-graph architecture

An explanation atom is Ora's smallest content unit: a titled medical concept paired with a short explanation (~200 characters) and a set of associated terms. Atoms are not generic glossary entries pasted everywhere; each is written for the surface it sits on. The concept macrophages, for instance, is rendered three ways depending on context:

On a vignette

Macrophages

Large tissue-resident phagocytes derived from monocytes; they engulf pathogens and debris and act as professional antigen-presenting cells.

On a flashcard

Macrophages

In the liver, resident macrophages are Kupffer cells; during hepatitis they activate to phagocytose debris and apoptotic (Councilman) bodies.

In an article

Macrophages

Phagocytic cells that arrive later to clear up cellular debris.

Same concept, three context-tailored explanations. This is a deliberate design choice, not an absence of reuse: reusing one maximal explanation everywhere would force a single phrasing onto contexts that need different emphasis. Instead the graph keeps each explanation focused (consistent with cognitive-load-managed instructional design² and the knowledge-component framing from learning-engineering research¹) and enforces coherence through the shared concept vocabulary rather than shared text. It is one organizational strategy among several: textbooks and review articles optimize for editorial voice, depth, and narrative arc; the atom graph optimizes for cross-modal coherence and citation-level traceability, the substrate Ora's content and AI features point back to.

Method

Corpus

Population. All 3,235,292 atoms and ~3.2M surface links in the production content database; full population, no sampling.
Atom. A titled concept + short explanation + term list, in one of two scope tiers.
Surfaces. Vignettes, flashcards, and library-article sections; each atom links to exactly one.

Measurement

Concept. The normalized atom title; 338,114 distinct concepts across the corpus.
Modality presence. A concept is present in a modality if any of its atoms link there.
Cross-modal. A concept present in ≥2 modalities (atom-weighted alternative in the log).

Scope & comparators

Descriptive. Full-population characterization; no inferential test, no outcome or causation claim.
Coverage. Corpus spans all 18 organ-system topics; concepts are clean medical terms, not labels.
No product benchmark. Adjacent products (UpToDate, DynaMed, AMBOSS) publish no structural details; this is a standalone baseline.

Limitations

Concept identity is approximated by the normalized atom title, so genuine synonyms phrased differently count as separate concepts, making the 24.9% cross-modal share a conservative floor. The share depends on how “present in a modality” is defined: requiring at least three atoms per modality lowers it to ~9%; the headline uses a one-atom threshold. About 0.3% of atoms link to no surface and are excluded from coverage figures. The analysis characterizes content structure only; it makes no claim that atom density or cross-modal coverage improves learning outcomes (which would require a controlled design) and does not evaluate atom accuracy.

References

Koedinger KR, Corbett AT, Perfetti C. The Knowledge-Learning-Instruction framework: bridging the science-practice chasm to enhance robust student learning. Cognitive Science. 2012;36(5):757–798. doi:10.1111/j.1551-6709.2012.01245.x
Sweller J, van Merriënboer JJG, Paas FGWC. Cognitive architecture and instructional design. Educational Psychology Review. 1998;10(3):251–296. doi:10.1023/A:1022193728205
Wiley D, Bliss TJ, McEwen M. Open educational resources: a review of the literature. In: Handbook of Research on Educational Communications and Technology. Springer; 2014:781–789. doi:10.1007/978-1-4614-3185-5_63
Mayer RE. Multimedia Learning. 3rd ed. Cambridge University Press; 2020.
Liaison Committee on Medical Education. Functions and Structure of a Medical School, 2024–25. lcme.org/publications