In 2005, Nature compared Wikipedia with Encyclopaedia Britannica. The comparison became a historical threshold. Wikipedia could be read beside one of the great reference works of print culture, and encyclopedic authority moved into a public, editable, networked form.
After the Encyclopedia begins from that threshold and follows what happens when the problem grows across language editions.
The project takes the forty-two scientific subjects from the Nature comparison and follows them through Wikipedia. Each subject is anchored through Wikidata, so the comparison begins from a shared cross-lingual object. From there, the article field starts to move.
One version expands the historical frame. Another compresses the subject into a technical note. Another follows medical consequences, political memory, local examples, biography, taxonomy, dates, citations or surrounding concepts. The interface keeps the subject aligned. The article body may tell another story.
Each article is treated as a local version of public knowledge, made from text, links, omissions, emphasis and surrounding entities. The project measures how long a subject remains the same when it is maintained across different languages, scripts, communities and editorial histories.
method
The corpus starts with the forty-two scientific subjects reviewed by Nature in 2005. Each subject is connected to its current Wikidata item, giving the audit a stable cross-lingual anchor before the language editions are compared.
Each local article is analysed as text and as an entity environment. The text layer shows how the subject is described. The QID layer shows which linked objects travel with it.
fixed topic set
Use the forty-two scientific subjects from the original Nature comparison as the starting corpus.
wikidata anchor
Connect each subject to its current Wikidata item, so every language edition points back to one cross-lingual object.
local editions
Collect the available Wikipedia versions and treat every language edition as the article actually received by its reader.
text layer
Extract and clean the article body, then place each version in a comparable multilingual embedding space.
language baseline
Calculate a corpus baseline for each language, separating general language position from topic-specific movement.
entity layer
Resolve article links into Wikidata QIDs and measure which entities pull each version away from the topic centre.
topic displacement
Measure how far each local article moves from the shared topic environment after language-level correction.
case audit
Read the strongest outliers manually, separating meaningful drift from sparse evidence, identifiers, dates and homonym noise.
measurement logic
topic identity = shared Wikidata anchor
article object = local text + local linked entity field
topic displacement = local article environment − topic home environment
case strength = distance + evidence volume + interpretable drivers − noise risk
interpretation rule
A shared anchor gives the comparison a formal starting point. The audit then asks what the reader actually receives in each language edition: which claims, links, examples, emphases and omissions make up the local article.
The current QID audit covers 806 article versions across 20 language editions. The broader project remains organised around the full multilingual reach of Wikipedia and the forty-two Nature subjects.
current finding
The first audit shows that several subjects remain formally attached to the same Wikidata topic while developing sharply different linked-entity environments across languages.
The difference appears as domain drift, historical framing, compression, local emphasis, homonym contamination or metadata noise.
script layer
The audit treats written systems as part of the knowledge problem. Latin, Cyrillic, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean and Thai scripts organise article surfaces differently.
The comparison therefore reads article bodies and linked entities together, instead of forcing every edition through English labels first.
evidence rule
A claim counts through the edition where it appears. A Dutch reader receives the Dutch article, an Arabic reader receives the Arabic article, and a Chinese reader receives the Chinese article.
The method follows the local evidence field before translating it into a shared comparison.
output
- cross-lingual discrepancy ledger
- editorial essay and publication pitch
- large-scale data visualisations
- research archive
- installation wall