05 / 2026

After the Encyclopedia

A subject can keep its name while losing its shape.

starting pointNature, 2005

subjects42

current audit806 articles

language set20 editions

linked entities132,365 rows

After the Encyclopedia multilingual article atlas

In 2005, Nature compared Wikipedia with Encyclopaedia Britannica. The comparison became a historical threshold. Wikipedia could be read beside one of the great reference works of print culture, and encyclopedic authority moved into a public, editable, networked form.

After the Encyclopedia begins from that threshold and follows what happens when the problem grows across language editions.

The project takes the forty-two scientific subjects from the Nature comparison and follows them through Wikipedia. Each subject is anchored through Wikidata, so the comparison begins from a shared cross-lingual object. From there, the article field starts to move.

One version expands the historical frame. Another compresses the subject into a technical note. Another follows medical consequences, political memory, local examples, biography, taxonomy, dates, citations or surrounding concepts. The interface keeps the subject aligned. The article body may tell another story.

Each article is treated as a local version of public knowledge, made from text, links, omissions, emphasis and surrounding entities. The project measures how long a subject remains the same when it is maintained across different languages, scripts, communities and editorial histories.

method

The corpus starts with the forty-two scientific subjects reviewed by Nature in 2005. Each subject is connected to its current Wikidata item, giving the audit a stable cross-lingual anchor before the language editions are compared.

Each local article is analysed as text and as an entity environment. The text layer shows how the subject is described. The QID layer shows which linked objects travel with it.

fixed topic set

Use the forty-two scientific subjects from the original Nature comparison as the starting corpus.

wikidata anchor

Connect each subject to its current Wikidata item, so every language edition points back to one cross-lingual object.

local editions

Collect the available Wikipedia versions and treat every language edition as the article actually received by its reader.

text layer

Extract and clean the article body, then place each version in a comparable multilingual embedding space.

language baseline

Calculate a corpus baseline for each language, separating general language position from topic-specific movement.

entity layer

Resolve article links into Wikidata QIDs and measure which entities pull each version away from the topic centre.

topic displacement

Measure how far each local article moves from the shared topic environment after language-level correction.

case audit

Read the strongest outliers manually, separating meaningful drift from sparse evidence, identifiers, dates and homonym noise.

measurement logic

topic identity = shared Wikidata anchor

article object = local text + local linked entity field

topic displacement = local article environment − topic home environment

case strength = distance + evidence volume + interpretable drivers − noise risk

interpretation rule

A shared anchor gives the comparison a formal starting point. The audit then asks what the reader actually receives in each language edition: which claims, links, examples, emphases and omissions make up the local article.

The current QID audit covers 806 article versions across 20 language editions. The broader project remains organised around the full multilingual reach of Wikipedia and the forty-two Nature subjects.

current finding

The first audit shows that several subjects remain formally attached to the same Wikidata topic while developing sharply different linked-entity environments across languages.

The difference appears as domain drift, historical framing, compression, local emphasis, homonym contamination or metadata noise.

script layer

The audit treats written systems as part of the knowledge problem. Latin, Cyrillic, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean and Thai scripts organise article surfaces differently.

The comparison therefore reads article bodies and linked entities together, instead of forcing every edition through English labels first.

evidence rule

A claim counts through the edition where it appears. A Dutch reader receives the Dutch article, an Arabic reader receives the Arabic article, and a Chinese reader receives the Chinese article.

The method follows the local evidence field before translating it into a shared comparison.

output

cross-lingual discrepancy ledger
editorial essay and publication pitch
large-scale data visualisations
research archive
installation wall

current QID audit / language x subject

article objects as evidence strips

black = linked entity volume
signal = high displacement case
void = structurally thin evidence

Each strip is one article version in the current QID audit. Length is evidence volume, not proof by itself.

42 subjects / 806 articles / 132,365 QID rows

agent orange

The shared anchor points to the herbicide used during the Vietnam War. The article field changes by language.

One version behaves as chemistry and toxicology. Another becomes war history, public health, environmental residue or homonym noise.

case 01 / topic identity drift

neural network

The machine-learning topic can be pulled back toward biology.

In one article environment, artificial intelligence travels with neuron, brain, cortex, synapse and nervous system.

case 02 / metaphor as structure

mendeleev

The old accuracy dispute returns as a question of article identity.

Mendeleev appears as biography, chemistry history, periodic-table node, Russian science or an environment of elements.

case 03 / after 2005

punctuated equilibrium

The strongest early outlier is pulled toward equilibrium as a general field.

Evolutionary theory begins to travel with stability, balance, Nash equilibrium, chemical equilibrium and system dynamics.

case 04 / term drift

west nile

The virus and the disease field require a careful identity audit.

The data shows how taxonomy, symptom, vector, geography and public-health framing can split a subject into neighbouring objects.

case 05 / anchor discipline

publication

The next step is a public discrepancy ledger: cases, distances, drivers, evidence volume and noise risk.

The essay can then move from the 2005 reliability debate toward topic identity at multilingual scale.

case 06 / publication track

Wikipedia gives shared subjects a common interface. The articles reveal how many public realities that interface has to hold.

After the Encyclopedia language divergence detail

After the Encyclopedia article comparison detail

After the Encyclopedia knowledge map detail

next project Cleaning Service

→