Background image

LostMa

navigating the currents of culture

Publications

Preprint

On the transmission of texts: written cultures as complex systems

2025-06-03
Jean-Baptiste Camps, Julien Randon-Furling, Ulysse Godreau

Our knowledge of past cultures relies considerably on written material. For centuries, texts have been copied, altered, then transmitted or lost - eventually, from surviving documents, philologists attempt to reconstruct text phylogenies (“stemmata”), and past written cultures. Nonetheless, fundamental questions on the extent of losses, representativeness of surviving artefacts, and the dynamics of text genealogies have remained open since the earliest days of philology. To address these, we radically rethink the study of text transmission through a complexity science approach, integrating stochastic modelling, computer simulations, and data analysis, in a parsimonious mindset akin to statistical physics and evolutionary biology. Thus, we design models that are simple and general, while accounting for diachrony and other key aspects of the dynamical process underlying text phylogenies, such as the extinction of entire branches or trees. On the well-known case study of Medieval French chivalric literature, we find that up to 60% of texts and 99% of manuscripts were lost (consistent with recent synchronic “biodiversity” analyses). We also settle a hundred-year-old controversy on the bifidity of stemmata. Further, our null model suggests that pure chance (“drift”) is not the only mechanism at play, and we provide a theoretical and empirical framework for future investigation.

On the transmission of texts: written cultures as complex systems
Article

Saved in translation? Diversity shared in French and Dutch medieval literature

2026-02-19
Mike Kestemont, Folgert Karsdorp, Jean-Baptiste Camps, Remco Sleiderink, Anne Chao

Over the past millennium, each of the three centuries of most rapid demographic growth in the West coincided with the diffusion of a new communications technology. This paper examines the hypothesis of Harold Innis (1894—1952) that there is two-way feedback between such innovations and economic growth. First, detailed historical evidence is studied. Second, Innis’s ideas are translated into a formal growth model. Finally, the model is simulated and its predictions compared with historical data. The results suggest a technological explanation for the long cycles of the period 1000—1975 and for the puzzling productivity growth slowdown in industrialized countries after 1975.

Saved in translation? Diversity shared in French and Dutch medieval literature
Conference Proceedings

Transmission and Survival of Iberian Patristic Texts (3rd–5th Centuries)

2025-12-11
Émilie Guidi, Théo Moins, Jean-Baptiste Camps

This paper analyses the textual transmission of the Church Fathers from the Iberian Peninsula. The corpus is characterised by formal (prose, verse) and generic (sermons, letters, chronicles, epics) heterogeneity. Our computational analyses reveal two contrasting transmission dynamics: prose texts are more numerous but are transmitted by fewer witnesses, sometimes only via their inclusion in medieval collections. Poetic texts, fewer in number, have generated a higher number of witnesses, likely due to their integration in large literary projects. We model these dynamics using two approaches: probabilistic unseen species models, which estimate an upper bound of text and witness survival rates and indicate low corpus diversity and evenness; and stochastic birth-death models, which explore cultural evolutionary patterns in text and witness populations. Results suggest a text survival rate below 67% (potentially closer to 20%) and a manuscript survival rate below 10% (possibly under 1%). Notably, these estimates diverge from prior findings for Medieval French literature, where unseen species and birth-death models yielded similar results. This discrepancy suggests that diachrony - specifically, the broader chronological range of the patristic corpus - plays a key role in shaping transmission outcomes. Our findings also highlight limitations of the birth-death model, particularly in accounting for highly successful texts and in temporal variations in production/destruction rates.

Transmission and Survival of Iberian Patristic Texts (3rd–5th Centuries)
Conference Proceedings

Why Do Older Books Survive (Sometimes)? Modelling the Time Distribution of Manuscripts with a Birth-Death Approach

2025-12-11
Ulysse Godreau, Théo Moins, Kelly Christensen, Jean-Baptiste Camps

Understanding the survival of ancient manuscripts dating back to different periods and centuries is crucial for gathering insights into historical textual traditions and, more generally, cultural history. Previous studies have modelled the transmission of texts, particularly in manuscript form, as a birth-death process, in which the existing manuscript witnesses of a given text are simultaneously being copied and destroyed at given rates. However, these models have not fully accounted for key properties observed in real historical written traditions, such as the temporal distribution of surviving manuscripts and the heavy-tailed distribution of surviving witnesses by text. In this study, we refine the birth-death process to better explain these dynamics. We investigate the role of extrinsic historical factors on the transmission of texts, through the use of variable copy and destruction rates to reflect extrinsic historical factors, such as fluctuations in the book market or disruptions like wars. Additionally, we look into the effect of intrinsic features of the book themselves and the uses for which they were designed, be it to be stored on a library shelf or to be intensively used and copied. We test those refinements against empirical data collected for medieval traditions. Preliminary results indicate that these enhancements allow us to establish variations in production and destruction rates that align with known macro-level historical dynamics. This revised approach helps explain why we do in fact preserve (some) of the older manuscripts, and not only their most recent descendants, offering a more comprehensive understanding of manuscript survival.

Why Do Older Books Survive (Sometimes)? Modelling the Time Distribution of Manuscripts with a Birth-Death Approach
Communication

One tree to Yule them all? Reflexions on intertextuality and text transmission

2025-07-17
Jean-Baptiste Camps, Kelly Christensen, Ulysse Godreau, Théo Moins

The production of narrative fictions often balances between innovation, where new characters and stories are created, and derivation, where new works are based on existing ones. Medieval chivalric narratives are a typical example, in which many texts are situated inside existing cycles (e.g. the Grail cycle) and build off one another. This work will report on an ongoing experiment in modelling medieval manuscript transmission. Distributions of witnesses per text exhibit a Pareto-like character, for which simple birth-and-death models cannot account. We show that they are superseded by models including speciation events (such as Yule model). An Approximate Bayesian Computation (ABC) algorithm is then used to estimate parameters for Medieval Western European literature based on a corpus of 30,000 texts produced between 1100 and 1500.

One tree to Yule them all? Reflexions on intertextuality and text transmission

In a multilingual database of literary works, users will want to find a story's various versions. Therefore, we must conceptualize the threshold between narrative content (story) and its expression in language. While specially designed for evolving narrative traditions, our solution is grounded in the Functional Requirements for Bibliographic Records model.

Greening your database of literary works: How to avoid reinventing vocabularies, in favor of sustainable, reusable models

Plus qu’une simple poursuite des questionnaires philologiques actuels par d’autres moyens, la philologie computationnelle peut, dans certains contextes, amener des déplacements ou des transformations dans les paradigmes de recherche en sciences des textes. Sans sortir du paradigme évolutionniste qui remonte aux origines de la discipline, la philologie peut ainsi reprendre à son compte des approches méthodologiques articulant démarche de test d’hypothèse, modèles et analyse de données. La massification des données permise par l’intelligence artificielle ouvre des perspectives d’analyse nouvelles, propice aux études macrostructurelles, comparatistes ou de longue durée.

“Bifidité” et évolution: philologie computationnelle des textes en langue d’oïl
Communication

Style in Eight Syllables: Metric Annotation and Stylometry of Chrétien de Troyes and Contemporaries

2025-06-05
Jean-Baptiste Camps, Florian Cafiero, Philippe Chaumet-Riffaud, Damien Conceicao, Ulysse Godreau, Émilie Guidi, Alexandre Lionnet, Théo Moins, Pierre-Alexandre Nistor, Benedetta Salvati

Authorship attribution for medieval texts such as those of Chrétien de Troyes poses unique challenges due to textual transmission, language variation, and limited reference corpora. In this context, it might be useful to draw as much stylistic information as possible from the texts, beyond most common features such as function words. This paper presents an ongoing project to include metrical annotation (with a focus on prosody) in the stylometric analysis of Medieval French, to enhance support vector machine (SVM)-based authorship attribution. The case at hand focuses on the attribution of the works of Chrétien de Troyes and his contemporaries.

Style in Eight Syllables: Metric Annotation and Stylometry of Chrétien de Troyes and Contemporaries

This study focuses on the problem of multilingual medieval text alignment, which presents specific challenges, due to the absence of modern punctuation in the texts and the non-standard forms of medieval languages. In order to perform the alignment of several witnesses from the multilingual tradition of the prose Lancelot, we first develop an automatic text segmenter based on BERT and then align the produced segments using Bertalign. This alignment is then used to produce stemmatological hypotheses, using phylogenetic methods. The aligned sequences are clustered independently by two human annotators and a clustering algorithm (DBScan), and the resulting variant tables submitted to maximum parsimony analysis, in order to produce trees. The trees are then compared and discussed in light of philological knowledge. Results tend to show that automatically clustered sequences can provide results comparable to those of human annotation.

Textual Transmission without Borders: Multiple Multilingual Alignment and Stemmatology of the “Lancelot en prose” (Medieval French, Castilian, Italian)
Communication

The Authorship of the Works of Chrétien de Troyes A Stylometric Examination

2024-06-06
Jean-Baptiste Camps, Benedetta Salvati, Gonzalo Freijedo Aduna, Donghan Bian, Gaëtan Drouet, Eglantine Gaglione, Emilie Guidi, Carolina Macedo, Yaelle Zribi, Florian Cafiero

Numerous controversies have emerged about the exact delimitations of the works written by the hand of Chrétien de Troyes, the founder of French roman and introductor of the theme of the Holy Grail in medieval literature. In this paper, we collect a corpus made of previous digital editions and of texts OCRed and post-corrected by ourselves, to analyse it through authorship attribution techniques. Our results seem to broadly confirm expected results, such as the way Chrétien’s Lancelot was finished by Godefroy de Laigny. But our analyses also add new insights. Our rolling SVM suggest the possibility that the poorly esteemed and very disputed Guillaume d’Angleterre would have been written by multiple authors, indicating a possible completion of passages previously written by Chrétien. They also highlight the fact that Perceval’s ending could have been completed the same way the Lancelot was, by someone who is not Chrétien. Further analyses will be needed to confirm these new findings.

The Authorship of the Works of Chrétien de Troyes A Stylometric Examination
Conference Proceedings

Make Love or War? Monitoring the Thematic Evolution of Medieval French Narratives

2023-12-08
Jean-Baptiste Camps, Nicolas Baumard, Pierre-Carl Langlais, Olivier Morin, Thibault Clérice, Jade Norindr

In this paper, we test a famous conjecture in literary history put forward by Seignobos and de Rougemont according to which the French central medieval period (12-13th centuries) is characterized by an important increase in the cultural importance of love. To do that, we focus on the large and culturally important body of manuscripts containing medieval French long narrative fictions, in particular epics (chansons de geste, of the Matter of France) and romances (chiefly romans on the Matters of Britain and of Rome), both in verse and in prose, from the 12th to the 15th century. We introduce the largest available corpus of these texts, the Corpus of Medieval French Epics and Romances, composed of digitised manuscripts drawn from Gallica, and processed through layout analysis and handwritten text recognition. We then use semantic representations based on embeddings to monitor the place given to love and violence in this corpus, through time. We observe that themes (such as the relation between love and death) and emblematic works well identified by literary history do indeed play a central part in the representation of love in the corpus, but our modelling also points to the characteristic nature of more overlooked works. Variation in time seems to show that there is indeed an phase of expansion of love in these fictions, in the 13th and early 14th century, followed by a period of contraction, that seem to correlate with the Crisis of the Late Middle Ages.

Make Love or War? Monitoring the Thematic Evolution of Medieval French Narratives
Conference Proceedings

Lost Manuscripts and Extinct Texts : A Dynamic Model of Cultural Transmission

2022-10-26
Jean-Baptiste Camps, Julien Randon-Furling

How did written works evolve, disappear or survive down through the ages? In this paper, we propose a unified, formal framework for two fundamental questions in the study of the transmission of texts: how much was lost or preserved from all works of the past, and why do their genealogies (their "phylogenetic trees") present the very peculiar shapes that we observe or, more precisely, reconstruct? We argue here that these questions share similarities to those encountered in evolutionary biology, and can be described in terms of "genetic" drift and "natural" selection. Through agent-based models, we show that such properties as have been observed by philologists since the 1800s can be simulated, and confronted to data gathered for ancient and medieval texts across Europe, in order to obtain plausible estimations of the number of works and manuscripts that existed and were lost.

Lost Manuscripts and Extinct Texts : A Dynamic Model of Cultural Transmission