Background image

LostMa

navigating the currents of culture

Releases

Code

Heurist API

This Python package provides an API wrapper for Heurist as well as a command-line interface (CLI) that Extracts, Transforms, and Loads (ETL) data from a Heurist database server into a local DuckDB database file.

The Python packaged is published on the Python Package Index (PyPI). Documentation is available here.

Heurist API logo

Scrapers for Cultural Heritage sites

Jonas

Scrape metadata about manuscripts and works on the website Jonas and its Répertoire des textes et livres français et occitans (850-1550) database, which is managed by the Institut de Recherche et d'Histoire des Textes (IRHT).

Provide the scraper with the URL of a manuscript (jonas.irht.cnrs.fr/manuscrit/) or work (jonas.irht.cnrs.fr/oeuvre/) and receive relational tables of the manuscrit or work, depending on the URL, and the witnesses related to it.

Catalogue collectif de France, Archives et Manuscrits

Scrape bibliographic metadata from notices in the Catalogue collectif de France (CCfr) and/or the Bibliothèque nationale de France's Archives et Manuscrits catalogue.

Both scrapers are installed with the same Python package and require the URL of the notice to be scraped.

Archives et Manuscrits (search)

A CLI that runs the advanced search feature of the Bibliothèque nationale de France's Archives et Manuscrits website.

Using the department (i.e. Arsenal) and the shelfmark (cote), find the notice for the document in the Archives et Manuscrits catalogue. This tool is particularly useful when combined with the Archives et Manuscrits scraper, which takes the discovered notice URL.

Datasets

ML Models