Releases
Code
Heurist API
This Python package provides an API wrapper for Heurist as well as a command-line interface (CLI) that Extracts, Transforms, and Loads (ETL) data from a Heurist database server into a local DuckDB database file.
The Python packaged is published on the Python Package Index (PyPI). Documentation is available here.
Scrapers for Cultural Heritage sites
Jonas
Scrape metadata about manuscripts and works on the website Jonas and its Répertoire des textes et livres français et occitans (850-1550) database, which is managed by the Institut de Recherche et d'Histoire des Textes (IRHT).
Provide the scraper with the URL of a manuscript (jonas.irht.cnrs.fr/manuscrit/) or work (jonas.irht.cnrs.fr/oeuvre/) and receive relational tables of the manuscrit or work, depending on the URL, and the witnesses related to it.
https://github.com/LostMa-ERC/JonasScraper
Catalogue collectif de France, Archives et Manuscrits
Scrape bibliographic metadata from notices in the Catalogue collectif de France (CCfr) and/or the Bibliothèque nationale de France's Archives et Manuscrits catalogue.
Both scrapers are installed with the same Python package and require the URL of the notice to be scraped.
https://github.com/LostMa-ERC/french-catalogue-scraper
Archives et Manuscrits (search)
A CLI that runs the advanced search feature of the Bibliothèque nationale de France's Archives et Manuscrits website.
Using the department (i.e. Arsenal) and the shelfmark (cote), find the notice for the document in the Archives et Manuscrits catalogue. This tool is particularly useful when combined with the Archives et Manuscrits scraper, which takes the discovered notice URL.
https://github.com/LostMa-ERC/search-archives-manuscrits
Datasets
ML Models