Python API Client

Before using the heurist module, review the instructions on how to install the Python package.

Demos

Explore Jupyter notebooks in the GitHub repository's folder demos/.

Coding with the ETL workflow

You can integrate a Heurist database's tables into your Python application with just 3 lines of code! (But 1 line will be really long, so we break it up for readability.)

1. Create a DuckDB connection

Create a connection to a DuckDB database.

import duckdb

# Line 1 : DuckDB connection
conn = duckdb.connect()

To preserve the data that heurist extracts, transforms, and loads into the database, give the connect() method a path to a file, i.e. duckdb.connect("heurist.db"). If no argument is provided, duckdb.connect() creates an in-memory database connection.

2. Open a `HeuristAPIConnection`

To optimise your connection to the Heurist database, we will log in only one time and keep that connection live as a Python context.

Use your login credentials to create a HeuristAPIConnection context, which returns a client.

from heurist.api.connection import HeuristAPIConnection

# Line 2 (broken for readability) : API Connection
with HeuristAPIConnection(
    db = HEURIST_DATABASE,
    login = HEURIST_LOGIN,
    password = HEURIST_PASSWORD
) as client:
    _ # ... see below code block

3. Run the ETL process

from heurist.workflows import extract_transform_load

# Line 2 (broken for readability) : API Connection
with HeuristAPIConnection(
    # ... see above code block
) as client:
    # Line 3 : ETL process
    extract_transform_load(client = client, duckdb_connection = conn)

To load all your custom records, meaning all those of types in your "My record types" group, run the extract_transform_load() function with all its defaults, simply providing your (1) API connection and (2) DuckDB connection.

To explore all of the extract_transform_load function's parameters, see the source code.

Summary

See all 3 lines of code in action below:

import duckdb
from heurist.api.connection import HeuristAPIConnection
from heurist.workflows import extract_transform_load

# Line 1 : DuckDB connection
conn = duckdb.connect()

# Line 2 (broken for readbility) : API connection
with HeuristAPIConnection(
    db = HEURIST_DATABASE,
    login = HEURIST_LOGIN,
    password = HEURIST_PASSWORD
) as client:
    # Line 3 : ETL process
    extract_transform_load(client = client, duckdb_connection = conn)

Multiple record type groups

If you want to recover records of types in multiple record type groups, list all of them in the record_group_names parameter.

extract_transform_load(
    client = client,
    duckdb_connection = conn,
    record_group_names = ("My record types", "Place, features")
)

Check your data's validation

Running the extract_transform_load function causes log files to be generated.

Read the ./validation.log file to review all the records in the Heurist database that did not pass the data validation and were not loaded into the DuckDB database. For more information, see the section on logs.

DuckDB Python as a pivot format

Having loaded the Heurist records into a DuckDB database, you can begin taking advantage of DuckDB's Python client.

Heurist DuckDB -> to something else

Pandas dataframe

DuckDB can convert all of its relations (tables, query results, etc.) into pandas dataframes with df(). Because many data science techniques and methods are used to pandas, this is a very useful way to interact with the Heurist data and one of the reasons the heurist package uses DuckDB as a pivot format.

Convert a Heurist table, which has been loaded into the DuckDB database, into a pandas dataframe.

>>> conn.table("Witness").df()
      H-ID  type_id  ...  review_status TRM-ID             review_note
0    47500      105  ...                  9697                  Check.
1    47897      105  ...                  9697                  Check.
2    47756      105  ...                  9697                  Check.
3    47978      105  ...                  9697                  Check.
4    47552      105  ...                  9697                  Check.
..     ...      ...  ...                   ...                     ...
168  48051      105  ...                  9697  Kienhorst zegt in BNM?
169  47458      105  ...                  9697                    Ref.
170  47454      105  ...                  9697                    Ref.
171  47433      105  ...                  9697             Check, ref.
172  47420      105  ...                  9697           Check, ghost.

[173 rows x 36 columns]

Or convert SQL query results into a pandas dataframe.

>>> conn.sql('SELECT * FROM Witness w JOIN TextTable t ON w."is_manifestation_of H-ID" = t."H-ID"').df()
      H-ID  type_id  ...  review_status TRM-ID_1 review_note_1
0    47792      105  ...                  9697.0        Check.
1    48339      105  ...                  9697.0         Check
2    47906      105  ...                  9697.0        Check.
3    48356      105  ...                  9697.0         Check
4    47912      105  ...                  9697.0        Check.
..     ...      ...  ...                     ...           ...
168  47742      105  ...                  9697.0        Check.
169  47738      105  ...                  9697.0        Check.
170  47725      105  ...                  9697.0        Check.
171  47734      105  ...                  9697.0        Check.
172  47637      105  ...                  9697.0        Check.

[173 rows x 89 columns]

Parquet, CSV files

DuckDB's Python client also lets you export relations into modern file formats commonly used in data science, notably CSV and parquet.

# Execute some SQL query on your database,
# storing the result in a relation object
rel = conn.sql("SELECT * FROM TextTable LIMIT 10")

# Write the result to a csv file
rel.write_csv("my_results.csv")

# Write the result to a parquet file
rel.write_parquet("my_results.parquet")

Something else -> Heurist DuckDB

DuckDB's Python client can also ingest data from other sources and Python objects. This is useful if you want to enrich / complement your Heurist data with information not available in your Heurist database.

Read about DuckDB's Python data ingestion.

Other data files

You can create new tables or views in your database by reading datasets from external files.

conn.execute("CREATE TABLE OtherDataset AS SELECT * FROM read_csv('other_dataset.csv')")

conn.execute("CREATE TABLE OtherDataset AS SELECT * FROM read_parquet('other_dataset.parquet')")

conn.execute("CREATE TABLE OtherDataset AS SELECT * FROM read_json('other_dataset.json')")

Then, from within your database connection, you can join your Heurist data with the external dataset.

sql = """
SELECT *
FROM MyHeuristRecord h
LEFT JOIN OtherDataset o
    ON h.url = o.url
"""
rel = conn.sql(sql)

Other dataframes

In your DuckDB database connection, you can also select dataframes you've created with pandas and join them with your Heurist data.

import pandas as pd

new_dataframe = pd.DataFrame.from_dict({
    "insee_2025_population_total_all": [76_452, 1_663_974],
    "insee_2025_population_total_women": [38_173, 868_376],
    "name": ["Lozère", "Hautes-de-Seine"]
})
rel = conn.sql("""
SELECT *
FROM new_dataframe df
JOIN MyHeuristRecord h
    ON df.name = h.department
""")