Skip to content

basedb

Classes:

  • HeuristDatabase

    Base class for loading the original Heurist database structure.

HeuristDatabase

HeuristDatabase(
    hml_xml: bytes,
    conn: DuckDBPyConnection | None = None,
    db: str = ":memory:",
)

Base class for loading the original Heurist database structure.

Create a DuckDB database connection and populate the DuckDB database with the 5 base tables that comprise the Heurist database structure.

Parameters:

  • hml_xml

    (bytes) –

    Heurist database structure exported in XML format.

  • conn

    (DuckDBPyConnection | None, default: None ) –

    A DuckDB database connection. Defaults to None.

  • db

    (str, default: ':memory:' ) –

    Path to the DuckDB database. Defaults to ":memory:".

Methods:

  • create

    Create an empty table in the DuckDB database connection

  • delete_existing_table

    If the table already exists in the DuckDB database, drop it.

  • describe_record_schema

    Join the tables 'dty' (detail), 'rst' (record structure), 'rty' (record type)

  • trim_xml_bytes

    Remove any extra whitespace from a potentially malformatted XML.

create

create(name: str, model: BaseXmlModel) -> None

Create an empty table in the DuckDB database connection based on a Pydantic model.

Examples:

>>> # Set up the database class and parse a table model.
>>> from heurist.mock_data import DB_STRUCTURE_XML
>>> db = HeuristDatabase(hml_xml=DB_STRUCTURE_XML)
>>> model = db.hml.RecTypeGroups.rtg
>>>
>>> # Create a table for the Record Type Group (rtg) table model.
>>> db.create(name="rtg", model=model)
>>> shape = db.conn.table("rtg").fetchall()
>>> # The Record Type Group (rtg) table should have 11 columns.
>>> len(shape)
11

Parameters:

  • model

    (BaseXmlModel) –

    A Pydantic XML model.

delete_existing_table

delete_existing_table(table_name: str) -> None

If the table already exists in the DuckDB database, drop it.

Parameters:

  • table_name

    (str) –

    Name of the table to potentially drop.

describe_record_schema

describe_record_schema(rty_ID: int) -> DuckDBPyRelation

Join the tables 'dty' (detail), 'rst' (record structure), 'rty' (record type) to get all the relevant information for a specific record type, plus add the label and description of the section / separator associated with each detail (if any).

Parameters:

  • rty_ID

    (int) –

    ID of the targeted record type.

Returns:

  • DuckDBPyRelation ( DuckDBPyRelation ) –

    A DuckDB Python relation that can be queried or converted.

trim_xml_bytes classmethod

trim_xml_bytes(xml: bytes) -> bytes

Remove any extra whitespace from a potentially malformatted XML.

Parameters:

  • xml

    (bytes) –

    Heurist database structure exported XML format.

Returns:

  • bytes ( bytes ) –

    Validated Heurist database structure in XML format.