basedb
Classes:
-
HeuristDatabase
–Base class for loading the original Heurist database structure.
HeuristDatabase
Base class for loading the original Heurist database structure.
Create a DuckDB database connection and populate the DuckDB database with the 5 base tables that comprise the Heurist database structure.
Parameters:
-
hml_xml
bytes
) –Heurist database structure exported in XML format.
-
conn
DuckDBPyConnection | None
, default:None
) –A DuckDB database connection. Defaults to None.
-
db
str
, default:':memory:'
) –Path to the DuckDB database. Defaults to ":memory:".
Methods:
-
create
–Create an empty table in the DuckDB database connection
-
delete_existing_table
–If the table already exists in the DuckDB database, drop it.
-
describe_record_schema
–Join the tables 'dty' (detail), 'rst' (record structure), 'rty' (record type)
-
trim_xml_bytes
–Remove any extra whitespace from a potentially malformatted XML.
create
create(name: str, model: BaseXmlModel) -> None
Create an empty table in the DuckDB database connection based on a Pydantic model.
Examples:
>>> # Set up the database class and parse a table model.
>>> from heurist.mock_data import DB_STRUCTURE_XML
>>> db = HeuristDatabase(hml_xml=DB_STRUCTURE_XML)
>>> model = db.hml.RecTypeGroups.rtg
>>>
>>> # Create a table for the Record Type Group (rtg) table model.
>>> db.create(name="rtg", model=model)
>>> shape = db.conn.table("rtg").fetchall()
>>> # The Record Type Group (rtg) table should have 11 columns.
>>> len(shape)
11
Parameters:
-
model
BaseXmlModel
) –A Pydantic XML model.
delete_existing_table
delete_existing_table(table_name: str) -> None
If the table already exists in the DuckDB database, drop it.
Parameters:
-
table_name
str
) –Name of the table to potentially drop.
describe_record_schema
describe_record_schema(rty_ID: int) -> DuckDBPyRelation
Join the tables 'dty' (detail), 'rst' (record structure), 'rty' (record type) to get all the relevant information for a specific record type, plus add the label and description of the section / separator associated with each detail (if any).
Parameters:
-
rty_ID
int
) –ID of the targeted record type.
Returns:
-
DuckDBPyRelation
(DuckDBPyRelation
) –A DuckDB Python relation that can be queried or converted.
trim_xml_bytes
classmethod
trim_xml_bytes(xml: bytes) -> bytes
Remove any extra whitespace from a potentially malformatted XML.
Parameters:
-
xml
bytes
) –Heurist database structure exported XML format.
Returns:
-
bytes
(bytes
) –Validated Heurist database structure in XML format.