basedb
Classes:
-
HeuristDatabase–Base class for loading the original Heurist database structure.
HeuristDatabase
Base class for loading the original Heurist database structure.
Create a DuckDB database connection and populate the DuckDB database with the 5 base tables that comprise the Heurist database structure.
Parameters:
-
(hml_xmlbytes) –Heurist database structure exported in XML format.
-
(connDuckDBPyConnection | None, default:None) –A DuckDB database connection. Defaults to None.
-
(dbstr, default:':memory:') –Path to the DuckDB database. Defaults to ":memory:".
Methods:
-
create–Create an empty table in the DuckDB database connection
-
delete_existing_table–If the table already exists in the DuckDB database, drop it.
-
describe_record_schema–Join the tables 'dty' (detail), 'rst' (record structure), 'rty' (record type)
-
trim_xml_bytes–Remove any extra whitespace from a potentially malformatted XML.
create
create(name: str, model: BaseXmlModel) -> None
Create an empty table in the DuckDB database connection based on a Pydantic model.
Examples:
>>> # Set up the database class and parse a table model.
>>> from mock_data import DB_STRUCTURE_XML
>>> db = HeuristDatabase(hml_xml=DB_STRUCTURE_XML)
>>> model = db.hml.RecTypeGroups.rtg
>>>
>>> # Create a table for the Record Type Group (rtg) table model.
>>> db.create(name="rtg", model=model)
>>> shape = db.conn.table("rtg").fetchall()
>>> # The Record Type Group (rtg) table should have 11 columns.
>>> len(shape)
11
Parameters:
-
(modelBaseXmlModel) –A Pydantic XML model.
delete_existing_table
delete_existing_table(table_name: str) -> None
If the table already exists in the DuckDB database, drop it.
Parameters:
-
(table_namestr) –Name of the table to potentially drop.
describe_record_schema
describe_record_schema(rty_ID: int) -> DuckDBPyRelation
Join the tables 'dty' (detail), 'rst' (record structure), 'rty' (record type) to get all the relevant information for a specific record type, plus add the label and description of the section / separator associated with each detail (if any).
Parameters:
-
(rty_IDint) –ID of the targeted record type.
Returns:
-
DuckDBPyRelation(DuckDBPyRelation) –A DuckDB Python relation that can be queried or converted.
trim_xml_bytes
classmethod
trim_xml_bytes(xml: bytes) -> bytes
Remove any extra whitespace from a potentially malformatted XML.
Parameters:
-
(xmlbytes) –Heurist database structure exported XML format.
Returns:
-
bytes(bytes) –Validated Heurist database structure in XML format.