Download record groups
Require date fields to have metadata
heurist download -f NEW_DATABASE.db --require-compound-dates
Heurist offers a rich way of registering compound date information, including date ranges, uncertain dates, as well as details about a fuzzy date's certainty and probability distribution. However, Heurist also allows users to directly type a date estimate, i.e. a year (1448
), in the record's field.
If you want to confirm that all your records' dates have compound dates, with comparable metadata, use the heurist download
command with the --require-compound-dates
flag. This flag imposes an extra step of data validation that causes records without compound dates to be reported in the validation.log
file (see the Log section) and not included in the DuckDB database produced at the end of the workflow.
Example of an invalid date field in the log
A user can enter a year directly in a date field, without going through Heurist's compound date widget or, as in the case of CSV import, indicating a date range. When using the --require-compound-dates
flag, this record would fail validation and be reported in the log.
2025-02-27 12:19:03,378 validation WARNING
Record: text Record ID: 47644
DTY: 1285 The date field was not entered as a compound Heurist date object.
Entered value = 1448
If you want to impose this strict date data validation for your analysis, go back to Heurist and change the reported record's date.
Understanding Heurist date metadata
How Heurist's API describes dates
To better understand how the heurist
ETL package processes Heurist date data, look at how Heurist's API transmits the data stored in the database.
Simple date detail
Simple date detail from Heurist's API:
{
"dty_ID": 1285,
"value": 1448,
"fieldName": "date_of_creation",
"fieldType": "date",
"conceptID": ""
}
The value of a simple date detail from Heurist's API is a year or date string (i.e. 1448
). If the --require-compound-dates
flag is used in the heurist download
command, a simple date detail will raise a warning and cause the record to be invalid.
Compound date detail
Compound date detail from Heurist's API:
{
"dty_ID": 1285,
"value": {
"start": {
"earliest": "1460"
},
"end": {
"latest": "1469"
},
"estMinDate": 1460,
"estMaxDate": 1469.1231
},
"fieldName": "date_of_creation",
"fieldType": "date",
"conceptID": ""
},
The value of a compound date detail from Heurist's API is a map of metadata, including the data's earliest (estMinDate
) and latest (estMaxDate
) dates.
How the heurist
package processes compound dates
For every 1 date field, the heurist
ETL process creates 2 columns, which aim to (i) transform the data into an efficient format and (ii) preserve the original information returned from Heurist's API.
Input examples from Heurist API
Let's look at an example with a date field named date_of_creation
and 3 records.
Record 1: date_of_creation
1180 - 1200
{
"start": {
"earliest": "1180"
},
"end": {
"latest": "1200"
},
"estMinDate": 1180,
"estMaxDate": 1200.1231
}
Record 2: date_of_creation
in 1448 (implied, simple date)
{
"dty_ID": 1285,
"value": 1448,
"fieldName": "date_of_creation",
"fieldType": "date",
"conceptID": ""
}
Record 3: date_of_creation
circa 1188
{
"timestamp": {
"in": "1188",
"type": "s",
"circa": true
},
"comment": "1188",
"estMinDate": 1188,
"estMaxDate": 1188
}
Date column
The estimated minimum and maximum dates are extracted from Heurist's compound date metadata, transformed into Python datetime
objects, arranged in an ordered list of the earliest and latest dates in the data field.
Record | Compound | Meaning | estMinDate from API |
estMinDate from API |
created date_of_creation column |
---|---|---|---|---|---|
1 | yes | 1180 - 1200 | 1180 |
1200.1231 |
[1180-01-01 00:00:00, 1200-12-31 00:00:00] |
2 | no | in 1448 | [1448-01-01 00:00:00, NULL] |
||
3 | yes | circa 1188 | 1188 |
1188 |
[1188-01-01 00:00:00, 1188-01-01 00:00:00] |
Map column
In addition to the parsed date_of_creation
column, the heurist
ETL pipeline also preserves the response from Heurist's API in a supplemental column with the suffix _TEMPORAL
if it is of a compound date.
Record | created date_of_creation column |
created date_of_creation_TEMPORAL column |
---|---|---|
1 | [1180-01-01 00:00:00, 1200-12-31 00:00:00] |
{'start': {'earliest': '1180'}, 'end': {'latest': '1200'}, 'estMinDate': 1180, 'estMaxDate': 1200.1231} |
2 | [1448-01-01 00:00:00, NULL] |
|
3 | [1188-01-01 00:00:00, 1188-01-01 00:00:00] |
{'timestamp': {'in': '1188', 'type': 's', 'circa': True}, 'comment': '1188', 'estMinDate': 1188, 'estMaxDate': 1188} |