Loading document databases#
See also
- Fundamentals
Overview of ArchiTXT’s internal data representation.
ArchiTXT supports loading document databases (such as JSON, XML, YAML, TOML, CSV, and Excel) through the architxt.database.loader.documents
module.
These documents are converted into data Tree
corresponding the metamodel.
The document-to-tree conversion process involves three steps:
Read: Detect and parse the input file into a native Python nested structure compose of either
dict
orlist
.Parse: Convert the Python structure into a
Tree
composed of COLL, GROUP, and ENT nodes.Transform: Optionally extract relationships implied by nested groups, transforming the tree to align it with the target metamodel.
The decomposition in three steps enables parsing not only supported data formats but also arbitrary Python data structures. The resulting raw trees are not considered valid on their own but can be combined with syntax trees before applying more advanced structuring algorithms.
Parsing nested data structures#
The parsing process is performed via read_tree()
.
This function traverses nested Python structures and constructs a corresponding Tree
based on the following rules:
A
dict
becomes a GROUP node, where each key/value pair is parsed into a subtree.A
list
becomes a COLL node, where each element is parsed into a subtree.A scalar value (e.g.,
str
,int
,float
,bool
) becomes an ENT node wrapping the value.
Example
Consider the following JSON document:
[
{
"userId": 1,
"username": "johndoe",
"profile": {
"firstName": "John",
"lastName": "Doe",
"birthDate": "1990-01-01"
}
}
]
This input is converted into the following tree structure:
--- config: theme: neutral --- graph TD users["COLL users"] users --> user["GROUP user"] user --> userId["ENT userId"] --> userIdVal["1"] user --> username["ENT username"] --> usernameVal["johndoe"] user --> profile["GROUP profile"] profile --> firstName["ENT firstName"] --> firstNameVal["John"] profile --> lastName["ENT lastName"] --> lastNameVal["Doe"] profile --> birthDate["ENT birthDate"] --> birthDateVal["1990-01-01"]
Transforming Raw Trees#
Warning
The transformation described here is specifically designed for tree-like data. Applying it to arbitrary or improperly structured trees may result in invalid or incoherent outputs.
Once a raw tree is constructed, it can be transformed into a flattened structure aligned with the metamodel using parse_document_tree()
.
This transformation:
Converts nested GROUP nodes into REL nodes, establishing explicit relationships between parent and child subtrees.
Duplicates the parent node for each nested group while retaining only its direct ENT children as part of the GROUP.
If the root of the raw tree is a COLL, the transformation produces a forest; constructing one tree per collection element.
Example
Given the raw tree from the previous example, the transformation produces the following structure that conforms to the ArchiTXT metamodel:
--- config: theme: neutral --- graph TD root["ROOT"] root --> coll["COLL user<->profile"] coll --> rel["REL user<->profile"] rel --> user["GROUP user"] user --> userId["ENT userId"] --> userIdVal["1"] user --> username["ENT username"] --> usernameVal["johndoe"] rel --> profile["GROUP profile"] profile --> firstName["ENT firstName"] --> firstNameVal["John"] profile --> lastName["ENT lastName"] --> lastNameVal["Doe"] profile --> birthDate["ENT birthDate"] --> birthDateVal["1990-01-01"]
Supported File Formats#
ArchiTXT supports a wide range of document formats through pluggable parsers. Each format is handled by a specific backend parser:
|
|
Important
Parsers are applied in order; if none succeed, a ValueError
is raised.