architxt.database.loader.documents#

Functions

parse_document_tree(tree)

Parse a document tree and yields processed subtrees based on collection grouping.

parse_file(file)

Parse a document database file like XML, JSON, or CSV.

read_document(file, *[, raw_read, root_name])

Read the file as a data tree.

read_document_file(file)

Read and parse a document file like XML, JSON, or CSV.

read_tree(data, *[, root_name])

Recursively converts a document nested structure into a tree.

traverse_tree(tree)

Recursively traverses and transforms a nested tree into a valid metamodel structure.

architxt.database.loader.documents.parse_document_tree(tree)[source]#

Parse a document tree and yields processed subtrees based on collection grouping.

  • If the root node is not a collection, the entire tree is processed and a single result is yielded.

  • If the root node is a collection, each child subtree is individually processed and yielded.

TODO: Enhance tree decomposition for nested collections.

If no collection exists at the root level, consider splitting at the closest collection and duplicating the path to the root for each collection element.

Parameters:

tree (Tree) – The nested tree to be parsed.

Yield:

Trees representing the database.

Return type:

Generator[Tree, None, None]

architxt.database.loader.documents.parse_file(file)[source]#

Parse a document database file like XML, JSON, or CSV.

Parameters:

file (Union[BytesIO, BinaryIO]) – A file-like object opened for reading.

Return type:

Union[dict[str, Any], list[Any]]

Returns:

The parsed content of the file as a Python nested object.

Raises:

ValueError if none of the available parsers are able to process the input file.

architxt.database.loader.documents.read_document(file, *, raw_read=False, root_name='ROOT')[source]#

Read the file as a data tree.

XML are parsed according to https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

Parameters:
  • file (Union[str, Path, BytesIO, BinaryIO]) – The document file to read.

  • raw_read (bool) – If enabled, the tree corresponds to the document without any transformation applied.

  • root_name (str) – The root node name.

Return type:

Generator[Tree, None, None]

Returns:

A list of trees representing the database.

architxt.database.loader.documents.read_document_file(file)[source]#

Read and parse a document file like XML, JSON, or CSV.

Parameters:

file (Union[str, Path, BytesIO, BinaryIO]) – The document database file to read.

Return type:

Union[dict[str, Any], list[Any]]

Returns:

The parsed contents of the file.

Raises:
architxt.database.loader.documents.read_tree(data, *, root_name='ROOT')[source]#

Recursively converts a document nested structure into a tree.

  • Dictionaries are treated as groups.

  • Lists are treated as collections.

  • Leaf elements are treated as entities.

If a list contains only a single collection, the function flattens the output by returning that collection directly instead of nesting it under another collection node.

Parameters:
  • data (Union[dict[str, Any], list[Any]]) – The input data structure to be converted into a Tree.

  • root_name (str) – The label for the current node.

Return type:

Tree

Returns:

A nested tree structure corresponding to the input data.

architxt.database.loader.documents.traverse_tree(tree)[source]#

Recursively traverses and transforms a nested tree into a valid metamodel structure.

The function extracts entity nodes and groups them under a single group node. It then establishes relations between this group and any nested subgroups.

Parameters:

tree (Tree) – The tree to traverse and transform.

Return type:

tuple[Tree, Tree]

Returns:

A tuple containing: - The group to anchor too for parent relationship. - The transformed tree converting subgroup to relations.