architxt.schema

architxt.schema#

Classes

`Group`(name, entities)
`Relation`(name, left, right[, orientation])
`RelationOrientation`(*values)	Specifies the direction of a relationship between two groups.
`Schema`(productions, groups, relations)

class architxt.schema.Group(name, entities)[source]#

Bases: object

entities#: Type: set[str]

name#: Type: str

class architxt.schema.Relation(name, left, right, orientation=RelationOrientation.BOTH)[source]#

Bases: object

left#: Type: str

name#: Type: str

orientation#: Type: RelationOrientation

right#: Type: str

class architxt.schema.RelationOrientation(*values)[source]#

Bases: Enum

Specifies the direction of a relationship between two groups.

This enum is used to indicate the source or cardinality orientation of a relationship.

BOTH = 3#

Type: int

The relationship is bidirectional or many-to-many, with no single source.

LEFT = 1#

Type: int

The source of the relationship is the left group.

RIGHT = 2#

Type: int

The source of the relationship is the right group.

class architxt.schema.Schema(productions, groups, relations)[source]#

Bases: CFG

classmethod from_description(*, groups=None, relations=None, collections=True)[source]#

Create a Schema from a description of groups, relations, and collections.

Parameters:

groups (Optional[set[Group]]) – A dictionary mapping groups names to sets of entities.
relations (Optional[set[Relation]]) – A dictionary mapping relation names to tuples of group names.
collections (bool) – Whether to generate collection productions.

Return type:

Schema

Returns:

A Schema object.

classmethod from_forest(forest, *, keep_unlabelled=True, merge_lhs=True)[source]#

Create a Schema from a given forest of trees.

Parameters:

forest (Iterable[Tree]) – The input forest from which to derive the schema.
keep_unlabelled (bool) – Whether to keep uncategorized nodes in the schema.
merge_lhs (bool) – Whether to merge nodes in the schema.

Return type:

Schema

Returns:

A CFG-based schema representation.

as_cfg()[source]#

Convert the schema to a CFG representation.

Return type:: str
Returns:: The schema as a list of production rules, each terminated by a semicolon.

extract_datasets(forest)[source]#

Extract datasets from a forest for each group defined in the schema.

Parameters:: forest (Collection[Tree]) – The input forest to extract datasets from.
Return type:: dict[str, DataFrame]
Returns:: A mapping from group names to datasets.

extract_valid_trees(forest)[source]#

Filter and return a valid instance (according to the schema) of the provided forest.

It removes any subtrees with labels that do not match valid labels and gets rid of redundant collections.

Parameters:: forest (Iterable[Tree]) – The input forest to be cleaned.
Yield:: Valid trees according to the schema.
Return type:: Generator[Tree, None, None]

find_collapsible_groups()[source]#

Identify all groups eligible for collapsing into attributed relationships.

A group M is collapsible if it participates exactly twice in a 1-n relation on the ‘one’ side, i.e. we want to collapse patterns like:

A –(n-1)–> M <–(1-n)– B

Into a direct n-n edge:

A –[attributed edge]– B

Return type:: set[str]
Returns:: A set of groups that can be turned into attributed edges.

>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='A', right='M', orientation=RelationOrientation.LEFT),
...     Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
{'M'}

>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='M', right='B', orientation=RelationOrientation.RIGHT),
...     Relation(name='R2', left='M', right='C', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
{'M'}

>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='A', right='M', orientation=RelationOrientation.BOTH),
...     Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
set()

>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='A', right='M', orientation=RelationOrientation.LEFT),
...     Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT),
...     Relation(name='R2', left='M', right='C', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
set()

verify()[source]#

Verify the schema against the meta-grammar.

Return type:: bool
Returns:: True if the schema is valid, False otherwise.

property entities#: The set of entities in the schema.

property group_balance_score#

Get the balance score of attributes across groups.

The balance metric (B) measures the dispersion of attributes (coefficient of variation), indicating if the schema is well-balanced. A higher balance metric indicates that attributes are distributed more evenly across groups, while a lower balance metric suggests that some groups may be too large (wide) or too small (fragmented).

\[B = 1 - \frac{\sigma(A)}{\mu(A)}\]

Where:

\(A\): The set of attributes counts for all groups.
\(\mu(A)\): The mean number of attributes per group.
\(\sigma(A)\): The standard deviation of attribute counts across groups.

Return type:: float
Returns:: Balance metric (B), a measure of attribute dispersion. - \(B \approx 1\): Attributes are evenly distributed. - \(B \approx 0\): Significant imbalance; some groups are much larger or smaller than others.

property group_overlap#

Get the group overlap ratio as a combined Jaccard index.

The group overlap ratio is computed as the mean of all pairwise Jaccard indices for each pair of groups.

Return type:: float
Returns:: The group overlap ratio as a float value between 0 and 1. A higher value indicates a higher degree of overlap between groups.

property groups#

The set of groups in the schema.

Return type:: set[Group]

property relations#

The set of relations in the schema.

Return type:: set[Relation]

architxt.schema

Contents

architxt.schema#