architxt.schema#

Classes

Group(name, entities)

Relation(name, left, right[, orientation])

RelationOrientation(*values)

Specifies the direction of a relationship between two groups.

Schema(productions, groups, relations)

class architxt.schema.Group(name, entities)[source]#

Bases: object

entities#

Type:    set[str]

name#

Type:    str

class architxt.schema.Relation(name, left, right, orientation=RelationOrientation.BOTH)[source]#

Bases: object

left#

Type:    str

name#

Type:    str

orientation#

Type:    RelationOrientation

right#

Type:    str

class architxt.schema.RelationOrientation(*values)[source]#

Bases: Enum

Specifies the direction of a relationship between two groups.

This enum is used to indicate the source or cardinality orientation of a relationship.

BOTH = 3#

Type:    int

The relationship is bidirectional or many-to-many, with no single source.

LEFT = 1#

Type:    int

The source of the relationship is the left group.

RIGHT = 2#

Type:    int

The source of the relationship is the right group.

class architxt.schema.Schema(productions, groups, relations)[source]#

Bases: CFG

classmethod from_description(*, groups=None, relations=None, collections=True)[source]#

Create a Schema from a description of groups, relations, and collections.

Parameters:
  • groups (Optional[set[Group]]) – A dictionary mapping groups names to sets of entities.

  • relations (Optional[set[Relation]]) – A dictionary mapping relation names to tuples of group names.

  • collections (bool) – Whether to generate collection productions.

Return type:

Schema

Returns:

A Schema object.

classmethod from_forest(forest, *, keep_unlabelled=True, merge_lhs=True)[source]#

Create a Schema from a given forest of trees.

Parameters:
  • forest (Iterable[Tree]) – The input forest from which to derive the schema.

  • keep_unlabelled (bool) – Whether to keep uncategorized nodes in the schema.

  • merge_lhs (bool) – Whether to merge nodes in the schema.

Return type:

Schema

Returns:

A CFG-based schema representation.

as_cfg()[source]#

Convert the schema to a CFG representation.

Return type:

str

Returns:

The schema as a list of production rules, each terminated by a semicolon.

extract_datasets(forest)[source]#

Extract datasets from a forest for each group defined in the schema.

Parameters:

forest (Collection[Tree]) – The input forest to extract datasets from.

Return type:

dict[str, DataFrame]

Returns:

A mapping from group names to datasets.

extract_valid_trees(forest)[source]#

Filter and return a valid instance (according to the schema) of the provided forest.

It removes any subtrees with labels that do not match valid labels and gets rid of redundant collections.

Parameters:

forest (Iterable[Tree]) – The input forest to be cleaned.

Yield:

Valid trees according to the schema.

Return type:

Generator[Tree, None, None]

find_collapsible_groups()[source]#

Identify all groups eligible for collapsing into attributed relationships.

A group M is collapsible if it participates exactly twice in a 1-n relation on the ‘one’ side, i.e. we want to collapse patterns like:

A –(n-1)–> M <–(1-n)– B

Into a direct n-n edge:

A –[attributed edge]– B

Return type:

set[str]

Returns:

A set of groups that can be turned into attributed edges.

>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='A', right='M', orientation=RelationOrientation.LEFT),
...     Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
{'M'}
>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='M', right='B', orientation=RelationOrientation.RIGHT),
...     Relation(name='R2', left='M', right='C', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
{'M'}
>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='A', right='M', orientation=RelationOrientation.BOTH),
...     Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
set()
>>> schema = Schema.from_description(relations={
...     Relation(name='R1', left='A', right='M', orientation=RelationOrientation.LEFT),
...     Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT),
...     Relation(name='R2', left='M', right='C', orientation=RelationOrientation.RIGHT),
... })
>>> schema.find_collapsible_groups()
set()
verify()[source]#

Verify the schema against the meta-grammar.

Return type:

bool

Returns:

True if the schema is valid, False otherwise.

property entities#

The set of entities in the schema.

property group_balance_score#

Get the balance score of attributes across groups.

The balance metric (B) measures the dispersion of attributes (coefficient of variation), indicating if the schema is well-balanced. A higher balance metric indicates that attributes are distributed more evenly across groups, while a lower balance metric suggests that some groups may be too large (wide) or too small (fragmented).

\[B = 1 - \frac{\sigma(A)}{\mu(A)}\]
Where:
  • \(A\): The set of attributes counts for all groups.

  • \(\mu(A)\): The mean number of attributes per group.

  • \(\sigma(A)\): The standard deviation of attribute counts across groups.

Return type:

float

Returns:

Balance metric (B), a measure of attribute dispersion. - \(B \approx 1\): Attributes are evenly distributed. - \(B \approx 0\): Significant imbalance; some groups are much larger or smaller than others.

property group_overlap#

Get the group overlap ratio as a combined Jaccard index.

The group overlap ratio is computed as the mean of all pairwise Jaccard indices for each pair of groups.

Return type:

float

Returns:

The group overlap ratio as a float value between 0 and 1. A higher value indicates a higher degree of overlap between groups.

property groups#

The set of groups in the schema.

Return type:

set[Group]

property relations#

The set of relations in the schema.

Return type:

set[Relation]