architxt.schema#
Classes
|
|
|
|
|
Specifies the direction of a relationship between two groups. |
|
- class architxt.schema.Relation(name, left, right, orientation=RelationOrientation.BOTH)[source]#
Bases:
object
- orientation#
Type:
RelationOrientation
- class architxt.schema.RelationOrientation(*values)[source]#
Bases:
Enum
Specifies the direction of a relationship between two groups.
This enum is used to indicate the source or cardinality orientation of a relationship.
- class architxt.schema.Schema(productions, groups, relations)[source]#
Bases:
CFG
- classmethod from_description(*, groups=None, relations=None, collections=True)[source]#
Create a Schema from a description of groups, relations, and collections.
- Parameters:
- Return type:
- Returns:
A Schema object.
- classmethod from_forest(forest, *, keep_unlabelled=True, merge_lhs=True)[source]#
Create a Schema from a given forest of trees.
- as_cfg()[source]#
Convert the schema to a CFG representation.
- Return type:
- Returns:
The schema as a list of production rules, each terminated by a semicolon.
- extract_datasets(forest)[source]#
Extract datasets from a forest for each group defined in the schema.
- Parameters:
forest (
Collection
[Tree
]) – The input forest to extract datasets from.- Return type:
- Returns:
A mapping from group names to datasets.
- extract_valid_trees(forest)[source]#
Filter and return a valid instance (according to the schema) of the provided forest.
It removes any subtrees with labels that do not match valid labels and gets rid of redundant collections.
- find_collapsible_groups()[source]#
Identify all groups eligible for collapsing into attributed relationships.
A group M is collapsible if it participates exactly twice in a 1-n relation on the ‘one’ side, i.e. we want to collapse patterns like:
A –(n-1)–> M <–(1-n)– B
Into a direct n-n edge:
A –[attributed edge]– B
>>> schema = Schema.from_description(relations={ ... Relation(name='R1', left='A', right='M', orientation=RelationOrientation.LEFT), ... Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT), ... }) >>> schema.find_collapsible_groups() {'M'}
>>> schema = Schema.from_description(relations={ ... Relation(name='R1', left='M', right='B', orientation=RelationOrientation.RIGHT), ... Relation(name='R2', left='M', right='C', orientation=RelationOrientation.RIGHT), ... }) >>> schema.find_collapsible_groups() {'M'}
>>> schema = Schema.from_description(relations={ ... Relation(name='R1', left='A', right='M', orientation=RelationOrientation.BOTH), ... Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT), ... }) >>> schema.find_collapsible_groups() set()
>>> schema = Schema.from_description(relations={ ... Relation(name='R1', left='A', right='M', orientation=RelationOrientation.LEFT), ... Relation(name='R2', left='M', right='B', orientation=RelationOrientation.RIGHT), ... Relation(name='R2', left='M', right='C', orientation=RelationOrientation.RIGHT), ... }) >>> schema.find_collapsible_groups() set()
- verify()[source]#
Verify the schema against the meta-grammar.
- Return type:
- Returns:
True if the schema is valid, False otherwise.
- property entities#
The set of entities in the schema.
- property group_balance_score#
Get the balance score of attributes across groups.
The balance metric (B) measures the dispersion of attributes (coefficient of variation), indicating if the schema is well-balanced. A higher balance metric indicates that attributes are distributed more evenly across groups, while a lower balance metric suggests that some groups may be too large (wide) or too small (fragmented).
\[B = 1 - \frac{\sigma(A)}{\mu(A)}\]- Where:
\(A\): The set of attributes counts for all groups.
\(\mu(A)\): The mean number of attributes per group.
\(\sigma(A)\): The standard deviation of attribute counts across groups.
- Return type:
- Returns:
Balance metric (B), a measure of attribute dispersion. - \(B \approx 1\): Attributes are evenly distributed. - \(B \approx 0\): Significant imbalance; some groups are much larger or smaller than others.
- property group_overlap#
Get the group overlap ratio as a combined Jaccard index.
The group overlap ratio is computed as the mean of all pairwise Jaccard indices for each pair of groups.
- Return type:
- Returns:
The group overlap ratio as a float value between 0 and 1. A higher value indicates a higher degree of overlap between groups.