architxt.schema#
Classes
|
- class architxt.schema.Schema(start, productions, calculate_leftcorners=True)[source]#
Bases:
CFG
- classmethod from_description(*, groups=None, rels=None, collections=True)[source]#
Create a Schema from a description of groups, relations, and collections.
- Parameters:
- Return type:
- Returns:
A Schema object.
- classmethod from_forest(forest, *, keep_unlabelled=True, merge_lhs=True)[source]#
Create a Schema from a given forest of trees.
- Parameters:
- Return type:
- Returns:
A CFG-based schema representation.
- as_cfg()[source]#
Convert the schema to a CFG representation.
- Return type:
- Returns:
The schema as a list of production rules, each terminated by a semicolon.
- as_cypher()[source]#
Convert the schema to a Cypher representation.
It only define indexes and constraints as properties graph database do not have fixed schema.
TODO: Implement this method.
- Return type:
- Returns:
The schema as a Cypher creation script defining constraints and indexes.
- as_sql()[source]#
Convert the schema to an SQL representation.
TODO: Implement this method.
- Return type:
- Returns:
The schema as an SQL creation script.
- extract_datasets(forest)[source]#
Extract datasets from a forest for each group defined in the schema.
- Parameters:
forest (
Collection
[Tree
]) – The input forest to extract datasets from.- Return type:
- Returns:
A mapping from group names to datasets.
- extract_valid_trees(forest)[source]#
Filter and return a valid instance (according to the schema) of the provided forest.
It removes any subtrees with labels that do not match valid labels and gets rid of redundant collections.
- Parameters:
forest (
Collection
[Tree
]) – The input forest to be cleaned.- Return type:
- Returns:
A list of valid trees according to the schema.
- verify()[source]#
Verify the schema against the meta-grammar.
- Return type:
- Returns:
True if the schema is valid, False otherwise.
- property entities#
The set of entities in the schema.
- property group_balance_score#
Get the balance score of attributes across groups.
The balance metric (B) measures the dispersion of attributes (coefficient of variation), indicating if the schema is well-balanced. A higher balance metric indicates that attributes are distributed more evenly across groups, while a lower balance metric suggests that some groups may be too large (wide) or too small (fragmented).
\[B = 1 - \frac{\sigma(A)}{\mu(A)}\]- Where:
\(A\): The set of attribute counts for all groups.
\(\mu(A)\): The mean number of attributes per group.
\(\sigma(A)\): The standard deviation of attribute counts across groups.
- returns: Balance metric (B), a measure of attribute dispersion.
\(B \approx 1\): Attributes are evenly distributed.
\(B \approx 0\): Significant imbalance; some groups are much larger or smaller than others.
- Return type:
- property group_overlap#
Get the group overlap ratio as a combined Jaccard index.
The group overlap ratio is computed as the mean of all pairwise Jaccard indices for each pair of groups.
- Return type:
- Returns:
The group overlap ratio as a float value between 0 and 1. A higher value indicates a higher degree of overlap between groups.
- property groups#
The set of groups in the schema.
- property relations#
The set of relations in the schema.