architxt.simplification.tree_rewriting

architxt.simplification.tree_rewriting#

Functions

apply_operations(edit_ops, forest, *, ...[, ...])

Apply a sequence of edit operations to a forest, potentially simplifying its structure.

create_group(subtree, group_name)

Create a group node from a subtree and inserts it into its parent node.

find_groups(equiv_subtrees, min_support)

Find and create groups based on the given set of equivalent subtrees.

rewrite(forest, *[, tau, decay, epoch, ...])

Rewrite a forest by applying edit operations iteratively.

architxt.simplification.tree_rewriting.apply_operations(edit_ops, forest, *, equiv_subtrees, early_exit=True, executor, batch_size)[source]#

Apply a sequence of edit operations to a forest, potentially simplifying its structure.

Each operation in edit_ops is applied to the forest in the provided order. If early_exit is enabled, the function stops as soon as an operation successfully simplifies at least one tree. Otherwise, all operations are applied.

Parameters:
  • edit_ops (Sequence[Operation | tuple[str, Operation]]) – A sequence of operations to apply to the forest. Each operation can either be a callable or a tuple (name, callable) where name is a string identifier for the operation.

  • forest (Forest) – The input forest (a collection of trees) on which operations are applied.

  • equiv_subtrees (TREE_CLUSTER) – The set of equivalent subtrees.

  • early_exit (bool) – A boolean flag indicating whether to stop after the first successful operation. If False, all operations are applied.

  • executor (ProcessPoolExecutor) – A pool executor to parallelize the processing of the forest.

  • batch_size (int) – The number of trees to process in each batch.

Return type:

int | None

Returns:

The index of the operation that successfully simplified a tree, or None if no operation succeeded.

architxt.simplification.tree_rewriting.create_group(subtree, group_name)[source]#

Create a group node from a subtree and inserts it into its parent node.

Parameters:
  • subtree (Tree) – The subtree to convert into a group.

  • group_name (str) – The name to use for the group.

Return type:

None

architxt.simplification.tree_rewriting.find_groups(equiv_subtrees, min_support)[source]#

Find and create groups based on the given set of equivalent subtrees.

Parameters:
  • equiv_subtrees (dict[str, Sequence[Tree]]) – The set of equivalent subtrees.

  • min_support (int) – Minimum support of groups.

Return type:

bool

Returns:

A boolean indicating if groups were created.

architxt.simplification.tree_rewriting.rewrite(forest, *, tau=0.7, decay=DECAY, epoch=100, min_support=None, metric=DEFAULT_METRIC, edit_ops=DEFAULT_OPERATIONS, debug=False, max_workers=None, commit=True, simplify_names=True)[source]#

Rewrite a forest by applying edit operations iteratively.

Parameters:
  • forest (Forest) – The forest to perform on.

  • tau (float) – Threshold for subtree similarity when clustering.

  • decay (float) – The similarity decay factor. The higher the value, the more the weight of context decreases with distance.

  • epoch (int) – Maximum number of rewriting steps.

  • min_support (int | None) – Minimum support of groups.

  • metric (METRIC_FUNC) – The metric function used to compute similarity between subtrees.

  • edit_ops (Sequence[type[Operation]]) – The list of operations to perform on the forest.

  • debug (bool) – Whether to enable debug logging.

  • max_workers (int | None) – Number of parallel worker processes to use.

  • commit (bool | int) – Commit automatically. If already in a transaction, no commit is applied. - If False, no commits are made, it relies on the current transaction. - If True (default), commits in batch. - If an integer, commits every N tree. To avoid memory issues, we recommend using incremental commit with large iterables. When using TreeBucket, workers always commit in internal transactions (to avoid serialisation). The commit parameter only controls the batch size for these commits.

  • simplify_names (bool) – Should the groups/relations names be simplified after the rewrite?

Return type:

Metrics

Returns:

A Metrics object encapsulating the results and metrics calculated for the rewrite process.

Modules