architxt.simplification.tree_rewriting

architxt.simplification.tree_rewriting#

Functions

apply_operations(edit_ops, forest, *, ...[, ...])

Apply a sequence of edit operations to a forest, potentially simplifying its structure.

create_group(subtree, group_index)

Create a group node from a subtree and inserts it into its parent node.

find_groups(equiv_subtrees, min_support)

Find and create groups based on the given set of equivalent subtrees.

rewrite(forest, *[, tau, epoch, ...])

Rewrite a forest by applying edit operations iteratively.

architxt.simplification.tree_rewriting.apply_operations(edit_ops, forest, *, equiv_subtrees, early_exit=True, executor)[source]#

Apply a sequence of edit operations to a forest, potentially simplifying its structure.

Each operation in edit_ops is applied to the forest in the provided order. If early_exit is enabled, the function stops as soon as an operation successfully simplifies at least one tree. Otherwise, all operations are applied.

Parameters:
  • edit_ops (Sequence[Union[Operation, tuple[str, Operation]]]) – A sequence of operations to apply to the forest. Each operation can either be a callable or a tuple (name, callable) where name is a string identifier for the operation.

  • forest (Collection[Tree]) – The input forest (a collection of trees) on which operations are applied.

  • equiv_subtrees (set[tuple[Tree, …]]) – The set of equivalent subtrees.

  • early_exit (bool) – A boolean flag indicating whether to stop after the first successful operation. If False, all operations are applied.

  • executor (ProcessPoolExecutor) – A pool executor to parallelize the processing of the forest.

Return type:

tuple[Collection[Tree], Optional[int]]

Returns:

A tuple composed of: - The updated forest after applying the operations. - The index of the operation that successfully simplified a tree, or None if no operation succeeded.

architxt.simplification.tree_rewriting.create_group(subtree, group_index)[source]#

Create a group node from a subtree and inserts it into its parent node.

Parameters:
  • subtree (Tree) – The subtree to convert into a group.

  • group_index (int) – The index to use for naming the group.

Return type:

None

architxt.simplification.tree_rewriting.find_groups(equiv_subtrees, min_support)[source]#

Find and create groups based on the given set of equivalent subtrees.

Parameters:
  • equiv_subtrees (set[tuple[Tree, …]]) – The set of equivalent subtrees.

  • min_support (int) – Minimum support of groups.

Return type:

bool

Returns:

A boolean indicating if groups were created.

architxt.simplification.tree_rewriting.rewrite(forest, *, tau=0.7, epoch=100, min_support=None, metric=DEFAULT_METRIC, edit_ops=DEFAULT_OPERATIONS, debug=False, max_workers=None)[source]#

Rewrite a forest by applying edit operations iteratively.

Parameters:
  • forest (Collection[Tree]) – The forest to perform on.

  • tau (float) – Threshold for subtree similarity when clustering.

  • epoch (int) – Maximum number of rewriting steps.

  • min_support (Optional[int]) – Minimum support of groups.

  • metric (Callable[Collection[str], Collection[str], float]) – The metric function used to compute similarity between subtrees.

  • edit_ops (Sequence[type[Operation]]) – The list of operations to perform on the forest.

  • debug (bool) – Whether to enable debug logging.

  • max_workers (Optional[int]) – Number of parallel worker processes to use.

Return type:

Collection[Tree]

Returns:

The rewritten forest.

Modules