architxt.simplification.tree_rewriting#
Functions
|
Apply a sequence of edit operations to a forest, potentially simplifying its structure. |
|
Create a group node from a subtree and inserts it into its parent node. |
|
Find and create groups based on the given set of equivalent subtrees. |
|
Rewrite a forest by applying edit operations iteratively. |
- architxt.simplification.tree_rewriting.apply_operations(edit_ops, forest, *, equiv_subtrees, early_exit=True, executor, batch_size)[source]#
Apply a sequence of edit operations to a forest, potentially simplifying its structure.
Each operation in edit_ops is applied to the forest in the provided order. If early_exit is enabled, the function stops as soon as an operation successfully simplifies at least one tree. Otherwise, all operations are applied.
- Parameters:
edit_ops (
Sequence[Operation | tuple[str, Operation]]
) – A sequence of operations to apply to the forest. Each operation can either be a callable or a tuple (name, callable) where name is a string identifier for the operation.forest (
Forest
) – The input forest (a collection of trees) on which operations are applied.equiv_subtrees (
TREE_CLUSTER
) – The set of equivalent subtrees.early_exit (
bool
) – A boolean flag indicating whether to stop after the first successful operation. If False, all operations are applied.executor (
ProcessPoolExecutor
) – A pool executor to parallelize the processing of the forest.batch_size (
int
) – The number of trees to process in each batch.
- Return type:
int | None
- Returns:
The index of the operation that successfully simplified a tree, or None if no operation succeeded.
- architxt.simplification.tree_rewriting.create_group(subtree, group_name)[source]#
Create a group node from a subtree and inserts it into its parent node.
- architxt.simplification.tree_rewriting.find_groups(equiv_subtrees, min_support)[source]#
Find and create groups based on the given set of equivalent subtrees.
- architxt.simplification.tree_rewriting.rewrite(forest, *, tau=0.7, epoch=100, min_support=None, metric=DEFAULT_METRIC, edit_ops=DEFAULT_OPERATIONS, debug=False, max_workers=None, commit=BATCH_SIZE)[source]#
Rewrite a forest by applying edit operations iteratively.
- Parameters:
forest (
Forest
) – The forest to perform on.tau (
float
) – Threshold for subtree similarity when clustering.epoch (
int
) – Maximum number of rewriting steps.min_support (
int | None
) – Minimum support of groups.metric (
METRIC_FUNC
) – The metric function used to compute similarity between subtrees.edit_ops (
Sequence[type[Operation]]
) – The list of operations to perform on the forest.debug (
bool
) – Whether to enable debug logging.max_workers (
int | None
) – Number of parallel worker processes to use.commit (
bool | int
) – When working with a TreeBucket, changes can be committed automatically . - If False, no commits are made. Use this for small forests where you want to commit manually later. - If True, commits after processing the entire forest in one transaction. - If an integer, commits after processing every N tree. To avoid memory issues with large forests, we recommend using batch commit on large forests.
- Return type:
- Returns:
The rewritten forest.
Modules