architxt.bucket.zodb#

Classes

ZODBTreeBucket([storage_path, bucket_name, ...])

A persistent, scalable container for Tree objects backed by ZODB and RelStorage using SQLite.

class architxt.bucket.zodb.ZODBTreeBucket(storage_path=None, bucket_name='architxt', read_only=False)[source]#

Bases: TreeBucket

A persistent, scalable container for Tree objects backed by ZODB and RelStorage using SQLite.

This container uses ZODB’s OOBTree internally with Tree OIDs (UUIDs) as keys. The OIDs are stored as raw bytes to optimize storage space. This also enables fast key comparisons as UUID objects do not need to be created during lookups.

Note

UUIDs are stored as bytes rather than integers, because ZODB only supports integers up to 64 bits, while UUIDs require 128 bits.

Without a specified storage path, the container creates a temporary database automatically deleted upon closing.

>>> bucket = ZODBTreeBucket()
>>> tree = Tree.fromstring('(S (NP Alice) (VP (VB like) (NNS apples)))')
>>> bucket.add(tree)
>>> len(bucket)
1
>>> tree.label = 'ROOT'
>>> transaction.commit()  # Persist changes made to the tree
>>> tree.label
'ROOT'
>>> tree.label = 'S'
>>> transaction.abort()  # Cancel changes made to the tree
>>> tree.label
'ROOT'
>>> bucket.discard(tree)
>>> len(bucket)
0
>>> bucket.close()
add(tree)[source]#

Add a single Tree to the bucket.

Return type:

None

async async_update(trees, batch_size=BATCH_SIZE, _memory_threshold_mb=3_000)[source]#

Asynchronously add multiple Tree to the bucket.

This method mirrors the behavior of update() but supports asynchronous iteration. Internally, it delegates each chunk to a background thread.

Parameters:
  • trees (AsyncIterable[Tree]) – Trees to add to the bucket.

  • batch_size (int) – The number of trees to be added at once.

  • _memory_threshold_mb (int) – Memory threshold (in MB) below which garbage collection is triggered.

Return type:

None

clear()[source]#

Remove all Tree objects from the bucket.

Return type:

None

close()[source]#

Close the database connection and release associated resources.

This will:

  • Abort any uncommitted transaction.

  • Close the active database connection.

  • Clean up temporary storage if one was created.

Return type:

None

commit()[source]#

Persist any in-memory changes to Tree in the bucket.

Return type:

None

discard(tree)[source]#

Remove a Tree from the bucket if it exists.

Return type:

None

oids()[source]#

Yield the object IDs (OIDs) of all trees stored in the bucket.

Return type:

Generator[UUID, None, None]

transaction()[source]#

Return a context manager for managing a transaction.

Upon exiting the context, the transaction is automatically committed. If an exception occurs within the context, the transaction is rolled back.

Return type:

TransactionManager

update(trees, batch_size=BATCH_SIZE, _memory_threshold_mb=3_000)[source]#

Add multiple Tree to the bucket, managing memory via chunked transactions.

Trees are added in batches to reduce memory footprint. When available system memory falls below the threshold, the connection cache is minimized and garbage collection is triggered.

Warning

Only the last chunk is rolled back on error. Prior chunks remain committed, potentially leaving the database in a partially updated state.

Parameters:
  • trees (Iterable[Tree]) – Trees to add to the bucket.

  • batch_size (int) – The number of trees to be added at once.

  • _memory_threshold_mb (int) – Memory threshold (in MB) below which garbage collection is triggered.

Return type:

None