Exploring a Textual Corpus with ArchiTXT#

This tutorial provides a step-by-step guide on how to use ArchiTXT to efficiently process and analyze textual corpora.

ArchiTXT allows loading a corpus as a set of syntax trees, where each tree is enriched by incorporating named entities. These enriched trees form a forest, which can then be automatically structured into a valid database instance for further analysis.

By following this tutorial, you’ll learn how to:

  • Load a corpus

  • Parse textual data with Berkeley Neural Parser (Benepar)

  • Extract structured data using ArchiTXT

Downloading the MACCROBAT Corpus#

The MACCROBAT corpus is a collection of 200 annotated medical documents, specifically clinical case reports, extracted from PubMed Central. The annotations focus on key medical concepts such as diseases, treatments, medications, and symptoms, making it a valuable resource for biomedical text analysis.

The MACCROBAT corpus is available for download at Figshare.

Let’s download the corpora.

import io
import urllib.request
import zipfile

with urllib.request.urlopen('https://figshare.com/ndownloader/articles/9764942/versions/2') as response:
    archive_file = io.BytesIO(response.read())

with zipfile.ZipFile(archive_file) as archive:
    archive.extract('MACCROBAT2020.zip')

Installing and Configuring NLP Models#

ArchiTXT can parse the sentences using either Benepar with SpaCy or a CoreNLP server. In this tutorial, we will use the SpaCy parser with the default model, but you can use any models like one from SciSpaCy, a collection of models designed for biomedical text processing by AllenAI.

To download the SciSpaCy model, do:

!spacy download en_core_web_sm

We also need to download the Benepar model for English

import benepar

benepar.download('benepar_en3')

Parsing the Corpus with ArchiTXT#

Before processing the corpus, we need to configure the BeneparParser, specifying which SpaCy model to use for each language.

import warnings

from architxt.nlp.parser.benepar import BeneparParser

# Initialize the parser
parser = BeneparParser(
    spacy_models={
        'English': 'en_core_web_sm',
    }
)

# Suppress warnings for unsupported annotations
warnings.filterwarnings("ignore")

Named Entity Resolution (NER) helps to standardize the named entities and to build a database instance. To enable NER, we need to provide the knowledge base to use. For this tutorial, we will use the UMLS (Unified Medical Language System) resolver.

Let’s parse a sample of the corpus. To verify that everything is functioning as expected, we will inspect the largest enriched tree using the :py:meth:~architxt.tree.Tree.pretty_print method.

from architxt.nlp import raw_load_corpus

forest = [
    tree
    async for tree in raw_load_corpus(
        ['MACCROBAT2020.zip'],
        ['English'],
        parser=parser,
        resolver_name='umls',
        sample=400,
    )
]

# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/tfidf_vectors_sparse.npz not found 
in cache, downloading to /tmp/tmpe2sos5x9
Finished download, copying /tmp/tmpe2sos5x9 to cache at 
/home/runner/.scispacy/datasets/2b79923846fb52e62d686f2db846392575c8eb5b732d9d26cd3ca9378c622d40.87bd52d0f0ee055c1e
455ef54ba45149d188552f07991b765da256a1b512ca0b.tfidf_vectors_sparse.npz
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/nmslib_index.bin not found in 
cache, downloading to /tmp/tmp5db4wnl3
Finished download, copying /tmp/tmp5db4wnl3 to cache at 
/home/runner/.scispacy/datasets/7e8e091ec80370b87b1652f461eae9d926e543a403a69c1f0968f71157322c25.6d801a1e14867953e3
6258b0e19a23723ae84b0abd2a723bdd3574c3e0c873b4.nmslib_index.bin
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/tfidf_vectorizer.joblib not found 
in cache, downloading to /tmp/tmprej7384y
Finished download, copying /tmp/tmprej7384y to cache at 
/home/runner/.scispacy/datasets/37bc06bb7ce30de7251db5f5cbac788998e33b3984410caed2d0083187e01d38.f0994c1b61cc70d0eb
96dea4947dddcb37460fb5ae60975013711228c8fe3fba.tfidf_vectorizer.joblib
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/concept_aliases.json not found in 
cache, downloading to /tmp/tmp7ouy856a
Finished download, copying /tmp/tmp7ouy856a to cache at 
/home/runner/.scispacy/datasets/6238f505f56aca33290aab44097f67dd1b88880e3be6d6dcce65e56e9255b7d4.d7f77b1629001b40f1
b1bc951f3a890ff2d516fb8fbae3111b236b31b33d6dcf.concept_aliases.json
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/kbs/2023-04-23/umls_2022_ab_cat0129.jsonl not found in 
cache, downloading to /tmp/tmpgy6vlt68
Finished download, copying /tmp/tmpgy6vlt68 to cache at 
/home/runner/.scispacy/datasets/d5e593bc2d8adeee7754be423cd64f5d331ebf26272074a2575616be55697632.0660f30a60ad00fffd
8bbf084a18eb3f462fd192ac5563bf50940fc32a850a3c.umls_2022_ab_cat0129.jsonl
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/umls_semantic_type_tree.tsv not found in cache, downloading
to /tmp/tmpay33c4ns
Finished download, copying /tmp/tmpay33c4ns to cache at 
/home/runner/.scispacy/datasets/21a1012c532c3a431d60895c509f5b4d45b0f8966c4178b892190a302b21836f.330707f4efe7741348
72b9f77f0e3208c1d30f50800b3b39a6b8ec21d9adf1b7.umls_semantic_type_tree.tsv
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. 
This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. 
If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, 
and thoroughly read the reason why this was added as explained in 
https://github.com/huggingface/transformers/pull/24565
You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is 
faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an 
instance of `EncoderDecoderCache` instead, e.g. 
`past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
Error while processing corpus: 

                                                                                                                   ROOT                                                                                                                                                                                                                                                                                                                                                                                                                                                              
            ┌───────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐                                                                                                                                                                                                                                                                           
            │                                                                                                                                                                                                                                                                              UNDEF_7071c89ad515407b97d3e61457                                                                                                                                                                                                                                                          
            │                                                                                                                                                                                                                                                                                           359c84                                                                                                                                                                                                                                                                       
            │                                                                                                       ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐                                                                                                                                                         
            │                                                                                                       │                                                                                                                                                                                                                                                                                        UNDEF_16d67964b5e244b080e1eaada1                                                                                                                                        
            │                                                                                                       │                                                                                                                                                                                                                                                                                                     f0f6e6                                                                                                                                                     
            │                                                                                                       │                                                                                                                                                                                                                                     ┌─────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────┐                                                  
            │                                                                                                       │                                                                                                                                                                                                                      UNDEF_101e7d97346c47298039737db0                                                                                                                                                        │                                                 
            │                                                                                                       │                                                                                                                                                                                                                                   ed167a                                                                                                                                                                     │                                                 
            │                                                                                                       │                                                                                                                                                                                     ┌───────────────────────────────────────────────┴────────────────────────────────┐                                                                                                                                       │                                                  
            │                                                                                                       │                                                                                                                                                                                     │                                                                 UNDEF_0b5b49fb057e419394df3b3065                                                                                                                       │                                                 
            │                                                                                                       │                                                                                                                                                                                     │                                                                              dc0229                                                                                                                                    │                                                 
            │                                                                                                       │                                                                                                                                                                                     │                       ┌────────────────────────────────────────────────────────┴────────────────────────────────┐                                                                                                      │                                                  
            │                                                                                        UNDEF_48fc1d29e97a4a97b04c6cb2ab                                                                                                                                                                     │                       │                                                                          UNDEF_9ae2d001195045628326288833                                                                       UNDEF_626c95c921b9438ab30f36c04c                                 
            │                                                                                                     e8e4b1                                                                                                                                                                                  │                       │                                                                                       768e2e                                                                                                 f85bb1                                              
            │                                            ┌──────────────────────────────────────────────────────────┴────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┐                                                        │                       │                                                        ┌────────────────────────────────┴───────────────────────────┐                                         ┌────────────────────────────────┴────────────────────────────────┐                 
            │                             UNDEF_1e3acac5a24a4355b2ac9693c3                                                            UNDEF_4908010995254db6b26e6848c4                                                            UNDEF_c70091af84354ac091cb118d74                                        │                       │                                         UNDEF_af80d13b148f4883b84382dc89                                            │                          UNDEF_0003a612d13d45c382acc30280                                                 │                
            │                                          7165f2                                                                                      f44d58                                                                                      c3b163                                                     │                       │                                                      81f132                                                         │                                       75830d                                                              │                
            │                    ┌───────────────────────┴─────────────────────────────┐                            ┌────────────────────────────────┴─────────────────────────────┐                            ┌────────────────────────────────┴────────────────────────────┐                           │                       │                       ┌────────────────────────────────┴────────────────────────────────┐                           │                 ┌───────────────────────┴────────────────────────────────┐                                │                 
ENT::DIAGNOSTIC_PROCEDURE  ENT::LAB_VALUE                                  ENT::DIAGNOSTIC_PROCEDURE          ENT::LAB_VALUE                                           ENT::DIAGNOSTIC_PROCEDURE          ENT::SEVERITY                                             ENT::DISEASE_DISORDER     ENT::DIAGNOSTIC_PROCEDURE     ENT::LAB_VALUE          ENT::LAB_VALUE                                              ENT::DIAGNOSTIC_PROCEDURE         ENT::LAB_VALUE    ENT::LAB_VALUE                                     ENT::DIAGNOSTIC_PROCEDURE        ENT::BIOLOGICAL_STRUCTURE    
            │                    │                                                     │                            │                                                              │                            │                                                             │                           │                       │                       │                                                                 │                           │                 │                                                        │                                │                 
radiographic%20examination     normal                                         cardiac%20dimensions                normal                                                     wall%20motion       mild%20severity%20of%20illnes...                                  diastolic%20dysfunction    late%20diastolic%20murmur      less%20than              prolonged                                                        deceleration           two%20hundred%20fifty    reduced                                      early%20diastolic%20annular%2... mitral%20valve%20lateral%20an...
Task exception was never retrieved
future: <Task finished name='Task-6' coro=<_load_or_cache_corpus() done, defined at /home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/__init__.py:82> exception=AssertionError()>
Traceback (most recent call last):
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/retokenization.py", line 74, in retokenize
    token_idx, (token_start, token_end) = next(offset_mapping_iter)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/__init__.py", line 191, in _load_or_cache_corpus
    async for _, tree in parser.parse_batch(sentences, language=language, resolver=resolver):
  File "/home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/parser/__init__.py", line 90, in parse_batch
    async for sentence, tree in streamer:
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/select.py", line 204, in filter
    async for item in streamer:
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
    yield await corofn(arg, *args)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/advanced.py", line 75, in base_combine
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 143, in smap
    async for item in streamer:
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 78, in zip
    async for item in streamer:
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
    await self._aiterator.athrow(value)
  [Previous line repeated 1 more time]
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/create.py", line 49, in from_iterable
    yield item
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/transform.py", line 124, in chunks
    yield [first] + await aggregate.list(xs)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 79, in zip
    yield (item,)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 144, in smap
    yield func(*item)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/advanced.py", line 75, in base_combine
    result = task.result()
             ^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/create.py", line 47, in from_iterable
    for item in it:
                ^^
  File "/home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/parser/benepar.py", line 62, in raw_parse
    for doc in nlp.pipe(sentences, batch_size=batch_size):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/language.py", line 1618, in pipe
    for doc in docs:
               ^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/util.py", line 1718, in _pipe
    error_handler(name, proc, [doc], e)
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/util.py", line 1722, in raise_error
    raise e
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/util.py", line 1715, in _pipe
    doc = proc(doc, **kwargs)  # type: ignore[call-arg]
          ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/integrations/spacy_plugin.py", line 151, in __call__
    self._parser.parse(
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/parse_chart.py", line 414, in parse
    encoded = [self.encode(example) for example in examples]
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/parse_chart.py", line 193, in encode
    encoded = self.retokenizer(example.words, example.space_after)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/retokenization.py", line 150, in __call__
    example = retokenize(self.tokenizer, words, space_after, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/retokenization.py", line 76, in retokenize
    assert word_idx == len(words) - 1
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

ArchiTXT can then automatically structure parsed text into a database-friendly format. Let’s start with a simple rewrite!

from copy import deepcopy

from architxt.simplification.simple_rewrite import simple_rewrite

forest_copy = deepcopy(forest)
simple_rewrite(forest_copy)
# Look at the highest tree
max(forest_copy, key=lambda tree: tree.height).pretty_print()
                                                    ROOT                                                                
                                                     │                                                                   
                                                  GROUP::1                                                              
        ┌──────────────────┬─────────────────────────┼───────────────────────────────┬─────────────────────────┐         
 ENT::COREFERENCE ENT::CLINICAL_EVENT          ENT::FREQUENCY            ENT::DETAILED_DESCRIPTION     ENT::SIGN_SYMPTOM
        │                  │                         │                               │                         │         
symptoms%20aspect         rest        2%20or%203%20times%20per%20week up%20to%2030%20minutes%20at%2...      dyspnea

Now that we have a structured instance, we can extract its schema. The schema provides a formal representation of the extracted data.

from architxt.schema import Schema

schema = Schema.from_forest(forest_copy, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> GROUP::1 GROUP::10 GROUP::100 GROUP::101 GROUP::102 GROUP::103 GROUP::104 GROUP::105 GROUP::106 GROUP::107 GROUP::108 GROUP::109 GROUP::11 GROUP::110 GROUP::111 GROUP::112 GROUP::113 GROUP::114 GROUP::115 GROUP::116 GROUP::117 GROUP::118 GROUP::119 GROUP::12 GROUP::120 GROUP::121 GROUP::122 GROUP::123 GROUP::124 GROUP::125 GROUP::126 GROUP::127 GROUP::128 GROUP::129 GROUP::13 GROUP::130 GROUP::131 GROUP::132 GROUP::133 GROUP::134 GROUP::135 GROUP::136 GROUP::137 GROUP::138 GROUP::139 GROUP::14 GROUP::140 GROUP::141 GROUP::142 GROUP::143 GROUP::144 GROUP::145 GROUP::146 GROUP::147 GROUP::148 GROUP::149 GROUP::15 GROUP::150 GROUP::151 GROUP::152 GROUP::153 GROUP::154 GROUP::155 GROUP::156 GROUP::157 GROUP::158 GROUP::159 GROUP::16 GROUP::160 GROUP::161 GROUP::162 GROUP::163 GROUP::164 GROUP::165 GROUP::166 GROUP::167 GROUP::168 GROUP::169 GROUP::17 GROUP::170 GROUP::171 GROUP::172 GROUP::173 GROUP::174 GROUP::175 GROUP::176 GROUP::177 GROUP::178 GROUP::179 GROUP::18 GROUP::180 GROUP::181 GROUP::182 GROUP::183 GROUP::184 GROUP::185 GROUP::186 GROUP::187 GROUP::188 GROUP::189 GROUP::19 GROUP::190 GROUP::191 GROUP::192 GROUP::193 GROUP::194 GROUP::195 GROUP::196 GROUP::197 GROUP::198 GROUP::199 GROUP::2 GROUP::20 GROUP::200 GROUP::201 GROUP::202 GROUP::203 GROUP::204 GROUP::205 GROUP::206 GROUP::207 GROUP::208 GROUP::209 GROUP::21 GROUP::210 GROUP::211 GROUP::212 GROUP::213 GROUP::214 GROUP::215 GROUP::216 GROUP::217 GROUP::218 GROUP::219 GROUP::22 GROUP::220 GROUP::221 GROUP::222 GROUP::223 GROUP::224 GROUP::23 GROUP::24 GROUP::25 GROUP::26 GROUP::27 GROUP::28 GROUP::29 GROUP::3 GROUP::30 GROUP::31 GROUP::32 GROUP::33 GROUP::34 GROUP::35 GROUP::36 GROUP::37 GROUP::38 GROUP::39 GROUP::4 GROUP::40 GROUP::41 GROUP::42 GROUP::43 GROUP::44 GROUP::45 GROUP::46 GROUP::47 GROUP::48 GROUP::49 GROUP::5 GROUP::50 GROUP::51 GROUP::52 GROUP::53 GROUP::54 GROUP::55 GROUP::56 GROUP::57 GROUP::58 GROUP::59 GROUP::6 GROUP::60 GROUP::61 GROUP::62 GROUP::63 GROUP::64 GROUP::65 GROUP::66 GROUP::67 GROUP::68 GROUP::69 GROUP::7 GROUP::70 GROUP::71 GROUP::72 GROUP::73 GROUP::74 GROUP::75 GROUP::76 GROUP::77 GROUP::78 GROUP::79 GROUP::8 GROUP::80 GROUP::81 GROUP::82 GROUP::83 GROUP::84 GROUP::85 GROUP::86 GROUP::87 GROUP::88 GROUP::89 GROUP::9 GROUP::90 GROUP::91 GROUP::92 GROUP::93 GROUP::94 GROUP::95 GROUP::96 GROUP::97 GROUP::98 GROUP::99;
GROUP::1 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::FREQUENCY ENT::SIGN_SYMPTOM;
GROUP::2 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::3 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::4 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM;
GROUP::5 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::6 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::7 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::LAB_VALUE;
GROUP::8 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::9 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::10 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::TEXTURE;
GROUP::11 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::12 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY;
GROUP::13 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE;
GROUP::14 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::15 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::16 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::17 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::18 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::19 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::20 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE;
GROUP::21 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::22 -> ENT::COREFERENCE ENT::DATE ENT::OUTCOME ENT::SIGN_SYMPTOM;
GROUP::23 -> ENT::DATE ENT::MEDICATION;
GROUP::24 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::25 -> ENT::COREFERENCE ENT::DATE ENT::DOSAGE ENT::MEDICATION;
GROUP::26 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::27 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::28 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::29 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::30 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::31 -> ENT::CLINICAL_EVENT ENT::DURATION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::32 -> ENT::AGE ENT::DOSAGE ENT::MEDICATION ENT::SEX ENT::THERAPEUTIC_PROCEDURE;
GROUP::33 -> ENT::DISEASE_DISORDER ENT::DOSAGE ENT::MEDICATION;
GROUP::34 -> ENT::COREFERENCE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::35 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::MEDICATION;
GROUP::36 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::37 -> ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::38 -> ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::39 -> ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::40 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::41 -> ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::42 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::OUTCOME;
GROUP::43 -> ENT::ACTIVITY ENT::AGE ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SEX;
GROUP::44 -> ENT::ACTIVITY ENT::DURATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::45 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::46 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::47 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::48 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION;
GROUP::49 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::50 -> ENT::ACTIVITY ENT::DATE ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::OCCUPATION ENT::OTHER_ENTITY;
GROUP::51 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::52 -> ENT::COREFERENCE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::53 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::54 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::MEDICATION;
GROUP::55 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::56 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::57 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::58 -> ENT::CLINICAL_EVENT ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::59 -> ENT::DISEASE_DISORDER ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION;
GROUP::60 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::61 -> ENT::DISEASE_DISORDER;
GROUP::62 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::63 -> ENT::ADMINISTRATION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::LAB_VALUE ENT::MEDICATION;
GROUP::64 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE;
GROUP::65 -> ENT::CLINICAL_EVENT ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::66 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::67 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::68 -> ENT::AGE ENT::DATE ENT::DISEASE_DISORDER ENT::SEX;
GROUP::69 -> ENT::DIAGNOSTIC_PROCEDURE ENT::HISTORY ENT::LAB_VALUE;
GROUP::70 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::71 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::72 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::73 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::74 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::75 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY;
GROUP::76 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::77 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::78 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::MEDICATION;
GROUP::79 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DATE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::80 -> ENT::COLOR ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::81 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER;
GROUP::82 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::83 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::84 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DURATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::85 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::DURATION ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::86 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION;
GROUP::87 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::88 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY;
GROUP::89 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::90 -> ENT::ACTIVITY ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::TIME;
GROUP::91 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SIGN_SYMPTOM;
GROUP::92 -> ENT::COREFERENCE ENT::THERAPEUTIC_PROCEDURE;
GROUP::93 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::MEDICATION;
GROUP::94 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::95 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::96 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::97 -> ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::98 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::99 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::100 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::101 -> ENT::DATE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::102 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::103 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::104 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::105 -> ENT::DISEASE_DISORDER ENT::SEVERITY ENT::THERAPEUTIC_PROCEDURE;
GROUP::106 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::OUTCOME ENT::SIGN_SYMPTOM;
GROUP::107 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::108 -> ENT::DIAGNOSTIC_PROCEDURE;
GROUP::109 -> ENT::CLINICAL_EVENT ENT::DATE ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::110 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::111 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::112 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::113 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::114 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::115 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::116 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER;
GROUP::117 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::THERAPEUTIC_PROCEDURE;
GROUP::118 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::119 -> ENT::ADMINISTRATION ENT::COREFERENCE ENT::DOSAGE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::120 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::121 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::122 -> ENT::COREFERENCE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE;
GROUP::123 -> ENT::BIOLOGICAL_STRUCTURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::124 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::HISTORY ENT::LAB_VALUE;
GROUP::125 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::126 -> ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::127 -> ENT::CLINICAL_EVENT ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::128 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::129 -> ENT::ADMINISTRATION ENT::COREFERENCE ENT::DATE ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::130 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::131 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::132 -> ENT::ADMINISTRATION ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::MEDICATION;
GROUP::133 -> ENT::COREFERENCE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::134 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::135 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::136 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::FAMILY_HISTORY ENT::SEX ENT::THERAPEUTIC_PROCEDURE;
GROUP::137 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::138 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DURATION;
GROUP::139 -> ENT::CLINICAL_EVENT ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::140 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::SEVERITY;
GROUP::141 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::142 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::143 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::144 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISTANCE;
GROUP::145 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::146 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::147 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::148 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::149 -> ENT::BIOLOGICAL_STRUCTURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::150 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::QUANTITATIVE_CONCEPT;
GROUP::151 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SEX ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::152 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::HISTORY ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::153 -> ENT::COREFERENCE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::154 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::155 -> ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::156 -> ENT::THERAPEUTIC_PROCEDURE;
GROUP::157 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::158 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::SIGN_SYMPTOM;
GROUP::159 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::160 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::161 -> ENT::SIGN_SYMPTOM;
GROUP::162 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::163 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::164 -> ENT::COREFERENCE ENT::SIGN_SYMPTOM;
GROUP::165 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::166 -> ENT::DETAILED_DESCRIPTION ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::167 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::168 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::169 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::170 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::171 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::172 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER;
GROUP::173 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DOSAGE ENT::MEDICATION;
GROUP::174 -> ENT::CLINICAL_EVENT ENT::DATE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::175 -> ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION;
GROUP::176 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::177 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::178 -> ENT::CLINICAL_EVENT ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::179 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::180 -> ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::181 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::TEXTURE;
GROUP::182 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::MEDICATION;
GROUP::183 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY;
GROUP::184 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::185 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::186 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::THERAPEUTIC_PROCEDURE;
GROUP::187 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SEX;
GROUP::188 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::189 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::190 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DATE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::TIME;
GROUP::191 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::192 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::193 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::194 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::195 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::OTHER_EVENT;
GROUP::196 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::197 -> ENT::CLINICAL_EVENT ENT::DATE ENT::FAMILY_HISTORY ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::SUBJECT;
GROUP::198 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::199 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::200 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::201 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::202 -> ENT::DATE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::203 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::204 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::205 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::206 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::207 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::208 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::209 -> ENT::BIOLOGICAL_STRUCTURE ENT::THERAPEUTIC_PROCEDURE ENT::VOLUME;
GROUP::210 -> ENT::DURATION ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::211 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::212 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::213 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::DURATION ENT::MEDICATION;
GROUP::214 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::215 -> ENT::CLINICAL_EVENT ENT::DATE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::216 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::217 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SUBJECT;
GROUP::218 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::219 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::220 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SEX ENT::THERAPEUTIC_PROCEDURE;
GROUP::221 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::222 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::223 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::224 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;

We’ve successfully built a basic database schema from our corpus, but there’s significant potential for improvement. Let’s explore how we can enhance it using the ArchiTXT simplification algorithm!

First, let’s visualize the repartition of equivalent classes inside the forest.

from architxt.similarity import equiv_cluster

clusters = equiv_cluster(forest, tau=0.8)
Hide code cell source
import plotly.express as px

fig = px.bar(sum(tree.height == 2 for tree in klass) for klass in clusters.values())
fig.update_layout(xaxis_title='Equivalent Class', yaxis_title='Count', showlegend=False)
fig.show()

It’s now time to use ArchiTXT to automatically structure the data.

from architxt.simplification.tree_rewriting import rewrite

rewrite(forest, epoch=10, min_support=5, tau=0.8)
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
                                                                                                           ROOT                                                                                                                                                                                                                                                                                                                                                                                                                                              
            ┌───────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐                                                                                                                                                                                                                                                                   
            │                                                                                                                                                                                                                                                              UNDEF_7071c89ad515407b97d3e61457                                                                                                                                                                                                                                                  
            │                                                                                                                                                                                                                                                                           359c84                                                                                                                                                                                                                                                               
            │                                                                                               ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐                                                                                                                                                 
            │                                                                                               │                                                                                                                                                                                                                                                                                UNDEF_16d67964b5e244b080e1eaada1                                                                                                                                
            │                                                                                               │                                                                                                                                                                                                                                                                                             f0f6e6                                                                                                                                             
            │                                                                                               │                                                                                                                                                                                                                             ┌─────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┐                                                  
            │                                                                                               │                                                                                                                                                                                                              UNDEF_101e7d97346c47298039737db0                                                                                                                                                │                                                 
            │                                                                                               │                                                                                                                                                                                                                           ed167a                                                                                                                                                             │                                                 
            │                                                                                               │                                                                                                                                                                             ┌───────────────────────────────────────────────┴────────────────────────────────┐                                                                                                                               │                                                  
            │                                                                                               │                                                                                                                                                                             │                                                                 UNDEF_0b5b49fb057e419394df3b3065                                                                                                               │                                                 
            │                                                                                               │                                                                                                                                                                             │                                                                              dc0229                                                                                                                            │                                                 
            │                                                                                               │                                                                                                                                                                             │                       ┌────────────────────────────────────────────────────────┴────────────────────────────────┐                                                                                              │                                                  
            │                                                                                UNDEF_48fc1d29e97a4a97b04c6cb2ab                                                                                                                                                             │                       │                                                                          UNDEF_9ae2d001195045628326288833                                                               UNDEF_626c95c921b9438ab30f36c04c                                 
            │                                                                                             e8e4b1                                                                                                                                                                          │                       │                                                                                       768e2e                                                                                         f85bb1                                              
            │                                        ┌──────────────────────────────────────────────────────┴────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────┐                                                        │                       │                                                        ┌────────────────────────────────┴───────────────────────────┐                                     ┌────────────────────────────┴────────────────────────────────┐                 
            │                             GROUP::0_0_0_0_0_0_0_0_0                                                            GROUP::0_0_0_0_0_0_0_0_0                                                            UNDEF_c70091af84354ac091cb118d74                                        │                       │                                             GROUP::0_0_0_0_0_0_0_0_0                                                │                          GROUP::0_0_0_0_0_0_0_0_0                                                 │                
            │                                        │                                                                                   │                                                                                     c3b163                                                     │                       │                                                        │                                                            │                                     │                                                             │                
            │                    ┌───────────────────┴─────────────────────────┐                            ┌────────────────────────────┴─────────────────────────┐                            ┌────────────────────────────────┴────────────────────────────┐                           │                       │                       ┌────────────────────────────────┴────────────────────────────────┐                           │                 ┌───────────────────┴────────────────────────────┐                                │                 
ENT::DIAGNOSTIC_PROCEDURE  ENT::LAB_VALUE                          ENT::DIAGNOSTIC_PROCEDURE          ENT::LAB_VALUE                                   ENT::DIAGNOSTIC_PROCEDURE          ENT::SEVERITY                                             ENT::DISEASE_DISORDER     ENT::DIAGNOSTIC_PROCEDURE     ENT::LAB_VALUE          ENT::LAB_VALUE                                              ENT::DIAGNOSTIC_PROCEDURE         ENT::LAB_VALUE    ENT::LAB_VALUE                             ENT::DIAGNOSTIC_PROCEDURE        ENT::BIOLOGICAL_STRUCTURE    
            │                    │                                             │                            │                                                      │                            │                                                             │                           │                       │                       │                                                                 │                           │                 │                                                │                                │                 
radiographic%20examination     normal                                 cardiac%20dimensions                normal                                             wall%20motion       mild%20severity%20of%20illnes...                                  diastolic%20dysfunction    late%20diastolic%20murmur      less%20than              prolonged                                                        deceleration           two%20hundred%20fifty    reduced                              early%20diastolic%20annular%2... mitral%20valve%20lateral%20an...

We now have a more granular structure. Let’s take a closer look at the schema.

schema = Schema.from_forest(forest, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> COLL::0_0_0_0_0_0_0_0_0 GROUP::0_0_0_0_0_0_0_0_0 GROUP::2_2_2_2_2_2_2_2_2 REL::0_0_0_0_0_0_0_0_0<->2_2_2_2_2_2_2_2_2;
COLL::0_0_0_0_0_0_0_0_0 -> GROUP::0_0_0_0_0_0_0_0_0;
REL::0_0_0_0_0_0_0_0_0<->2_2_2_2_2_2_2_2_2 -> GROUP::0_0_0_0_0_0_0_0_0 GROUP::2_2_2_2_2_2_2_2_2;
GROUP::0_0_0_0_0_0_0_0_0 -> ENT::ACTIVITY ENT::ADMINISTRATION ENT::AREA ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::HISTORY ENT::LAB_VALUE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION ENT::OUTCOME ENT::SEVERITY ENT::SHAPE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::2_2_2_2_2_2_2_2_2 -> ENT::AGE ENT::SEX;

The schema is now much smaller, and the groups are more meaningful.

But not all extracted trees provide valuable insights, so we could filter the structured instance to keep only the valid trees using schema.extract_valid_trees(new_forest). Let’s explore the different semantic groups. Groups represent common patterns across the corpus.

all_datasets = schema.extract_datasets(forest)
group, dataset = max(all_datasets.items(), key=lambda x: len(x[1]))

print(f'Group: {group}')

dataset
Group: 0_0_0_0_0_0_0_0_0
Loading ITables v2.4.2 from the internet... (need help?)