Exploring a Textual Corpus with ArchiTXT#
This tutorial provides a step-by-step guide on how to use ArchiTXT to efficiently process and analyze textual corpora.
ArchiTXT allows loading a corpus as a set of syntax trees, where each tree is enriched by incorporating named entities. These enriched trees form a forest, which can then be automatically structured into a valid database instance for further analysis.
By following this tutorial, you’ll learn how to:
Load a corpus
Parse textual data with Berkeley Neural Parser (Benepar)
Extract structured data using ArchiTXT
Downloading the MACCROBAT Corpus#
The MACCROBAT corpus is a collection of 200 annotated medical documents, specifically clinical case reports, extracted from PubMed Central. The annotations focus on key medical concepts such as diseases, treatments, medications, and symptoms, making it a valuable resource for biomedical text analysis.
The MACCROBAT corpus is available for download at Figshare.
Let’s download the corpora.
import io
import urllib.request
import zipfile
with urllib.request.urlopen('https://figshare.com/ndownloader/articles/9764942/versions/2') as response:
archive_file = io.BytesIO(response.read())
with zipfile.ZipFile(archive_file) as archive:
archive.extract('MACCROBAT2020.zip')
Installing and Configuring NLP Models#
ArchiTXT can parse the sentences using either Benepar with SpaCy or a CoreNLP server. In this tutorial, we will use the SpaCy parser with the default model, but you can use any models like one from SciSpaCy, a collection of models designed for biomedical text processing by AllenAI.
To download the SciSpaCy model, do:
!spacy download en_core_web_sm
We also need to download the Benepar model for English
import benepar
benepar.download('benepar_en3')
Parsing the Corpus with ArchiTXT#
Before processing the corpus, we need to configure the BeneparParser, specifying which SpaCy model to use for each language.
import warnings
from architxt.nlp.parser.benepar import BeneparParser
# Initialize the parser
parser = BeneparParser(
spacy_models={
'English': 'en_core_web_sm',
}
)
# Suppress warnings for unsupported annotations
warnings.filterwarnings("ignore")
Named Entity Resolution (NER) helps to standardize the named entities and to build a database instance. To enable NER, we need to provide the knowledge base to use. For this tutorial, we will use the UMLS (Unified Medical Language System) resolver.
Let’s parse a sample of the corpus. To verify that everything is functioning as expected, we will inspect the largest enriched tree using the :py:meth:~architxt.tree.Tree.pretty_print
method.
from architxt.nlp import raw_load_corpus
forest = [
tree
async for tree in raw_load_corpus(
['MACCROBAT2020.zip'],
['English'],
parser=parser,
resolver_name='umls',
sample=400,
)
]
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/tfidf_vectors_sparse.npz not found in cache, downloading to /tmp/tmpe2sos5x9
Finished download, copying /tmp/tmpe2sos5x9 to cache at /home/runner/.scispacy/datasets/2b79923846fb52e62d686f2db846392575c8eb5b732d9d26cd3ca9378c622d40.87bd52d0f0ee055c1e 455ef54ba45149d188552f07991b765da256a1b512ca0b.tfidf_vectors_sparse.npz
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/nmslib_index.bin not found in cache, downloading to /tmp/tmp5db4wnl3
Finished download, copying /tmp/tmp5db4wnl3 to cache at /home/runner/.scispacy/datasets/7e8e091ec80370b87b1652f461eae9d926e543a403a69c1f0968f71157322c25.6d801a1e14867953e3 6258b0e19a23723ae84b0abd2a723bdd3574c3e0c873b4.nmslib_index.bin
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/tfidf_vectorizer.joblib not found in cache, downloading to /tmp/tmprej7384y
Finished download, copying /tmp/tmprej7384y to cache at /home/runner/.scispacy/datasets/37bc06bb7ce30de7251db5f5cbac788998e33b3984410caed2d0083187e01d38.f0994c1b61cc70d0eb 96dea4947dddcb37460fb5ae60975013711228c8fe3fba.tfidf_vectorizer.joblib
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/concept_aliases.json not found in cache, downloading to /tmp/tmp7ouy856a
Finished download, copying /tmp/tmp7ouy856a to cache at /home/runner/.scispacy/datasets/6238f505f56aca33290aab44097f67dd1b88880e3be6d6dcce65e56e9255b7d4.d7f77b1629001b40f1 b1bc951f3a890ff2d516fb8fbae3111b236b31b33d6dcf.concept_aliases.json
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/kbs/2023-04-23/umls_2022_ab_cat0129.jsonl not found in cache, downloading to /tmp/tmpgy6vlt68
Finished download, copying /tmp/tmpgy6vlt68 to cache at /home/runner/.scispacy/datasets/d5e593bc2d8adeee7754be423cd64f5d331ebf26272074a2575616be55697632.0660f30a60ad00fffd 8bbf084a18eb3f462fd192ac5563bf50940fc32a850a3c.umls_2022_ab_cat0129.jsonl
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/umls_semantic_type_tree.tsv not found in cache, downloading to /tmp/tmpay33c4ns
Finished download, copying /tmp/tmpay33c4ns to cache at /home/runner/.scispacy/datasets/21a1012c532c3a431d60895c509f5b4d45b0f8966c4178b892190a302b21836f.330707f4efe7741348 72b9f77f0e3208c1d30f50800b3b39a6b8ec21d9adf1b7.umls_semantic_type_tree.tsv
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
Error while processing corpus:
ROOT
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ UNDEF_7071c89ad515407b97d3e61457
│ 359c84
│ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ UNDEF_16d67964b5e244b080e1eaada1
│ │ f0f6e6
│ │ ┌─────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ UNDEF_101e7d97346c47298039737db0 │
│ │ ed167a │
│ │ ┌───────────────────────────────────────────────┴────────────────────────────────┐ │
│ │ │ UNDEF_0b5b49fb057e419394df3b3065 │
│ │ │ dc0229 │
│ │ │ ┌────────────────────────────────────────────────────────┴────────────────────────────────┐ │
│ UNDEF_48fc1d29e97a4a97b04c6cb2ab │ │ UNDEF_9ae2d001195045628326288833 UNDEF_626c95c921b9438ab30f36c04c
│ e8e4b1 │ │ 768e2e f85bb1
│ ┌──────────────────────────────────────────────────────────┴────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┐ │ │ ┌────────────────────────────────┴───────────────────────────┐ ┌────────────────────────────────┴────────────────────────────────┐
│ UNDEF_1e3acac5a24a4355b2ac9693c3 UNDEF_4908010995254db6b26e6848c4 UNDEF_c70091af84354ac091cb118d74 │ │ UNDEF_af80d13b148f4883b84382dc89 │ UNDEF_0003a612d13d45c382acc30280 │
│ 7165f2 f44d58 c3b163 │ │ 81f132 │ 75830d │
│ ┌───────────────────────┴─────────────────────────────┐ ┌────────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────────┴────────────────────────────┐ │ │ ┌────────────────────────────────┴────────────────────────────────┐ │ ┌───────────────────────┴────────────────────────────────┐ │
ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::DISEASE_DISORDER ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::BIOLOGICAL_STRUCTURE
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │
radiographic%20examination normal cardiac%20dimensions normal wall%20motion mild%20severity%20of%20illnes... diastolic%20dysfunction late%20diastolic%20murmur less%20than prolonged deceleration two%20hundred%20fifty reduced early%20diastolic%20annular%2... mitral%20valve%20lateral%20an...
Task exception was never retrieved
future: <Task finished name='Task-6' coro=<_load_or_cache_corpus() done, defined at /home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/__init__.py:82> exception=AssertionError()>
Traceback (most recent call last):
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/retokenization.py", line 74, in retokenize
token_idx, (token_start, token_end) = next(offset_mapping_iter)
^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/__init__.py", line 191, in _load_or_cache_corpus
async for _, tree in parser.parse_batch(sentences, language=language, resolver=resolver):
File "/home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/parser/__init__.py", line 90, in parse_batch
async for sentence, tree in streamer:
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/select.py", line 204, in filter
async for item in streamer:
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 174, in func
yield await corofn(arg, *args)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/advanced.py", line 75, in base_combine
result = task.result()
^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 143, in smap
async for item in streamer:
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 78, in zip
async for item in streamer:
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/aiter_utils.py", line 231, in __aexit__
await self._aiterator.athrow(value)
[Previous line repeated 1 more time]
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/create.py", line 49, in from_iterable
yield item
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/transform.py", line 124, in chunks
yield [first] + await aggregate.list(xs)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 79, in zip
yield (item,)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/combine.py", line 144, in smap
yield func(*item)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/advanced.py", line 75, in base_combine
result = task.result()
^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/aiostream/stream/create.py", line 47, in from_iterable
for item in it:
^^
File "/home/runner/work/ArchiTXT/ArchiTXT/architxt/nlp/parser/benepar.py", line 62, in raw_parse
for doc in nlp.pipe(sentences, batch_size=batch_size):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/language.py", line 1618, in pipe
for doc in docs:
^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/util.py", line 1718, in _pipe
error_handler(name, proc, [doc], e)
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/util.py", line 1722, in raise_error
raise e
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/spacy/util.py", line 1715, in _pipe
doc = proc(doc, **kwargs) # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/integrations/spacy_plugin.py", line 151, in __call__
self._parser.parse(
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/parse_chart.py", line 414, in parse
encoded = [self.encode(example) for example in examples]
^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/parse_chart.py", line 193, in encode
encoded = self.retokenizer(example.words, example.space_after)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/retokenization.py", line 150, in __call__
example = retokenize(self.tokenizer, words, space_after, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.cache/pypoetry/virtualenvs/architxt-r2jGmZOI-py3.12/lib/python3.12/site-packages/benepar/retokenization.py", line 76, in retokenize
assert word_idx == len(words) - 1
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
ArchiTXT can then automatically structure parsed text into a database-friendly format. Let’s start with a simple rewrite!
from copy import deepcopy
from architxt.simplification.simple_rewrite import simple_rewrite
forest_copy = deepcopy(forest)
simple_rewrite(forest_copy)
# Look at the highest tree
max(forest_copy, key=lambda tree: tree.height).pretty_print()
ROOT
│
GROUP::1
┌──────────────────┬─────────────────────────┼───────────────────────────────┬─────────────────────────┐
ENT::COREFERENCE ENT::CLINICAL_EVENT ENT::FREQUENCY ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM
│ │ │ │ │
symptoms%20aspect rest 2%20or%203%20times%20per%20week up%20to%2030%20minutes%20at%2... dyspnea
Now that we have a structured instance, we can extract its schema. The schema provides a formal representation of the extracted data.
from architxt.schema import Schema
schema = Schema.from_forest(forest_copy, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> GROUP::1 GROUP::10 GROUP::100 GROUP::101 GROUP::102 GROUP::103 GROUP::104 GROUP::105 GROUP::106 GROUP::107 GROUP::108 GROUP::109 GROUP::11 GROUP::110 GROUP::111 GROUP::112 GROUP::113 GROUP::114 GROUP::115 GROUP::116 GROUP::117 GROUP::118 GROUP::119 GROUP::12 GROUP::120 GROUP::121 GROUP::122 GROUP::123 GROUP::124 GROUP::125 GROUP::126 GROUP::127 GROUP::128 GROUP::129 GROUP::13 GROUP::130 GROUP::131 GROUP::132 GROUP::133 GROUP::134 GROUP::135 GROUP::136 GROUP::137 GROUP::138 GROUP::139 GROUP::14 GROUP::140 GROUP::141 GROUP::142 GROUP::143 GROUP::144 GROUP::145 GROUP::146 GROUP::147 GROUP::148 GROUP::149 GROUP::15 GROUP::150 GROUP::151 GROUP::152 GROUP::153 GROUP::154 GROUP::155 GROUP::156 GROUP::157 GROUP::158 GROUP::159 GROUP::16 GROUP::160 GROUP::161 GROUP::162 GROUP::163 GROUP::164 GROUP::165 GROUP::166 GROUP::167 GROUP::168 GROUP::169 GROUP::17 GROUP::170 GROUP::171 GROUP::172 GROUP::173 GROUP::174 GROUP::175 GROUP::176 GROUP::177 GROUP::178 GROUP::179 GROUP::18 GROUP::180 GROUP::181 GROUP::182 GROUP::183 GROUP::184 GROUP::185 GROUP::186 GROUP::187 GROUP::188 GROUP::189 GROUP::19 GROUP::190 GROUP::191 GROUP::192 GROUP::193 GROUP::194 GROUP::195 GROUP::196 GROUP::197 GROUP::198 GROUP::199 GROUP::2 GROUP::20 GROUP::200 GROUP::201 GROUP::202 GROUP::203 GROUP::204 GROUP::205 GROUP::206 GROUP::207 GROUP::208 GROUP::209 GROUP::21 GROUP::210 GROUP::211 GROUP::212 GROUP::213 GROUP::214 GROUP::215 GROUP::216 GROUP::217 GROUP::218 GROUP::219 GROUP::22 GROUP::220 GROUP::221 GROUP::222 GROUP::223 GROUP::224 GROUP::23 GROUP::24 GROUP::25 GROUP::26 GROUP::27 GROUP::28 GROUP::29 GROUP::3 GROUP::30 GROUP::31 GROUP::32 GROUP::33 GROUP::34 GROUP::35 GROUP::36 GROUP::37 GROUP::38 GROUP::39 GROUP::4 GROUP::40 GROUP::41 GROUP::42 GROUP::43 GROUP::44 GROUP::45 GROUP::46 GROUP::47 GROUP::48 GROUP::49 GROUP::5 GROUP::50 GROUP::51 GROUP::52 GROUP::53 GROUP::54 GROUP::55 GROUP::56 GROUP::57 GROUP::58 GROUP::59 GROUP::6 GROUP::60 GROUP::61 GROUP::62 GROUP::63 GROUP::64 GROUP::65 GROUP::66 GROUP::67 GROUP::68 GROUP::69 GROUP::7 GROUP::70 GROUP::71 GROUP::72 GROUP::73 GROUP::74 GROUP::75 GROUP::76 GROUP::77 GROUP::78 GROUP::79 GROUP::8 GROUP::80 GROUP::81 GROUP::82 GROUP::83 GROUP::84 GROUP::85 GROUP::86 GROUP::87 GROUP::88 GROUP::89 GROUP::9 GROUP::90 GROUP::91 GROUP::92 GROUP::93 GROUP::94 GROUP::95 GROUP::96 GROUP::97 GROUP::98 GROUP::99;
GROUP::1 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::FREQUENCY ENT::SIGN_SYMPTOM;
GROUP::2 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::3 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::4 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM;
GROUP::5 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::6 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::7 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::LAB_VALUE;
GROUP::8 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::9 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::10 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::TEXTURE;
GROUP::11 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::12 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY;
GROUP::13 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE;
GROUP::14 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::15 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::16 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::17 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::18 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::19 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::20 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE;
GROUP::21 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::22 -> ENT::COREFERENCE ENT::DATE ENT::OUTCOME ENT::SIGN_SYMPTOM;
GROUP::23 -> ENT::DATE ENT::MEDICATION;
GROUP::24 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::25 -> ENT::COREFERENCE ENT::DATE ENT::DOSAGE ENT::MEDICATION;
GROUP::26 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::27 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::28 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::29 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::30 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::31 -> ENT::CLINICAL_EVENT ENT::DURATION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::32 -> ENT::AGE ENT::DOSAGE ENT::MEDICATION ENT::SEX ENT::THERAPEUTIC_PROCEDURE;
GROUP::33 -> ENT::DISEASE_DISORDER ENT::DOSAGE ENT::MEDICATION;
GROUP::34 -> ENT::COREFERENCE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::35 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::MEDICATION;
GROUP::36 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::37 -> ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::38 -> ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::39 -> ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::40 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::41 -> ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::42 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::OUTCOME;
GROUP::43 -> ENT::ACTIVITY ENT::AGE ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SEX;
GROUP::44 -> ENT::ACTIVITY ENT::DURATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::45 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::46 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::47 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::48 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION;
GROUP::49 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::50 -> ENT::ACTIVITY ENT::DATE ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::OCCUPATION ENT::OTHER_ENTITY;
GROUP::51 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::52 -> ENT::COREFERENCE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::53 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::54 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::MEDICATION;
GROUP::55 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::56 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::57 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::58 -> ENT::CLINICAL_EVENT ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::59 -> ENT::DISEASE_DISORDER ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION;
GROUP::60 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::61 -> ENT::DISEASE_DISORDER;
GROUP::62 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::63 -> ENT::ADMINISTRATION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::LAB_VALUE ENT::MEDICATION;
GROUP::64 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE;
GROUP::65 -> ENT::CLINICAL_EVENT ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::66 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::67 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::68 -> ENT::AGE ENT::DATE ENT::DISEASE_DISORDER ENT::SEX;
GROUP::69 -> ENT::DIAGNOSTIC_PROCEDURE ENT::HISTORY ENT::LAB_VALUE;
GROUP::70 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::71 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::72 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::73 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::74 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::75 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY;
GROUP::76 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::77 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::78 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::MEDICATION;
GROUP::79 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DATE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::80 -> ENT::COLOR ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::81 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER;
GROUP::82 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::83 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::84 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DURATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::85 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::DURATION ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::86 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION;
GROUP::87 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::88 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY;
GROUP::89 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::90 -> ENT::ACTIVITY ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::TIME;
GROUP::91 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SIGN_SYMPTOM;
GROUP::92 -> ENT::COREFERENCE ENT::THERAPEUTIC_PROCEDURE;
GROUP::93 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::MEDICATION;
GROUP::94 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::95 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::96 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::97 -> ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::98 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::99 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::100 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::101 -> ENT::DATE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::102 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::103 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::104 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::105 -> ENT::DISEASE_DISORDER ENT::SEVERITY ENT::THERAPEUTIC_PROCEDURE;
GROUP::106 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::OUTCOME ENT::SIGN_SYMPTOM;
GROUP::107 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::108 -> ENT::DIAGNOSTIC_PROCEDURE;
GROUP::109 -> ENT::CLINICAL_EVENT ENT::DATE ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::110 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::111 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::112 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::113 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::114 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::115 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::116 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER;
GROUP::117 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::THERAPEUTIC_PROCEDURE;
GROUP::118 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::119 -> ENT::ADMINISTRATION ENT::COREFERENCE ENT::DOSAGE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::120 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::121 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::122 -> ENT::COREFERENCE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE;
GROUP::123 -> ENT::BIOLOGICAL_STRUCTURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::124 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::HISTORY ENT::LAB_VALUE;
GROUP::125 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::126 -> ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::127 -> ENT::CLINICAL_EVENT ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::128 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::129 -> ENT::ADMINISTRATION ENT::COREFERENCE ENT::DATE ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::130 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::131 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::132 -> ENT::ADMINISTRATION ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::MEDICATION;
GROUP::133 -> ENT::COREFERENCE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::134 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::135 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::136 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::FAMILY_HISTORY ENT::SEX ENT::THERAPEUTIC_PROCEDURE;
GROUP::137 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::138 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DURATION;
GROUP::139 -> ENT::CLINICAL_EVENT ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::140 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::SEVERITY;
GROUP::141 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::142 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::143 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::144 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISTANCE;
GROUP::145 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::146 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::147 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::148 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::149 -> ENT::BIOLOGICAL_STRUCTURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::150 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::QUANTITATIVE_CONCEPT;
GROUP::151 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SEX ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::152 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::HISTORY ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::153 -> ENT::COREFERENCE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::154 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::155 -> ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::156 -> ENT::THERAPEUTIC_PROCEDURE;
GROUP::157 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::158 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::SIGN_SYMPTOM;
GROUP::159 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::160 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::161 -> ENT::SIGN_SYMPTOM;
GROUP::162 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::163 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::164 -> ENT::COREFERENCE ENT::SIGN_SYMPTOM;
GROUP::165 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::166 -> ENT::DETAILED_DESCRIPTION ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::167 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::168 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::169 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::170 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::171 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::172 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER;
GROUP::173 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DOSAGE ENT::MEDICATION;
GROUP::174 -> ENT::CLINICAL_EVENT ENT::DATE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::175 -> ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION;
GROUP::176 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::177 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::178 -> ENT::CLINICAL_EVENT ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::179 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::180 -> ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::181 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::TEXTURE;
GROUP::182 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::MEDICATION;
GROUP::183 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY;
GROUP::184 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::185 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::186 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::THERAPEUTIC_PROCEDURE;
GROUP::187 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SEX;
GROUP::188 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::189 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::190 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DATE ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::TIME;
GROUP::191 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::192 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::193 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::194 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::195 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::OTHER_EVENT;
GROUP::196 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::197 -> ENT::CLINICAL_EVENT ENT::DATE ENT::FAMILY_HISTORY ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::SUBJECT;
GROUP::198 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::199 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::200 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::201 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::202 -> ENT::DATE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::203 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::204 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::205 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::206 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::207 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::208 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::209 -> ENT::BIOLOGICAL_STRUCTURE ENT::THERAPEUTIC_PROCEDURE ENT::VOLUME;
GROUP::210 -> ENT::DURATION ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::211 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::212 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::213 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::DURATION ENT::MEDICATION;
GROUP::214 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::215 -> ENT::CLINICAL_EVENT ENT::DATE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::216 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::217 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SUBJECT;
GROUP::218 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::219 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::220 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SEX ENT::THERAPEUTIC_PROCEDURE;
GROUP::221 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::222 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::223 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::224 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
We’ve successfully built a basic database schema from our corpus, but there’s significant potential for improvement. Let’s explore how we can enhance it using the ArchiTXT simplification algorithm!
First, let’s visualize the repartition of equivalent classes inside the forest.
from architxt.similarity import equiv_cluster
clusters = equiv_cluster(forest, tau=0.8)
Show code cell source
import plotly.express as px
fig = px.bar(sum(tree.height == 2 for tree in klass) for klass in clusters.values())
fig.update_layout(xaxis_title='Equivalent Class', yaxis_title='Count', showlegend=False)
fig.show()
It’s now time to use ArchiTXT to automatically structure the data.
from architxt.simplification.tree_rewriting import rewrite
rewrite(forest, epoch=10, min_support=5, tau=0.8)
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
ROOT
┌───────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ UNDEF_7071c89ad515407b97d3e61457
│ 359c84
│ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ UNDEF_16d67964b5e244b080e1eaada1
│ │ f0f6e6
│ │ ┌─────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ UNDEF_101e7d97346c47298039737db0 │
│ │ ed167a │
│ │ ┌───────────────────────────────────────────────┴────────────────────────────────┐ │
│ │ │ UNDEF_0b5b49fb057e419394df3b3065 │
│ │ │ dc0229 │
│ │ │ ┌────────────────────────────────────────────────────────┴────────────────────────────────┐ │
│ UNDEF_48fc1d29e97a4a97b04c6cb2ab │ │ UNDEF_9ae2d001195045628326288833 UNDEF_626c95c921b9438ab30f36c04c
│ e8e4b1 │ │ 768e2e f85bb1
│ ┌──────────────────────────────────────────────────────┴────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────┐ │ │ ┌────────────────────────────────┴───────────────────────────┐ ┌────────────────────────────┴────────────────────────────────┐
│ GROUP::0_0_0_0_0_0_0_0_0 GROUP::0_0_0_0_0_0_0_0_0 UNDEF_c70091af84354ac091cb118d74 │ │ GROUP::0_0_0_0_0_0_0_0_0 │ GROUP::0_0_0_0_0_0_0_0_0 │
│ │ │ c3b163 │ │ │ │ │ │
│ ┌───────────────────┴─────────────────────────┐ ┌────────────────────────────┴─────────────────────────┐ ┌────────────────────────────────┴────────────────────────────┐ │ │ ┌────────────────────────────────┴────────────────────────────────┐ │ ┌───────────────────┴────────────────────────────┐ │
ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::DISEASE_DISORDER ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::LAB_VALUE ENT::DIAGNOSTIC_PROCEDURE ENT::BIOLOGICAL_STRUCTURE
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │
radiographic%20examination normal cardiac%20dimensions normal wall%20motion mild%20severity%20of%20illnes... diastolic%20dysfunction late%20diastolic%20murmur less%20than prolonged deceleration two%20hundred%20fifty reduced early%20diastolic%20annular%2... mitral%20valve%20lateral%20an...
We now have a more granular structure. Let’s take a closer look at the schema.
schema = Schema.from_forest(forest, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> COLL::0_0_0_0_0_0_0_0_0 GROUP::0_0_0_0_0_0_0_0_0 GROUP::2_2_2_2_2_2_2_2_2 REL::0_0_0_0_0_0_0_0_0<->2_2_2_2_2_2_2_2_2;
COLL::0_0_0_0_0_0_0_0_0 -> GROUP::0_0_0_0_0_0_0_0_0;
REL::0_0_0_0_0_0_0_0_0<->2_2_2_2_2_2_2_2_2 -> GROUP::0_0_0_0_0_0_0_0_0 GROUP::2_2_2_2_2_2_2_2_2;
GROUP::0_0_0_0_0_0_0_0_0 -> ENT::ACTIVITY ENT::ADMINISTRATION ENT::AREA ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::HISTORY ENT::LAB_VALUE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION ENT::OUTCOME ENT::SEVERITY ENT::SHAPE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::2_2_2_2_2_2_2_2_2 -> ENT::AGE ENT::SEX;
The schema is now much smaller, and the groups are more meaningful.
But not all extracted trees provide valuable insights, so we could filter the structured instance to keep only the valid trees using schema.extract_valid_trees(new_forest)
.
Let’s explore the different semantic groups.
Groups represent common patterns across the corpus.
all_datasets = schema.extract_datasets(forest)
group, dataset = max(all_datasets.items(), key=lambda x: len(x[1]))
print(f'Group: {group}')
dataset
Group: 0_0_0_0_0_0_0_0_0
Loading ITables v2.4.2 from the internet... (need help?) |