Exploring a Textual Corpus with ArchiTXT#
This tutorial provides a step-by-step guide on how to use ArchiTXT to efficiently process and analyze textual corpora.
ArchiTXT allows loading a corpus as a set of syntax trees, where each tree is enriched by incorporating named entities. These enriched trees form a forest, which can then be automatically structured into a valid database instance for further analysis.
By following this tutorial, you’ll learn how to:
Load a corpus
Parse textual data with Berkeley Neural Parser (Benepar)
Extract structured data using ArchiTXT
Downloading the MACCROBAT Corpus#
The MACCROBAT corpus is a collection of 200 annotated medical documents, specifically clinical case reports, extracted from PubMed Central. The annotations focus on key medical concepts such as diseases, treatments, medications, and symptoms, making it a valuable resource for biomedical text analysis.
The MACCROBAT corpus is available for download at Figshare.
Let’s download the corpora.
import io
import urllib.request
import zipfile
with urllib.request.urlopen('https://figshare.com/ndownloader/articles/9764942/versions/2') as response:
archive_file = io.BytesIO(response.read())
with zipfile.ZipFile(archive_file) as archive:
archive.extract('MACCROBAT2020.zip')
Installing and Configuring NLP Models#
ArchiTXT can parse the sentences using either Benepar with SpaCy or a CoreNLP server. In this tutorial, we will use the SpaCy parser with the default model, but you can use any models like one from SciSpaCy, a collection of models designed for biomedical text processing by AllenAI.
To download the SciSpaCy model, do:
!spacy download en_core_web_sm
We also need to download the Benepar model for English
import benepar
benepar.download('benepar_en3')
Parsing the Corpus with ArchiTXT#
Before processing the corpus, we need to configure the BeneparParser, specifying which SpaCy model to use for each language.
import warnings
from architxt.nlp.parser.benepar import BeneparParser
# Initialize the parser
parser = BeneparParser(
spacy_models={
'English': 'en_core_web_sm',
}
)
# Suppress warnings for unsupported annotations
warnings.filterwarnings("ignore")
Named Entity Resolution (NER) helps to standardize the named entities and to build a database instance. To enable NER, we need to provide the knowledge base to use. For this tutorial, we will use the UMLS (Unified Medical Language System) resolver.
Let’s parse a sample of the corpus. To verify that everything is functioning as expected, we will inspect the largest enriched tree using the :py:meth:~architxt.tree.Tree.pretty_print
method.
from architxt.nlp import raw_load_corpus
forest = [
tree
async for tree in raw_load_corpus(
['MACCROBAT2020.zip'],
['English'],
parser=parser,
resolver_name='umls',
sample=400,
)
]
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
ROOT
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐
│ UNDEF_beb2e008c9d3453ea8ecd7ae09
│ fdbe13
│ ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐
│ │ UNDEF_0c1924f22a7f4e4bbea0d9c4f1
│ │ 5b12f5
│ │ ┌────────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_87d78dfc508c4dfeadbedef033
│ │ │ │ │ │ 5cb1b8
│ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────┬───────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_99b38550dc884944a610facde4 │ │ │
│ │ │ │ │ │ c1d87b │ │ │
│ │ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ UNDEF_bd12d50438a74752b3172e4c91 │ │ │
│ │ │ │ │ │ │ c47491 │ │ │
│ │ │ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ │ UNDEF_a69f788a53ba475cac55ffa277 │ │ UNDEF_bbdfaa386c40482b9b01940e91
│ │ │ │ │ │ │ │ 0c4d26 │ │ 6edf2f
│ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┐ │ │ ┌──────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┐
│ │ │ │ │ UNDEF_5a8942f7168642f79bb9029b01 │ │ │ UNDEF_32aa3ccdbc664b88a115e3881e │ │ │ UNDEF_e4ade9e4af98488aa3cad31d19
│ │ │ │ │ c5537c │ │ │ ca251d │ │ │ dc1b4f
│ │ │ │ │ ┌──────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┐ │ │ │ ┌────────────────────────────────┴─────────────────────────────┐ │ │ │ ┌────────────────────────────────┴──────────────────────────────────────────────────┐
UNDEF_e13b76e72b9443c7a2dd6e2c75 │ UNDEF_c225ba41f87b40348578937858 UNDEF_611b8ee7f54b4d25a13ded8663 UNDEF_9338e61f064c4107bca665036a UNDEF_40f74303649f47ab98c5532bf0 UNDEF_e6b178f928c04d13bdbb0fe388 UNDEF_5e0973fea03c4dbe9f1d1cd831 UNDEF_0252a3bf54e74442ae09815cc1 UNDEF_71c6c6069b6748d2aea5ea3ab4 UNDEF_0675365e28714160902a8fe4ed │ │ │ UNDEF_5f7a8b4a1e0e4c7788a8acdfa3 UNDEF_0bd6172b7d39425c9564016825 │
81154f │ 54c8c8 cce992 97fb37 79287f d99a6b ea87e1 b48783 e0952f af350d │ │ │ 199afc b7ef3e │
┌───────────────────────┴────────────────────┐ │ ┌────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┴────────────────────────────────┐ ┌────────────────────────────┼───────────────────────────────┐ ┌────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┴─────────────────────────────┐ ┌───────────────────────┼────────────────────────────────┐ ┌────────────────────────────────┴────────────────────────────────┐ ┌────────────────────────────────┴────────────────────────────────┐ ┌────────────────────────────────┴────────────────────────────────┐ │ │ │ ┌────────────────────────────┴───────────────────────────┐ ┌────────────────────────────────┼────────────────────────────────────────────────────────────┐ │
ENT::AGE ENT::SEX ENT::DIAGNOSTIC_PROCEDURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::LAB_VALUE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DISEASE_DISORDER
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
1%20year%20old woman vacterl%20association absent cervical%20atlas supernumerary bone%20structure%20of%20lumba... hypoplastic abies%20religiosa bone%20structure%20of%20coccyx fatty%20acids end-stage tethered spinal%20cord three fused bone%20structure%20of%20rib anorectal%20malformations cloaca%20chamber common%20%28qualifier%20value%29 urogenital%20sinus duplex vagina midline%20septum type%20c pathologic%20fistula right%20kidney agenesis moderate%20%28severity%20modi... left hydronephrosis vesico-ureteral%20reflux
ArchiTXT can then automatically structure parsed text into a database-friendly format. Let’s start with a simple rewrite!
from copy import deepcopy
from architxt.simplification.simple_rewrite import simple_rewrite
forest_copy = deepcopy(forest)
simple_rewrite(forest_copy)
# Look at the highest tree
max(forest_copy, key=lambda tree: tree.height).pretty_print()
ROOT
│
GROUP::1
┌───────────────────────┬─────────────────┼──────────────────────────┬───────────────────────────────┬───────────────────────┐
ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::LAB_VALUE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM
│ │ │ │ │ │
structure%20of%20pupil%20of%2... 4%20mm reactive%20to%20light relative%20%28related%20perso... afferent%20pupillary%20defect defect
Now that we have a structured instance, we can extract its schema. The schema provides a formal representation of the extracted data.
from architxt.schema import Schema
schema = Schema.from_forest(forest_copy, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> GROUP::1 GROUP::10 GROUP::100 GROUP::101 GROUP::102 GROUP::103 GROUP::104 GROUP::105 GROUP::106 GROUP::107 GROUP::108 GROUP::109 GROUP::11 GROUP::110 GROUP::111 GROUP::112 GROUP::113 GROUP::114 GROUP::115 GROUP::116 GROUP::117 GROUP::118 GROUP::119 GROUP::12 GROUP::120 GROUP::121 GROUP::122 GROUP::123 GROUP::124 GROUP::125 GROUP::126 GROUP::127 GROUP::128 GROUP::129 GROUP::13 GROUP::130 GROUP::131 GROUP::132 GROUP::133 GROUP::134 GROUP::135 GROUP::136 GROUP::137 GROUP::138 GROUP::139 GROUP::14 GROUP::140 GROUP::141 GROUP::142 GROUP::143 GROUP::144 GROUP::145 GROUP::146 GROUP::147 GROUP::148 GROUP::149 GROUP::15 GROUP::150 GROUP::151 GROUP::152 GROUP::153 GROUP::154 GROUP::155 GROUP::156 GROUP::157 GROUP::158 GROUP::159 GROUP::16 GROUP::160 GROUP::161 GROUP::162 GROUP::163 GROUP::164 GROUP::165 GROUP::166 GROUP::167 GROUP::168 GROUP::169 GROUP::17 GROUP::170 GROUP::171 GROUP::172 GROUP::173 GROUP::174 GROUP::175 GROUP::176 GROUP::177 GROUP::178 GROUP::179 GROUP::18 GROUP::180 GROUP::181 GROUP::182 GROUP::183 GROUP::184 GROUP::185 GROUP::186 GROUP::187 GROUP::188 GROUP::189 GROUP::19 GROUP::190 GROUP::191 GROUP::192 GROUP::193 GROUP::194 GROUP::195 GROUP::196 GROUP::197 GROUP::198 GROUP::199 GROUP::2 GROUP::20 GROUP::200 GROUP::201 GROUP::202 GROUP::203 GROUP::204 GROUP::205 GROUP::206 GROUP::207 GROUP::208 GROUP::209 GROUP::21 GROUP::210 GROUP::211 GROUP::212 GROUP::213 GROUP::214 GROUP::215 GROUP::216 GROUP::217 GROUP::218 GROUP::22 GROUP::23 GROUP::24 GROUP::25 GROUP::26 GROUP::27 GROUP::28 GROUP::29 GROUP::3 GROUP::30 GROUP::31 GROUP::32 GROUP::33 GROUP::34 GROUP::35 GROUP::36 GROUP::37 GROUP::38 GROUP::39 GROUP::4 GROUP::40 GROUP::41 GROUP::42 GROUP::43 GROUP::44 GROUP::45 GROUP::46 GROUP::47 GROUP::48 GROUP::49 GROUP::5 GROUP::50 GROUP::51 GROUP::52 GROUP::53 GROUP::54 GROUP::55 GROUP::56 GROUP::57 GROUP::58 GROUP::59 GROUP::6 GROUP::60 GROUP::61 GROUP::62 GROUP::63 GROUP::64 GROUP::65 GROUP::66 GROUP::67 GROUP::68 GROUP::69 GROUP::7 GROUP::70 GROUP::71 GROUP::72 GROUP::73 GROUP::74 GROUP::75 GROUP::76 GROUP::77 GROUP::78 GROUP::79 GROUP::8 GROUP::80 GROUP::81 GROUP::82 GROUP::83 GROUP::84 GROUP::85 GROUP::86 GROUP::87 GROUP::88 GROUP::89 GROUP::9 GROUP::90 GROUP::91 GROUP::92 GROUP::93 GROUP::94 GROUP::95 GROUP::96 GROUP::97 GROUP::98 GROUP::99;
GROUP::1 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::2 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::3 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::4 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::5 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::6 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::OTHER_ENTITY ENT::QUALITATIVE_CONCEPT ENT::THERAPEUTIC_PROCEDURE;
GROUP::7 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::8 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::9 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::10 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::11 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::12 -> ENT::BIOLOGICAL_STRUCTURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::13 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::FREQUENCY ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::14 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::15 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::16 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::17 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::18 -> ENT::COREFERENCE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::19 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::20 -> ENT::ACTIVITY ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::21 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::22 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::23 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::24 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::25 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::26 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::27 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::28 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::QUANTITATIVE_CONCEPT ENT::SIGN_SYMPTOM;
GROUP::29 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::30 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::31 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::32 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM;
GROUP::33 -> ENT::ADMINISTRATION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::34 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::35 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::36 -> ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::37 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::38 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::39 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::40 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::41 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::42 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::43 -> ENT::DURATION ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::44 -> ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::45 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE;
GROUP::46 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::47 -> ENT::CLINICAL_EVENT ENT::MEDICATION ENT::OTHER_EVENT;
GROUP::48 -> ENT::CLINICAL_EVENT ENT::DATE ENT::LAB_VALUE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::49 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::50 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DURATION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::51 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::52 -> ENT::ACTIVITY ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::HISTORY;
GROUP::53 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::54 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::55 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::FREQUENCY;
GROUP::56 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::57 -> ENT::ACTIVITY ENT::AGE ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::58 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::59 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::60 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::61 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::62 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::63 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY;
GROUP::64 -> ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::65 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::66 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY ENT::SEX;
GROUP::67 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::68 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::69 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::70 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::71 -> ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::72 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::73 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::74 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::75 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::76 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DURATION ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::77 -> ENT::DISEASE_DISORDER ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::78 -> ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::MEDICATION;
GROUP::79 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VOLUME;
GROUP::80 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::81 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::82 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::83 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::84 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SUBJECT;
GROUP::85 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::86 -> ENT::ADMINISTRATION ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::MEDICATION;
GROUP::87 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::OTHER_EVENT ENT::SEVERITY;
GROUP::88 -> ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::89 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::90 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::91 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::TEXTURE;
GROUP::92 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::93 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::94 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::95 -> ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::96 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::97 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TIME;
GROUP::98 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DISEASE_DISORDER ENT::THERAPEUTIC_PROCEDURE;
GROUP::99 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::100 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::101 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::102 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SEVERITY;
GROUP::103 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION;
GROUP::104 -> ENT::MEDICATION;
GROUP::105 -> ENT::DATE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY;
GROUP::106 -> ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::107 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::QUALITATIVE_CONCEPT ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::108 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::TEXTURE;
GROUP::109 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::110 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SIGN_SYMPTOM;
GROUP::111 -> ENT::CLINICAL_EVENT ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::112 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::113 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::114 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::115 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::116 -> ENT::CLINICAL_EVENT ENT::DOSAGE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::117 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::118 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::119 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::120 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::121 -> ENT::DATE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::122 -> ENT::DETAILED_DESCRIPTION ENT::FAMILY_HISTORY ENT::SUBJECT;
GROUP::123 -> ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::THERAPEUTIC_PROCEDURE;
GROUP::124 -> ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::125 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::126 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DURATION ENT::MEDICATION;
GROUP::127 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::PERSONAL_BACKGROUND ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::128 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION;
GROUP::129 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::OTHER_EVENT;
GROUP::130 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::131 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SHAPE;
GROUP::132 -> ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::133 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::MEDICATION;
GROUP::134 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::135 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::136 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DURATION ENT::HISTORY ENT::NONBIOLOGICAL_LOCATION ENT::OCCUPATION ENT::PERSONAL_BACKGROUND ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::137 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::138 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SUBJECT;
GROUP::139 -> ENT::DATE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::140 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::THERAPEUTIC_PROCEDURE;
GROUP::141 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::142 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::143 -> ENT::ACTIVITY ENT::AGE ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SEX;
GROUP::144 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::145 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::146 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::DURATION ENT::LAB_VALUE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::147 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::148 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::149 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::150 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::HISTORY ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SEX;
GROUP::151 -> ENT::DATE ENT::DISEASE_DISORDER;
GROUP::152 -> ENT::DETAILED_DESCRIPTION ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::153 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::154 -> ENT::FAMILY_HISTORY ENT::SUBJECT;
GROUP::155 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::QUANTITATIVE_CONCEPT ENT::SIGN_SYMPTOM;
GROUP::156 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::157 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::158 -> ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::159 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::160 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE;
GROUP::161 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::162 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::163 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::164 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::165 -> ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::166 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::167 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::168 -> ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TIME;
GROUP::169 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::170 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION;
GROUP::171 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::172 -> ENT::DURATION ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::173 -> ENT::DATE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::THERAPEUTIC_PROCEDURE;
GROUP::174 -> ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::175 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::DURATION ENT::FREQUENCY ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::176 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::177 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::MEDICATION;
GROUP::178 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY;
GROUP::179 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::180 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::OUTCOME ENT::SIGN_SYMPTOM;
GROUP::181 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::MEDICATION;
GROUP::182 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::183 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
GROUP::184 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::LAB_VALUE;
GROUP::185 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::186 -> ENT::BIOLOGICAL_STRUCTURE ENT::DURATION ENT::QUANTITATIVE_CONCEPT ENT::SIGN_SYMPTOM;
GROUP::187 -> ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::LAB_VALUE ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::188 -> ENT::COREFERENCE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::189 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::HISTORY ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::190 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::THERAPEUTIC_PROCEDURE;
GROUP::191 -> ENT::BIOLOGICAL_STRUCTURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::192 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::THERAPEUTIC_PROCEDURE;
GROUP::193 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::194 -> ENT::ADMINISTRATION ENT::DATE ENT::DOSAGE ENT::MEDICATION;
GROUP::195 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::196 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::197 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::198 -> ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::THERAPEUTIC_PROCEDURE;
GROUP::199 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::200 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::201 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::202 -> ENT::DOSAGE ENT::MEDICATION ENT::THERAPEUTIC_PROCEDURE;
GROUP::203 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::FREQUENCY ENT::MEDICATION ENT::SIGN_SYMPTOM;
GROUP::204 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::SEVERITY ENT::THERAPEUTIC_PROCEDURE;
GROUP::205 -> ENT::DOSAGE ENT::DURATION ENT::MEDICATION;
GROUP::206 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::LAB_VALUE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SUBJECT;
GROUP::207 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DOSAGE ENT::DURATION ENT::MEDICATION ENT::PERSONAL_BACKGROUND ENT::SEX;
GROUP::208 -> ENT::DISEASE_DISORDER ENT::DURATION ENT::NONBIOLOGICAL_LOCATION ENT::OUTCOME ENT::SEVERITY ENT::THERAPEUTIC_PROCEDURE;
GROUP::209 -> ENT::ACTIVITY ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE;
GROUP::210 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::211 -> ENT::ADMINISTRATION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::LAB_VALUE ENT::MEDICATION;
GROUP::212 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::213 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION;
GROUP::214 -> ENT::BIOLOGICAL_STRUCTURE ENT::COREFERENCE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::215 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::216 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::QUANTITATIVE_CONCEPT;
GROUP::217 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::QUANTITATIVE_CONCEPT ENT::THERAPEUTIC_PROCEDURE;
GROUP::218 -> ENT::DIAGNOSTIC_PROCEDURE ENT::LAB_VALUE ENT::MEDICATION;
We’ve successfully built a basic database schema from our corpus, but there’s significant potential for improvement. Let’s explore how we can enhance it using the ArchiTXT simplification algorithm!
First, let’s visualize the repartition of equivalent classes inside the forest.
from architxt.similarity import equiv_cluster
clusters = equiv_cluster(forest, tau=0.95)
import plotly.express as px
fig = px.bar(x=clusters.keys(), y=[len(elems) for elems in clusters.values()])
fig.update_layout(xaxis_title='Equivalent Class', yaxis_title='Count', showlegend=False)
fig.show()
It’s now time to use ArchiTXT to automatically structure the data.
from architxt.simplification.tree_rewriting import rewrite
rewrite(forest, epoch=10, min_support=5, tau=0.95)
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
ROOT
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐
│ UNDEF_beb2e008c9d3453ea8ecd7ae09
│ fdbe13
│ ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐
│ │ UNDEF_0c1924f22a7f4e4bbea0d9c4f1
│ │ 5b12f5
│ │ ┌────────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_87d78dfc508c4dfeadbedef033
│ │ │ │ │ │ 5cb1b8
│ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────┬───────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_99b38550dc884944a610facde4 │ │ │
│ │ │ │ │ │ c1d87b │ │ │
│ │ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ UNDEF_bd12d50438a74752b3172e4c91 │ │ │
│ │ │ │ │ │ │ c47491 │ │ │
│ │ │ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ │ UNDEF_a69f788a53ba475cac55ffa277 │ │ UNDEF_bbdfaa386c40482b9b01940e91
│ │ │ │ │ │ │ │ 0c4d26 │ │ 6edf2f
│ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────┐ │ │ ┌──────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┐
│ │ │ │ │ UNDEF_5a8942f7168642f79bb9029b01 │ │ │ UNDEF_32aa3ccdbc664b88a115e3881e │ │ │ UNDEF_e4ade9e4af98488aa3cad31d19
│ │ │ │ │ c5537c │ │ │ ca251d │ │ │ dc1b4f
│ │ │ │ │ ┌──────────────────────────────────────────────────────┴──────────────────────────────────────────────────────┐ │ │ │ ┌────────────────────────────┴─────────────────────────────┐ │ │ │ ┌────────────────────────────────┴──────────────────────────────────────────────────┐
GROUP::34_35_35_35_35_35_35_35_3 │ UNDEF_c225ba41f87b40348578937858 UNDEF_611b8ee7f54b4d25a13ded8663 UNDEF_9338e61f064c4107bca665036a GROUP::3_3_3_3_3_3_3_3_3 GROUP::3_3_3_3_3_3_3_3_3 UNDEF_5e0973fea03c4dbe9f1d1cd831 GROUP::3_3_3_3_3_3_3_3_3 GROUP::3_3_3_3_3_3_3_3_3 GROUP::3_3_3_3_3_3_3_3_3 │ │ │ GROUP::3_3_3_3_3_3_3_3_3 UNDEF_0bd6172b7d39425c9564016825 │
5 │ 54c8c8 cce992 97fb37 │ │ ea87e1 │ │ │ │ │ │ │ b7ef3e │
┌───────────────────────┴────────────────────┐ │ ┌────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┴────────────────────────────────┐ ┌────────────────────────────┼───────────────────────────────┐ ┌────────────────────────┴─────────────────────────┐ ┌────────────────────────┴─────────────────────────┐ ┌───────────────────────┼────────────────────────────────┐ ┌────────────────────────────┴────────────────────────────┐ ┌────────────────────────────┴────────────────────────────┐ ┌────────────────────────────┴────────────────────────────┐ │ │ │ ┌────────────────────────┴───────────────────────┐ ┌────────────────────────────────┼────────────────────────────────────────────────────────────┐ │
ENT::AGE ENT::SEX ENT::DIAGNOSTIC_PROCEDURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::LAB_VALUE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DISEASE_DISORDER
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
1%20year%20old woman vacterl%20association absent cervical%20atlas supernumerary bone%20structure%20of%20lumba... hypoplastic abies%20religiosa bone%20structure%20of%20coccyx fatty%20acids end-stage tethered spinal%20cord three fused bone%20structure%20of%20rib anorectal%20malformations cloaca%20chamber common%20%28qualifier%20value%29 urogenital%20sinus duplex vagina midline%20septum type%20c pathologic%20fistula right%20kidney agenesis moderate%20%28severity%20modi... left hydronephrosis vesico-ureteral%20reflux
We now have a more granular structure. Let’s take a closer look at the schema.
schema = Schema.from_forest(forest, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> COLL::3_3_3_3_3_3_3_3_3 GROUP::32_33_33_33_33_33_33_33_33 GROUP::34_35_35_35_35_35_35_35_35 GROUP::3_3_3_3_3_3_3_3_3 GROUP::56_57_57_57_57_57_57_57_57 GROUP::69_19_19_19_19_19_19_19_19 GROUP::7_7_7_7_7_7_7_7_7 REL::3_3_3_3_3_3_3_3_3<->7_7_7_7_7_7_7_7_7;
COLL::3_3_3_3_3_3_3_3_3 -> GROUP::3_3_3_3_3_3_3_3_3;
REL::3_3_3_3_3_3_3_3_3<->7_7_7_7_7_7_7_7_7 -> GROUP::3_3_3_3_3_3_3_3_3 GROUP::7_7_7_7_7_7_7_7_7;
GROUP::3_3_3_3_3_3_3_3_3 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::COREFERENCE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DURATION ENT::FAMILY_HISTORY ENT::HISTORY ENT::LAB_VALUE ENT::MEDICATION ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::SUBJECT ENT::THERAPEUTIC_PROCEDURE ENT::TIME;
GROUP::7_7_7_7_7_7_7_7_7 -> ENT::DETAILED_DESCRIPTION ENT::THERAPEUTIC_PROCEDURE;
GROUP::69_19_19_19_19_19_19_19_19 -> ENT::DOSAGE ENT::MEDICATION;
GROUP::32_33_33_33_33_33_33_33_33 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::34_35_35_35_35_35_35_35_35 -> ENT::AGE ENT::SEX;
GROUP::56_57_57_57_57_57_57_57_57 -> ENT::LAB_VALUE ENT::SIGN_SYMPTOM;
The schema is now much smaller, and the groups are more meaningful.
But not all extracted trees provide valuable insights, so we could filter the structured instance to keep only the valid trees using schema.extract_valid_trees(new_forest)
.
Let’s explore the different semantic groups.
Groups represent common patterns across the corpus.
all_datasets = schema.extract_datasets(forest)
group, dataset = max(all_datasets.items(), key=lambda x: len(x[1]))
print(f'Group: {group}')
dataset
Group: 3_3_3_3_3_3_3_3_3
Loading ITables v2.4.4 from the internet... (need help?) |