Exploring a Textual Corpus with ArchiTXT#
This tutorial provides a step-by-step guide on how to use ArchiTXT to efficiently process and analyze textual corpora.
ArchiTXT allows loading a corpus as a set of syntax trees, where each tree is enriched by incorporating named entities. These enriched trees form a forest, which can then be automatically structured into a valid database instance for further analysis.
By following this tutorial, you’ll learn how to:
Load a corpus
Parse textual data with Berkeley Neural Parser (Benepar)
Extract structured data using ArchiTXT
Downloading the MACCROBAT Corpus#
The MACCROBAT corpus is a collection of 200 annotated medical documents, specifically clinical case reports, extracted from PubMed Central. The annotations focus on key medical concepts such as diseases, treatments, medications, and symptoms, making it a valuable resource for biomedical text analysis.
The MACCROBAT corpus is available for download at Figshare or on kaggle.
Let’s download the corpora.
import urllib.request
urllib.request.urlretrieve(
'https://www.kaggle.com/api/v1/datasets/download/okolojeremiah/maccrobat',
filename='MACCROBAT.zip',
)
Installing and Configuring NLP Models#
ArchiTXT can parse the sentences using either Benepar with SpaCy or a CoreNLP server. In this tutorial, we will use the SpaCy parser with the default model, but you can use any models like one from SciSpaCy, a collection of models designed for biomedical text processing by AllenAI.
To download the SciSpaCy model, do:
!spacy download en_core_web_smWe also need to download the Benepar model for English
import benepar
benepar.download('benepar_en3')
Parsing the Corpus with ArchiTXT#
Before processing the corpus, we need to configure the BeneparParser, specifying which SpaCy model to use for each language.
import warnings
from architxt.nlp.parser.benepar import BeneparParser
# Initialize the parser
parser = BeneparParser(
spacy_models={
'English': 'en_core_web_sm',
}
)
# Suppress warnings for unsupported annotations
warnings.filterwarnings("ignore")
Named Entity Resolution (NER) helps to standardize the named entities and to build a database instance. To enable NER, we need to provide the knowledge base to use. For this tutorial, we will use the UMLS (Unified Medical Language System) resolver.
from architxt.nlp.contrib.scispacy import ScispacyResolver
resolver = ScispacyResolver(kb_name='umls')
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/tfidf_vectors_sparse.npz not found in cache, downloading to /tmp/tmpvloeznh2
Finished download, copying /tmp/tmpvloeznh2 to cache at /home/runner/.scispacy/datasets/2b79923846fb52e62d686f2db846392575c8eb5b732d9d26cd3ca9378c622d40.87bd52d0f0ee055c1e455ef54ba45149d188552f07991b765da256a1b512ca0b.tfidf_vectors_sparse.npz
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/nmslib_index.bin not found in cache, downloading to /tmp/tmpiqbji5zm
Finished download, copying /tmp/tmpiqbji5zm to cache at /home/runner/.scispacy/datasets/7e8e091ec80370b87b1652f461eae9d926e543a403a69c1f0968f71157322c25.6d801a1e14867953e36258b0e19a23723ae84b0abd2a723bdd3574c3e0c873b4.nmslib_index.bin
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/tfidf_vectorizer.joblib not found in cache, downloading to /tmp/tmpli7qyuxj
Finished download, copying /tmp/tmpli7qyuxj to cache at /home/runner/.scispacy/datasets/37bc06bb7ce30de7251db5f5cbac788998e33b3984410caed2d0083187e01d38.f0994c1b61cc70d0eb96dea4947dddcb37460fb5ae60975013711228c8fe3fba.tfidf_vectorizer.joblib
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2023-04-23/umls/concept_aliases.json not found in cache, downloading to /tmp/tmp9jifb_y4
Finished download, copying /tmp/tmp9jifb_y4 to cache at /home/runner/.scispacy/datasets/6238f505f56aca33290aab44097f67dd1b88880e3be6d6dcce65e56e9255b7d4.d7f77b1629001b40f1b1bc951f3a890ff2d516fb8fbae3111b236b31b33d6dcf.concept_aliases.json
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/kbs/2023-04-23/umls_2022_ab_cat0129.jsonl not found in cache, downloading to /tmp/tmpk1txihdm
Finished download, copying /tmp/tmpk1txihdm to cache at /home/runner/.scispacy/datasets/d5e593bc2d8adeee7754be423cd64f5d331ebf26272074a2575616be55697632.0660f30a60ad00fffd8bbf084a18eb3f462fd192ac5563bf50940fc32a850a3c.umls_2022_ab_cat0129.jsonl
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/umls_semantic_type_tree.tsv not found in cache, downloading to /tmp/tmplj2q4gzl
Finished download, copying /tmp/tmplj2q4gzl to cache at /home/runner/.scispacy/datasets/21a1012c532c3a431d60895c509f5b4d45b0f8966c4178b892190a302b21836f.330707f4efe774134872b9f77f0e3208c1d30f50800b3b39a6b8ec21d9adf1b7.umls_semantic_type_tree.tsvLet’s parse a sample of the corpus. To verify that everything is functioning as expected, we will inspect the largest enriched tree using the :py:meth:~architxt.tree.Tree.pretty_print method.
from architxt.nlp import raw_load_corpus
forest = [
tree
async for tree in raw_load_corpus(
['MACCROBAT.zip'],
['English'],
cache=False,
parser=parser,
resolver=resolver,
sample=600,
entities_filter={
'OTHER_ENTITY',
'OTHER_EVENT',
'COREFERENCE',
},
entities_mapping={
'QUANTITATIVE_CONCEPT': 'VALUE',
'QUALITATIVE_CONCEPT': 'VALUE',
'LAB_VALUE': 'VALUE',
'THERAPEUTIC_PROCEDURE': 'TREATMENT',
'MEDICATION': 'TREATMENT',
'OUTCOME': 'SIGN_SYMPTOM',
},
)
]
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
ROOT
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────┐
│ UNDEF_7dc07e2d46a34e42be6a565763
│ d2baed
│ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────┐
│ │ UNDEF_cda4dd9f16724d539ea3854bbe
│ │ ffa00c
│ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_80355115ba0e4ed786978b4b9f
│ │ │ │ │ │ 2b6bad
│ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────┬───────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_1736ff31b3b94c08873f095f2a │ │ │
│ │ │ │ │ │ 74de09 │ │ │
│ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ UNDEF_3b19fc13563f4472881cbe76ed │ │ │
│ │ │ │ │ │ │ a77033 │ │ │
│ │ │ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ │ UNDEF_05b6c8e0f9ec4eef8e348db38a │ │ UNDEF_dc2f05a9c6694bf9b36fdef4ea
│ │ │ │ │ │ │ │ 7583c3 │ │ 5c0466
│ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┐ │ │ ┌──────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┐
│ │ │ │ │ UNDEF_de7eab9836f647cfb1031dc113 │ │ │ UNDEF_786ca999a81143f5b2cb9da6ef │ │ │ UNDEF_c85a549f5dbb47bca314a7fae8
│ │ │ │ │ 8b5aa2 │ │ │ 7395aa │ │ │ fe1cc6
│ │ │ │ │ ┌──────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┐ │ │ │ ┌──────────────────────────────────────────────────────────┴─────────────────────────────────────┐ │ │ │ ┌────────────────────────────────┴──────────────────────────────────────────┐
UNDEF_51d264bbeda044fb8f263a3c9b │ UNDEF_02b827cc611f4e2eb938d93d53 UNDEF_426704075d594683ab6495ddea UNDEF_014e0de7e4c949a0b315be0680 UNDEF_debd4ef811204e3d96e1a517cd UNDEF_be25a1dd6de04bc5afa56339b9 UNDEF_a8f48256f3ec47c1be8deb1752 UNDEF_bd8cd9a83f4b451383fcfecff0 UNDEF_1776da8717b64c88ade85d95e1 UNDEF_2d6dfd6247fb4a10bb73ce39b4 │ │ │ UNDEF_9ab1e5e517fb4b01ba5703488c UNDEF_b0b993b2ee76450ea9a84e5192 │
e416e9 │ 202924 3afc8c c7da79 3e946c dff1c6 390f74 cfdce1 a5d4b8 9b4a28 │ │ │ 7fb8b7 278061 │
┌────────────────────┴────────────────────────┐ │ ┌────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┼─────────────────────────────────┐ ┌────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┴─────────────────────────────┐ ┌─────────────────────┼─────────────────────────────┐ ┌──────────────────────────┴─────────────────────────────┐ ┌────────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────────┴─────────────────────────────┐ │ │ │ ┌────────────────────────────┴───────────────────────────┐ ┌────────────────────────────────┼────────────────────────────────┐ │
ENT::AGE ENT::SEX ENT::DIAGNOSTIC_PROCEDURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::VALUE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DISEASE_DISORDER
┌───┬─────┼────────────────────┬──────────────────┐ │ ┌────────────────┴──────────────────┐ │ ┌─────────────────────────────┴────────────────┐ │ ┌─────────────────────────────┴─────────────────┐ │ │ ┌──────────────┴───────────────┐ │ ┌─────────────────────────────┴────────────────────────────┐ │ ┌─────────────────────────────┴──────────────┐ │ │ │ ┌───────────────────────────┴──────────────────────────┐ │ │ ┌─────────────────────────────┴───────────────┐ │ │ ┌────────────────┴────────────────────────────┐ ┌───────────────┴──────────────┐ ┌────────────┴──────────────┐ ┌───────────────┴────────────────────────────┐ │ │ │ │ ┌──────────────────┴─────────────┐
3 - year - old girl VACTERL association absent C1 vertebra supernumerary lumbar vertebrae hypoplastic sacrum / coccyx fatty filum terminale tethered spinal cord three fused ribs anorectal malformation cloaca common urogenital sinus duplex vagina midline septum type C TE fistula right renal agenesis moderate left hydronephrosis vesicoureteral refluxLet’s see the repartition of the entities inside this sample
ArchiTXT can then automatically structure parsed text into a database-friendly format. Let’s start with a simple rewrite!
from copy import deepcopy
from architxt.simplification.simple_rewrite import simple_rewrite
forest_copy = deepcopy(forest)
simple_rewrite(forest_copy)
# Look at the highest tree
max(forest_copy, key=lambda tree: tree.height).pretty_print()
ROOT
│
GROUP::1
┌─────────────────────────────┬──────────┴─────────────┬────────────┬──────────────┬────────────────────┐
ENT::AGE ENT::HISTORY ENT::SIGN_SYMPTOM ENT::SEX ENT::CLINICAL_EVENT ENT::DURATION
┌───┬─────┼──────┬───┐ ┌───────────┴──────────┐ │ │ │ ┌────────┼────────┐
28 - year - old previously healthy healthy man presented 6 - weekNow that we have a structured instance, we can extract its schema. The schema provides a formal representation of the extracted data.
from architxt.schema import Schema
schema = Schema.from_forest(forest_copy, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> GROUP::1 GROUP::10 GROUP::100 GROUP::101 GROUP::102 GROUP::103 GROUP::104 GROUP::105 GROUP::106 GROUP::107 GROUP::108 GROUP::109 GROUP::11 GROUP::110 GROUP::111 GROUP::112 GROUP::113 GROUP::114 GROUP::115 GROUP::116 GROUP::117 GROUP::118 GROUP::119 GROUP::12 GROUP::120 GROUP::121 GROUP::122 GROUP::123 GROUP::124 GROUP::125 GROUP::126 GROUP::127 GROUP::128 GROUP::129 GROUP::13 GROUP::130 GROUP::131 GROUP::132 GROUP::133 GROUP::134 GROUP::135 GROUP::136 GROUP::137 GROUP::138 GROUP::139 GROUP::14 GROUP::140 GROUP::141 GROUP::142 GROUP::143 GROUP::144 GROUP::145 GROUP::146 GROUP::147 GROUP::148 GROUP::149 GROUP::15 GROUP::150 GROUP::151 GROUP::152 GROUP::153 GROUP::154 GROUP::155 GROUP::156 GROUP::157 GROUP::158 GROUP::159 GROUP::16 GROUP::160 GROUP::161 GROUP::162 GROUP::163 GROUP::164 GROUP::165 GROUP::166 GROUP::167 GROUP::168 GROUP::169 GROUP::17 GROUP::170 GROUP::171 GROUP::172 GROUP::173 GROUP::174 GROUP::175 GROUP::176 GROUP::177 GROUP::178 GROUP::179 GROUP::18 GROUP::180 GROUP::181 GROUP::182 GROUP::183 GROUP::184 GROUP::185 GROUP::186 GROUP::187 GROUP::188 GROUP::189 GROUP::19 GROUP::190 GROUP::191 GROUP::192 GROUP::193 GROUP::194 GROUP::195 GROUP::196 GROUP::197 GROUP::198 GROUP::199 GROUP::2 GROUP::20 GROUP::200 GROUP::201 GROUP::202 GROUP::203 GROUP::204 GROUP::205 GROUP::206 GROUP::207 GROUP::208 GROUP::209 GROUP::21 GROUP::210 GROUP::211 GROUP::212 GROUP::213 GROUP::214 GROUP::215 GROUP::216 GROUP::217 GROUP::218 GROUP::219 GROUP::22 GROUP::220 GROUP::221 GROUP::222 GROUP::223 GROUP::224 GROUP::225 GROUP::226 GROUP::227 GROUP::228 GROUP::229 GROUP::23 GROUP::230 GROUP::231 GROUP::232 GROUP::233 GROUP::234 GROUP::235 GROUP::236 GROUP::237 GROUP::238 GROUP::239 GROUP::24 GROUP::240 GROUP::241 GROUP::242 GROUP::243 GROUP::244 GROUP::245 GROUP::246 GROUP::247 GROUP::248 GROUP::249 GROUP::25 GROUP::250 GROUP::251 GROUP::252 GROUP::253 GROUP::254 GROUP::255 GROUP::256 GROUP::257 GROUP::258 GROUP::259 GROUP::26 GROUP::260 GROUP::261 GROUP::262 GROUP::263 GROUP::264 GROUP::265 GROUP::266 GROUP::267 GROUP::268 GROUP::269 GROUP::27 GROUP::270 GROUP::271 GROUP::272 GROUP::273 GROUP::28 GROUP::29 GROUP::3 GROUP::30 GROUP::31 GROUP::32 GROUP::33 GROUP::34 GROUP::35 GROUP::36 GROUP::37 GROUP::38 GROUP::39 GROUP::4 GROUP::40 GROUP::41 GROUP::42 GROUP::43 GROUP::44 GROUP::45 GROUP::46 GROUP::47 GROUP::48 GROUP::49 GROUP::5 GROUP::50 GROUP::51 GROUP::52 GROUP::53 GROUP::54 GROUP::55 GROUP::56 GROUP::57 GROUP::58 GROUP::59 GROUP::6 GROUP::60 GROUP::61 GROUP::62 GROUP::63 GROUP::64 GROUP::65 GROUP::66 GROUP::67 GROUP::68 GROUP::69 GROUP::7 GROUP::70 GROUP::71 GROUP::72 GROUP::73 GROUP::74 GROUP::75 GROUP::76 GROUP::77 GROUP::78 GROUP::79 GROUP::8 GROUP::80 GROUP::81 GROUP::82 GROUP::83 GROUP::84 GROUP::85 GROUP::86 GROUP::87 GROUP::88 GROUP::89 GROUP::9 GROUP::90 GROUP::91 GROUP::92 GROUP::93 GROUP::94 GROUP::95 GROUP::96 GROUP::97 GROUP::98 GROUP::99;
GROUP::1 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DURATION ENT::HISTORY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::2 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::FREQUENCY ENT::SIGN_SYMPTOM;
GROUP::3 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::4 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::5 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM;
GROUP::6 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::7 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT;
GROUP::8 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::VALUE;
GROUP::9 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::10 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::VALUE;
GROUP::11 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::12 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::13 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::TEXTURE;
GROUP::14 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::VALUE;
GROUP::15 -> ENT::BIOLOGICAL_STRUCTURE;
GROUP::16 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::17 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::18 -> ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::19 -> ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION;
GROUP::20 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::VALUE;
GROUP::21 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::22 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::VALUE;
GROUP::23 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::24 -> ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::25 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::DURATION ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::26 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::27 -> ENT::DATE ENT::TREATMENT;
GROUP::28 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::29 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SIGN_SYMPTOM;
GROUP::30 -> ENT::DATE ENT::DOSAGE ENT::TREATMENT;
GROUP::31 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::32 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::33 -> ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::34 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::35 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::36 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::37 -> ENT::CLINICAL_EVENT ENT::DURATION ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::38 -> ENT::AGE ENT::DOSAGE ENT::SEX ENT::TREATMENT;
GROUP::39 -> ENT::DISEASE_DISORDER ENT::DOSAGE ENT::TREATMENT;
GROUP::40 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::TREATMENT;
GROUP::41 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::TREATMENT;
GROUP::42 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::43 -> ENT::DETAILED_DESCRIPTION ENT::TREATMENT;
GROUP::44 -> ENT::DISEASE_DISORDER ENT::TREATMENT;
GROUP::45 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::46 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::47 -> ENT::ACTIVITY ENT::AGE ENT::NONBIOLOGICAL_LOCATION ENT::PERSONAL_BACKGROUND ENT::SEX;
GROUP::48 -> ENT::ACTIVITY ENT::DURATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::49 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::50 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::51 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::52 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::53 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::54 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::FREQUENCY ENT::TREATMENT;
GROUP::55 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::56 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::57 -> ENT::ACTIVITY ENT::DATE ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::OCCUPATION;
GROUP::58 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::59 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::TREATMENT;
GROUP::60 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::61 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::TREATMENT;
GROUP::62 -> ENT::DISEASE_DISORDER ENT::DOSAGE ENT::FREQUENCY ENT::TREATMENT;
GROUP::63 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::64 -> ENT::CLINICAL_EVENT ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::65 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::VALUE;
GROUP::66 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::67 -> ENT::DISEASE_DISORDER;
GROUP::68 -> ENT::ADMINISTRATION ENT::DATE ENT::DOSAGE ENT::FREQUENCY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::69 -> ENT::ADMINISTRATION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::TREATMENT ENT::VALUE;
GROUP::70 -> ENT::DATE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::71 -> ENT::ADMINISTRATION ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::TREATMENT;
GROUP::72 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT ENT::VALUE;
GROUP::73 -> ENT::CLINICAL_EVENT ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::74 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::75 -> ENT::AGE ENT::DATE ENT::DISEASE_DISORDER ENT::SEX;
GROUP::76 -> ENT::DIAGNOSTIC_PROCEDURE ENT::HISTORY ENT::VALUE;
GROUP::77 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::78 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::79 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::80 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT ENT::VALUE;
GROUP::81 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::82 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::VALUE;
GROUP::83 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::84 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::85 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::TREATMENT;
GROUP::86 -> ENT::CLINICAL_EVENT ENT::TREATMENT;
GROUP::87 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER;
GROUP::88 -> ENT::CLINICAL_EVENT ENT::DURATION ENT::NONBIOLOGICAL_LOCATION;
GROUP::89 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::NONBIOLOGICAL_LOCATION ENT::VALUE;
GROUP::90 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::91 -> ENT::ACTIVITY ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::TIME;
GROUP::92 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT;
GROUP::93 -> ENT::DETAILED_DESCRIPTION ENT::TREATMENT ENT::VALUE;
GROUP::94 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::HISTORY ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::95 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::96 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::97 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::98 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::99 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::VALUE;
GROUP::100 -> ENT::DATE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::101 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::TIME ENT::TREATMENT;
GROUP::102 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::TREATMENT;
GROUP::103 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::104 -> ENT::DISEASE_DISORDER ENT::SEVERITY ENT::TREATMENT;
GROUP::105 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::106 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::107 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::108 -> ENT::DIAGNOSTIC_PROCEDURE;
GROUP::109 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::110 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::111 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::TREATMENT;
GROUP::112 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEVERITY ENT::VALUE;
GROUP::113 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::114 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::115 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::116 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER;
GROUP::117 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DOSAGE ENT::TREATMENT;
GROUP::118 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::TREATMENT;
GROUP::119 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::120 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::121 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::122 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::123 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::124 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::125 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::HISTORY ENT::VALUE;
GROUP::126 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::HISTORY ENT::TREATMENT;
GROUP::127 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY;
GROUP::128 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::VALUE;
GROUP::129 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::130 -> ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::131 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::132 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::133 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::134 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::SIGN_SYMPTOM ENT::TIME ENT::VALUE;
GROUP::135 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::136 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::137 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DURATION;
GROUP::138 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::FAMILY_HISTORY ENT::SEX ENT::TREATMENT;
GROUP::139 -> ENT::CLINICAL_EVENT ENT::DATE ENT::FAMILY_HISTORY ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::SUBJECT ENT::VALUE;
GROUP::140 -> ENT::DIAGNOSTIC_PROCEDURE ENT::TIME ENT::VALUE;
GROUP::141 -> ENT::CLINICAL_EVENT ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::142 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::143 -> ENT::BIOLOGICAL_STRUCTURE ENT::SEVERITY;
GROUP::144 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::145 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::TEXTURE ENT::VALUE;
GROUP::146 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISTANCE;
GROUP::147 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::148 -> ENT::BIOLOGICAL_STRUCTURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::149 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::150 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::151 -> ENT::SIGN_SYMPTOM;
GROUP::152 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::VALUE;
GROUP::153 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::154 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::155 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::156 -> ENT::TREATMENT;
GROUP::157 -> ENT::BIOLOGICAL_STRUCTURE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::158 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEX ENT::TREATMENT;
GROUP::159 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TIME ENT::TREATMENT ENT::VALUE;
GROUP::160 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::161 -> ENT::DETAILED_DESCRIPTION ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::162 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::163 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::164 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::165 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER;
GROUP::166 -> ENT::CLINICAL_EVENT ENT::DATE ENT::TREATMENT ENT::VALUE;
GROUP::167 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::168 -> ENT::DOSAGE ENT::FREQUENCY ENT::TREATMENT;
GROUP::169 -> ENT::CLINICAL_EVENT ENT::DATE ENT::SIGN_SYMPTOM;
GROUP::170 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::HISTORY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::171 -> ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::172 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::TEXTURE;
GROUP::173 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM;
GROUP::174 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::VALUE;
GROUP::175 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEX ENT::TREATMENT;
GROUP::176 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE;
GROUP::177 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::TREATMENT ENT::VALUE;
GROUP::178 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::TREATMENT;
GROUP::179 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::180 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SIGN_SYMPTOM ENT::TIME ENT::TREATMENT ENT::VALUE;
GROUP::181 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SUBJECT ENT::TREATMENT ENT::VALUE;
GROUP::182 -> ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::183 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER;
GROUP::184 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::185 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DOSAGE ENT::TREATMENT ENT::VALUE;
GROUP::186 -> ENT::DATE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::187 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::188 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::189 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::190 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::191 -> ENT::BIOLOGICAL_STRUCTURE ENT::TREATMENT ENT::VOLUME;
GROUP::192 -> ENT::DURATION ENT::TREATMENT;
GROUP::193 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM;
GROUP::194 -> ENT::ADMINISTRATION ENT::DOSAGE ENT::DURATION ENT::TREATMENT;
GROUP::195 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::196 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SUBJECT;
GROUP::197 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::198 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION;
GROUP::199 -> ENT::ADMINISTRATION ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::TREATMENT;
GROUP::200 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE ENT::VOLUME;
GROUP::201 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::202 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::SEX ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::203 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::204 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::SIGN_SYMPTOM;
GROUP::205 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::206 -> ENT::CLINICAL_EVENT ENT::DATE ENT::TREATMENT;
GROUP::207 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DURATION ENT::HISTORY ENT::SEX;
GROUP::208 -> ENT::HISTORY ENT::VALUE;
GROUP::209 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::FREQUENCY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::210 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::211 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VOLUME;
GROUP::212 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::213 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::VALUE;
GROUP::214 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SEX ENT::VALUE;
GROUP::215 -> ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SUBJECT;
GROUP::216 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::TEXTURE ENT::VALUE;
GROUP::217 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::VALUE;
GROUP::218 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SUBJECT;
GROUP::219 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::220 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::VALUE;
GROUP::221 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::HISTORY ENT::SEX ENT::VALUE;
GROUP::222 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::223 -> ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SHAPE;
GROUP::224 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::TREATMENT ENT::VALUE;
GROUP::225 -> ENT::DISEASE_DISORDER ENT::DURATION ENT::SIGN_SYMPTOM ENT::SUBJECT ENT::VALUE;
GROUP::226 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SHAPE ENT::TEXTURE;
GROUP::227 -> ENT::DISEASE_DISORDER ENT::SEVERITY ENT::VALUE;
GROUP::228 -> ENT::BIOLOGICAL_STRUCTURE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::229 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::230 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::231 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::232 -> ENT::BIOLOGICAL_STRUCTURE ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::233 -> ENT::DIAGNOSTIC_PROCEDURE ENT::SHAPE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::234 -> ENT::AGE ENT::CLINICAL_EVENT ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::235 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::HISTORY ENT::NONBIOLOGICAL_LOCATION;
GROUP::236 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::237 -> ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::TREATMENT;
GROUP::238 -> ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::239 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SHAPE;
GROUP::240 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION;
GROUP::241 -> ENT::CLINICAL_EVENT ENT::DISEASE_DISORDER;
GROUP::242 -> ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE;
GROUP::243 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX;
GROUP::244 -> ENT::ACTIVITY ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DURATION ENT::HISTORY ENT::SIGN_SYMPTOM;
GROUP::245 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::246 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::247 -> ENT::BIOLOGICAL_STRUCTURE ENT::DISTANCE ENT::TREATMENT;
GROUP::248 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::TREATMENT;
GROUP::249 -> ENT::BIOLOGICAL_STRUCTURE ENT::TREATMENT;
GROUP::250 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::TREATMENT ENT::VALUE;
GROUP::251 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::252 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::253 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DURATION ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::254 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM;
GROUP::255 -> ENT::DATE ENT::DISEASE_DISORDER;
GROUP::256 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM;
GROUP::257 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::258 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::259 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::VALUE;
GROUP::260 -> ENT::CLINICAL_EVENT ENT::DATE ENT::NONBIOLOGICAL_LOCATION;
GROUP::261 -> ENT::DATE ENT::DIAGNOSTIC_PROCEDURE;
GROUP::262 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::263 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::TREATMENT;
GROUP::264 -> ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM;
GROUP::265 -> ENT::CLINICAL_EVENT ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::VALUE;
GROUP::266 -> ENT::DETAILED_DESCRIPTION ENT::SIGN_SYMPTOM ENT::TEXTURE;
GROUP::267 -> ENT::ADMINISTRATION ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::NONBIOLOGICAL_LOCATION ENT::TREATMENT;
GROUP::268 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::TREATMENT ENT::VALUE;
GROUP::269 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::FREQUENCY ENT::HISTORY ENT::PERSONAL_BACKGROUND ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::270 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SHAPE ENT::SIGN_SYMPTOM;
GROUP::271 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::TREATMENT ENT::VALUE;
GROUP::272 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::273 -> ENT::ADMINISTRATION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::TREATMENT ENT::VALUE;We’ve successfully built a basic database schema from our corpus, but there’s significant potential for improvement. Let’s explore how we can enhance it using the ArchiTXT simplification algorithm!
First, let’s visualize the repartition of equivalent classes inside the forest.
from architxt.similarity import TreeClusterer
clusterer = TreeClusterer(tau=0.8, decay=4)
clusters = clusterer.fit_predict(forest)
Let’s visualize the clustering result as a bar chart to better understand the distribution of groups across equivalent classes.
It’s now time to use ArchiTXT to automatically structure the data.
from architxt.simplification.tree_rewriting import rewrite
rewrite(forest, epoch=30, min_support=5, tau=0.8, decay=4)
# Look at the highest tree
max(forest, key=lambda tree: tree.height).pretty_print()
ROOT
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────┐
│ UNDEF_7dc07e2d46a34e42be6a565763
│ d2baed
│ ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────┐
│ │ UNDEF_cda4dd9f16724d539ea3854bbe
│ │ ffa00c
│ │ ┌─────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_80355115ba0e4ed786978b4b9f
│ │ │ │ │ │ 2b6bad
│ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────┬───────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ │ │ │ │ │ UNDEF_1736ff31b3b94c08873f095f2a │ │ │
│ │ │ │ │ │ 74de09 │ │ │
│ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ UNDEF_3b19fc13563f4472881cbe76ed │ │ │
│ │ │ │ │ │ │ a77033 │ │ │
│ │ │ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │ │ UNDEF_05b6c8e0f9ec4eef8e348db38a │ │ UNDEF_dc2f05a9c6694bf9b36fdef4ea
│ │ │ │ │ │ │ │ 7583c3 │ │ 5c0466
│ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────┐ │ │ ┌─────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┐
│ │ │ │ │ UNDEF_de7eab9836f647cfb1031dc113 │ │ │ UNDEF_786ca999a81143f5b2cb9da6ef │ │ │ UNDEF_c85a549f5dbb47bca314a7fae8
│ │ │ │ │ 8b5aa2 │ │ │ 7395aa │ │ │ fe1cc6
│ │ │ │ │ ┌──────────────────────────────────────────────────────┴──────────────────────────────────────────────────────┐ │ │ │ ┌──────────────────────────────────────────────────────┴─────────────────────────────────────┐ │ │ │ ┌────────────────────────────┴──────────────────────────────────────────┐
GROUP::UndefinedGroup_34 │ GROUP::UndefinedGroup_27 GROUP::UndefinedGroup_8 UNDEF_014e0de7e4c949a0b315be0680 GROUP::UndefinedGroup_27 GROUP::UndefinedGroup_42 GROUP::UndefinedGroup_2 GROUP::UndefinedGroup_16 GROUP::UndefinedGroup_30 GROUP::UndefinedGroup_57 │ │ │ GROUP::UndefinedGroup_1 GROUP::UndefinedGroup_13 │
│ │ │ │ c7da79 │ │ │ │ │ │ │ │ │ │ │ │
┌────────────────┴────────────────────┐ │ ┌────────────────────────┴─────────────────────────┐ ┌────────────────────────┴────────────────────────┐ ┌────────────────────────────┼─────────────────────────────────┐ ┌────────────────────────┴─────────────────────────┐ ┌────────────────────────┴─────────────────────────┐ ┌──────────────────┼─────────────────────────┐ ┌──────────────────────────┴─────────────────────────────┐ ┌────────────────────────────────┴─────────────────────────────┐ ┌────────────────────────────┴─────────────────────────┐ │ │ │ ┌────────────────────────┴──────────────────────┐ ┌─────────────────────────────┼────────────────────────────┐ │
ENT::AGE ENT::SEX ENT::DIAGNOSTIC_PROCEDURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::VALUE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::BIOLOGICAL_STRUCTURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DISEASE_DISORDER
┌───┬─────┼────────────────┬──────────────┐ │ ┌────────────────┴──────────────────┐ │ ┌─────────────────────────┴────────────────┐ │ ┌────────────────────────┴─────────────────┐ │ │ ┌──────────────┴───────────────┐ │ ┌─────────────────────────┴────────────────────────────┐ │ ┌─────────────────────────┴──────────────┐ │ │ │ ┌───────────────────────────┴──────────────────────────┐ │ │ ┌─────────────────────────────┴───────────────┐ │ │ ┌────────────────┴────────────────────────────┐ ┌───────────────┴──────────────┐ ┌────────────┴──────────────┐ ┌───────────────┴────────────────────────┐ │ │ │ │ ┌──────────────────┴─────────────┐
3 - year - old girl VACTERL association absent C1 vertebra supernumerary lumbar vertebrae hypoplastic sacrum / coccyx fatty filum terminale tethered spinal cord three fused ribs anorectal malformation cloaca common urogenital sinus duplex vagina midline septum type C TE fistula right renal agenesis moderate left hydronephrosis vesicoureteral refluxWe now have a more granular structure. Let’s take a closer look at the schema.
schema = Schema.from_forest(forest, keep_unlabelled=False)
print(schema.as_cfg())
ROOT -> COLL::UndefinedGroup COLL::UndefinedGroup_36 COLL::UndefinedGroup_44 GROUP::UndefinedGroup GROUP::UndefinedGroup_1 GROUP::UndefinedGroup_10 GROUP::UndefinedGroup_11 GROUP::UndefinedGroup_12 GROUP::UndefinedGroup_13 GROUP::UndefinedGroup_14 GROUP::UndefinedGroup_15 GROUP::UndefinedGroup_16 GROUP::UndefinedGroup_17 GROUP::UndefinedGroup_18 GROUP::UndefinedGroup_19 GROUP::UndefinedGroup_2 GROUP::UndefinedGroup_20 GROUP::UndefinedGroup_21 GROUP::UndefinedGroup_22 GROUP::UndefinedGroup_23 GROUP::UndefinedGroup_24 GROUP::UndefinedGroup_25 GROUP::UndefinedGroup_26 GROUP::UndefinedGroup_27 GROUP::UndefinedGroup_28 GROUP::UndefinedGroup_29 GROUP::UndefinedGroup_3 GROUP::UndefinedGroup_30 GROUP::UndefinedGroup_31 GROUP::UndefinedGroup_32 GROUP::UndefinedGroup_33 GROUP::UndefinedGroup_34 GROUP::UndefinedGroup_35 GROUP::UndefinedGroup_36 GROUP::UndefinedGroup_37 GROUP::UndefinedGroup_38 GROUP::UndefinedGroup_39 GROUP::UndefinedGroup_4 GROUP::UndefinedGroup_40 GROUP::UndefinedGroup_41 GROUP::UndefinedGroup_42 GROUP::UndefinedGroup_43 GROUP::UndefinedGroup_44 GROUP::UndefinedGroup_45 GROUP::UndefinedGroup_46 GROUP::UndefinedGroup_47 GROUP::UndefinedGroup_48 GROUP::UndefinedGroup_49 GROUP::UndefinedGroup_5 GROUP::UndefinedGroup_50 GROUP::UndefinedGroup_51 GROUP::UndefinedGroup_52 GROUP::UndefinedGroup_53 GROUP::UndefinedGroup_54 GROUP::UndefinedGroup_55 GROUP::UndefinedGroup_56 GROUP::UndefinedGroup_57 GROUP::UndefinedGroup_58 GROUP::UndefinedGroup_59 GROUP::UndefinedGroup_6 GROUP::UndefinedGroup_60 GROUP::UndefinedGroup_61 GROUP::UndefinedGroup_62 GROUP::UndefinedGroup_63 GROUP::UndefinedGroup_64 GROUP::UndefinedGroup_65 GROUP::UndefinedGroup_66 GROUP::UndefinedGroup_67 GROUP::UndefinedGroup_68 GROUP::UndefinedGroup_69 GROUP::UndefinedGroup_7 GROUP::UndefinedGroup_70 GROUP::UndefinedGroup_71 GROUP::UndefinedGroup_72 GROUP::UndefinedGroup_73 GROUP::UndefinedGroup_8 GROUP::UndefinedGroup_9 REL::UndefinedGroup_11<->UndefinedGroup_15 REL::UndefinedGroup_14<->UndefinedGroup_63 REL::UndefinedGroup_19<->UndefinedGroup_3 REL::UndefinedGroup_20<->UndefinedGroup_21 REL::UndefinedGroup_21<->UndefinedGroup_55 REL::UndefinedGroup_22<->UndefinedGroup_30 REL::UndefinedGroup_24<->UndefinedGroup_25 REL::UndefinedGroup_24<->UndefinedGroup_46 REL::UndefinedGroup_24<->UndefinedGroup_59 REL::UndefinedGroup_25<->UndefinedGroup_41 REL::UndefinedGroup_26<->UndefinedGroup_32 REL::UndefinedGroup_30<->UndefinedGroup_34 REL::UndefinedGroup_30<->UndefinedGroup_52 REL::UndefinedGroup_33<->UndefinedGroup_41 REL::UndefinedGroup_37<->UndefinedGroup_42 REL::UndefinedGroup_39<->UndefinedGroup_51 REL::UndefinedGroup_3<->UndefinedGroup_44 REL::UndefinedGroup_3<->UndefinedGroup_45 REL::UndefinedGroup_41<->UndefinedGroup_49 REL::UndefinedGroup_43<->UndefinedGroup_57 REL::UndefinedGroup_52<->UndefinedGroup_53 REL::UndefinedGroup_52<->UndefinedGroup_66 REL::UndefinedGroup_53<->UndefinedGroup_9;
COLL::UndefinedGroup -> GROUP::UndefinedGroup;
COLL::UndefinedGroup_44 -> REL::UndefinedGroup_3<->UndefinedGroup_44;
COLL::UndefinedGroup_36 -> GROUP::UndefinedGroup_36;
REL::UndefinedGroup_20<->UndefinedGroup_21 -> GROUP::UndefinedGroup_20 GROUP::UndefinedGroup_21;
REL::UndefinedGroup_24<->UndefinedGroup_25 -> GROUP::UndefinedGroup_24 GROUP::UndefinedGroup_25;
REL::UndefinedGroup_3<->UndefinedGroup_45 -> GROUP::UndefinedGroup_3 GROUP::UndefinedGroup_45;
REL::UndefinedGroup_24<->UndefinedGroup_46 -> GROUP::UndefinedGroup_24 GROUP::UndefinedGroup_46;
REL::UndefinedGroup_30<->UndefinedGroup_34 -> GROUP::UndefinedGroup_30 GROUP::UndefinedGroup_34;
REL::UndefinedGroup_22<->UndefinedGroup_30 -> GROUP::UndefinedGroup_22 GROUP::UndefinedGroup_30;
REL::UndefinedGroup_19<->UndefinedGroup_3 -> GROUP::UndefinedGroup_19 GROUP::UndefinedGroup_3;
REL::UndefinedGroup_21<->UndefinedGroup_55 -> GROUP::UndefinedGroup_21 GROUP::UndefinedGroup_55;
REL::UndefinedGroup_37<->UndefinedGroup_42 -> GROUP::UndefinedGroup_37 GROUP::UndefinedGroup_42;
REL::UndefinedGroup_41<->UndefinedGroup_49 -> GROUP::UndefinedGroup_41 GROUP::UndefinedGroup_49;
REL::UndefinedGroup_25<->UndefinedGroup_41 -> GROUP::UndefinedGroup_25 GROUP::UndefinedGroup_41;
REL::UndefinedGroup_33<->UndefinedGroup_41 -> GROUP::UndefinedGroup_33 GROUP::UndefinedGroup_41;
REL::UndefinedGroup_14<->UndefinedGroup_63 -> GROUP::UndefinedGroup_14 GROUP::UndefinedGroup_63;
REL::UndefinedGroup_52<->UndefinedGroup_66 -> GROUP::UndefinedGroup_52 GROUP::UndefinedGroup_66;
REL::UndefinedGroup_53<->UndefinedGroup_9 -> GROUP::UndefinedGroup_53 GROUP::UndefinedGroup_9;
REL::UndefinedGroup_11<->UndefinedGroup_15 -> GROUP::UndefinedGroup_11 GROUP::UndefinedGroup_15;
REL::UndefinedGroup_39<->UndefinedGroup_51 -> GROUP::UndefinedGroup_39 GROUP::UndefinedGroup_51;
REL::UndefinedGroup_24<->UndefinedGroup_59 -> GROUP::UndefinedGroup_24 GROUP::UndefinedGroup_59;
REL::UndefinedGroup_43<->UndefinedGroup_57 -> GROUP::UndefinedGroup_43 GROUP::UndefinedGroup_57;
REL::UndefinedGroup_30<->UndefinedGroup_52 -> GROUP::UndefinedGroup_30 GROUP::UndefinedGroup_52;
REL::UndefinedGroup_3<->UndefinedGroup_44 -> GROUP::UndefinedGroup_3 GROUP::UndefinedGroup_44;
REL::UndefinedGroup_26<->UndefinedGroup_32 -> GROUP::UndefinedGroup_26 GROUP::UndefinedGroup_32;
REL::UndefinedGroup_52<->UndefinedGroup_53 -> GROUP::UndefinedGroup_52 GROUP::UndefinedGroup_53;
GROUP::UndefinedGroup -> ENT::AGE ENT::HISTORY ENT::SEX ENT::SIGN_SYMPTOM;
GROUP::UndefinedGroup_1 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_2 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_3 -> ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_4 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_5 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DOSAGE ENT::FREQUENCY ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_6 -> ENT::ACTIVITY ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_7 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_8 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SEX ENT::VALUE;
GROUP::UndefinedGroup_9 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_10 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_11 -> ENT::AGE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::FREQUENCY ENT::NONBIOLOGICAL_LOCATION ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_12 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_13 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_14 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_15 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::DURATION ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_16 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_17 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_18 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEX ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::TREATMENT ENT::VALUE ENT::VOLUME;
GROUP::UndefinedGroup_19 -> ENT::ADMINISTRATION ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SEX ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_20 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_21 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_ATTRIBUTE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_22 -> ENT::ADMINISTRATION ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DOSAGE ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_23 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DOSAGE ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_24 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_25 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SHAPE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_26 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::PERSONAL_BACKGROUND ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_27 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISTANCE ENT::SEX ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_28 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::COLOR ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_29 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEX ENT::SIGN_SYMPTOM ENT::TIME ENT::VALUE;
GROUP::UndefinedGroup_30 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DURATION ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_31 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_32 -> ENT::ACTIVITY ENT::ADMINISTRATION ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DOSAGE ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SEX ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_33 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::HISTORY ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_34 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_35 -> ENT::ACTIVITY ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_36 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_37 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_38 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_39 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::VALUE;
GROUP::UndefinedGroup_40 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_41 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_42 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_43 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::HISTORY ENT::SIGN_SYMPTOM ENT::TIME ENT::TREATMENT;
GROUP::UndefinedGroup_44 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_45 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::DOSAGE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_46 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_47 -> ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEX ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_48 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT;
GROUP::UndefinedGroup_49 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_50 -> ENT::ACTIVITY ENT::AGE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DURATION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_51 -> ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_52 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_53 -> ENT::ADMINISTRATION ENT::AREA ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_54 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISTANCE ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE ENT::VOLUME;
GROUP::UndefinedGroup_55 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::DURATION ENT::FREQUENCY ENT::SEX ENT::SIGN_SYMPTOM ENT::TEXTURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_56 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_57 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::CLINICAL_EVENT ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_58 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_59 -> ENT::BIOLOGICAL_STRUCTURE ENT::DATE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_60 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::FREQUENCY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_61 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_62 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::SEX ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_63 -> ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DURATION ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_64 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_65 -> ENT::ADMINISTRATION ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DISEASE_DISORDER ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_66 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SEVERITY ENT::SIGN_SYMPTOM;
GROUP::UndefinedGroup_67 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::VALUE;
GROUP::UndefinedGroup_68 -> ENT::AGE ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DISTANCE ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SEX ENT::SIGN_SYMPTOM ENT::SUBJECT ENT::TEXTURE ENT::TREATMENT;
GROUP::UndefinedGroup_69 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_70 -> ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DOSAGE ENT::SEVERITY ENT::SHAPE ENT::SIGN_SYMPTOM ENT::TREATMENT;
GROUP::UndefinedGroup_71 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::DOSAGE ENT::FREQUENCY ENT::NONBIOLOGICAL_LOCATION ENT::SEVERITY ENT::SIGN_SYMPTOM ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_72 -> ENT::BIOLOGICAL_STRUCTURE ENT::DETAILED_DESCRIPTION ENT::DIAGNOSTIC_PROCEDURE ENT::DISEASE_DISORDER ENT::SIGN_SYMPTOM ENT::SUBJECT ENT::TREATMENT ENT::VALUE;
GROUP::UndefinedGroup_73 -> ENT::DISEASE_DISORDER ENT::VALUE;The schema is now much smaller, and the groups are more meaningful.
But not all extracted trees provide valuable insights, so we could filter the structured instance to keep only the valid trees using schema.extract_valid_trees(new_forest). Let’s explore the different semantic groups. Groups represent common patterns across the corpus.
all_datasets = schema.extract_datasets(forest)
group, dataset = max(all_datasets.items(), key=lambda x: len(x[1]))
print(f'Group: {group}')
dataset
Group: UndefinedGroup_3| Loading ITables v2.8.1 from the internet... (need help?) |
Export as a property graph#
Now that we’ve integrated our two databases, we can export the result as a property graph.
ArchiTXT makes it easy to export structured data like a tree or forest directly into a property graph.
from architxt.database.export import export_cypher
from neo4j import GraphDatabase
driver = GraphDatabase.driver(uri, auth=('neo4j', 'password'))
with driver.session() as session:
export_cypher(forest, session=session)
Let’s explore the generated graph database.
from yfiles_jupyter_graphs_for_neo4j import Neo4jGraphWidget
g = Neo4jGraphWidget(driver)
g.show_cypher("""
MATCH (n)
OPTIONAL MATCH path = (n)-[*..4]-()
RETURN n, path
LIMIT 50
""")