Integrating multiple databases#

In this example, we use ArchiTXT to integrate two databases (Northwind and Chinook).

Load databases#

Northwind#

The Northwind sample database was provided with Microsoft Access as a tutorial schema for managing small business customers, orders, inventory, purchasing, suppliers, shipping, and employees. Northwind is an excellent tutorial schema for a small-business ERP, with customers, orders, inventory, purchasing, suppliers, shipping, employees, and single-entry accounting.

            ---
    config:
        theme: neutral
    ---
    erDiagram
        CustomerCustomerDemo }o--|| CustomerDemographics : "has"
        CustomerCustomerDemo }o--|| Customers : "relates to"

        Employees }|--o| Employees : "reports to"
        Employees ||--o{ EmployeeTerritories : "assigned to"

        Orders }o--|| Shippers : "ships via"
        Orders }o--|| Customers : "placed by"
        Orders }o--|| Employees : "sold by"

        "Order Details" }o--|| Orders : "contain"
        "Order Details" }o--|| Products : "refer to"

        Products }o--|| Categories : "categorized in"
        Products }o--|| Suppliers : "supplied by"

        Territories }|--|| Regions : "located in"
        EmployeeTerritories }o--|| Territories : "assigned to"

        Categories {
            int     CategoryID      PK
            string  CategoryName
            string  Description
            blob    Picture
        }

        CustomerCustomerDemo {
            string  CustomerID       PK, FK
            string  CustomerTypeID   PK, FK
        }

        CustomerDemographics {
            string  CustomerTypeID   PK
            string  CustomerDesc
        }

        Customers {
            string  CustomerID       PK
            string  CompanyName
            string  ContactName
            string  ContactTitle
            string  Address
            string  City
            string  Region
            string  PostalCode
            string  Country
            string  Phone
            string  Fax
        }

        Employees {
            int     EmployeeID       PK
            string  LastName
            string  FirstName
            string  Title
            string  TitleOfCourtesy
            date    BirthDate
            date    HireDate
            string  Address
            string  City
            string  Region
            string  PostalCode
            string  Country
            string  HomePhone
            string  Extension
            blob    Photo
            string  Notes
            int     ReportsTo        FK
            string  PhotoPath
        }

        EmployeeTerritories {
            int     EmployeeID       PK, FK
            int     TerritoryID      PK, FK
        }

        "Order Details" {
            int     OrderID          PK, FK
            int     ProductID        PK, FK
            float   UnitPrice
            int     Quantity
            real    Discount
        }

        Orders {
            int     OrderID          PK
            string  CustomerID       FK
            int     EmployeeID       FK
            datetime OrderDate
            datetime RequiredDate
            datetime ShippedDate
            int     ShipVia          FK
            numeric Freight
            string  ShipName
            string  ShipAddress
            string  ShipCity
            string  ShipRegion
            string  ShipPostalCode
            string  ShipCountry
        }

        Products {
            int     ProductID        PK
            string  ProductName
            int     SupplierID       FK
            int     CategoryID       FK
            int     QuantityPerUnit
            float   UnitPrice
            int     UnitsInStock
            int     UnitsOnOrder
            int     ReorderLevel
            string  Discontinued
        }

        Regions {
            int     RegionID         PK
            string  RegionDescription
        }

        Shippers {
            int     ShipperID        PK
            string  CompanyName
            string  Phone
        }

        Suppliers {
            int     SupplierID       PK
            string  CompanyName
            string  ContactName
            string  ContactTitle
            string  Address
            string  City
            string  Region
            string  PostalCode
            string  Country
            string  Phone
            string  Fax
            string  HomePage
        }

        Territories {
            string  TerritoryID      PK
            string  TerritoryDescription
            int     RegionID         FK
        }
    
Hide code cell source
import urllib.request
from pathlib import Path

db_file = Path('northwind.db')

if not db_file.exists():
    urllib.request.urlretrieve(
        'https://github.com/jpwhite3/northwind-SQLite3/raw/refs/heads/main/dist/northwind.db',
        filename=db_file,
    )
from architxt.database.loader import read_sql
from sqlalchemy import create_engine

with create_engine('sqlite:///northwind.db').connect() as connection:
    northwind_forest = list(read_sql(connection, sample=20))

Chinook#

The Chinook Sample Database is a fictional digital media store database used for learning and testing SQL queries. It is modeled after real-world online music stores like iTunes and includes data related to artists, albums, tracks, customers, invoices, employees, and genres.

            ---
    config:
        theme: neutral
    ---
    erDiagram
        Artist ||--|{ Album : "hasAuthor"

        Employee ||--|{ Customer : "supports"
        Employee ||--|{ Employee : "reportsTo"

        Customer ||--|{ Invoice : "makeOrder"
        Invoice ||--|{ InvoiceLine : "contains"
        Track ||--|{ InvoiceLine : "soldIn"

        Playlist ||--|{ PlaylistTrack : "includes"
        Track ||--|{ PlaylistTrack : "appearsIn"

        Album ||--|{ Track : "has"
        Genre ||--|{ Track : "hasGenre"
        MediaType ||--|{ Track : "encodedAs"

        Album {
            int     AlbumId   PK
            string  Title
            int     ArtistId  FK
        }
        Artist {
            int     ArtistId  PK
            string  Name
        }
        Customer {
            int     CustomerId    PK
            string  FirstName
            string  LastName
            string  Company
            string  Address
            string  City
            string  State
            string  Country
            string  PostalCode
            string  Phone
            string  Fax
            string  Email
            int     SupportRepId  FK
        }
        Employee {
            int     EmployeeId  PK
            string  LastName
            string  FirstName
            string  Title
            int     ReportsTo   FK
            date    BirthDate
            date    HireDate
            string  Address
            string  City
            string  State
            string  Country
            string  PostalCode
            string  Phone
            string  Fax
            string  Email
        }
        Genre {
            int     GenreId     PK
            string  Name
        }
        Invoice {
            int     InvoiceId           PK
            int     CustomerId          FK
            date    InvoiceDate
            string  BillingAddress
            string  BillingCity
            string  BillingState
            string  BillingCountry
            string  BillingPostalCode
            float   Total
        }
        InvoiceLine {
            int     InvoiceLineId   PK
            int     InvoiceId       FK
            int     TrackId         FK
            float   UnitPrice
            int     Quantity
        }
        MediaType {
            int     MediaTypeId PK
            string  Name
        }
        Playlist {
            int     PlaylistId  PK
            string  Name
        }
        PlaylistTrack {
            int PlaylistId  PK, FK
            int TrackId     PK, FK
        }
        Track {
            int     TrackId     PK
            string  Name
            int     AlbumId     FK
            int     MediaTypeId FK
            int     GenreId     FK
            string  Composer
            int     Milliseconds
            int     Bytes
            float   UnitPrice
        }
    
Hide code cell source
db_file = Path('chinook.db')

if not db_file.exists():
    urllib.request.urlretrieve(
        'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite',
        filename=db_file,
    )
with create_engine('sqlite:///chinook.db').connect() as connection:
    chinook_forest = list(read_sql(connection, sample=10))

Integration#

We combine the two forests representing the databases and inspect the schema.

from architxt.schema import Schema

forest = northwind_forest + chinook_forest
schema = Schema.from_forest(forest)
print(schema.as_cfg())
ROOT -> GROUP::Album GROUP::Artist GROUP::Categories GROUP::Customer GROUP::Customers GROUP::Employee GROUP::Employees GROUP::Genre GROUP::Invoice GROUP::InvoiceLine GROUP::MediaType GROUP::Order_Details GROUP::Orders GROUP::Playlist GROUP::Products GROUP::Regions GROUP::Shippers GROUP::Suppliers GROUP::Territories GROUP::Track REL::Album<->Artist REL::Customer<->Employee REL::Employee<->Employee REL::EmployeeTerritories REL::Employees<->Employees REL::Invoice<->Customer REL::InvoiceLine<->Invoice REL::InvoiceLine<->Track REL::Order_Details<->Orders REL::Order_Details<->Products REL::Orders<->Customers REL::Orders<->Employees REL::Orders<->Shippers REL::PlaylistTrack REL::Products<->Categories REL::Products<->Suppliers REL::Territories<->Regions REL::Track<->Album REL::Track<->Genre REL::Track<->MediaType;
REL::Employees<->Employees -> GROUP::Employees GROUP::Employees;
REL::EmployeeTerritories -> GROUP::Employees GROUP::Territories;
REL::Territories<->Regions -> GROUP::Regions GROUP::Territories;
REL::Order_Details<->Orders -> GROUP::Order_Details GROUP::Orders;
REL::Orders<->Customers -> GROUP::Customers GROUP::Orders;
REL::Orders<->Employees -> GROUP::Employees GROUP::Orders;
REL::Orders<->Shippers -> GROUP::Orders GROUP::Shippers;
REL::Order_Details<->Products -> GROUP::Order_Details GROUP::Products;
REL::Products<->Categories -> GROUP::Categories GROUP::Products;
REL::Products<->Suppliers -> GROUP::Products GROUP::Suppliers;
REL::PlaylistTrack -> GROUP::Playlist GROUP::Track;
REL::Track<->Album -> GROUP::Album GROUP::Track;
REL::Album<->Artist -> GROUP::Album GROUP::Artist;
REL::Track<->Genre -> GROUP::Genre GROUP::Track;
REL::Track<->MediaType -> GROUP::MediaType GROUP::Track;
REL::InvoiceLine<->Invoice -> GROUP::Invoice GROUP::InvoiceLine;
REL::Invoice<->Customer -> GROUP::Customer GROUP::Invoice;
REL::Customer<->Employee -> GROUP::Customer GROUP::Employee;
REL::Employee<->Employee -> GROUP::Employee GROUP::Employee;
REL::InvoiceLine<->Track -> GROUP::InvoiceLine GROUP::Track;
GROUP::Employees -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::EmployeeID ENT::Extension ENT::FirstName ENT::HireDate ENT::HomePhone ENT::LastName ENT::Notes ENT::Photo ENT::PhotoPath ENT::PostalCode ENT::Region ENT::Title ENT::TitleOfCourtesy;
GROUP::Territories -> ENT::TerritoryDescription ENT::TerritoryID;
GROUP::Regions -> ENT::RegionDescription ENT::RegionID;
GROUP::Order_Details -> ENT::Discount ENT::Quantity ENT::UnitPrice;
GROUP::Orders -> ENT::Freight ENT::OrderDate ENT::OrderID ENT::RequiredDate ENT::ShipAddress ENT::ShipCity ENT::ShipCountry ENT::ShipName ENT::ShipPostalCode ENT::ShipRegion ENT::ShippedDate;
GROUP::Customers -> ENT::Address ENT::City ENT::CompanyName ENT::ContactName ENT::ContactTitle ENT::Country ENT::CustomerID ENT::Fax ENT::Phone ENT::PostalCode ENT::Region;
GROUP::Shippers -> ENT::CompanyName ENT::Phone ENT::ShipperID;
GROUP::Products -> ENT::Discontinued ENT::ProductID ENT::ProductName ENT::QuantityPerUnit ENT::ReorderLevel ENT::UnitPrice ENT::UnitsInStock ENT::UnitsOnOrder;
GROUP::Categories -> ENT::CategoryID ENT::CategoryName ENT::Description ENT::Picture;
GROUP::Suppliers -> ENT::Address ENT::City ENT::CompanyName ENT::ContactName ENT::ContactTitle ENT::Country ENT::Fax ENT::HomePage ENT::Phone ENT::PostalCode ENT::Region ENT::SupplierID;
GROUP::Playlist -> ENT::Name ENT::PlaylistId;
GROUP::Track -> ENT::Bytes ENT::Composer ENT::Milliseconds ENT::Name ENT::TrackId ENT::UnitPrice;
GROUP::Album -> ENT::AlbumId ENT::Title;
GROUP::Artist -> ENT::ArtistId ENT::Name;
GROUP::Genre -> ENT::GenreId ENT::Name;
GROUP::MediaType -> ENT::MediaTypeId ENT::Name;
GROUP::InvoiceLine -> ENT::InvoiceLineId ENT::Quantity ENT::UnitPrice;
GROUP::Invoice -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::InvoiceDate ENT::InvoiceId ENT::Total;
GROUP::Customer -> ENT::Address ENT::City ENT::Company ENT::Country ENT::CustomerId ENT::Email ENT::Fax ENT::FirstName ENT::LastName ENT::Phone ENT::PostalCode ENT::State;
GROUP::Employee -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::Email ENT::EmployeeId ENT::Fax ENT::FirstName ENT::HireDate ENT::LastName ENT::Phone ENT::PostalCode ENT::State ENT::Title;

Explore the schema#

The schema is complex, and some groups like Orders and Invoice may be merged to simplify the instance. We first need to rename semantically equivalent properties to make them uniform across both databases.

We’ll use a mapping dictionary to rename entities and standardize them.

from architxt.tree import NodeLabel, NodeType, has_type

entity_mapping = {
    # Track / Product
    'TrackId': 'ProductID',
    # Invoice / Order
    'InvoiceId': 'OrderID',
    'InvoiceDate': 'OrderDate',
    'ShipAddress': 'BillingAddress',
    'ShipCity': 'BillingCity',
    'ShipCountry': 'BillingCountry',
    'ShipPostalCode': 'BillingPostalCode',
    'BillingState': 'State',
    'ShipRegion': 'Region',
    # Employees / Customers
    'CompanyName': 'Company',
    'HomePhone': 'Phone',
    'CustomerId': 'CustomerID',
    'EmployeeId': 'EmployeeID',
}

for tree in forest:
    for subtree in tree.subtrees():
        if has_type(subtree, NodeType.ENT) and subtree.label.name in entity_mapping:
            subtree.label = NodeLabel(NodeType.ENT, entity_mapping[subtree.label.name])

schema = Schema.from_forest(forest)
print(schema.as_cfg())
ROOT -> GROUP::Album GROUP::Artist GROUP::Categories GROUP::Customer GROUP::Customers GROUP::Employee GROUP::Employees GROUP::Genre GROUP::Invoice GROUP::InvoiceLine GROUP::MediaType GROUP::Order_Details GROUP::Orders GROUP::Playlist GROUP::Products GROUP::Regions GROUP::Shippers GROUP::Suppliers GROUP::Territories GROUP::Track REL::Album<->Artist REL::Customer<->Employee REL::Employee<->Employee REL::EmployeeTerritories REL::Employees<->Employees REL::Invoice<->Customer REL::InvoiceLine<->Invoice REL::InvoiceLine<->Track REL::Order_Details<->Orders REL::Order_Details<->Products REL::Orders<->Customers REL::Orders<->Employees REL::Orders<->Shippers REL::PlaylistTrack REL::Products<->Categories REL::Products<->Suppliers REL::Territories<->Regions REL::Track<->Album REL::Track<->Genre REL::Track<->MediaType;
REL::Employees<->Employees -> GROUP::Employees GROUP::Employees;
REL::EmployeeTerritories -> GROUP::Employees GROUP::Territories;
REL::Territories<->Regions -> GROUP::Regions GROUP::Territories;
REL::Order_Details<->Orders -> GROUP::Order_Details GROUP::Orders;
REL::Orders<->Customers -> GROUP::Customers GROUP::Orders;
REL::Orders<->Employees -> GROUP::Employees GROUP::Orders;
REL::Orders<->Shippers -> GROUP::Orders GROUP::Shippers;
REL::Order_Details<->Products -> GROUP::Order_Details GROUP::Products;
REL::Products<->Categories -> GROUP::Categories GROUP::Products;
REL::Products<->Suppliers -> GROUP::Products GROUP::Suppliers;
REL::PlaylistTrack -> GROUP::Playlist GROUP::Track;
REL::Track<->Album -> GROUP::Album GROUP::Track;
REL::Album<->Artist -> GROUP::Album GROUP::Artist;
REL::Track<->Genre -> GROUP::Genre GROUP::Track;
REL::Track<->MediaType -> GROUP::MediaType GROUP::Track;
REL::InvoiceLine<->Invoice -> GROUP::Invoice GROUP::InvoiceLine;
REL::Invoice<->Customer -> GROUP::Customer GROUP::Invoice;
REL::Customer<->Employee -> GROUP::Customer GROUP::Employee;
REL::Employee<->Employee -> GROUP::Employee GROUP::Employee;
REL::InvoiceLine<->Track -> GROUP::InvoiceLine GROUP::Track;
GROUP::Employees -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::EmployeeID ENT::Extension ENT::FirstName ENT::HireDate ENT::LastName ENT::Notes ENT::Phone ENT::Photo ENT::PhotoPath ENT::PostalCode ENT::Region ENT::Title ENT::TitleOfCourtesy;
GROUP::Territories -> ENT::TerritoryDescription ENT::TerritoryID;
GROUP::Regions -> ENT::RegionDescription ENT::RegionID;
GROUP::Order_Details -> ENT::Discount ENT::Quantity ENT::UnitPrice;
GROUP::Orders -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::Freight ENT::OrderDate ENT::OrderID ENT::Region ENT::RequiredDate ENT::ShipName ENT::ShippedDate;
GROUP::Customers -> ENT::Address ENT::City ENT::Company ENT::ContactName ENT::ContactTitle ENT::Country ENT::CustomerID ENT::Fax ENT::Phone ENT::PostalCode ENT::Region;
GROUP::Shippers -> ENT::Company ENT::Phone ENT::ShipperID;
GROUP::Products -> ENT::Discontinued ENT::ProductID ENT::ProductName ENT::QuantityPerUnit ENT::ReorderLevel ENT::UnitPrice ENT::UnitsInStock ENT::UnitsOnOrder;
GROUP::Categories -> ENT::CategoryID ENT::CategoryName ENT::Description ENT::Picture;
GROUP::Suppliers -> ENT::Address ENT::City ENT::Company ENT::ContactName ENT::ContactTitle ENT::Country ENT::Fax ENT::HomePage ENT::Phone ENT::PostalCode ENT::Region ENT::SupplierID;
GROUP::Playlist -> ENT::Name ENT::PlaylistId;
GROUP::Track -> ENT::Bytes ENT::Composer ENT::Milliseconds ENT::Name ENT::ProductID ENT::UnitPrice;
GROUP::Album -> ENT::AlbumId ENT::Title;
GROUP::Artist -> ENT::ArtistId ENT::Name;
GROUP::Genre -> ENT::GenreId ENT::Name;
GROUP::MediaType -> ENT::MediaTypeId ENT::Name;
GROUP::InvoiceLine -> ENT::InvoiceLineId ENT::Quantity ENT::UnitPrice;
GROUP::Invoice -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::OrderDate ENT::OrderID ENT::Total;
GROUP::Customer -> ENT::Address ENT::City ENT::Company ENT::Country ENT::CustomerID ENT::Email ENT::Fax ENT::FirstName ENT::LastName ENT::Phone ENT::PostalCode ENT::State;
GROUP::Employee -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::Email ENT::EmployeeID ENT::Fax ENT::FirstName ENT::HireDate ENT::LastName ENT::Phone ENT::PostalCode ENT::State ENT::Title;

Now that we’ve standardized the property names, we can visualize the initial distribution of groups in the schema.

To do this, let’s count how many instances of each group exist in the schema.

Hide code cell source
from collections import Counter

import pandas as pd
from architxt.tree import NodeType, has_type

group_counter = Counter(
    subtree.label.name for tree in forest for subtree in tree.subtrees() if has_type(subtree, NodeType.GROUP)
)

df_initial = pd.DataFrame.from_dict(group_counter, orient='index', columns=['Count'])
df_initial = df_initial.sort_values('Count', ascending=False)
df_initial.loc['Total'] = group_counter.total()
df_initial.index.name = 'Group'

df_initial
Loading ITables v2.4.2 from the internet... (need help?)

Next, we can use ArchiTXT’s tree clustering algorithm to identify which groups can be merged.

from architxt.similarity import equiv_cluster

clusters = equiv_cluster(forest, tau=0.85)

Let’s visualize the clustering result in a new table.

new_groups = {
    cluster_name: Counter(subtree.label.name for subtree in cluster if has_type(subtree, NodeType.GROUP))
    for cluster_name, cluster in clusters.items()
}

df_new = pd.DataFrame.from_dict(new_groups, orient='index').fillna(0).astype(int)
df_new['Total'] = df_new.sum(axis=1)
df_new = df_new.sort_values('Total', ascending=False)
df_new.index.name = 'Class'

df_new
Loading ITables v2.4.2 from the internet... (need help?)

Let’s visualize the clustering result as a bar chart to better understand the distribution of groups across equivalent classes.

Hide code cell source
import plotly.express as px

fig = px.bar(
    df_new.drop(columns=['Total']),
    labels={'value': 'Count', 'variable': 'Group', 'index': 'Class'},
    title='Groups repartition per Equivalent Class',
)

fig.update_layout(barmode='stack', xaxis_title='Equivalent Class', yaxis_title='Count', xaxis_tickangle=-45)
fig.show()

Simplification#

Now that we saw potential groupings, we can use ArchiTXT to perform simplification and rewrite the forest into a condensed version.

from architxt.simplification.tree_rewriting import rewrite

rewrite(forest, tau=0.85, min_support=10, epoch=10, edit_ops=[])
2025/06/23 17:25:05 WARNING mlflow.tracing.processor.mlflow: Creating a trace within the default experiment with id '0'. It is strongly recommended to not use the default experiment to log traces due to ambiguous search results and probable performance issues over time due to directory table listing performance degradation with high volumes of directories within a specific path. To avoid performance and disambiguation issues, set the experiment for your environment using `mlflow.set_experiment()` API.
2025/06/23 17:25:05 WARNING mlflow.tracing.processor.mlflow: Creating a trace within the default experiment with id '0'. It is strongly recommended to not use the default experiment to log traces due to ambiguous search results and probable performance issues over time due to directory table listing performance degradation with high volumes of directories within a specific path. To avoid performance and disambiguation issues, set the experiment for your environment using `mlflow.set_experiment()` API.
<architxt.metrics.Metrics at 0x7fb3e4972540>
new_schema = Schema.from_forest(forest)
print(new_schema.as_cfg())
ROOT -> COLL::Employees_0 GROUP::Album_10 GROUP::Artist_11 GROUP::Employees_0 GROUP::Genre_12 GROUP::InvoiceLine_14 GROUP::MediaType_13 GROUP::Order_Details_3 GROUP::Orders_4 GROUP::Playlist_8 GROUP::Products_7 GROUP::Regions_2 GROUP::Shippers_6 GROUP::Suppliers_5 GROUP::Territories_1 GROUP::Track_9 REL::Album_10<->Artist_11 REL::Album_10<->Track_9 REL::Employees_0<->Employees_0 REL::Employees_0<->Orders_4 REL::Employees_0<->Territories_1 REL::Genre_12<->Track_9 REL::InvoiceLine_14<->Orders_4 REL::InvoiceLine_14<->Track_9 REL::MediaType_13<->Track_9 REL::Order_Details_3<->Orders_4 REL::Order_Details_3<->Products_7 REL::Orders_4<->Shippers_6 REL::Orders_4<->Suppliers_5 REL::Playlist_8<->Track_9 REL::Products_7<->Products_7 REL::Products_7<->Suppliers_5 REL::Regions_2<->Territories_1;
COLL::Employees_0 -> GROUP::Employees_0;
REL::Employees_0<->Employees_0 -> GROUP::Employees_0 GROUP::Employees_0;
REL::Employees_0<->Territories_1 -> GROUP::Employees_0 GROUP::Territories_1;
REL::Regions_2<->Territories_1 -> GROUP::Regions_2 GROUP::Territories_1;
REL::Order_Details_3<->Orders_4 -> GROUP::Order_Details_3 GROUP::Orders_4;
REL::Orders_4<->Suppliers_5 -> GROUP::Orders_4 GROUP::Suppliers_5;
REL::Employees_0<->Orders_4 -> GROUP::Employees_0 GROUP::Orders_4;
REL::Orders_4<->Shippers_6 -> GROUP::Orders_4 GROUP::Shippers_6;
REL::Order_Details_3<->Products_7 -> GROUP::Order_Details_3 GROUP::Products_7;
REL::Products_7<->Products_7 -> GROUP::Products_7 GROUP::Products_7;
REL::Products_7<->Suppliers_5 -> GROUP::Products_7 GROUP::Suppliers_5;
REL::Playlist_8<->Track_9 -> GROUP::Playlist_8 GROUP::Track_9;
REL::Album_10<->Track_9 -> GROUP::Album_10 GROUP::Track_9;
REL::Album_10<->Artist_11 -> GROUP::Album_10 GROUP::Artist_11;
REL::Genre_12<->Track_9 -> GROUP::Genre_12 GROUP::Track_9;
REL::MediaType_13<->Track_9 -> GROUP::MediaType_13 GROUP::Track_9;
REL::InvoiceLine_14<->Orders_4 -> GROUP::InvoiceLine_14 GROUP::Orders_4;
REL::InvoiceLine_14<->Track_9 -> GROUP::InvoiceLine_14 GROUP::Track_9;
GROUP::Employees_0 -> ENT::Address ENT::BirthDate ENT::City ENT::Company ENT::Country ENT::CustomerID ENT::Email ENT::EmployeeID ENT::Extension ENT::Fax ENT::FirstName ENT::HireDate ENT::LastName ENT::Notes ENT::Phone ENT::Photo ENT::PhotoPath ENT::PostalCode ENT::Region ENT::State ENT::Title ENT::TitleOfCourtesy;
GROUP::Territories_1 -> ENT::TerritoryDescription ENT::TerritoryID;
GROUP::Regions_2 -> ENT::RegionDescription ENT::RegionID;
GROUP::Order_Details_3 -> ENT::Discount ENT::Quantity ENT::UnitPrice;
GROUP::Orders_4 -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::Freight ENT::OrderDate ENT::OrderID ENT::Region ENT::RequiredDate ENT::ShipName ENT::ShippedDate ENT::Total;
GROUP::Suppliers_5 -> ENT::Address ENT::City ENT::Company ENT::ContactName ENT::ContactTitle ENT::Country ENT::CustomerID ENT::Fax ENT::HomePage ENT::Phone ENT::PostalCode ENT::Region ENT::SupplierID;
GROUP::Shippers_6 -> ENT::Company ENT::Phone ENT::ShipperID;
GROUP::Products_7 -> ENT::CategoryID ENT::CategoryName ENT::Description ENT::Discontinued ENT::Picture ENT::ProductID ENT::ProductName ENT::QuantityPerUnit ENT::ReorderLevel ENT::UnitPrice ENT::UnitsInStock ENT::UnitsOnOrder;
GROUP::Playlist_8 -> ENT::Name ENT::PlaylistId;
GROUP::Track_9 -> ENT::Bytes ENT::Composer ENT::Milliseconds ENT::Name ENT::ProductID ENT::UnitPrice;
GROUP::Album_10 -> ENT::AlbumId ENT::Title;
GROUP::Artist_11 -> ENT::ArtistId ENT::Name;
GROUP::Genre_12 -> ENT::GenreId ENT::Name;
GROUP::MediaType_13 -> ENT::MediaTypeId ENT::Name;
GROUP::InvoiceLine_14 -> ENT::InvoiceLineId ENT::Quantity ENT::UnitPrice;
all_datasets = new_schema.extract_datasets(forest)
group, dataset = max(all_datasets.items(), key=lambda x: len(x[1]))

print(f'Group: {group}')

dataset
Group: Employees_0
Loading ITables v2.4.2 from the internet... (need help?)