Integrating multiple databases#
In this example, we use ArchiTXT to integrate two databases (Northwind and Chinook).
Load databases#
Northwind#
The Northwind sample database was provided with Microsoft Access as a tutorial schema for managing small business customers, orders, inventory, purchasing, suppliers, shipping, and employees. Northwind is an excellent tutorial schema for a small-business ERP, with customers, orders, inventory, purchasing, suppliers, shipping, employees, and single-entry accounting.
--- config: theme: neutral --- erDiagram CustomerCustomerDemo }o--|| CustomerDemographics : "has" CustomerCustomerDemo }o--|| Customers : "relates to" Employees }|--o| Employees : "reports to" Employees ||--o{ EmployeeTerritories : "assigned to" Orders }o--|| Shippers : "ships via" Orders }o--|| Customers : "placed by" Orders }o--|| Employees : "sold by" "Order Details" }o--|| Orders : "contain" "Order Details" }o--|| Products : "refer to" Products }o--|| Categories : "categorized in" Products }o--|| Suppliers : "supplied by" Territories }|--|| Regions : "located in" EmployeeTerritories }o--|| Territories : "assigned to" Categories { int CategoryID PK string CategoryName string Description blob Picture } CustomerCustomerDemo { string CustomerID PK, FK string CustomerTypeID PK, FK } CustomerDemographics { string CustomerTypeID PK string CustomerDesc } Customers { string CustomerID PK string CompanyName string ContactName string ContactTitle string Address string City string Region string PostalCode string Country string Phone string Fax } Employees { int EmployeeID PK string LastName string FirstName string Title string TitleOfCourtesy date BirthDate date HireDate string Address string City string Region string PostalCode string Country string HomePhone string Extension blob Photo string Notes int ReportsTo FK string PhotoPath } EmployeeTerritories { int EmployeeID PK, FK int TerritoryID PK, FK } "Order Details" { int OrderID PK, FK int ProductID PK, FK float UnitPrice int Quantity real Discount } Orders { int OrderID PK string CustomerID FK int EmployeeID FK datetime OrderDate datetime RequiredDate datetime ShippedDate int ShipVia FK numeric Freight string ShipName string ShipAddress string ShipCity string ShipRegion string ShipPostalCode string ShipCountry } Products { int ProductID PK string ProductName int SupplierID FK int CategoryID FK int QuantityPerUnit float UnitPrice int UnitsInStock int UnitsOnOrder int ReorderLevel string Discontinued } Regions { int RegionID PK string RegionDescription } Shippers { int ShipperID PK string CompanyName string Phone } Suppliers { int SupplierID PK string CompanyName string ContactName string ContactTitle string Address string City string Region string PostalCode string Country string Phone string Fax string HomePage } Territories { string TerritoryID PK string TerritoryDescription int RegionID FK }
Show code cell source
import urllib.request
from pathlib import Path
db_file = Path('northwind.db')
if not db_file.exists():
urllib.request.urlretrieve(
'https://github.com/jpwhite3/northwind-SQLite3/raw/refs/heads/main/dist/northwind.db',
filename=db_file,
)
from architxt.database.loader import read_sql
from sqlalchemy import create_engine
with create_engine('sqlite:///northwind.db').connect() as connection:
northwind_forest = list(read_sql(connection, sample=20))
Chinook#
The Chinook Sample Database is a fictional digital media store database used for learning and testing SQL queries. It is modeled after real-world online music stores like iTunes and includes data related to artists, albums, tracks, customers, invoices, employees, and genres.
--- config: theme: neutral --- erDiagram Artist ||--|{ Album : "hasAuthor" Employee ||--|{ Customer : "supports" Employee ||--|{ Employee : "reportsTo" Customer ||--|{ Invoice : "makeOrder" Invoice ||--|{ InvoiceLine : "contains" Track ||--|{ InvoiceLine : "soldIn" Playlist ||--|{ PlaylistTrack : "includes" Track ||--|{ PlaylistTrack : "appearsIn" Album ||--|{ Track : "has" Genre ||--|{ Track : "hasGenre" MediaType ||--|{ Track : "encodedAs" Album { int AlbumId PK string Title int ArtistId FK } Artist { int ArtistId PK string Name } Customer { int CustomerId PK string FirstName string LastName string Company string Address string City string State string Country string PostalCode string Phone string Fax string Email int SupportRepId FK } Employee { int EmployeeId PK string LastName string FirstName string Title int ReportsTo FK date BirthDate date HireDate string Address string City string State string Country string PostalCode string Phone string Fax string Email } Genre { int GenreId PK string Name } Invoice { int InvoiceId PK int CustomerId FK date InvoiceDate string BillingAddress string BillingCity string BillingState string BillingCountry string BillingPostalCode float Total } InvoiceLine { int InvoiceLineId PK int InvoiceId FK int TrackId FK float UnitPrice int Quantity } MediaType { int MediaTypeId PK string Name } Playlist { int PlaylistId PK string Name } PlaylistTrack { int PlaylistId PK, FK int TrackId PK, FK } Track { int TrackId PK string Name int AlbumId FK int MediaTypeId FK int GenreId FK string Composer int Milliseconds int Bytes float UnitPrice }
Show code cell source
db_file = Path('chinook.db')
if not db_file.exists():
urllib.request.urlretrieve(
'https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite',
filename=db_file,
)
with create_engine('sqlite:///chinook.db').connect() as connection:
chinook_forest = list(read_sql(connection, sample=10))
Integration#
We combine the two forests representing the databases and inspect the schema.
from architxt.schema import Schema
forest = northwind_forest + chinook_forest
schema = Schema.from_forest(forest)
print(schema.as_cfg())
ROOT -> GROUP::Album GROUP::Artist GROUP::Categories GROUP::Customer GROUP::Customers GROUP::Employee GROUP::Employees GROUP::Genre GROUP::Invoice GROUP::InvoiceLine GROUP::MediaType GROUP::Order_Details GROUP::Orders GROUP::Playlist GROUP::Products GROUP::Regions GROUP::Shippers GROUP::Suppliers GROUP::Territories GROUP::Track REL::Album<->Artist REL::Customer<->Employee REL::Employee<->Employee REL::EmployeeTerritories REL::Employees<->Employees REL::Invoice<->Customer REL::InvoiceLine<->Invoice REL::InvoiceLine<->Track REL::Order_Details<->Orders REL::Order_Details<->Products REL::Orders<->Customers REL::Orders<->Employees REL::Orders<->Shippers REL::PlaylistTrack REL::Products<->Categories REL::Products<->Suppliers REL::Territories<->Regions REL::Track<->Album REL::Track<->Genre REL::Track<->MediaType;
REL::Employees<->Employees -> GROUP::Employees GROUP::Employees;
REL::EmployeeTerritories -> GROUP::Employees GROUP::Territories;
REL::Territories<->Regions -> GROUP::Regions GROUP::Territories;
REL::Order_Details<->Orders -> GROUP::Order_Details GROUP::Orders;
REL::Orders<->Customers -> GROUP::Customers GROUP::Orders;
REL::Orders<->Employees -> GROUP::Employees GROUP::Orders;
REL::Orders<->Shippers -> GROUP::Orders GROUP::Shippers;
REL::Order_Details<->Products -> GROUP::Order_Details GROUP::Products;
REL::Products<->Categories -> GROUP::Categories GROUP::Products;
REL::Products<->Suppliers -> GROUP::Products GROUP::Suppliers;
REL::PlaylistTrack -> GROUP::Playlist GROUP::Track;
REL::Track<->Album -> GROUP::Album GROUP::Track;
REL::Album<->Artist -> GROUP::Album GROUP::Artist;
REL::Track<->Genre -> GROUP::Genre GROUP::Track;
REL::Track<->MediaType -> GROUP::MediaType GROUP::Track;
REL::InvoiceLine<->Invoice -> GROUP::Invoice GROUP::InvoiceLine;
REL::Invoice<->Customer -> GROUP::Customer GROUP::Invoice;
REL::Customer<->Employee -> GROUP::Customer GROUP::Employee;
REL::Employee<->Employee -> GROUP::Employee GROUP::Employee;
REL::InvoiceLine<->Track -> GROUP::InvoiceLine GROUP::Track;
GROUP::Employees -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::EmployeeID ENT::Extension ENT::FirstName ENT::HireDate ENT::HomePhone ENT::LastName ENT::Notes ENT::Photo ENT::PhotoPath ENT::PostalCode ENT::Region ENT::Title ENT::TitleOfCourtesy;
GROUP::Territories -> ENT::TerritoryDescription ENT::TerritoryID;
GROUP::Regions -> ENT::RegionDescription ENT::RegionID;
GROUP::Order_Details -> ENT::Discount ENT::Quantity ENT::UnitPrice;
GROUP::Orders -> ENT::Freight ENT::OrderDate ENT::OrderID ENT::RequiredDate ENT::ShipAddress ENT::ShipCity ENT::ShipCountry ENT::ShipName ENT::ShipPostalCode ENT::ShipRegion ENT::ShippedDate;
GROUP::Customers -> ENT::Address ENT::City ENT::CompanyName ENT::ContactName ENT::ContactTitle ENT::Country ENT::CustomerID ENT::Fax ENT::Phone ENT::PostalCode ENT::Region;
GROUP::Shippers -> ENT::CompanyName ENT::Phone ENT::ShipperID;
GROUP::Products -> ENT::Discontinued ENT::ProductID ENT::ProductName ENT::QuantityPerUnit ENT::ReorderLevel ENT::UnitPrice ENT::UnitsInStock ENT::UnitsOnOrder;
GROUP::Categories -> ENT::CategoryID ENT::CategoryName ENT::Description ENT::Picture;
GROUP::Suppliers -> ENT::Address ENT::City ENT::CompanyName ENT::ContactName ENT::ContactTitle ENT::Country ENT::Fax ENT::HomePage ENT::Phone ENT::PostalCode ENT::Region ENT::SupplierID;
GROUP::Playlist -> ENT::Name ENT::PlaylistId;
GROUP::Track -> ENT::Bytes ENT::Composer ENT::Milliseconds ENT::Name ENT::TrackId ENT::UnitPrice;
GROUP::Album -> ENT::AlbumId ENT::Title;
GROUP::Artist -> ENT::ArtistId ENT::Name;
GROUP::Genre -> ENT::GenreId ENT::Name;
GROUP::MediaType -> ENT::MediaTypeId ENT::Name;
GROUP::InvoiceLine -> ENT::InvoiceLineId ENT::Quantity ENT::UnitPrice;
GROUP::Invoice -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::InvoiceDate ENT::InvoiceId ENT::Total;
GROUP::Customer -> ENT::Address ENT::City ENT::Company ENT::Country ENT::CustomerId ENT::Email ENT::Fax ENT::FirstName ENT::LastName ENT::Phone ENT::PostalCode ENT::State;
GROUP::Employee -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::Email ENT::EmployeeId ENT::Fax ENT::FirstName ENT::HireDate ENT::LastName ENT::Phone ENT::PostalCode ENT::State ENT::Title;
Explore the schema#
The schema is complex, and some groups like Orders
and Invoice
may be merged to simplify the instance.
We first need to rename semantically equivalent properties to make them uniform across both databases.
We’ll use a mapping dictionary to rename entities and standardize them.
from architxt.tree import NodeLabel, NodeType, has_type
entity_mapping = {
# Track / Product
'TrackId': 'ProductID',
# Invoice / Order
'InvoiceId': 'OrderID',
'InvoiceDate': 'OrderDate',
'ShipAddress': 'BillingAddress',
'ShipCity': 'BillingCity',
'ShipCountry': 'BillingCountry',
'ShipPostalCode': 'BillingPostalCode',
'BillingState': 'State',
'ShipRegion': 'Region',
# Employees / Customers
'CompanyName': 'Company',
'HomePhone': 'Phone',
'CustomerId': 'CustomerID',
'EmployeeId': 'EmployeeID',
}
for tree in forest:
for subtree in tree.subtrees():
if has_type(subtree, NodeType.ENT) and subtree.label.name in entity_mapping:
subtree.label = NodeLabel(NodeType.ENT, entity_mapping[subtree.label.name])
schema = Schema.from_forest(forest)
print(schema.as_cfg())
ROOT -> GROUP::Album GROUP::Artist GROUP::Categories GROUP::Customer GROUP::Customers GROUP::Employee GROUP::Employees GROUP::Genre GROUP::Invoice GROUP::InvoiceLine GROUP::MediaType GROUP::Order_Details GROUP::Orders GROUP::Playlist GROUP::Products GROUP::Regions GROUP::Shippers GROUP::Suppliers GROUP::Territories GROUP::Track REL::Album<->Artist REL::Customer<->Employee REL::Employee<->Employee REL::EmployeeTerritories REL::Employees<->Employees REL::Invoice<->Customer REL::InvoiceLine<->Invoice REL::InvoiceLine<->Track REL::Order_Details<->Orders REL::Order_Details<->Products REL::Orders<->Customers REL::Orders<->Employees REL::Orders<->Shippers REL::PlaylistTrack REL::Products<->Categories REL::Products<->Suppliers REL::Territories<->Regions REL::Track<->Album REL::Track<->Genre REL::Track<->MediaType;
REL::Employees<->Employees -> GROUP::Employees GROUP::Employees;
REL::EmployeeTerritories -> GROUP::Employees GROUP::Territories;
REL::Territories<->Regions -> GROUP::Regions GROUP::Territories;
REL::Order_Details<->Orders -> GROUP::Order_Details GROUP::Orders;
REL::Orders<->Customers -> GROUP::Customers GROUP::Orders;
REL::Orders<->Employees -> GROUP::Employees GROUP::Orders;
REL::Orders<->Shippers -> GROUP::Orders GROUP::Shippers;
REL::Order_Details<->Products -> GROUP::Order_Details GROUP::Products;
REL::Products<->Categories -> GROUP::Categories GROUP::Products;
REL::Products<->Suppliers -> GROUP::Products GROUP::Suppliers;
REL::PlaylistTrack -> GROUP::Playlist GROUP::Track;
REL::Track<->Album -> GROUP::Album GROUP::Track;
REL::Album<->Artist -> GROUP::Album GROUP::Artist;
REL::Track<->Genre -> GROUP::Genre GROUP::Track;
REL::Track<->MediaType -> GROUP::MediaType GROUP::Track;
REL::InvoiceLine<->Invoice -> GROUP::Invoice GROUP::InvoiceLine;
REL::Invoice<->Customer -> GROUP::Customer GROUP::Invoice;
REL::Customer<->Employee -> GROUP::Customer GROUP::Employee;
REL::Employee<->Employee -> GROUP::Employee GROUP::Employee;
REL::InvoiceLine<->Track -> GROUP::InvoiceLine GROUP::Track;
GROUP::Employees -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::EmployeeID ENT::Extension ENT::FirstName ENT::HireDate ENT::LastName ENT::Notes ENT::Phone ENT::Photo ENT::PhotoPath ENT::PostalCode ENT::Region ENT::Title ENT::TitleOfCourtesy;
GROUP::Territories -> ENT::TerritoryDescription ENT::TerritoryID;
GROUP::Regions -> ENT::RegionDescription ENT::RegionID;
GROUP::Order_Details -> ENT::Discount ENT::Quantity ENT::UnitPrice;
GROUP::Orders -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::Freight ENT::OrderDate ENT::OrderID ENT::Region ENT::RequiredDate ENT::ShipName ENT::ShippedDate;
GROUP::Customers -> ENT::Address ENT::City ENT::Company ENT::ContactName ENT::ContactTitle ENT::Country ENT::CustomerID ENT::Fax ENT::Phone ENT::PostalCode ENT::Region;
GROUP::Shippers -> ENT::Company ENT::Phone ENT::ShipperID;
GROUP::Products -> ENT::Discontinued ENT::ProductID ENT::ProductName ENT::QuantityPerUnit ENT::ReorderLevel ENT::UnitPrice ENT::UnitsInStock ENT::UnitsOnOrder;
GROUP::Categories -> ENT::CategoryID ENT::CategoryName ENT::Description ENT::Picture;
GROUP::Suppliers -> ENT::Address ENT::City ENT::Company ENT::ContactName ENT::ContactTitle ENT::Country ENT::Fax ENT::HomePage ENT::Phone ENT::PostalCode ENT::Region ENT::SupplierID;
GROUP::Playlist -> ENT::Name ENT::PlaylistId;
GROUP::Track -> ENT::Bytes ENT::Composer ENT::Milliseconds ENT::Name ENT::ProductID ENT::UnitPrice;
GROUP::Album -> ENT::AlbumId ENT::Title;
GROUP::Artist -> ENT::ArtistId ENT::Name;
GROUP::Genre -> ENT::GenreId ENT::Name;
GROUP::MediaType -> ENT::MediaTypeId ENT::Name;
GROUP::InvoiceLine -> ENT::InvoiceLineId ENT::Quantity ENT::UnitPrice;
GROUP::Invoice -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::OrderDate ENT::OrderID ENT::Total;
GROUP::Customer -> ENT::Address ENT::City ENT::Company ENT::Country ENT::CustomerID ENT::Email ENT::Fax ENT::FirstName ENT::LastName ENT::Phone ENT::PostalCode ENT::State;
GROUP::Employee -> ENT::Address ENT::BirthDate ENT::City ENT::Country ENT::Email ENT::EmployeeID ENT::Fax ENT::FirstName ENT::HireDate ENT::LastName ENT::Phone ENT::PostalCode ENT::State ENT::Title;
Now that we’ve standardized the property names, we can visualize the initial distribution of groups in the schema.
To do this, let’s count how many instances of each group exist in the schema.
Show code cell source
from collections import Counter
import pandas as pd
from architxt.tree import NodeType, has_type
group_counter = Counter(
subtree.label.name for tree in forest for subtree in tree.subtrees() if has_type(subtree, NodeType.GROUP)
)
df_initial = pd.DataFrame.from_dict(group_counter, orient='index', columns=['Count'])
df_initial = df_initial.sort_values('Count', ascending=False)
df_initial.loc['Total'] = group_counter.total()
df_initial.index.name = 'Group'
df_initial
Loading ITables v2.4.2 from the internet... (need help?) |
Next, we can use ArchiTXT’s tree clustering algorithm to identify which groups can be merged.
from architxt.similarity import equiv_cluster
clusters = equiv_cluster(forest, tau=0.85)
Let’s visualize the clustering result in a new table.
new_groups = {
cluster_name: Counter(subtree.label.name for subtree in cluster if has_type(subtree, NodeType.GROUP))
for cluster_name, cluster in clusters.items()
}
df_new = pd.DataFrame.from_dict(new_groups, orient='index').fillna(0).astype(int)
df_new['Total'] = df_new.sum(axis=1)
df_new = df_new.sort_values('Total', ascending=False)
df_new.index.name = 'Class'
df_new
Loading ITables v2.4.2 from the internet... (need help?) |
Let’s visualize the clustering result as a bar chart to better understand the distribution of groups across equivalent classes.
Show code cell source
import plotly.express as px
fig = px.bar(
df_new.drop(columns=['Total']),
labels={'value': 'Count', 'variable': 'Group', 'index': 'Class'},
title='Groups repartition per Equivalent Class',
)
fig.update_layout(barmode='stack', xaxis_title='Equivalent Class', yaxis_title='Count', xaxis_tickangle=-45)
fig.show()
Simplification#
Now that we saw potential groupings, we can use ArchiTXT to perform simplification and rewrite the forest into a condensed version.
from architxt.simplification.tree_rewriting import rewrite
rewrite(forest, tau=0.85, min_support=10, epoch=10, edit_ops=[])
2025/06/23 17:25:05 WARNING mlflow.tracing.processor.mlflow: Creating a trace within the default experiment with id '0'. It is strongly recommended to not use the default experiment to log traces due to ambiguous search results and probable performance issues over time due to directory table listing performance degradation with high volumes of directories within a specific path. To avoid performance and disambiguation issues, set the experiment for your environment using `mlflow.set_experiment()` API.
2025/06/23 17:25:05 WARNING mlflow.tracing.processor.mlflow: Creating a trace within the default experiment with id '0'. It is strongly recommended to not use the default experiment to log traces due to ambiguous search results and probable performance issues over time due to directory table listing performance degradation with high volumes of directories within a specific path. To avoid performance and disambiguation issues, set the experiment for your environment using `mlflow.set_experiment()` API.
<architxt.metrics.Metrics at 0x7fb3e4972540>
new_schema = Schema.from_forest(forest)
print(new_schema.as_cfg())
ROOT -> COLL::Employees_0 GROUP::Album_10 GROUP::Artist_11 GROUP::Employees_0 GROUP::Genre_12 GROUP::InvoiceLine_14 GROUP::MediaType_13 GROUP::Order_Details_3 GROUP::Orders_4 GROUP::Playlist_8 GROUP::Products_7 GROUP::Regions_2 GROUP::Shippers_6 GROUP::Suppliers_5 GROUP::Territories_1 GROUP::Track_9 REL::Album_10<->Artist_11 REL::Album_10<->Track_9 REL::Employees_0<->Employees_0 REL::Employees_0<->Orders_4 REL::Employees_0<->Territories_1 REL::Genre_12<->Track_9 REL::InvoiceLine_14<->Orders_4 REL::InvoiceLine_14<->Track_9 REL::MediaType_13<->Track_9 REL::Order_Details_3<->Orders_4 REL::Order_Details_3<->Products_7 REL::Orders_4<->Shippers_6 REL::Orders_4<->Suppliers_5 REL::Playlist_8<->Track_9 REL::Products_7<->Products_7 REL::Products_7<->Suppliers_5 REL::Regions_2<->Territories_1;
COLL::Employees_0 -> GROUP::Employees_0;
REL::Employees_0<->Employees_0 -> GROUP::Employees_0 GROUP::Employees_0;
REL::Employees_0<->Territories_1 -> GROUP::Employees_0 GROUP::Territories_1;
REL::Regions_2<->Territories_1 -> GROUP::Regions_2 GROUP::Territories_1;
REL::Order_Details_3<->Orders_4 -> GROUP::Order_Details_3 GROUP::Orders_4;
REL::Orders_4<->Suppliers_5 -> GROUP::Orders_4 GROUP::Suppliers_5;
REL::Employees_0<->Orders_4 -> GROUP::Employees_0 GROUP::Orders_4;
REL::Orders_4<->Shippers_6 -> GROUP::Orders_4 GROUP::Shippers_6;
REL::Order_Details_3<->Products_7 -> GROUP::Order_Details_3 GROUP::Products_7;
REL::Products_7<->Products_7 -> GROUP::Products_7 GROUP::Products_7;
REL::Products_7<->Suppliers_5 -> GROUP::Products_7 GROUP::Suppliers_5;
REL::Playlist_8<->Track_9 -> GROUP::Playlist_8 GROUP::Track_9;
REL::Album_10<->Track_9 -> GROUP::Album_10 GROUP::Track_9;
REL::Album_10<->Artist_11 -> GROUP::Album_10 GROUP::Artist_11;
REL::Genre_12<->Track_9 -> GROUP::Genre_12 GROUP::Track_9;
REL::MediaType_13<->Track_9 -> GROUP::MediaType_13 GROUP::Track_9;
REL::InvoiceLine_14<->Orders_4 -> GROUP::InvoiceLine_14 GROUP::Orders_4;
REL::InvoiceLine_14<->Track_9 -> GROUP::InvoiceLine_14 GROUP::Track_9;
GROUP::Employees_0 -> ENT::Address ENT::BirthDate ENT::City ENT::Company ENT::Country ENT::CustomerID ENT::Email ENT::EmployeeID ENT::Extension ENT::Fax ENT::FirstName ENT::HireDate ENT::LastName ENT::Notes ENT::Phone ENT::Photo ENT::PhotoPath ENT::PostalCode ENT::Region ENT::State ENT::Title ENT::TitleOfCourtesy;
GROUP::Territories_1 -> ENT::TerritoryDescription ENT::TerritoryID;
GROUP::Regions_2 -> ENT::RegionDescription ENT::RegionID;
GROUP::Order_Details_3 -> ENT::Discount ENT::Quantity ENT::UnitPrice;
GROUP::Orders_4 -> ENT::BillingAddress ENT::BillingCity ENT::BillingCountry ENT::BillingPostalCode ENT::Freight ENT::OrderDate ENT::OrderID ENT::Region ENT::RequiredDate ENT::ShipName ENT::ShippedDate ENT::Total;
GROUP::Suppliers_5 -> ENT::Address ENT::City ENT::Company ENT::ContactName ENT::ContactTitle ENT::Country ENT::CustomerID ENT::Fax ENT::HomePage ENT::Phone ENT::PostalCode ENT::Region ENT::SupplierID;
GROUP::Shippers_6 -> ENT::Company ENT::Phone ENT::ShipperID;
GROUP::Products_7 -> ENT::CategoryID ENT::CategoryName ENT::Description ENT::Discontinued ENT::Picture ENT::ProductID ENT::ProductName ENT::QuantityPerUnit ENT::ReorderLevel ENT::UnitPrice ENT::UnitsInStock ENT::UnitsOnOrder;
GROUP::Playlist_8 -> ENT::Name ENT::PlaylistId;
GROUP::Track_9 -> ENT::Bytes ENT::Composer ENT::Milliseconds ENT::Name ENT::ProductID ENT::UnitPrice;
GROUP::Album_10 -> ENT::AlbumId ENT::Title;
GROUP::Artist_11 -> ENT::ArtistId ENT::Name;
GROUP::Genre_12 -> ENT::GenreId ENT::Name;
GROUP::MediaType_13 -> ENT::MediaTypeId ENT::Name;
GROUP::InvoiceLine_14 -> ENT::InvoiceLineId ENT::Quantity ENT::UnitPrice;
all_datasets = new_schema.extract_datasets(forest)
group, dataset = max(all_datasets.items(), key=lambda x: len(x[1]))
print(f'Group: {group}')
dataset
Group: Employees_0
Loading ITables v2.4.2 from the internet... (need help?) |