architxt.cli.loader#
Functions
|
Automatically structure a corpus as a database instance and print the database schema as a CFG. |
|
Extract the database schema and relations to a tree format. |
|
Read a parse a document file to a structured tree. |
- architxt.cli.loader.load_corpus(corpus_path=typer.Argument(..., exists=True, readable=True, help='Path to the input corpus.'), *, language=typer.Option(['French'], help='Language of the input corpus.'), corenlp_url=typer.Option('http://localhost:9000', help='URL of the CoreNLP server.'), tau=typer.Option(0.7, help='The similarity threshold.', min=0, max=1), epoch=typer.Option(100, help='Number of iteration for tree rewriting.', min=1), min_support=typer.Option(20, help='Minimum support for tree patterns.', min=1), gen_instances=typer.Option(0, help='Number of synthetic instances to generate.', min=0), sample=typer.Option(None, help='Number of sentences to sample from the corpus.', min=1), workers=typer.Option(None, help='Number of parallel worker processes to use. Defaults to the number of available CPU cores.', min=1), resolver=typer.Option(None, help='The entity resolver to use when loading the corpus.', click_type=click.Choice(['umls', 'mesh', 'rxnorm', 'go', 'hpo'], case_sensitive=False)), output=typer.Option(None, help='Path to save the result.'), cache=typer.Option(True, help='Enable caching of the analyzed corpus to prevent re-parsing.'), shuffle=typer.Option(False, help='Shuffle the corpus data before processing to introduce randomness.'), debug=typer.Option(False, help='Enable debug mode for more verbose output.'), metrics=typer.Option(False, help='Show metrics of the simplification.'), log=typer.Option(False, help='Enable logging to MLFlow.'))[source]#
Automatically structure a corpus as a database instance and print the database schema as a CFG.
- Return type:
- architxt.cli.loader.load_database(db_connection=typer.Argument(..., help='Database connection string.'), *, simplify_association=typer.Option(True, help='Simplify association tables.'), sample=typer.Option(None, help='Number of sentences to sample from the corpus.', min=1), output=typer.Option(None, help='Path to save the result.'))[source]#
Extract the database schema and relations to a tree format.
- Return type:
- architxt.cli.loader.load_document(file=typer.Argument(..., exists=True, readable=True, help='The document file to read.'), *, raw=typer.Option(False, help='Enable row reading, skipping any transformation to convert it to the metamodel.'), root_name=typer.Option('ROOT', help='The root node name.'), sample=typer.Option(None, help='Number of sentences to sample from the corpus.', min=1), output=typer.Option(None, help='Path to save the result.'))[source]#
Read a parse a document file to a structured tree.
- Return type: