architxt.nlp.parser#
Functions
|
Enriches a syntactic tree (tree) by inserting entities and relationships, and removing unused subtrees. |
|
Fix all coordination structures in a tree. |
|
Fix conjunction structures in a tree at the specified position pos. |
|
Fix the coordination structure in a tree at the specified position pos. |
|
Insert a tree entity into the appropriate position within a parented tree. |
|
|
|
Check for conflicts with other entities (overlapping or duplicate spans). |
|
|
|
Resolve entities in a tree using the provided entity resolver. |
|
Un-nest an entity in a tree at the specified position pos. |
Classes
|
- class architxt.nlp.parser.Parser[source]#
Bases:
ABC
- async parse(sentence, *, language, resolver=None)[source]#
Parse an annotated sentence into an enriched syntax tree.
This function takes an annotated sentence, parses it into a syntax tree, enriches the tree by fixing coordination structures, adding extra information (entities and relations), and applying reductions. An external entity resolver could be used to unify entities and relations.
- Parameters:
sentence (
AnnotatedSentence
) – The annotated sentence to parse.language (
str
) – The language to use for parsing.resolver (
Optional
[EntityResolver
]) – An optional entity resolver used to resolve entities within the parsed trees. If None, no entity resolution is performed.
- Return type:
Optional
[Tree
]- Returns:
An enriched tree object.
- Example:
with Parser(corenlp_url="http://localhost:9000") as parser: tree = parse(sentence, language='English') print(tree)
- async parse_batch(sentences, language, resolver=None, batch_size=128)[source]#
Parse a batch of annotated sentences into enriched syntax trees.
This function processes an iterable (or asynchronous iterable) of sentences, parses each sentence into a syntax tree, enriches the tree by resolving coordination structures, and applies further enhancements like entity and relation enrichment. Optionally, an external entity resolver can be used to unify entities and relations across sentences.
- Parameters:
sentences (
Union
[Iterable
[AnnotatedSentence
],AsyncIterable
[AnnotatedSentence
]]) – An iterable or asynchronous iterable of AnnotatedSentence objects to be parsed.language (
str
) – The language to use for parsing.resolver (
Optional
[EntityResolver
]) – An optional entity resolver used to resolve entities within the parsed trees. If None, no entity resolution is performed.batch_size (
int
) – The maximum number of concurrent parsing tasks that can run at once. It will only load at most batch_size element from the input iterable.
- Yields:
A tuple of the original AnnotatedSentence and its enriched Tree. Each sentence is parsed independently, and results are yielded as they become available.
- Example:
with Parser(corenlp_url="http://localhost:9000") as parser: async for sentence, tree in parser.parse_batch(sentences, language="English"): print(sentence) print(tree)
- Return type:
- abstractmethod raw_parse(sentences, *, language, batch_size=64)[source]#
Parse a sentences into syntax trees using CoreNLP server.
- Parameters:
- Return type:
- Returns:
The parse trees of the sentences.
- Example:
with Parser(corenlp_url="http://localhost:9000") as parser: for tree in parser.raw_parse(sentences, language='English'): print(tree)
- architxt.nlp.parser.enrich_tree(tree, sentence, entities, relations)[source]#
Enriches a syntactic tree (tree) by inserting entities and relationships, and removing unused subtrees.
The function processes a list of entities and relations, inserting them into the tree, unnesting entities as needed, and finally deleting any subtrees that are not part of the enriched structure.
- Parameters:
tree (
Tree
) – A tree representing the syntactic tree to enrich.sentence (
str
) – The original sentence from which the tree is derived.entities (
list
[Entity
]) – A list of Entity objects to be inserted into the tree.relations (
list
[Relation
]) – A list of Relation objects representing the relationships between entities (currently not used).
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB likes) (NP (NNS apples) (CCONJ and) (NNS oranges))))") >>> e1 = Entity(name="person", start=0, end=5, id="E1") >>> e2 = Entity(name="fruit", start=12, end=18, id="E2") >>> e3 = Entity(name="fruit", start=23, end=30, id="E3") >>> enrich_tree(t, "Alice likes apples and oranges", [e1, e2, e3], []) >>> print(t.pformat(margin=255)) (S (ENT::person Alice) (VP (NP (ENT::fruit apples) (ENT::fruit oranges)))) >>> t = Tree.fromstring("(S (NP XXX) (NP YYY))") >>> e1 = Entity(name="nested1", start=0, end=3, id="E1") >>> e2 = Entity(name="nested2", start=4, end=7, id="E2") >>> e3 = Entity(name="overlap", start=0, end=7, id="E3") >>> enrich_tree(t, "XXX YYY", [e1, e2, e3], []) >>> print(t.pformat(margin=255)) (S (REL (ENT::overlap XXX YYY) (nested (ENT::nested1 XXX) (ENT::nested2 YYY))))
- Return type:
- architxt.nlp.parser.fix_all_coord(tree)[source]#
Fix all coordination structures in a tree.
This function iteratively applies fix_coord and fix_conj to the tree until no further modifications can be made. It ensures that the tree adheres to the correct syntactic structure for coordination and conjunctions.
- Parameters:
tree (
Tree
) – The tree in which coordination structures will be fixed.
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB eats) (NP (NNS apples) (COORD (CCONJ and) (NP (NNS oranges))))))") >>> fix_all_coord(t) >>> print(t.pformat(margin=255)) (S (NP Alice) (VP (VB eats) (CONJ (NP (NNS apples)) (NP (NNS oranges)))))
>>> t2 = Tree.fromstring("(S (NP Alice) (VP (VB eats) (NP (NNS apples) (COORD (CCONJ and) (NP (NNS oranges) (COORD (CCONJ and) (NP (NNS bananas))))))))") >>> fix_all_coord(t2) >>> print(t2.pformat(margin=255)) (S (NP Alice) (VP (VB eats) (CONJ (NP (NNS apples)) (NP (NNS oranges)) (NP (NNS bananas)))))
- Return type:
- architxt.nlp.parser.fix_conj(tree, pos)[source]#
Fix conjunction structures in a tree at the specified position pos.
If the node at pos is labeled ‘CONJ’, the function flattens any nested conjunctions by replacing the node with a new tree that combines its children.
- Parameters:
- Return type:
- Returns:
True if the conjunction structure was modified, False otherwise.
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB eats) (CONJ (NP (NNS apples)) (NP (NNS oranges)))))") >>> fix_conj(t[1], 1) False >>> t = Tree.fromstring("(S (NP Alice) (VP (VB eats) (CONJ (NP (NNS apples)) (CONJ (NP (NNS oranges)) (NP (NNS bananas))))))") >>> fix_conj(t[1], 1) True >>> print(t.pformat(margin=255)) (S (NP Alice) (VP (VB eats) (CONJ (NP (NNS apples)) (NP (NNS oranges)) (NP (NNS bananas)))))
- architxt.nlp.parser.fix_coord(tree, pos)[source]#
Fix the coordination structure in a tree at the specified position pos.
This function modifies the tree to ensure that the conjunctions are structured correctly according to the grammar rules of coordination.
- Parameters:
- Return type:
- Returns:
True if the coordination was successfully fixed, False otherwise.
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB eats) (NP (NNS apples) (COORD (CCONJ and) (NP (NNS oranges))))))") >>> fix_coord(t[1], 1) True >>> print(t.pformat(margin=255)) (S (NP Alice) (VP (VB eats) (CONJ (NP (NNS apples)) (NP (NNS oranges)))))
- architxt.nlp.parser.ins_ent(tree, tree_ent)[source]#
Insert a tree entity into the appropriate position within a parented tree.
The function modifies the tree structure to insert an entity at the correct level based on its positions and root position.
- Parameters:
tree (
Tree
) – A tree representing the syntactic tree.tree_ent (
TreeEntity
) – A TreeEntity containing the entity name and its positions in the tree.
- Return type:
- Returns:
The updated subtree where the entity was inserted.
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB like) (NP (NNS apples))))") >>> tree_ent1 = TreeEntity(name="person", positions=[(0, 0)]) >>> tree_ent2 = TreeEntity(name="fruit", positions=[(1, 1, 0, 0)]) >>> ent_tree = ins_ent(t, tree_ent1) >>> print(t.pformat(margin=255)) (S (ENT::person Alice) (VP (VB like) (NP (NNS apples)))) >>> ent_tree = ins_ent(t, tree_ent2) >>> print(t.pformat(margin=255)) (S (ENT::person Alice) (VP (VB like) (ENT::fruit apples)))
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB like) (NP (NNS apples))))") >>> t_ent = TreeEntity(name="xxx", positions=[(1, 0, 0), (1, 1, 0, 0)]) >>> ent_tree = ins_ent(t, t_ent) >>> print(t.pformat(margin=255)) (S (NP Alice) (ENT::xxx like apples))
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB like) (NP (NNS apples))))") >>> t_ent = TreeEntity(name="xxx", positions=[(0, 0), (1, 1, 0, 0)]) >>> ent_tree = ins_ent(t, t_ent) >>> print(t.pformat(margin=255)) (S (ENT::xxx Alice apples) (VP (VB like)))
>>> t = Tree.fromstring("(S (NP Alice) (VP (VB like) (NP (NNS apples))))") >>> t_ent = TreeEntity(name="xxx", positions=[(0, 0), (1, 0, 0), (1, 1, 0, 0)]) >>> ent_tree = ins_ent(t, t_ent) >>> print(t.pformat(margin=255)) (S (ENT::xxx Alice like apples)) >>> t_ent = TreeEntity(name="yyy", positions=[(0, 2)]) >>> ent_tree = ins_ent(t, t_ent) >>> print(t.pformat(margin=255)) (S (ENT::xxx Alice like (ENT::yyy apples)))
>>> t = Tree.fromstring("(S x y z)") >>> t_ent = TreeEntity(name="XY", positions=[(0,), (1,)]) >>> ent_tree = ins_ent(t, t_ent) >>> print(t.pformat(margin=255)) (S (ENT::XY x y) z) >>> t_ent = TreeEntity(name="YZ", positions=[(0, 1), (1,)]) >>> ent_tree = ins_ent(t, t_ent) >>> print(t.pformat(margin=255)) (S (ENT::XY x y) (ENT::YZ y z))
- architxt.nlp.parser.is_conflicting_entity(entity, entity_span, computed_spans, tree)[source]#
Check for conflicts with other entities (overlapping or duplicate spans).
- Return type:
- async architxt.nlp.parser.process_tree(sentence, tree, *, resolver=None)[source]#
- Return type:
Optional
[Tree
]
- async architxt.nlp.parser.resolve_tree(tree, resolver)[source]#
Resolve entities in a tree using the provided entity resolver.
- Return type:
- architxt.nlp.parser.unnest_ent(tree, pos)[source]#
Un-nest an entity in a tree at the specified position pos.
If the node at pos is labeled as an entity (ENT), the function converts the nested structure into a flat structure, creating a relationship (REL) between the entity and its nested entities.
- Parameters:
>>> t = Tree.fromstring('(S (ENT::person Alice (ENT::person Bob) (ENT::person Charlie)))') >>> unnest_ent(t[0], 0) >>> print(t.pformat(margin=255)) (S (ENT::person Alice (ENT::person Bob) (ENT::person Charlie))) >>> unnest_ent(t, 0) >>> print(t.pformat(margin=255)) (S (REL (ENT::person Alice Bob Charlie) (nested (ENT::person Bob) (ENT::person Charlie))))
- Return type:
Modules