architxt.nlp.brat#

Dataset loader for BRAT (BRAT Rapid Annotation Tool) format.

Functions

convert_brat_entities(entities, *[, ...])

Convert a list of BratEntity objects into Entity objects, while filtering out certain types of tags.

convert_brat_example(example, *[, ...])

Convert a Brat example into annotated sentences, filtering and mapping entities and relations as specified.

convert_brat_relations(relations, *[, ...])

Convert a list of BratRelation objects into Relation objects while filtering out certain types of relations.

load_brat_dataset(path, *[, ...])

architxt.nlp.brat.convert_brat_entities(entities, *, allow_list=None, mapping=None)[source]#

Convert a list of BratEntity objects into Entity objects, while filtering out certain types of tags.

Parameters:
  • entities (Iterable[Entity]) – An iterable of BratEntity objects to convert.

  • allow_list (Optional[set[str]]) – A set of entity types to exclude from the output. If None, no filtering is applied.

  • mapping (Optional[dict[str, str]]) – A dictionary mapping entity names to new values. If None, no mapping is applied.

Return type:

Generator[Entity, None, None]

Returns:

A generator yielding Entity objects.

>>> from pybrat.parser import Entity, Relation, Span
>>> ents = [
...     Entity(spans=[Span(start=0, end=5)], type="person", mention="E1"),
...     Entity(spans=[Span(start=10, end=15)], type="FREQ", mention="E2"),
...     Entity(spans=[Span(start=20, end=25)], type="MOMENT", mention="E3")
... ]
>>> ents = list(convert_brat_entities(ents, allow_list={"MOMENT"}, mapping={"FREQ": "FREQUENCE"}))
>>> len(ents)
2
>>> print(ents[0].name)
PERSON
>>> print(ents[1].name)
FREQUENCE
architxt.nlp.brat.convert_brat_example(example, *, entities_filter=None, relations_filter=None, entities_mapping=None, relations_mapping=None)[source]#

Convert a Brat example into annotated sentences, filtering and mapping entities and relations as specified.

Parameters:
  • example (Example) – An Example object containing the .txt and .ann file data.

  • entities_filter (Optional[set[str]]) – A set of entity types to exclude from the output. If None, no filtering is applied.

  • relations_filter (Optional[set[str]]) – A set of relation types to exclude from the output. If None, no filtering is applied.

  • entities_mapping (Optional[dict[str, str]]) – A dictionary mapping entity names to new values. If None, no mapping is applied.

  • relations_mapping (Optional[dict[str, str]]) – A dictionary mapping relation names to new values. If None, no mapping is applied.

Return type:

Generator[AnnotatedSentence, None, None]

Returns:

A generator yielding AnnotatedSentence objects for each sentence in the text.

architxt.nlp.brat.convert_brat_relations(relations, *, allow_list=None, mapping=None)[source]#

Convert a list of BratRelation objects into Relation objects while filtering out certain types of relations.

Parameters:
  • relations (Iterable[Relation]) – An iterable of BratRelation objects to convert.

  • allow_list (Optional[set[str]]) – A set of relation types to exclude from the output. If None, no filtering is applied.

  • mapping (Optional[dict[str, str]]) – A dictionary mapping relation names to new values. If None, no mapping is applied.

Return type:

Generator[Relation, None, None]

Returns:

A generator yielding Relation objects.

>>> from pybrat.parser import Entity, Relation, Span
>>> rels = [
...     Relation(arg1=Entity(spans=[Span(start=0, end=5)], type='X', mention='E1'), arg2=Entity(spans=[Span(start=10, end=15)], type='Y', mention='E2'), type="part-of"),
...     Relation(arg1=Entity(spans=[Span(start=20, end=25)], type='X', mention='E3'), arg2=Entity(spans=[Span(start=30, end=35)], type='Z', mention='E3'), type="TEMPORALITY")
... ]
>>> rels = list(convert_brat_relations(rels, allow_list={"TEMPORALITY"}))
>>> len(rels)
1
>>> print(rels[0].name)
PART-OF
architxt.nlp.brat.load_brat_dataset(path, *, entities_filter=None, relations_filter=None, entities_mapping=None, relations_mapping=None)[source]#
Return type:

Generator[AnnotatedSentence, None, None]