Parsers

This page is for users who want to squeeze the most bizarre possibilities out of PyBEL. Most users will not need this reference.

PyBEL makes extensive use of the PyParsing module. The code is organized to different modules to reflect the different faces ot the BEL language. These parsers support BEL 2.0 and have some backwards compatibility for rewriting BEL v1.0 statements as BEL v2.0. The biologist and bioinformatician using this software will likely never need to read this page, but a developer seeking to extend the language will be interested to see the inner workings of these parsers.

See: https://github.com/OpenBEL/language/blob/master/version_2.0/MIGRATE_BEL1_BEL2.md

Metadata Parser

class pybel.parser.parse_metadata.MetadataParser(manager, namespace_dict=None, annotation_dict=None, namespace_regex=None, annotation_regex=None, default_namespace=None, allow_redefinition=False)[source]

A parser for the document and definitions section of a BEL document.

See also

BEL 1.0 Specification for the DEFINE keyword

Parameters:
  • manager (pybel.manager.Manager) – A cache manager
  • namespace_dict (dict[str,dict[str,str]]) – A dictionary of pre-loaded, enumerated namespaces from {namespace keyword: {name: encoding}}
  • annotation_dict (dict[str,set[str]) – A dictionary of pre-loaded, enumerated annotations from {annotation keyword: set of valid values}
  • namespace_regex (dict[str,str]) – A dictionary of pre-loaded, regular expression namespaces from {namespace keyword: regex string}
  • annotation_regex (dict[str,str]) – A dictionary of pre-loaded, regular expression annotations from {annotation keyword: regex string}
  • default_namespace (set[str]) – A set of strings that can be used without a namespace
manager = None

This metadata parser’s internal definition cache manager

namespace_dict = None

A dictionary of cached {namespace keyword: {name: encoding}}

annotation_dict = None

A dictionary of cached {annotation keyword: set of values}

namespace_regex = None

A dictionary of {namespace keyword: regular expression string}

default_namespace = None

A set of names that can be used without a namespace

annotation_regex = None

A dictionary of {annotation keyword: regular expression string}

uncachable_namespaces = None

A set of namespaces’s URLs that can’t be cached

document_metadata = None

A dictionary containing the document metadata

namespace_url_dict = None

A dictionary from {namespace keyword: BEL namespace URL}

namespace_owl_dict = None

A dictionary from {namespace keyword: OWL namespace URL}

annotation_url_dict = None

A dictionary from {annotation keyword: BEL annotation URL}

annotation_owl_dict = None

A dictionary from {annotation keyword: OWL annotation URL}

annotation_lists = None

A set of annotation keywords that are defined ad-hoc in the BEL script

handle_document(line, position, tokens)[source]

Handles statements like SET DOCUMENT X = "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
raise_for_redefined_namespace(line, position, namespace)[source]

Raises an exception if a namespace is already defined

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • namespace (str) – The namespace being parsed
Raises:

RedefinedNamespaceError

handle_namespace_url(line, position, tokens)[source]

Handles statements like DEFINE NAMESPACE X AS URL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedNamespaceError

handle_namespace_owl(line, position, tokens)[source]

Handles statements like DEFINE NAMESPACE X AS OWL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedNamespaceError

handle_namespace_pattern(line, position, tokens)[source]

Handles statements like DEFINE NAMESPACE X AS PATTERN "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedNamespaceError

raise_for_redefined_annotation(line, position, annotation)[source]

Raises an exception if the given annotation is already defined

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • annotation (str) – The annotation being parsed
Raises:

RedefinedAnnotationError

handle_annotation_owl(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS OWL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedAnnotationError

handle_annotations_url(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS URL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedAnnotationError

handle_annotation_list(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS LIST {"Y","Z", ...}

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedAnnotationError

handle_annotation_pattern(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS PATTERN "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RedefinedAnnotationError

has_enumerated_annotation(annotation)[source]

Checks if this annotation is defined by an enumeration

Parameters:annotation (str) – The keyword of a annotation
Return type:bool
has_regex_annotation(annotation)[source]

Checks if this annotation is defined by a regular expression

Parameters:annotation (str) – The keyword of a annotation
Return type:bool
has_annotation(annotation)[source]

Checks if this annotation is defined

Parameters:annotation (str) – The keyword of a annotation
Return type:bool
has_enumerated_namespace(namespace)[source]

Checks if this namespace is defined by an enumeration

Parameters:namespace (str) – The keyword of a namespace
Return type:bool
has_regex_namespace(namespace)[source]

Checks if this namespace is defined by a regular expression

Parameters:namespace (str) – The keyword of a namespace
Return type:bool
has_namespace(namespace)[source]

Checks if this namespace is defined

Parameters:namespace (str) – The keyword of a namespace
Return type:bool
raise_for_version(line, position, version)[source]

Checks that a version string is valid for BEL documents, meaning it’s either in the YYYYMMDD or semantic version format

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • version (str) – A version string
Raises:

VersionFormatWarning

Control Parser

class pybel.parser.parse_control.ControlParser(annotation_dict=None, annotation_regex=None, citation_clearing=True)[source]

A parser for BEL control statements

See also

BEL 1.0 specification on control records

Parameters:
  • annotation_dict (dict[str,set[str]]) – A dictionary of {annotation: set of valid values} for parsing
  • annotation_regex (dict[str,str]) – A dictionary of {annotation: regular expression string}
  • citation_clearing (bool) – Should SET Citation statements clear evidence and all annotations?
annotation_dict

A dictionary of annotaions to their set of values

Return type:dict[str,set[str]]
annotation_regex

A dictioary of annotations defined by regular expressions {annotation keyword: string regular expression}

Returns:dict[str,str]
annotation_regex_compiled

A dictionary of annotations defined by regular expressions {annotation keyword: compiled regular expression}

Return type:dict[str,re]
raise_for_undefined_annotation(line, position, annotation)[source]

Raises is an annotation is not defined

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • annotation (str) – The annotation to check
Raises:

UndefinedAnnotationWarning

raise_for_invalid_annotation_value(line, position, key, value)[source]

Raises is an annotation is not defined

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • key (str) – The annotation to check
  • value (str) – The entry in the annotation to check
Raises:

IllegalAnnotationValueWarning or MissingAnnotationRegexWarning

raise_for_missing_citation(line, position)[source]

Raises if there is no citation present in the parser

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
Raises:

MissingCitationException

handle_annotation_key(line, position, tokens)[source]

Called on all annotation keys before parsing to validate that it’s either enumerated or as a regex

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raise:

MissingCitationException or UndefinedAnnotationWarning

handle_unset_statement_group(line, position, tokens)[source]

Unsets the statement group, or raises an exception if it is not set.

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

MissingAnnotationKeyWarning

handle_unset_citation(line, position, tokens)[source]

Unsets the citation, or raises an exception if it is not set

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

MissingAnnotationKeyWarning

handle_unset_evidence(line, position, tokens)[source]

Unsets the evidence, or throws an exception if it is not already set. The value for tokens[EVIDENCE] corresponds to which alternate of SupportingText or Evidence was used in the BEL script.

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

MissingAnnotationKeyWarning

validate_unset_command(line, position, key)[source]

Raises an exception when trying to UNSET X if X is not already set.

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • key (str) – The annotation to check
Raises:

MissingAnnotationKeyWarning

handle_unset_command(line, position, tokens)[source]

Handles UNSET X or raises an exception if it is not already set.

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

MissingAnnotationKeyWarning

handle_unset_list(line, position, tokens)[source]

Handles UNSET {A, B, ...} or raises an exception of any of them are not present. Consider that all unsets are in peril if just one of them is wrong!

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

MissingAnnotationKeyWarning

handle_unset_all(line, position, tokens)[source]

Handles UNSET_ALL

get_annotations()[source]

Gets the current annotations

Returns:The currently stored BEL annotations
Return type:dict
clear_citation()[source]

Clears the citation. Additionally, if citation clearing is enabled, clears the evidence and annotations.

clear()[source]

Clears the statement_group, citation, evidence, and annotations

Identifier Parser

class pybel.parser.parse_identifier.IdentifierParser(namespace_dict=None, namespace_regex=None, default_namespace=None, allow_naked_names=False)[source]

A parser for identifiers in the form of namespace:name. Can be made more lenient when given a default namespace or enabling the use of naked names

Parameters:
  • namespace_dict (dict[str,dict[str,str]]) – A dictionary of {namespace: {name: encoding}}
  • namespace_regex (dict[str,str]) – A dictionary of {namespace: regular expression string} to compile
  • default_namespace (set[str]) – A set of strings that can be used without a namespace
  • allow_naked_names (bool) – If true, turn off naked namespace failures
namespace_dict

A dictionary of {namespace: {name: encodings}}

Return type:dict[str,dict[str,str]]
namespace_regex

A dictionary of {namespace keyword: regular expression string}

Return type:dict[str,str]
namespace_regex_compiled

A dictionary of {namespace keyword: compiled regular expression}

Return type:dict[str,re]
has_enumerated_namespace(namespace)[source]

Checks that the namespace has been defined by an enumeration

has_regex_namespace(namespace)[source]

Checks that the namespace has been defined by a regular expression

has_namespace(namespace)[source]

Checks that the namespace has either been defined by an enumeration or a regular expression

has_enumerated_namespace_name(namespace, name)[source]

Checks that the namespace is defined by an enumeration and that the name is a member

has_regex_namespace_name(namespace, name)[source]

Checks that the namespace is defined as a regular expression and the name matches it

BEL Parser

class pybel.parser.parse_bel.BelParser(graph, namespace_dict=None, annotation_dict=None, namespace_regex=None, annotation_regex=None, allow_naked_names=False, allow_nested=False, allow_unqualified_translocations=False, citation_clearing=True, no_identifier_validation=False, autostreamline=True)[source]

Build a parser backed by a given dictionary of namespaces

Parameters:
pmod = None

2.2.1

variant = None

2.2.2

fragment = None

2.2.3

location = None

2.2.4

psub = None

DEPRECATED: 2.2.X Amino Acid Substitutions

gsub = None

DEPRECATED: 2.2.X Sequence Variations

trunc = None

DEPRECATED Truncated proteins

gmod = None

PyBEL BEL Specification variant

fusion = None

2.6.1

general_abundance = None

2.1.1

gene = None

2.1.4

mirna = None

2.1.5

protein = None

2.1.6

rna = None

2.1.7

complex_singleton = None

2.1.2

composite_abundance = None

2.1.3

molecular_activity = None

2.4.1

biological_process = None

2.3.1

pathology = None

2.3.2

activity = None

2.3.3

translocation = None

2.5.1

degradation = None

2.5.2

reactants = None

2.5.3

increases_tag = None

3.1.1

directly_increases_tag = None

3.1.2

decreases_tag = None

3.1.3

directly_decreases_tag = None

3.1.4

analogous_tag = None

3.5.1

causes_no_change_tag = None

3.1.6

regulates_tag = None

3.1.7

negative_correlation_tag = None

3.2.1

positive_correlation_tag = None

3.2.2

association_tag = None

3.2.3

orthologous_tag = None

3.3.1

is_a_tag = None

3.4.5

equivalent_tag = None

PyBEL Variants

rate_limit_tag = None

3.1.5

subprocess_of_tag = None

3.4.6

transcribed_tag = None

3.3.2

translated_tag = None

3.3.3

has_member_tag = None

3.4.1

abundance_list = None

3.4.2

biomarker_tag = None

3.5.2

prognostic_biomarker_tag = None

3.5.3

causal_relation_tags = None

3.1 Causal Relationships - nested. Not enabled by default.

namespace_dict

The dictionary of {namespace: {name: encoding}} stored in the internal identifier parser

Return type:dict[str,dict[str,str]]
namespace_regex

The dictionary of {namespace keyword: compiled regular expression} stored the internal identifier parser

Return type:dict[str,re]
annotation_dict

A dictionary of annotations to their set of values

Return type:dict[str,set[str]]
annotation_regex

A dictionary of annotations defined by regular expressions {annotation keyword: string regular expression}

Return type:dict[str,str]
allow_naked_names

Should naked names be parsed, or should errors be thrown?

Return type:bool
get_annotations()[source]

Get current annotations in this parser

Return type:dict
clear()[source]

Clears the graph and all control parser data (current citation, annotations, and statement group)

handle_nested_relation(line, position, tokens)[source]

Handles nested statements. If allow_nested is False, raises a warning.

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

NestedRelationWarning

check_function_semantics(line, position, tokens)[source]

Raises an exception if the function used on the tokens is wrong

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

InvalidFunctionSemantic

handle_term(line, position, tokens)[source]

Handles BEL terms (the subject and object of BEL relations)

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_has_members(line, position, tokens)[source]

Handles list relations like p(X) hasMembers list(p(Y), p(Z), ...)

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_has_components(line, position, tokens)[source]

Handles list relations like p(X) hasComponents list(p(Y), p(Z), ...)

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_relation(line, position, tokens)[source]

Handles BEL relations

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_unqualified_relation(line, position, tokens)[source]

Handles unqualified relations

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_label_relation(line, position, tokens)[source]

Handles statements like p(X) label "Label for X"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
Raises:

RelabelWarning

ensure_node(tokens)[source]

Turns parsed tokens into canonical node name and makes sure its in the graph

Parameters:tokens (pyparsing.ParseResult) – Tokens from PyParsing
Returns:A pair of the PyBEL node tuple and the PyBEL node data dictionary
Return type:tuple[tuple, dict]