Parsers

This page is for users who want to squeeze the most bizarre possibilities out of PyBEL. Most users will not need this reference.

PyBEL makes extensive use of the PyParsing module. The code is organized to different modules to reflect the different faces ot the BEL language. These parsers support BEL 2.0 and have some backwards compatibility for rewriting BEL 1.0 statements as BEL 2.0. The biologist and bioinformatician using this software will likely never need to read this page, but a developer seeking to extend the language will be interested to see the inner workings of these parsers.

See: https://github.com/OpenBEL/language/blob/master/version_2.0/MIGRATE_BEL1_BEL2.md

Metadata Parser

class pybel.parser.parse_metadata.MetadataParser(manager, namespace_dict=None, annotation_dict=None, namespace_regex=None, annotations_regex=None, default_namespace=None, allow_redefinition=False)[source]

A parser for the document and definitions section of a BEL document.

See also

BEL 1.0 Specification for the DEFINE keyword

Parameters:
  • manager (pybel.manager.Manager) – A cache manager
  • namespace_dict (dict[str,set[str]]) – A dictionary of pre-loaded, enumerated namespaces from {namespace keyword: set of valid values}
  • annotation_dict (dict[str,set[str]) – A dictionary of pre-loaded, enumerated annotations from {annotation keyword: set of valid values}
  • namespace_regex (dict[str,str]) – A dictionary of pre-loaded, regular expression namespaces from {namespace keyword: regex string}
  • annotations_regex (dict[str,str]) – A dictionary of pre-loaded, regular expression annotations from {annotation keyword: regex string}
  • default_namespace (set[str]) – A set of strings that can be used without a namespace
manager = None

This metadata parser’s internal definition cache manager

namespace_dict = None

A dictionary of cached {namespace keyword: set of values}

annotations_dict = None

A dictionary of cached {annotation keyword: set of values}

namespace_regex = None

A dictionary of {namespace keyword: regular expression string}

namespace_regex_compiled = None

A dictionary of {namespace keyword: compiled regular expression}

default_namespace = None

A set of names that can be used without a namespace

annotations_regex = None

A dictionary of {annotation keyword: regular expression string}

annotations_regex_compiled = None

A dictionary of {annotation keyword: compiled regular expression}

document_metadata = None

A dictionary containing the document metadata

namespace_url_dict = None

A dictionary from {namespace keyword: BEL namespace URL}

namespace_owl_dict = None

A dictionary from {namespace keyword: OWL namespace URL}

annotation_url_dict = None

A dictionary from {annotation keyword: BEL annotation URL}

annotations_owl_dict = None

A dictionary from {annotation keyword: OWL annotation URL}

annotation_lists = None

A set of annotation keywords that are defined ad-hoc in the BEL script

handle_document(line, position, tokens)[source]

Handles statements like SET DOCUMENT X = "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
raise_for_redefined_namespace(line, position, namespace)[source]

Raises an exception if a namespace is already defined

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • namespace (str) – The namespace being parsed
handle_namespace_url(line, position, tokens)[source]

Handles statements like DEFINE NAMESPACE X AS URL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_namespace_owl(line, position, tokens)[source]

Handles statements like DEFINE NAMESPACE X AS OWL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_namespace_pattern(line, position, tokens)[source]

Handles statements like DEFINE NAMESPACE X AS PATTERN "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
raise_for_redefined_annotation(line, position, annotation)[source]

Raises an exception if the given annotation is already defined

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • annotation (str) – The annotation being parsed
handle_annotation_owl(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS OWL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_annotations_url(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS URL "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_annotation_list(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS LIST {"Y","Z", ...}

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
handle_annotation_pattern(line, position, tokens)[source]

Handles statements like DEFINE ANNOTATION X AS PATTERN "Y"

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • tokens (pyparsing.ParseResult) – The tokens from PyParsing
has_enumerated_annotation(annotation)[source]

Checks if this annotation is defined by an enumeration

Parameters:annotation (str) – The keyword of a annotation
has_regex_annotation(annotation)[source]

Checks if this annotation is defined by a regular expression

Parameters:annotation (str) – The keyword of a annotation
has_annotation(annotation)[source]

Checks if this annotation is defined

Parameters:annotation (str) – The keyword of a annotation
has_enumerated_namespace(namespace)[source]

Checks if this namespace is defined by an enumeration

Parameters:namespace (str) – The keyword of a namespace
has_regex_namespace(namespace)[source]

Checks if this namespace is defined by a regular expression

Parameters:namespace (str) – The keyword of a namespace
has_namespace(namespace)[source]

Checks if this namespace is defined

Parameters:namespace (str) – The keyword of a namespace
check_version(line, position, version)[source]

Checks that a version string is valid for BEL documents, meaning it’s either in the YYYYMMDD or semantic version format

Parameters:
  • line (str) – The line being parsed
  • position (int) – The position in the line being parsed
  • version (str) – A version string

Control Parser

class pybel.parser.parse_control.ControlParser(annotation_dicts=None, annotation_regex=None, citation_clearing=True)[source]

A parser for BEL control statements

See also

BEL 1.0 specification on control records

Parameters:
  • annotation_dicts (dict[str,set[str]]) – A dictionary of {annotation: set of valid values} for parsing
  • annotation_regex (dict[str,str]) – A dictionary of {annotation: regular expression string}
  • citation_clearing (bool) – Should SET Citation statements clear evidence and all annotations?
handle_annotation_key(line, position, tokens)[source]

Called on all annotation keys before parsing to validate that it’s either enumerated or as a regex

get_annotations()[source]
Returns:The currently stored BEL annotations
Return type:dict
clear_citation()[source]

Clears the citation. Additionally, if citation clearing is enabled, clears the evidence and annotations.

clear()[source]

Clears the statement_group, citation, evidence, and annotations

Identifier Parser

class pybel.parser.parse_identifier.IdentifierParser(namespace_dict=None, namespace_regex=None, default_namespace=None, allow_naked_names=False)[source]

A parser for identifiers in the form of namespace:name. Can be made more lenient when given a default namespace or enabling the use of naked names

Parameters:
  • namespace_dict (dict[str,set[str]]) – A dictionary of {namespace: set of names}
  • namespace_regex (dict[str,str]) – A dictionary of {namespace: regular expression string} to compile
  • default_namespace (set[str]) – A set of strings that can be used without a namespace
  • allow_naked_names (bool) – If true, turn off naked namespace failures
namespace_dict = None

A dictionary of cached {namespace keyword: set of values}

namespace_regex = None

A dictionary of {namespace keyword: regular expression string}

namespace_regex_compiled = None

A dictionary of {namespace keyword: compiled regular expression}

has_enumerated_namespace(namespace)[source]

Checks that the namespace has been defined by an enumeration

has_regex_namespace(namespace)[source]

Checks that the namespace has been defined by a regular expression

has_namespace(namespace)[source]

Checks that the namespace has either been defined by an enumeration or a regular expression

has_enumerated_namespace_name(namespace, name)[source]

Checks that the namespace is defined by an enumeration and that the name is a member

has_regex_namespace_name(namespace, name)[source]

Checks that the namespace is defined as a regular expression and the name matches it

BEL Parser

class pybel.parser.parse_bel.BelParser(graph, namespace_dict=None, annotation_dict=None, namespace_regex=None, annotation_regex=None, allow_naked_names=False, allow_nested=False, allow_unqualified_translocations=False, citation_clearing=True, no_identifier_validation=False, autostreamline=True)[source]

Build a parser backed by a given dictionary of namespaces

Parameters:
pmod = None

2.2.1

variant = None

2.2.2

fragment = None

2.2.3

location = None

2.2.4

psub = None

DEPRECATED: 2.2.X Amino Acid Substitutions

gsub = None

DEPRECATED: 2.2.X Sequence Variations

trunc = None

DEPRECATED Truncated proteins

gmod = None

PyBEL BEL Specification variant

fusion = None

2.6.1

general_abundance = None

2.1.1

gene = None

2.1.4

mirna = None

2.1.5

protein = None

2.1.6

rna = None

2.1.7

complex_singleton = None

2.1.2

composite_abundance = None

2.1.3

molecular_activity = None

2.4.1

biological_process = None

2.3.1

pathology = None

2.3.2

activity = None

2.3.3

translocation = None

2.5.1

degradation = None

2.5.2

reactants = None

2.5.3

increases_tag = None

3.1.1

directly_increases_tag = None

3.1.2

decreases_tag = None

3.1.3

directly_decreases_tag = None

3.1.4

analogous_tag = None

3.5.1

causes_no_change_tag = None

3.1.6

regulates_tag = None

3.1.7

negative_correlation_tag = None

3.2.1

positive_correlation_tag = None

3.2.2

association_tag = None

3.2.3

orthologous_tag = None

3.3.1

is_a_tag = None

3.4.5

equivalent_tag = None

PyBEL Variant

rate_limit_tag = None

3.1.5

subprocess_of_tag = None

3.4.6

transcribed_tag = None

3.3.2

translated_tag = None

3.3.3

has_member_tag = None

3.4.1

abundance_list = None

3.4.2

biomarker_tag = None

3.5.2

prognostic_biomarker_tag = None

3.5.3

causal_relation_tags = None

3.1 Causal Relationships - nested. Explicitly not supported because of ambiguity

get_annotations()[source]

Get current annotations in this parser

Return type:dict
clear()[source]

Clears the graph and all control parser data (current citation, annotations, and statement group)

ensure_node(tokens)[source]

Turns parsed tokens into canonical node name and makes sure its in the graph

Parameters:tokens (pyparsing.ParseResult) – Tokens from PyParsing
Returns:A pair of the PyBEL node tuple and the PyBEL node data dictionary
Return type:tuple[tuple, dict]