Data Model

The pybel.struct module houses functions for handling the main data structure in PyBEL.

Because BEL expresses how biological entities interact within many different contexts, with descriptive annotations, PyBEL represents data as a directed multi-graph by sub-classing the networkx.MultiDiGraph. Each node is an instance of a subclass of the pybel.dsl.BaseEntity and each edge has a stable key and associated data dictionary for storing relevant contextual information.

The graph contains metadata for the PyBEL version, the BEL script metadata, the namespace definitions, the annotation definitions, and the warnings produced in analysis. Like any networkx graph, all attributes of a given object can be accessed through the graph property, like in: my_graph.graph['my key']. Convenient property definitions are given for these attributes that are outlined in the documentation for pybel.BELGraph.

This allows for much easier programmatic access to answer more complicated questions, which can be written with python code. Because the data structure is the same in Neo4J, the data can be directly exported with pybel.to_neo4j(). Neo4J supports the Cypher querying language so that the same queries can be written in an elegant and simple way.

Constants

These documents refer to many aspects of the data model using constants, which can be found in the top-level module pybel.constants.

Terms describing abundances, annotations, and other internal data are designated in pybel.constants with full-caps, such as pybel.constants.FUNCTION and pybel.constants.PROTEIN.

For normal usage, we suggest referring to values in dictionaries by these constants, in case the hard-coded strings behind these constants change.

Function Nomenclature

The following table shows PyBEL’s internal mapping from BEL functions to its own constants. This can be accessed programatically via pybel.parser.language.abundance_labels.

BEL Function

PyBEL Constant

PyBEL DSL

a(), abundance()

pybel.constants.ABUNDANCE

pybel.dsl.Abundance

g(), geneAbundance()

pybel.constants.GENE

pybel.dsl.Gene

r(), rnaAbunance()

pybel.constants.RNA

pybel.dsl.Rna

m(), microRNAAbundance()

pybel.constants.MIRNA

pybel.dsl.MicroRna

p(), proteinAbundance()

pybel.constants.PROTEIN

pybel.dsl.Protein

bp(), biologicalProcess()

pybel.constants.BIOPROCESS

pybel.dsl.BiologicalProcess

path(), pathology()

pybel.constants.PATHOLOGY

pybel.dsl.Pathology

complex(), complexAbundance()

pybel.constants.COMPLEX

pybel.dsl.ComplexAbundance

composite(), compositeAbundance()

pybel.constants.COMPOSITE

pybel.dsl.CompositeAbundance

rxn(), reaction()

pybel.constants.REACTION

pybel.dsl.Reaction

Graph

class pybel.BELGraph(name=None, version=None, description=None, authors=None, contact=None, license=None, copyright=None, disclaimer=None, path=None)[source]

An extension to networkx.MultiDiGraph to represent BEL.

Initialize a BEL graph with its associated metadata.

Parameters
__add__(other)[source]

Copy this graph and join it with another graph with it using pybel.struct.left_full_join().

Parameters

other (BELGraph) – Another BEL graph

Return type

BELGraph

Example usage:

>>> import pybel
>>> g = pybel.from_bel_script('...')
>>> h = pybel.from_bel_script('...')
>>> k = g + h
__iadd__(other)[source]

Join another graph into this one, in-place, using pybel.struct.left_full_join().

Parameters

other (BELGraph) – Another BEL graph

Return type

BELGraph

Example usage:

>>> import pybel
>>> g = pybel.from_bel_script('...')
>>> h = pybel.from_bel_script('...')
>>> g += h
__and__(other)[source]

Create a deep copy of this graph and left outer joins another graph.

Uses pybel.struct.left_outer_join().

Parameters

other (BELGraph) – Another BEL graph

Return type

BELGraph

Example usage:

>>> import pybel
>>> g = pybel.from_bel_script('...')
>>> h = pybel.from_bel_script('...')
>>> k = g & h
__iand__(other)[source]

Join another graph into this one, in-place, using pybel.struct.left_outer_join().

Parameters

other (BELGraph) – Another BEL graph

Return type

BELGraph

Example usage:

>>> import pybel
>>> g = pybel.from_bel_script('...')
>>> h = pybel.from_bel_script('...')
>>> g &= h
property path

The graph’s path, if it was derived from a BEL document.

Return type

Optional[str]

property name

The graph’s name.

Hint

Can be set with the SET DOCUMENT Name = "..." entry in the source BEL script.

Return type

Optional[str]

property version

The graph’s version.

Hint

Can be set with the SET DOCUMENT Version = "..." entry in the source BEL script.

Return type

Optional[str]

property description

The graph’s description.

Hint

Can be set with the SET DOCUMENT Description = "..." entry in the source BEL document.

Return type

Optional[str]

property authors

The graph’s authors.

Hint

Can be set with the SET DOCUMENT Authors = "..." entry in the source BEL document.

Return type

Optional[str]

property contact

The graph’s contact information.

Hint

Can be set with the SET DOCUMENT ContactInfo = "..." entry in the source BEL document.

Return type

Optional[str]

property license

The graph’s license.

Hint

Can be set with the SET DOCUMENT Licenses = "..." entry in the source BEL document

Return type

Optional[str]

property copyright

The graph’s copyright.

Hint

Can be set with the SET DOCUMENT Copyright = "..." entry in the source BEL document

Return type

Optional[str]

property disclaimer

The graph’s disclaimer.

Hint

Can be set with the SET DOCUMENT Disclaimer = "..." entry in the source BEL document.

Return type

Optional[str]

property namespace_url

The mapping from the keywords used in this graph to their respective BEL namespace URLs.

Hint

Can be appended with the DEFINE NAMESPACE [key] AS URL "[value]" entries in the definitions section of the source BEL document.

Return type

Dict[str, str]

property defined_namespace_keywords

The set of all keywords defined as namespaces in this graph.

Return type

Set[str]

property namespace_pattern

The mapping from the namespace keywords used to create this graph to their regex patterns.

Hint

Can be appended with the DEFINE NAMESPACE [key] AS PATTERN "[value]" entries in the definitions section of the source BEL document.

Return type

Dict[str, str]

property annotation_url

The mapping from the annotation keywords used to create this graph to the URLs of the BELANNO files.

Hint

Can be appended with the DEFINE ANNOTATION [key] AS URL "[value]" entries in the definitions section of the source BEL document.

Return type

Dict[str, str]

property annotation_pattern

The mapping from the annotation keywords used to create this graph to their regex patterns as strings.

Hint

Can be appended with the DEFINE ANNOTATION [key] AS PATTERN "[value]" entries in the definitions section of the source BEL document.

Return type

Dict[str, str]

property annotation_list

The mapping from the keywords of locally defined annotations to their respective sets of values.

Hint

Can be appended with the DEFINE ANNOTATION [key] AS LIST {"[value]", ...} entries in the definitions section of the source BEL document.

Return type

Dict[str, Set[str]]

property defined_annotation_keywords

Get the set of all keywords defined as annotations in this graph.

Return type

Set[str]

property pybel_version

The version of PyBEL with which this graph was produced as a string.

Return type

str

property warnings

A list of warnings associated with this graph.

Return type

List[Tuple[Optional[str], BELParserWarning, Mapping]]

number_of_warnings()[source]

Return the number of warnings.

Return type

int

number_of_citations()[source]

Return the number of citations contained within the graph.

Return type

int

number_of_authors()[source]

Return the number of citations contained within the graph.

Return type

int

add_unqualified_edge(u, v, relation)[source]

Add a unique edge that has no annotations.

Parameters
  • u (BaseEntity) – The source node

  • v (BaseEntity) – The target node

  • relation (str) – A relationship label from pybel.constants

Return type

str

Returns

The key for this edge (a unique hash)

add_transcription(gene, rna)[source]

Add a transcription relation from a gene to an RNA or miRNA node.

Parameters
  • gene (Gene) – A gene node

  • rna (Union[Rna, MicroRna]) – An RNA or microRNA node

Return type

str

add_translation(rna, protein)[source]

Add a translation relation from a RNA to a protein.

Parameters
  • rna (Rna) – An RNA node

  • protein (Protein) – A protein node

Return type

str

add_equivalence(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add two equivalence relations for the nodes.

add_orthology(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add two orthology relations for the nodes such that u orthologousTo v and v orthologousTo u.

add_is_a(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'isA') → str

Add an isA relationship such that u isA v.

add_part_of(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'partOf') → str

Add a partOf relationship such that u partOf v.

add_has_variant(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasVariant') → str

Add a hasVariant relationship such that u hasVariant v.

add_has_reactant(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasReactant') → str

Add a hasReactant relationship such that u hasReactant v.

add_has_product(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasProduct') → str

Add a hasProduct relationship such that u hasProduct v.

add_qualified_edge(u, v, *, relation, evidence, citation, annotations=None, subject_modifier=None, object_modifier=None, **attr)[source]

Add a qualified edge.

Qualified edges have a relation, evidence, citation, and optional annotations, subject modifications, and object modifications.

Parameters
  • u – The source node

  • v – The target node

  • relation (str) – The type of relation this edge represents

  • evidence (str) – The evidence string from an article

  • citation (Union[str, Tuple[str, str], CitationDict]) – The citation data dictionary for this evidence. If a string is given, assumes it’s a PubMed identifier and auto-fills the citation type.

  • annotations (Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None]) – The annotations data dictionary

  • subject_modifier (Optional[Mapping]) – The modifiers (like activity) on the subject node. See data model documentation.

  • object_modifier (Optional[Mapping]) – The modifiers (like activity) on the object node. See data model documentation.

Return type

str

Returns

The hash of the edge

add_binds(u, v, *, evidence, citation, annotations=None, **attr)[source]

Add a “binding” relationship between the two entities such that u => complex(u, v).

Return type

str

add_increases(u, v, *, relation: str = 'increases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str

Wrap add_qualified_edge() for the pybel.constants.INCREASES relation.

add_directly_increases(u, v, *, relation: str = 'directlyIncreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str

Add a pybel.constants.DIRECTLY_INCREASES with add_qualified_edge().

add_decreases(u, v, *, relation: str = 'decreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str

Add a pybel.constants.DECREASES relationship with add_qualified_edge().

add_directly_decreases(u, v, *, relation: str = 'directlyDecreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str

Add a pybel.constants.DIRECTLY_DECREASES relationship with add_qualified_edge().

add_association(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add a pybel.constants.ASSOCIATION relationship with add_qualified_edge().

add_regulates(u, v, *, relation: str = 'regulates', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str

Add a pybel.constants.REGULATES relationship with add_qualified_edge().

add_correlation(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add a pybel.constants.CORRELATION relationship with add_qualified_edge().

add_no_correlation(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add a pybel.constants.NO_CORRELATION relationship with add_qualified_edge().

add_positive_correlation(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add a pybel.constants.POSITIVE_CORRELATION relationship with add_qualified_edge().

add_negative_correlation(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str

Add a pybel.constants.NEGATIVE_CORRELATION relationship with add_qualified_edge().

add_causes_no_change(u, v, *, relation: str = 'causesNoChange', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str

Add a pybel.constants.CAUSES_NO_CHANGE relationship with add_qualified_edge().

add_inhibits(u, v, *, relation: str = 'decreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = {'modifier': 'Activity'}, **attr) → str

Add an “inhibits” relationship.

A more specific version of add_decreases() that automatically populates the object modifier with an activity.

add_activates(u, v, *, relation: str = 'increases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = {'modifier': 'Activity'}, **attr) → str

Add an “inhibits” relationship.

A more specific version of add_increases() that automatically populates the object modifier with an activity.

add_node_from_data(node)[source]

Add an entity to the graph.

Return type

None

has_edge_citation(u, v, key)[source]

Check if the given edge has a citation.

Return type

bool

has_edge_evidence(u, v, key)[source]

Check if the given edge has an evidence.

Return type

bool

get_edge_citation(u, v, key)[source]

Get the citation for a given edge.

Return type

Optional[CitationDict]

get_edge_evidence(u, v, key)[source]

Get the evidence for a given edge.

Return type

Optional[str]

get_edge_annotations(u, v, key)[source]

Get the annotations for a given edge.

Return type

Optional[Mapping[str, Mapping[str, bool]]]

static node_to_bel(n)[source]

Serialize a node as BEL.

Return type

str

static edge_to_bel(u, v, edge_data, sep=None, use_identifiers=False)[source]

Serialize a pair of nodes and related edge data as a BEL relation.

Return type

str

iter_equivalent_nodes(node)[source]

Iterate over nodes that are equivalent to the given node, including the original.

Return type

Iterable[BaseEntity]

get_equivalent_nodes(node)[source]

Get a set of equivalent nodes to this node, excluding the given node.

Return type

Set[BaseEntity]

node_has_namespace(node, namespace)[source]

Check if the node have the given namespace.

This also should look in the equivalent nodes.

Return type

bool

summary_dict()[source]

Return a dictionary that summarizes the graph.

Return type

Mapping[str, float]

summary_str()[source]

Return a string that summarizes the graph.

Return type

str

summarize(file=None)[source]

Print a summary of the graph.

Return type

None

serialize(*, fmt='nodelink', file=None, **kwargs)[source]

Serialize the graph to an object or file if given.

For additional I/O, see the pybel.io module.

Nodes

Nodes (or entities) in a pybel.BELGraph represent physical entities’ abundances. Most contain information about the identifier for the entity using a namespace/name pair. The PyBEL parser converts BEL terms to an internal representation using an internal domain specific language (DSL) that allows for writing BEL directly in Python.

For example, after the BEL term p(HGNC:GSK3B) is parsed, it is instantiated as a Python object using the DSL function corresponding to the p() function in BEL, pybel.dsl.Protein, like:

from pybel.dsl import Protein
gsk3b_protein = Protein(namespace='HGNC', name='GSK3B')

pybel.dsl.Protein, like the others mentioned before, inherit from pybel.dsl.BaseEntity, which itself inherits from dict. Therefore, the resulting object can be used like a dict that looks like:

from pybel.constants import *

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'GSK3B',
}

Alternatively, it can be used in more exciting ways, outlined later in the documentation for pybel.dsl.

Variants

The addition of a variant tag results in an entry called ‘variants’ in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry ‘kind’ to identify whether it is a post-translational modification (PTM), gene modification, fragment, or HGVS variant.

Warning

The canonical ordering for the elements of the VARIANTS list correspond to the sorted order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list(). Rather than directly modifying the BELGraph’s structure, use pybel.BELGraph.add_node_from_data(), which takes care of automatically canonicalizing this dictionary.

HGVS Variants.

For example, the BEL term p(HGNC:GSK3B, var(p.Gly123Arg)) is translated to the following internal DSL:

from pybel.dsl import Protein, Hgvs
gsk3b_variant = Protien(namespace='HGNC', name='GSK3B', variants=Hgvs('p.Gly123Arg'))

Further, the shorthand for protein substitutions, pybel.dsl.ProteinSubstitution, can be used to produce the same result, as it inherits from pybel.dsl.Hgvs:

from pybel.dsl import Protein, ProteinSubstitution
gsk3b_variant = Protien(namespace='HGNC', name='GSK3B', variants=ProteinSubstitution('Gly', 123, 'Arg'))

Either way, the resulting object can be used like a dict that looks like:

from pybel.constants import *

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'GSK3B',
    VARIANTS: [
        {
            KIND: HGVS,
            IDENTIFIER: 'p.Gly123Arg',
        },
    ],
}

See also

  • BEL 2.0 specification on variants

  • HGVS conventions

  • PyBEL module pybel.parser.modifiers.get_hgvs_language

Gene Substitutions

Gene Substitutions.

Gene substitutions are legacy statements defined in BEL 1.0. BEL 2.0 recommends using HGVS strings. Luckily, the information contained in a BEL 1.0 encoding, such as g(HGNC:APP,sub(G,275341,C)) can be automatically translated to the appropriate HGVS g(HGNC:APP, var(c.275341G>C)), assuming that all substitutions are using the reference coding gene sequence for numbering and not the genomic reference. The previous statements both produce the underlying data:

from pybel.constants import *

{
    FUNCTION: GENE,
    NAMESPACE: 'HGNC',
    NAME: 'APP',
    VARIANTS: [
        {
            KIND: HGVS,
            IDENTIFIER: 'c.275341G>C',
        },
    ],
}

See also

  • BEL 2.0 specification on gene substitutions

  • PyBEL module pybel.parser.modifiers.get_gene_substitution_language

Gene Modifications

Gene Modifications.

PyBEL introduces the gene modification tag, gmod(), to allow for the encoding of epigenetic modifications. Its syntax follows the same style s the pmod() tags for proteins, and can include the following values:

  • M

  • Me

  • methylation

  • A

  • Ac

  • acetylation

For example, the node g(HGNC:GSK3B, gmod(M)) is represented with the following:

from pybel.constants import *

{
    FUNCTION: GENE,
    NAMESPACE: 'HGNC',
    NAME: 'GSK3B',
    VARIANTS: [
        {
            KIND: GMOD,
            IDENTIFIER: {
                NAMESPACE: BEL_DEFAULT_NAMESPACE,
                NAME: 'Me',
            },
        },
    ],
}

The addition of this function does not preclude the use of all other standard functions in BEL; however, other compilers probably won’t support these standards. If you agree that this is useful, please contribute to discussion in the OpenBEL community.

See also

  • PyBEL module pybel.parser.modifiers.get_gene_modification_language()

Protein Substitutions

Protein Substitutions.

Protein substitutions are legacy statements defined in BEL 1.0. BEL 2.0 recommends using HGVS strings. Luckily, the information contained in a BEL 1.0 encoding, such as p(HGNC:APP,sub(R,275,H)) can be automatically translated to the appropriate HGVS p(HGNC:APP, var(p.Arg275His)), assuming that all substitutions are using the reference protein sequence for numbering and not the genomic reference. The previous statements both produce the underlying data:

from pybel.constants import *

{
    FUNCTION: GENE,
    NAMESPACE: 'HGNC',
    NAME: 'APP',
    VARIANTS: [
        {
            KIND: HGVS,
            IDENTIFIER: 'p.Arg275His',
        },
    ],
}

See also

  • BEL 2.0 specification on protein substitutions

  • PyBEL module pybel.parser.modifiers.get_protein_substitution_language

Protein Modifications

Protein Modifications.

The addition of a post-translational modification (PTM) tag results in an entry called ‘variants’ in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry ‘kind’ to identify whether it is a PTM, gene modification, fragment, or HGVS variant. The ‘kind’ value for PTM is ‘pmod’.

Each PMOD contains an identifier, which is a dictionary with the namespace and name, and can optionally include the position (‘pos’) and/or amino acid code (‘code’).

For example, the node p(HGNC:GSK3B, pmod(P, S, 9)) is represented with the following:

from pybel.constants import *

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'GSK3B',
    VARIANTS: [
        {
            KIND: PMOD,
            IDENTIFIER: {
                NAMESPACE: BEL_DEFAULT_NAMESPACE
                NAME: 'Ph',
            },
            PMOD_CODE: 'Ser',
            PMOD_POSITION: 9,
        },
    ],
}

As an additional example, in p(HGNC:MAPK1, pmod(Ph, Thr, 202), pmod(Ph, Tyr, 204)), MAPK is phosphorylated twice to become active. This results in the following:

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'MAPK1',
    VARIANTS: [
        {
            KIND: PMOD,
            IDENTIFIER: {
                NAMESPACE: BEL_DEFAULT_NAMESPACE
                NAME: 'Ph',

            },
            PMOD_CODE: 'Thr',
            PMOD_POSITION: 202
        },
        {
            KIND: PMOD,
            IDENTIFIER: {
                NAMESPACE: BEL_DEFAULT_NAMESPACE
                NAME: 'Ph',

            },
            PMOD_CODE: 'Tyr',
            PMOD_POSITION: 204
        }
    ]
}

See also

  • BEL 2.0 specification on protein modifications

  • PyBEL module pybel.parser.modifiers.get_protein_modification_language

Protein Truncations

Truncations.

Truncations in the legacy BEL 1.0 specification are automatically translated to BEL 2.0 with HGVS nomenclature. p(HGNC:AKT1, trunc(40)) becomes p(HGNC:AKT1, var(p.40*)) and is represented with the following dictionary:

from pybel.constants import *

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'AKT1',
    VARIANTS: [
        {
            KIND: HGVS,
            IDENTIFIER: 'p.40*',
        },
    ],
}

Unfortunately, the HGVS nomenclature requires the encoding of the terminal amino acid which is exchanged for a stop codon, and this information is not required by BEL 1.0. For this example, the proper encoding of the truncation at position also includes the information that the 40th amino acid in the AKT1 is Cys. Its BEL encoding should be p(HGNC:AKT1, var(p.Cys40*)). Temporary support has been added to compile these statements, but it’s recommended they are upgraded by reexamining the supporting text, or looking up the amino acid sequence.

See also

  • BEL 2.0 specification on truncations

  • PyBEL module pybel.parser.modifiers.get_truncation_language

Protein Fragments

Fragments.

The addition of a fragment results in an entry called pybel.constants.VARIANTS in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry pybel.constants.KIND to identify whether it is a PTM, gene modification, fragment, or HGVS variant. The pybel.constants.KIND value for a fragment is pybel.constants.FRAGMENT.

Each fragment contains an identifier, which is a dictionary with the namespace and name, and can optionally include the position (‘pos’) and/or amino acid code (‘code’).

For example, the node p(HGNC:GSK3B, frag(45_129)) is represented with the following:

from pybel.constants import *

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'GSK3B',
    VARIANTS: [
        {
            KIND: FRAGMENT,
            FRAGMENT_START: 45,
            FRAGMENT_STOP: 129,
        },
    ],
}

Additionally, nodes can have an asterick (*) or question mark (?) representing unbound or unknown fragments, respectively.

A fragment may also be unknown, such as in the node p(HGNC:GSK3B, frag(?)). This is represented with the key pybel.constants.FRAGMENT_MISSING and the value of ‘?’ like:

from pybel.constants import *

{
    FUNCTION: PROTEIN,
    NAMESPACE: 'HGNC',
    NAME: 'GSK3B',
    VARIANTS: [
        {
            KIND: FRAGMENT,
            FRAGMENT_MISSING: '?',
        },
    ],
}

See also

Fusions

Fusions.

Gene, RNA, miRNA, and protein fusions are all represented with the same underlying data structure. Below it is shown with uppercase letters referring to constants from pybel.constants and. For example, g(HGNC:BCR, fus(HGNC:JAK2, 1875, 2626)) is represented as:

from pybel.constants import *

{
    FUNCTION: GENE,
    FUSION: {
        PARTNER_5P: {NAMESPACE: 'HGNC', NAME: 'BCR'},
        PARTNER_3P: {NAMESPACE: 'HGNC', NAME: 'JAK2'},
        RANGE_5P: {
            FUSION_REFERENCE: 'c',
            FUSION_START: '?',
            FUSION_STOP: 1875,
        },
        RANGE_3P: {
            FUSION_REFERENCE: 'c',
            FUSION_START: 2626,
            FUSION_STOP: '?',
        },
    },
}

See also

  • BEL 2.0 specification on fusions (2.6.1)

  • PyBEL module pybel.parser.modifiers.get_fusion_language

  • PyBEL module pybel.parser.modifiers.get_legacy_fusion_language

Unqualified Edges

Unqualified edges are automatically inferred by PyBEL and do not contain citations or supporting evidence.

Variant and Modifications’ Parent Relations

All variants, modifications, fragments, and truncations are connected to their parent entity with an edge having the relationship hasParent.

For p(HGNC:GSK3B, var(p.Gly123Arg)), the following edge is inferred:

p(HGNC:GSK3B, var(p.Gly123Arg)) hasParent p(HGNC:GSK3B)

All variants have this relationship to their reference node. BEL does not specify relationships between variants, such as the case when a given phosphorylation is necessary to make another one. This knowledge could be encoded directly like BEL, since PyBEL does not restrict users from manually asserting unqualified edges.

List Abundances

Complexes and composites that are defined by lists. As of version 0.9.0, they contain a list of the data dictionaries that describe their members. For example complex(p(HGNC:FOS), p(HGNC:JUN)) becomes:

from pybel.constants import *

{
    FUNCTION: COMPLEX,
    MEMBERS: [
        {
            FUNCTION: PROTEIN,
            NAMESPACE: 'HGNC',
            NAME: 'FOS',
        }, {
            FUNCTION: PROTEIN,
            NAMESPACE: 'HGNC',
            NAME: 'JUN',
        }
    ]
}

The following edges are also inferred:

complex(p(HGNC:FOS), p(HGNC:JUN)) hasMember p(HGNC:FOS)
complex(p(HGNC:FOS), p(HGNC:JUN)) hasMember p(HGNC:JUN)

See also

BEL 2.0 specification on complex abundances

Similarly, composite(a(CHEBI:malonate), p(HGNC:JUN)) becomes:

from pybel.constants import *

{
    FUNCTION: COMPOSITE,
    MEMBERS: [
        {
            FUNCTION: ABUNDANCE,
            NAMESPACE: 'CHEBI',
            NAME: 'malonate',
        }, {
            FUNCTION: PROTEIN,
            NAMESPACE: 'HGNC',
            NAME: 'JUN',
        }
    ]
}

The following edges are inferred:

composite(a(CHEBI:malonate), p(HGNC:JUN)) hasComponent a(CHEBI:malonate)
composite(a(CHEBI:malonate), p(HGNC:JUN)) hasComponent p(HGNC:JUN)

Warning

The canonical ordering for the elements of the pybel.constantsMEMBERS list correspond to the sorted order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list(). Rather than directly modifying the BELGraph’s structure, use BELGraph.add_node_from_data(), which takes care of automatically canonicalizing this dictionary.

See also

BEL 2.0 specification on composite abundances

Reactions

The usage of a reaction causes many nodes and edges to be created. The following example will illustrate what is added to the network for

rxn(reactants(a(CHEBI:"(3S)-3-hydroxy-3-methylglutaryl-CoA"), a(CHEBI:"NADPH"), \
    a(CHEBI:"hydron")), products(a(CHEBI:"mevalonate"), a(CHEBI:"NADP(+)")))

As of version 0.9.0, the reactants’ and products’ data dictionaries are included as sub-lists keyed REACTANTS and PRODUCTS. It becomes:

from pybel.constants import *

{
    FUNCTION: REACTION
    REACTANTS: [
        {
            FUNCTION: ABUNDANCE,
            NAMESPACE: 'CHEBI',
            NAME: '(3S)-3-hydroxy-3-methylglutaryl-CoA'
        }, {
            FUNCTION: ABUNDANCE,
            NAMESPACE: 'CHEBI',
            NAME: 'NADPH',
        }, {
            FUNCTION: ABUNDANCE,
            NAMESPACE: 'CHEBI',
            NAME: 'hydron',
        }
    ],
    PRODUCTS: [
        {
            FUNCTION: ABUNDANCE,
            NAMESPACE: 'CHEBI',
            NAME: 'mevalonate',
        }, {
            FUNCTION: ABUNDANCE,
            NAMESPACE: 'CHEBI',
            NAME: 'NADP(+)',
        }
    ]
}

Warning

The canonical ordering for the elements of the REACTANTS and PRODUCTS lists correspond to the sorted order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list(). Rather than directly modifying the BELGraph’s structure, use BELGraph.add_node_from_data(), which takes care of automatically canonicalizing this dictionary.

The following edges are inferred, where X represents the previous reaction, for brevity:

X hasReactant a(CHEBI:"(3S)-3-hydroxy-3-methylglutaryl-CoA")
X hasReactant a(CHEBI:"NADPH")
X hasReactant a(CHEBI:"hydron")
X hasProduct a(CHEBI:"mevalonate")
X hasProduct a(CHEBI:"NADP(+)"))

See also

BEL 2.0 specification on reactions

Edges

Design Choices

In the OpenBEL Framework, modifiers such as activities (kinaseActivity, etc.) and transformations (translocations, degradations, etc.) were represented as their own nodes. In PyBEL, these modifiers are represented as a property of the edge. In reality, an edge like sec(p(HGNC:A)) -> activity(p(HGNC:B), ma(kinaseActivity)) represents a connection between HGNC:A and HGNC:B. Each of these modifiers explains the context of the relationship between these physical entities. Further, querying a network where these modifiers are part of a relationship is much more straightforward. For example, finding all proteins that are upregulated by the kinase activity of another protein now can be directly queried by filtering all edges for those with a subject modifier whose modification is molecular activity, and whose effect is kinase activity. Having fewer nodes also allows for a much easier display and visual interpretation of a network. The information about the modifier on the subject and activity can be displayed as a color coded source and terminus of the connecting edge.

The compiler in OpenBEL framework created nodes for molecular activities like kin(p(HGNC:YFG)) and induced an edge like p(HGNC:YFG) actsIn kin(p(HGNC:YFG)). For transformations, a statement like tloc(p(HGNC:YFG), GOCC:intracellular, GOCC:"cell membrane") also induced tloc(p(HGNC:YFG), GOCC:intracellular, GOCC:"cell membrane") translocates p(HGNC:YFG).

In PyBEL, we recognize that these modifications are actually annotations to the type of relationship between the subject’s entity and the object’s entity. p(HGNC:ABC) -> tloc(p(HGNC:YFG), GOCC:intracellular, GOCC:"cell membrane") is about the relationship between p(HGNC:ABC) and p(HGNC:YFG), while the information about the translocation qualifies that the object is undergoing an event, and not just the abundance. This is a confusion with the use of proteinAbundance as a keyword, and perhaps is why many people prefer to use just the keyword p

Example Edge Data Structure

Because this data is associated with an edge, the node data for the subject and object are not included explicitly. However, information about the activities, modifiers, and transformations on the subject and object are included. Below is the “skeleton” for the edge data model in PyBEL:

from pybel.constants import *

{
    SUBJECT: {
        # ... modifications to the subject node. Only present if non-empty.
    },
    RELATION: POSITIVE_CORRELATION,
    OBJECT: {
        # ... modifications to the object node. Only present if non-empty.
    },
    EVIDENCE: ...,
    CITATION : {
        CITATION_TYPE: CITATION_TYPE_PUBMED,
        CITATION_REFERENCE: ...,
        CITATION_DATE: 'YYYY-MM-DD',
        CITATION_AUTHORS: 'Jon Snow|John Doe',
    },
    ANNOTATIONS: {
        'Disease': {
            'Colorectal Cancer': True,
        },
        # ... additional annotations as tuple[str,dict[str,bool]] pairs
    },
}

Each edge must contain the RELATION, EVIDENCE, and CITATION entries. The CITATION must minimally contain CITATION_TYPE and CITATION_REFERENCE since these can be used to look up additional metadata.

Note

Since version 0.10.2, annotations now always appear as dictionaries, even if only one value is present.

Activities

Modifiers are added to this structure as well. Under this schema, p(HGNC:GSK3B, pmod(P, S, 9)) pos act(p(HGNC:GSK3B), ma(kin)) becomes:

from pybel.constants import *

{
    RELATION: POSITIVE_CORRELATION,
    OBJECT: {
        MODIFIER: ACTIVITY,
        EFFECT: {
            NAME: 'kin',
            NAMESPACE: BEL_DEFAULT_NAMESPACE,
        }
    },
    CITATION: { ... },
    EVIDENCE: ...,
    ANNOTATIONS: { ... },
}

Activities without molecular activity annotations do not contain an pybel.constants.EFFECT entry: Under this schema, p(HGNC:GSK3B, pmod(P, S, 9)) pos act(p(HGNC:GSK3B)) becomes:

from pybel.constants import *

{
    RELATION: POSITIVE_CORRELATION,
    OBJECT: {
        MODIFIER: ACTIVITY
    },
    CITATION: { ... },
    EVIDENCE: ...,
    ANNOTATIONS: { ... },
}

Locations

Locations.

Location data also is added into the information in the edge for the node (subject or object) for which it was annotated. p(HGNC:GSK3B, pmod(P, S, 9), loc(GO:lysozome)) pos act(p(HGNC:GSK3B), ma(kin)) becomes:

from pybel.constants import *

{
    SUBJECT: {
        LOCATION: {
            NAMESPACE: 'GO',
            NAME: 'lysozome',
        }
    },
    RELATION: POSITIVE_CORRELATION,
    OBJECT: {
        MODIFIER: ACTIVITY,
        EFFECT: {
            NAMESPACE: BEL_DEFAULT_NAMESPACE
            NAME: 'kin',
        }
    },
    EVIDENCE: ...,
    CITATION: { ... },
}

The addition of the location() element in BEL 2.0 allows for the unambiguous expression of the differences between the process of hypothetical HGNC:A moving from one place to another and the existence of hypothetical HGNC:A in a specific location having different effects. In BEL 1.0, this action had its own node, but this introduced unnecessary complexity to the network and made querying more difficult. This calls for thoughtful consideration of the following two statements:

  • tloc(p(HGNC:A), fromLoc(GO:intracellular), toLoc(GO:"cell membrane")) -> p(HGNC:B)

  • p(HGNC:A, location(GO:"cell membrane")) -> p(HGNC:B)

See also

Translocations

Translocations have their own unique syntax. p(HGNC:YFG1) -> sec(p(HGNC:YFG2)) becomes:

from pybel.constants import *

{
    RELATION: INCREASES,
    OBJECT: {
        MODIFIER: TRANSLOCATION,
        EFFECT: {
            FROM_LOC: {
                NAMESPACE: 'GO',
                NAME: 'intracellular',
            },
            TO_LOC: {
                NAMESPACE: 'GO',
                NAME: 'extracellular space',
            }
        }
    },
    CITATION: { ... },
    EVIDENCE: ...,
    ANNOTATIONS: { ... },
}

See also

BEL 2.0 specification on translocations

Degradations

Degradations are more simple, because there’s no :pybel.constants.EFFECT entry. p(HGNC:YFG1) -> deg(p(HGNC:YFG2)) becomes:

from pybel.constants import *

{
    RELATION: INCREASES,
    OBJECT: {
        MODIFIER: DEGRADATION,
    },
    CITATION: { ... },
    EVIDENCE: ...,
    ANNOTATIONS: { ... },
}

See also

BEL 2.0 specification on degradations