Data Model
The pybel.struct
module houses functions for handling the main data structure in PyBEL.
Because BEL expresses how biological entities interact within many
different contexts, with descriptive annotations, PyBEL represents data as a directed multi-graph by sub-classing the
networkx.MultiDiGraph
. Each node is an instance of a subclass of the pybel.dsl.BaseEntity
and each
edge has a stable key and associated data dictionary for storing relevant contextual information.
The graph contains metadata for the PyBEL version, the BEL script metadata, the namespace definitions, the
annotation definitions, and the warnings produced in analysis. Like any networkx
graph, all attributes of
a given object can be accessed through the graph
property, like in: my_graph.graph['my key']
.
Convenient property definitions are given for these attributes that are outlined in the documentation for
pybel.BELGraph
.
This allows for much easier programmatic access to answer more complicated questions, which can be written with python
code. Because the data structure is the same in Neo4J, the data can be directly exported with pybel.to_neo4j()
.
Neo4J supports the Cypher querying language so that the same queries can be written in an elegant and simple way.
Constants
These documents refer to many aspects of the data model using constants, which can be found in the top-level module
pybel.constants
.
Terms describing abundances, annotations, and other internal data are designated in pybel.constants
with full-caps, such as pybel.constants.FUNCTION
and pybel.constants.PROTEIN
.
For normal usage, we suggest referring to values in dictionaries by these constants, in case the hard-coded strings behind these constants change.
Function Nomenclature
The following table shows PyBEL’s internal mapping from BEL functions to its own constants. This can be accessed
programatically via pybel.parser.language.abundance_labels
.
BEL Function |
PyBEL Constant |
PyBEL DSL |
---|---|---|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
Graph
- class pybel.BELGraph(name=None, version=None, description=None, authors=None, contact=None, license=None, copyright=None, disclaimer=None, path=None)[source]
An extension to
networkx.MultiDiGraph
to represent BEL.Initialize a BEL graph with its associated metadata.
- Parameters
version (
Optional
[str
]) – The graph’s version. Recommended to use semantic versioning orYYYYMMDD
format.
- __add__(other)[source]
Copy this graph and join it with another graph with it using
pybel.struct.left_full_join()
.- Parameters
other (
BELGraph
) – Another BEL graph
Example usage:
>>> from pybel.examples import ras_tloc_graph, braf_graph >>> k = ras_tloc_graph + braf_graph
- Return type
- __iadd__(other)[source]
Join another graph into this one, in-place, using
pybel.struct.left_full_join()
.- Parameters
other (
BELGraph
) – Another BEL graph
Example usage:
>>> from pybel.examples import ras_tloc_graph, braf_graph >>> ras_tloc_graph += braf_graph
- Return type
- __and__(other)[source]
Create a deep copy of this graph and left outer joins another graph.
Uses
pybel.struct.left_outer_join()
.- Parameters
other (
BELGraph
) – Another BEL graph
Example usage:
>>> from pybel.examples import ras_tloc_graph, braf_graph >>> k = ras_tloc_graph & braf_graph
- Return type
- __iand__(other)[source]
Join another graph into this one, in-place, using
pybel.struct.left_outer_join()
.- Parameters
other (
BELGraph
) – Another BEL graph
Example usage:
>>> from pybel.examples import ras_tloc_graph, braf_graph >>> ras_tloc_graph &= braf_graph
- Return type
- transitivities: Set[Tuple[str, str]]
A set of pairs of hashes of edges over which there is transitivity. For example, for the nested statement (P(X) -> P(Y)) -> P(Z) will have a pair for (hash(P(X) -> P(Y)), hash(P(Y) -> P(Z)))
- parent
A reference to the parent graph
- property count: pybel.struct.graph.CountDispatch
A dispatch to count functions.
Can be used like this:
>>> from pybel.examples import sialic_acid_graph >>> sialic_acid_graph.count.functions() Counter({'Protein': 7, 'Complex': 1, 'Abundance': 1})
- Return type
- property summarize: pybel.struct.graph.SummarizeDispatch
A dispatch to summarize the graph.
- Return type
- property expand: pybel.struct.graph.ExpandDispatch
A dispatch to expand the graph w.r.t. its parent.
- Return type
- property induce: pybel.struct.graph.InduceDispatch
A dispatch to mutate the graph.
- Return type
- property plot: pybel.struct.graph.PlotDispatch
A dispatch to plot the graph using
matplotlib
andseaborn
.- Return type
- property name: Optional[str]
The graph’s name.
Hint
Can be set with the
SET DOCUMENT Name = "..."
entry in the source BEL script.
- property version: Optional[str]
The graph’s version.
Hint
Can be set with the
SET DOCUMENT Version = "..."
entry in the source BEL script.
- property description: Optional[str]
The graph’s description.
Hint
Can be set with the
SET DOCUMENT Description = "..."
entry in the source BEL document.
- property authors: Optional[str]
The graph’s authors.
Hint
Can be set with the
SET DOCUMENT Authors = "..."
entry in the source BEL document.
- property contact: Optional[str]
The graph’s contact information.
Hint
Can be set with the
SET DOCUMENT ContactInfo = "..."
entry in the source BEL document.
- property license: Optional[str]
The graph’s license.
Hint
Can be set with the
SET DOCUMENT Licenses = "..."
entry in the source BEL document
- property copyright: Optional[str]
The graph’s copyright.
Hint
Can be set with the
SET DOCUMENT Copyright = "..."
entry in the source BEL document
- property disclaimer: Optional[str]
The graph’s disclaimer.
Hint
Can be set with the
SET DOCUMENT Disclaimer = "..."
entry in the source BEL document.
- property namespace_url: Dict[str, str]
The mapping from the keywords used in this graph to their respective BEL namespace URLs.
Hint
Can be appended with the
DEFINE NAMESPACE [key] AS URL "[value]"
entries in the definitions section of the source BEL document.
- property defined_namespace_keywords: Set[str]
The set of all keywords defined as namespaces in this graph.
- property namespace_pattern: Dict[str, str]
The mapping from the namespace keywords used to create this graph to their regex patterns.
Hint
Can be appended with the
DEFINE NAMESPACE [key] AS PATTERN "[value]"
entries in the definitions section of the source BEL document.
- property annotation_url: Dict[str, str]
The mapping from the annotation keywords used to create this graph to the URLs of the BELANNO files.
Hint
Can be appended with the
DEFINE ANNOTATION [key] AS URL "[value]"
entries in the definitions section of the source BEL document.
- property annotation_pattern: Dict[str, str]
The mapping from the annotation keywords used to create this graph to their regex patterns as strings.
Hint
Can be appended with the
DEFINE ANNOTATION [key] AS PATTERN "[value]"
entries in the definitions section of the source BEL document.
- property annotation_list: Dict[str, Set[str]]
The mapping from the keywords of locally defined annotations to their respective sets of values.
Hint
Can be appended with the
DEFINE ANNOTATION [key] AS LIST {"[value]", ...}
entries in the definitions section of the source BEL document.
- property defined_annotation_keywords: Set[str]
Get the set of all keywords defined as annotations in this graph.
- property pybel_version: str
The version of PyBEL with which this graph was produced as a string.
- Return type
- property warnings: List[Tuple[Optional[str], pybel.exceptions.BELParserWarning, Mapping]]
A list of warnings associated with this graph.
- number_of_citations()[source]
Return the number of citations contained within the graph.
- Return type
- add_unqualified_edge(source, target, relation)[source]
Add a unique edge that has no annotations.
- Parameters
source (
BaseEntity
) – The source nodetarget (
BaseEntity
) – The target noderelation (
str
) – A relationship label frompybel.constants
- Return type
- Returns
The key for this edge (a unique hash)
- add_transcription(gene, rna)[source]
Add a transcription relation from a gene to an RNA or miRNA node.
- add_equivalence(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add two equivalence relations for the nodes.
- add_orthology(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add two orthology relations for the nodes such that
u orthologousTo v
andv orthologousTo u
.
- add_is_a(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'isA') str
Add an
isA
relationship such thatu isA v
.
- add_part_of(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'partOf') str
Add a
partOf
relationship such thatu partOf v
.
- add_has_variant(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasVariant') str
Add a
hasVariant
relationship such thatu hasVariant v
.
- add_has_reactant(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasReactant') str
Add a
hasReactant
relationship such thatu hasReactant v
.
- add_has_product(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasProduct') str
Add a
hasProduct
relationship such thatu hasProduct v
.
- add_qualified_edge(source, target, *, relation, evidence, citation, annotations=None, source_modifier=None, target_modifier=None, **attr)[source]
Add a qualified edge.
Qualified edges have a relation, evidence, citation, and optional annotations, subject modifications, and object modifications.
- Parameters
source (
BaseEntity
) – The source nodetarget (
BaseEntity
) – The target noderelation (
str
) – The type of relation this edge representsevidence (
str
) – The evidence string from an articlecitation (
Union
[str
,Tuple
[str
,str
],CitationDict
]) – The citation data dictionary for this evidence. If a string is given, assumes it’s a PubMed identifier and auto-fills the citation type.annotations (
Union
[Mapping
[str
,str
],Mapping
[str
,Set
[str
]],Mapping
[str
,List
[Entity
]],None
]) – The annotations data dictionarysource_modifier (
Optional
[Mapping
[str
,Any
]]) – The modifiers (like activity) on the subject node. See data model documentation.target_modifier (
Optional
[Mapping
[str
,Any
]]) – The modifiers (like activity) on the object node. See data model documentation.
- Return type
- Returns
The hash of the edge
- add_binds(source, target, *, evidence, citation, annotations=None, **attr)[source]
Add a “binding” relationship between the two entities such that
u => complex(u, v)
.- Return type
- add_increases(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'increases', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Wrap
add_qualified_edge()
for thepybel.constants.INCREASES
relation.
- add_directly_increases(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'directlyIncreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Add a
pybel.constants.DIRECTLY_INCREASES
withadd_qualified_edge()
.
- add_decreases(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'decreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Add a
pybel.constants.DECREASES
relationship withadd_qualified_edge()
.
- add_directly_decreases(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'directlyDecreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Add a
pybel.constants.DIRECTLY_DECREASES
relationship withadd_qualified_edge()
.
- add_association(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add a
pybel.constants.ASSOCIATION
relationship withadd_qualified_edge()
.
- add_regulates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'regulates', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Add a
pybel.constants.REGULATES
relationship withadd_qualified_edge()
.
- add_directly_regulates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'directlyRegulates', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Add a
pybel.constants.DIRECTLY_REGULATES
relationship withadd_qualified_edge()
.
- add_correlation(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add a
pybel.constants.CORRELATION
relationship withadd_qualified_edge()
.
- add_no_correlation(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add a
pybel.constants.NO_CORRELATION
relationship withadd_qualified_edge()
.
- add_positive_correlation(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add a
pybel.constants.POSITIVE_CORRELATION
relationship withadd_qualified_edge()
.
- add_negative_correlation(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) str
Add a
pybel.constants.NEGATIVE_CORRELATION
relationship withadd_qualified_edge()
.
- add_causes_no_change(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'causesNoChange', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = None, **attr) str
Add a
pybel.constants.CAUSES_NO_CHANGE
relationship withadd_qualified_edge()
.
- add_inhibits(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'decreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = {'effect': {'identifier': '0003674', 'name': 'molecular function', 'namespace': 'go'}, 'modifier': 'Activity'}, **attr) str
Add an “inhibits” relationship.
A more specific version of
add_decreases()
that automatically populates the object modifier with an activity.
- add_activates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'increases', evidence: str, citation: Union[str, Tuple[str, str], pybel.language.CitationDict], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping[str, Any]] = None, target_modifier: Optional[Mapping[str, Any]] = {'effect': {'identifier': '0003674', 'name': 'molecular function', 'namespace': 'go'}, 'modifier': 'Activity'}, **attr) str
Add an “activates” relationship.
A more specific version of
add_increases()
that automatically populates the object modifier with an activity.
- add_phosphorylates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.CentralDogma, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping] = None, target_modifier: Optional[Mapping] = None, **attr)
Add an increase of modified object with phosphorylation.
- add_directly_phosphorylates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.CentralDogma, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping] = None, target_modifier: Optional[Mapping] = None, **attr)
Add a direct increase of modified object with phosphorylation.
- add_dephosphorylates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.CentralDogma, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping] = None, target_modifier: Optional[Mapping] = None, **attr)
Add a decrease of modified object with phosphorylation.
- add_directly_dephosphorylates(source: pybel.dsl.node_classes.BaseEntity, target: pybel.dsl.node_classes.CentralDogma, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Optional[Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, List[pybel.language.Entity]]]] = None, source_modifier: Optional[Mapping] = None, target_modifier: Optional[Mapping] = None, **attr)
Add a direct decrease of modified object with phosphorylation.
- static edge_to_bel(u, v, edge_data, sep=None, use_identifiers=True)[source]
Serialize a pair of nodes and related edge data as a BEL relation.
- Return type
- iter_equivalent_nodes(node)[source]
Iterate over nodes that are equivalent to the given node, including the original.
- Return type
- get_equivalent_nodes(node)[source]
Get a set of equivalent nodes to this node, excluding the given node.
- Return type
- node_has_namespace(node, namespace)[source]
Check if the node have the given namespace.
This also should look in the equivalent nodes.
- Return type
Dispatches
Dispatches are classes that enable easy access to summary, mutation, and other functions
that consume graphs directly through the pybel.BELGraph
interface.
- class pybel.struct.graph.CountDispatch(graph)[source]
A dispatch for count functions that can be found at
pybel.BELGraph.count
.- functions()[source]
Count the functions in a graph.
>>> from pybel.examples import sialic_acid_graph >>> sialic_acid_graph.count.functions() Counter({'Protein': 7, 'Complex': 1, 'Abundance': 1})
- Return type
- namespaces()[source]
Return a counter of namespaces’ occurrences in nodes in the graph.
- Return type
- pathologies()[source]
Return a counter of pathologies’ occurrences in edges in the graph.
- Return type
- annotations()[source]
Return a counter of annotations’ occurrences in edges in the graph.
- Return type
- error_types()[source]
Return a counter of error types’ occurrences in BEL script underlying the graph.
- Return type
- modifications()[source]
Return a counter of relation modifications’ occurrences (activity, translocation, etc.) in the graph.
- Return type
- class pybel.struct.graph.InduceDispatch(graph)[source]
A dispatch for induction functions that can be found at
pybel.BELGraph.induce
.
- class pybel.struct.graph.SummarizeDispatch(graph)[source]
A dispatch for summary printing functions that can be found at
pybel.BELGraph.summarize
.
- class pybel.struct.graph.ExpandDispatch(graph)[source]
A dispatch for count functions that can be found at
pybel.BELGraph.expand
.- property parent: pybel.struct.graph.BELGraph
Get the parent BEL graph.
- Return type
- neighborhood(node)[source]
Expand around the neighborhood of a given node.
>>> from pybel.examples import braf_graph >>> from pybel.dsl import Protein >>> thpo = Protein(namespace='HGNC', name='THPO', identifier='11795') >>> braf = Protein(namespace='HGNC', name='BRAF', identifier='1097') >>> raf1 = Protein(namespace='HGNC', name='RAF1', identifier='9829') >>> elk1 = Protein(namespace='HGNC', name='ELK1', identifier='3321') >>> subgraph_1 = braf_graph.induce.paths([braf, elk1]) >>> assert thpo not in subgraph_1 and raf1 not in subgraph_1 >>> subgraph_2 = subgraph_1.expand.neighborhood(braf) >>> assert thpo in subgraph_2 and raf1 not in subgraph_2
- Return type
- class pybel.struct.graph.PlotDispatch(graph)[source]
A dispatch for count functions that can be found at
pybel.BELGraph.plot
.
Nodes
Nodes (or entities) in a pybel.BELGraph
represent physical entities’ abundances. Most contain information
about the identifier for the entity using a namespace/name pair. The PyBEL parser converts BEL terms to an internal
representation using an internal domain specific language (DSL) that allows for writing BEL directly in Python.
For example, after the BEL term p(hgnc:GSK3B)
is parsed, it is instantiated as a Python object using the
DSL function corresponding to the p()
function in BEL, pybel.dsl.Protein
, like:
from pybel.dsl import Protein
gsk3b_protein = Protein(namespace='hgnc', name='GSK3B')
pybel.dsl.Protein
, like the others mentioned before, inherit from pybel.dsl.BaseEntity
, which itself
inherits from dict
. Therefore, the resulting object can be used like a dict that looks like:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'hgnc',
NAME: 'GSK3B',
}
Alternatively, it can be used in more exciting ways, outlined later in the documentation for pybel.dsl
.
Variants
The addition of a variant tag results in an entry called ‘variants’ in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry ‘kind’ to identify whether it is a post-translational modification (PTM), gene modification, fragment, or HGVS variant.
Warning
The canonical ordering for the elements of the VARIANTS
list correspond to the sorted
order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list()
. Rather than
directly modifying the BELGraph’s structure, use pybel.BELGraph.add_node_from_data()
, which takes care of
automatically canonicalizing this dictionary.
HGVS Variants.
For example, the BEL term p(HGNC:GSK3B, var(p.Gly123Arg))
is translated to the following internal DSL:
from pybel.dsl import Protein, Hgvs
gsk3b_variant = Protien(namespace='HGNC', name='GSK3B', variants=Hgvs('p.Gly123Arg'))
Further, the shorthand for protein substitutions, pybel.dsl.ProteinSubstitution
, can be used to produce the
same result, as it inherits from pybel.dsl.Hgvs
:
from pybel.dsl import Protein, ProteinSubstitution
gsk3b_variant = Protien(namespace='HGNC', name='GSK3B', variants=ProteinSubstitution('Gly', 123, 'Arg'))
Either way, the resulting object can be used like a dict that looks like:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'p.Gly123Arg',
},
],
}
See also
BEL 2.0 specification on variants
HGVS conventions
PyBEL module
pybel.parser.modifiers.get_hgvs_language
Gene Substitutions
Gene Substitutions.
Gene substitutions are legacy statements defined in BEL 1.0. BEL 2.0 recommends using HGVS strings. Luckily,
the information contained in a BEL 1.0 encoding, such as g(HGNC:APP,sub(G,275341,C))
can be
automatically translated to the appropriate HGVS g(HGNC:APP, var(c.275341G>C))
, assuming that all
substitutions are using the reference coding gene sequence for numbering and not the genomic reference.
The previous statements both produce the underlying data:
from pybel.constants import *
{
FUNCTION: GENE,
NAMESPACE: 'HGNC',
NAME: 'APP',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'c.275341G>C',
},
],
}
See also
BEL 2.0 specification on gene substitutions
PyBEL module
pybel.parser.modifiers.get_gene_substitution_language
Gene Modifications
Gene Modifications.
PyBEL introduces the gene modification tag, gmod(), to allow for the encoding of epigenetic modifications. Its syntax follows the same style s the pmod() tags for proteins, and can include the following values:
M
Me
methylation
A
Ac
acetylation
For example, the node g(HGNC:GSK3B, gmod(M))
is represented with the following:
from pybel.constants import *
{
FUNCTION: GENE,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: GMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE,
NAME: 'Me',
},
},
],
}
The addition of this function does not preclude the use of all other standard functions in BEL; however, other compilers probably won’t support these standards. If you agree that this is useful, please contribute to discussion in the OpenBEL community.
See also
PyBEL module
pybel.parser.modifiers.get_gene_modification_language()
Protein Substitutions
Protein Substitutions.
Protein substitutions are legacy statements defined in BEL 1.0. BEL 2.0 recommends using HGVS strings. Luckily,
the information contained in a BEL 1.0 encoding, such as p(HGNC:APP,sub(R,275,H))
can be
automatically translated to the appropriate HGVS p(HGNC:APP, var(p.Arg275His))
, assuming that all
substitutions are using the reference protein sequence for numbering and not the genomic reference.
The previous statements both produce the underlying data:
from pybel.constants import *
{
FUNCTION: GENE,
NAMESPACE: 'HGNC',
NAME: 'APP',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'p.Arg275His',
},
],
}
See also
BEL 2.0 specification on protein substitutions
PyBEL module
pybel.parser.modifiers.get_protein_substitution_language
Protein Modifications
Protein Modifications.
The addition of a post-translational modification (PTM) tag results in an entry called ‘variants’ in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry ‘kind’ to identify whether it is a PTM, gene modification, fragment, or HGVS variant. The ‘kind’ value for PTM is ‘pmod’.
Each PMOD contains an identifier, which is a dictionary with the namespace and name, and can optionally include the position (‘pos’) and/or amino acid code (‘code’).
For example, the node p(HGNC:GSK3B, pmod(P, S, 9))
is represented with the following:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: PMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'Ph',
},
PMOD_CODE: 'Ser',
PMOD_POSITION: 9,
},
],
}
As an additional example, in p(HGNC:MAPK1, pmod(Ph, Thr, 202), pmod(Ph, Tyr, 204))
, MAPK is phosphorylated
twice to become active. This results in the following:
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'MAPK1',
VARIANTS: [
{
KIND: PMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'Ph',
},
PMOD_CODE: 'Thr',
PMOD_POSITION: 202
},
{
KIND: PMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'Ph',
},
PMOD_CODE: 'Tyr',
PMOD_POSITION: 204
}
]
}
See also
BEL 2.0 specification on protein modifications
PyBEL module
pybel.parser.modifiers.get_protein_modification_language
Protein Truncations
Truncations.
Truncations in the legacy BEL 1.0 specification are automatically translated to BEL 2.0 with HGVS nomenclature.
p(HGNC:AKT1, trunc(40))
becomes p(HGNC:AKT1, var(p.40*))
and is represented with the following
dictionary:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'AKT1',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'p.40*',
},
],
}
Unfortunately, the HGVS nomenclature requires the encoding of the terminal amino acid which is exchanged
for a stop codon, and this information is not required by BEL 1.0. For this example, the proper encoding
of the truncation at position also includes the information that the 40th amino acid in the AKT1 is Cys. Its
BEL encoding should be p(HGNC:AKT1, var(p.Cys40*))
. Temporary support has been added to
compile these statements, but it’s recommended they are upgraded by reexamining the supporting text, or
looking up the amino acid sequence.
See also
BEL 2.0 specification on truncations
PyBEL module
pybel.parser.modifiers.get_truncation_language
Protein Fragments
Fragments.
The addition of a fragment results in an entry called pybel.constants.VARIANTS
in the data dictionary associated with a given node. This entry is a list with dictionaries
describing each of the variants. All variants have the entry pybel.constants.KIND
to identify whether it is
a PTM, gene modification, fragment, or HGVS variant. The pybel.constants.KIND
value for a fragment is
pybel.constants.FRAGMENT
.
Each fragment contains an identifier, which is a dictionary with the namespace and name, and can optionally include the position (‘pos’) and/or amino acid code (‘code’).
For example, the node p(HGNC:GSK3B, frag(45_129))
is represented with the following:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: FRAGMENT,
FRAGMENT_START: 45,
FRAGMENT_STOP: 129,
},
],
}
Additionally, nodes can have an asterick (*) or question mark (?) representing unbound or unknown fragments, respectively.
A fragment may also be unknown, such as in the node p(HGNC:GSK3B, frag(?))
. This
is represented with the key pybel.constants.FRAGMENT_MISSING
and the value of ‘?’ like:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: FRAGMENT,
FRAGMENT_MISSING: '?',
},
],
}
See also
BEL 2.0 specification on proteolytic fragments (2.2.3)
PyBEL module
pybel.parser.modifiers.get_fragment_language
Fusions
Fusions.
Gene, RNA, miRNA, and protein fusions are all represented with the same underlying data structure. Below
it is shown with uppercase letters referring to constants from pybel.constants
and. For example,
g(HGNC:BCR, fus(HGNC:JAK2, 1875, 2626))
is represented as:
from pybel.constants import *
{
FUNCTION: GENE,
FUSION: {
PARTNER_5P: {NAMESPACE: 'HGNC', NAME: 'BCR'},
PARTNER_3P: {NAMESPACE: 'HGNC', NAME: 'JAK2'},
RANGE_5P: {
FUSION_REFERENCE: 'c',
FUSION_START: '?',
FUSION_STOP: 1875,
},
RANGE_3P: {
FUSION_REFERENCE: 'c',
FUSION_START: 2626,
FUSION_STOP: '?',
},
},
}
See also
BEL 2.0 specification on fusions (2.6.1)
PyBEL module
pybel.parser.modifiers.get_fusion_language
PyBEL module
pybel.parser.modifiers.get_legacy_fusion_language
Unqualified Edges
Unqualified edges are automatically inferred by PyBEL and do not contain citations or supporting evidence.
Variant and Modifications’ Parent Relations
All variants, modifications, fragments, and truncations are connected to their parent entity with an edge having
the relationship hasParent
.
For p(hgnc:GSK3B, var(p.Gly123Arg))
, the following edge is inferred:
p(hgnc:GSK3B, var(p.Gly123Arg)) hasParent p(hgnc:GSK3B)
All variants have this relationship to their reference node. BEL does not specify relationships between variants, such as the case when a given phosphorylation is necessary to make another one. This knowledge could be encoded directly like BEL, since PyBEL does not restrict users from manually asserting unqualified edges.
List Abundances
Complexes and composites that are defined by lists. As of version 0.9.0, they contain a list of the data dictionaries
that describe their members. For example complex(p(hgnc:FOS), p(hgnc:JUN))
becomes:
from pybel.constants import *
{
FUNCTION: COMPLEX,
MEMBERS: [
{
FUNCTION: PROTEIN,
NAMESPACE: 'hgnc',
NAME: 'FOS',
}, {
FUNCTION: PROTEIN,
NAMESPACE: 'hgnc',
NAME: 'JUN',
}
]
}
The following edges are also inferred:
complex(p(hgnc:FOS), p(hgnc:JUN)) hasMember p(hgnc:FOS)
complex(p(hgnc:FOS), p(hgnc:JUN)) hasMember p(hgnc:JUN)
See also
BEL 2.0+ Tutorial on complex abundances
Similarly, composite(a(CHEBI:malonate), p(hgnc:JUN))
becomes:
from pybel.constants import *
{
FUNCTION: COMPOSITE,
MEMBERS: [
{
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'malonate',
}, {
FUNCTION: PROTEIN,
NAMESPACE: 'hgnc',
NAME: 'JUN',
}
]
}
The following edges are inferred:
composite(a(CHEBI:malonate), p(hgnc:JUN)) hasComponent a(CHEBI:malonate)
composite(a(CHEBI:malonate), p(hgnc:JUN)) hasComponent p(hgnc:JUN)
Warning
The canonical ordering for the elements of the pybel.constantsMEMBERS
list correspond to the sorted
order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list()
. Rather than
directly modifying the BELGraph’s structure, use BELGraph.add_node_from_data()
, which takes care of
automatically canonicalizing this dictionary.
See also
BEL 2.0+ Tutorial on composite abundances
Reactions
The usage of a reaction causes many nodes and edges to be created. The following example will illustrate what is added to the network for
rxn(reactants(a(CHEBI:"(3S)-3-hydroxy-3-methylglutaryl-CoA"), a(CHEBI:"NADPH"), \
a(CHEBI:"hydron")), products(a(CHEBI:"mevalonate"), a(CHEBI:"NADP(+)")))
As of version 0.9.0, the reactants’ and products’ data dictionaries are included as sub-lists keyed REACTANTS
and
PRODUCTS
. It becomes:
from pybel.constants import *
{
FUNCTION: REACTION
REACTANTS: [
{
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: '(3S)-3-hydroxy-3-methylglutaryl-CoA'
}, {
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'NADPH',
}, {
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'hydron',
}
],
PRODUCTS: [
{
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'mevalonate',
}, {
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'NADP(+)',
}
]
}
Warning
The canonical ordering for the elements of the REACTANTS
and PRODUCTS
lists correspond to the sorted
order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list()
. Rather than
directly modifying the BELGraph’s structure, use BELGraph.add_node_from_data()
, which takes care of
automatically canonicalizing this dictionary.
The following edges are inferred, where X
represents the previous reaction, for brevity:
X hasReactant a(CHEBI:"(3S)-3-hydroxy-3-methylglutaryl-CoA")
X hasReactant a(CHEBI:"NADPH")
X hasReactant a(CHEBI:"hydron")
X hasProduct a(CHEBI:"mevalonate")
X hasProduct a(CHEBI:"NADP(+)"))
See also
BEL 2.0+ tutorial on reactions
Edges
Design Choices
In the OpenBEL Framework, modifiers such as activities (kinaseActivity, etc.) and transformations (translocations,
degradations, etc.) were represented as their own nodes. In PyBEL, these modifiers are represented as a property
of the edge. In reality, an edge like sec(p(hgnc:A)) -> activity(p(hgnc:B), ma(kinaseActivity))
represents
a connection between hgnc:A
and hgnc:B
. Each of these modifiers explains the context of the relationship
between these physical entities. Further, querying a network where these modifiers are part of a relationship
is much more straightforward. For example, finding all proteins that are upregulated by the kinase activity of another
protein now can be directly queried by filtering all edges for those with a subject modifier whose modification is
molecular activity, and whose effect is kinase activity. Having fewer nodes also allows for a much easier display
and visual interpretation of a network. The information about the modifier on the subject and activity can be displayed
as a color coded source and terminus of the connecting edge.
The compiler in OpenBEL framework created nodes for molecular activities like kin(p(hgnc:YFG))
and induced an
edge like p(hgnc:YFG) actsIn kin(p(hgnc:YFG))
. For transformations, a statement like
tloc(p(hgnc:YFG), GO:intracellular, GO:"cell membrane")
also induced
tloc(p(hgnc:YFG), GO:intracellular, GO:"cell membrane") translocates p(hgnc:YFG)
.
In PyBEL, we recognize that these modifications are actually annotations to the type of relationship between the
subject’s entity and the object’s entity. p(hgnc:ABC) -> tloc(p(hgnc:YFG), GO:intracellular, GO:"cell membrane")
is about the relationship between p(hgnc:ABC)
and p(hgnc:YFG)
, while
the information about the translocation qualifies that the object is undergoing an event, and not just the abundance.
This is a confusion with the use of proteinAbundance
as a keyword, and perhaps is why many people prefer to use
just the keyword p
Example Edge Data Structure
Because this data is associated with an edge, the node data for the subject and object are not included explicitly. However, information about the activities, modifiers, and transformations on the subject and object are included. Below is the “skeleton” for the edge data model in PyBEL:
from pybel.constants import *
{
SUBJECT: {
# ... modifications to the subject node. Only present if non-empty.
},
RELATION: POSITIVE_CORRELATION,
OBJECT: {
# ... modifications to the object node. Only present if non-empty.
},
EVIDENCE: ...,
CITATION : {
CITATION_TYPE: CITATION_TYPE_PUBMED,
CITATION_REFERENCE: ...,
CITATION_DATE: 'YYYY-MM-DD',
CITATION_AUTHORS: 'Jon Snow|John Doe',
},
ANNOTATIONS: {
'Disease': {
'Colorectal Cancer': True,
},
# ... additional annotations as tuple[str,dict[str,bool]] pairs
},
}
Each edge must contain the RELATION
, EVIDENCE
, and CITATION
entries. The CITATION
must minimally contain CITATION_TYPE
and CITATION_REFERENCE
since these can be used to look up additional
metadata.
Note
Since version 0.10.2, annotations now always appear as dictionaries, even if only one value is present.
Activities
Modifiers are added to this structure as well. Under this schema,
p(hgnc:GSK3B, pmod(P, S, 9)) pos act(p(hgnc:GSK3B), ma(kin))
becomes:
from pybel.constants import *
{
RELATION: POSITIVE_CORRELATION,
OBJECT: {
MODIFIER: ACTIVITY,
EFFECT: {
NAME: 'kin',
NAMESPACE: BEL_DEFAULT_NAMESPACE,
}
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
Activities without molecular activity annotations do not contain an pybel.constants.EFFECT
entry: Under this
schema, p(hgnc:GSK3B, pmod(P, S, 9)) pos act(p(hgnc:GSK3B))
becomes:
from pybel.constants import *
{
RELATION: POSITIVE_CORRELATION,
OBJECT: {
MODIFIER: ACTIVITY
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
Locations
Locations.
Location data also is added into the information in the edge for the node (subject or object) for which it was
annotated. p(HGNC:GSK3B, pmod(P, S, 9), loc(GO:lysozome)) pos act(p(HGNC:GSK3B), ma(kin))
becomes:
from pybel.constants import *
{
SUBJECT: {
LOCATION: {
NAMESPACE: 'GO',
NAME: 'lysozome',
}
},
RELATION: POSITIVE_CORRELATION,
OBJECT: {
MODIFIER: ACTIVITY,
EFFECT: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'kin',
}
},
EVIDENCE: ...,
CITATION: { ... },
}
The addition of the location()
element in BEL 2.0 allows for the unambiguous expression of the differences
between the process of hypothetical HGNC:A
moving from one place to another and the existence of
hypothetical HGNC:A
in a specific location having different effects. In BEL 1.0, this action had its own node,
but this introduced unnecessary complexity to the network and made querying more difficult.
This calls for thoughtful consideration of the following two statements:
tloc(p(HGNC:A), fromLoc(GO:intracellular), toLoc(GO:"cell membrane")) -> p(HGNC:B)
p(HGNC:A, location(GO:"cell membrane")) -> p(HGNC:B)
See also
BEL 2.0 specification on cellular location (2.2.4)
PyBEL module
pybel.parser.modifiers.get_location_language
Translocations
Translocations have their own unique syntax. p(hgnc:YFG1) -> sec(p(hgnc:YFG2))
becomes:
from pybel.constants import *
{
RELATION: INCREASES,
OBJECT: {
MODIFIER: TRANSLOCATION,
EFFECT: {
FROM_LOC: {
NAMESPACE: 'GO',
NAME: 'intracellular',
},
TO_LOC: {
NAMESPACE: 'GO',
NAME: 'extracellular space',
}
}
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
See also
BEL 2.0+ tutorial on translocations
Degradations
Degradations are more simple, because there’s no :pybel.constants.EFFECT
entry.
p(hgnc:YFG1) -> deg(p(hgnc:YFG2))
becomes:
from pybel.constants import *
{
RELATION: INCREASES,
OBJECT: {
MODIFIER: DEGRADATION,
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
Warning
Degradations only provide syntax sugar and will be automatically upgraded in a future version of PyBEL such that:
deg(X) -> Y
is upgraded toX -| Y
deg(X) -| Y
is upgraded toX -> Y
deg(X) => Y
is upgraded toX =| Y
deg(X) cnc Y
is upgraded toX cnc Y
X -> deg(Y)
is upgraded toX -| Y
X => deg(Y)
is upgraded toX =| Y
X cnc deg(Y)
is upgraded toX cnc Y
X -| deg(Y)
is undefined