PyBEL 0.14.9 Documentation¶
Biological Expression Language (BEL) is a domain-specific language that enables the expression of complex molecular relationships and their context in a machine-readable form. Its simple grammar and expressive power have led to its successful use in the to describe complex disease networks with several thousands of relationships.
PyBEL is a pure Python software package that parses BEL documents, validates their semantics, and facilitates data interchange between common formats and database systems like JSON, CSV, Excel, SQL, CX, and Neo4J. Its companion package, PyBEL-Tools, contains a library of functions for analysis of biological networks. For result-oriented guides, see the PyBEL-Notebooks repository.
Installation is as easy as getting the code from PyPI with
python3 -m pip install pybel
. See the installation documentation.
For citation information, see the references page.
PyBEL is tested on Python 3.5+ on Mac OS and Linux using Travis CI as well as on Windows using AppVeyor.
See also
Overview¶
Background on Systems Biology Modeling¶
Biological Expression Language (BEL)¶
Biological Expression Language (BEL) is a domain specific language that enables the expression of complex molecular relationships and their context in a machine-readable form. Its simple grammar and expressive power have led to its successful use to describe complex disease networks with several thousands of relationships. For a detailed explanation, see the BEL 1.0 and 2.0 specifications.
OpenBEL Links¶
OpenBEL on Google Groups
OpenBEL Wiki
OpenBEL on GitHub
Chat on Gitter
Design Considerations¶
Missing Namespaces and Improper Names¶
The use of openly shared controlled vocabularies (namespaces) within BEL facilitates the exchange and consistency of
information. Finding the correct namespace:name
pair is often a difficult part of the curation process.
Outdated Namespaces¶
OpenBEL provides a variety of namespaces covering each of the BEL function types. These namespaces are generated by code found at https://github.com/OpenBEL/resource-generator and distributed at http://resources.openbel.org/belframework/.
This code has not been maintained to reflect the changes in the underlying resources, so this repository has been forked and updated at https://github.com/pybel/resource-generator to reflect the most recent versions of the underlying namespaces. The files are now distributed using the Fraunhofer SCAI Artifactory server.
Generating New Namespaces¶
In some cases, it is appropriate to design a new namespace, using the custom namespace specification provided by the OpenBEL Framework. Packages for generating namespace, annotation, and knowledge resources have been grouped in the Bio2BEL organization on GitHub.
Synonym Issues¶
Due to the huge number of terms across many namespaces, it’s difficult for curators to know the domain-specific synonyms that obscure the controlled/preferred term. However, the issue of synonym resolution and semantic searching has already been generally solved by the use of ontologies. Besides just a controlled vocabulary, they also a hierarchical model of knowledge, synonyms with cross-references to databases and other ontologies, and other information semantic reasoning. Ontologies in the biomedical domain can be found at OBO and EMBL-EBI OLS.
Additionally, as a tool for curators, the EMBL Ontology Lookup Service (OLS) allows for semantic searching. Simple queries for the terms ‘mitochondrial dysfunction’ and ‘amyloid beta-peptides’ immediately returned results from relevant ontologies, and ended a long debate over how to represent these objects within BEL. EMBL-EBI also provides a programmatic API to the OLS service, for searching terms (http://www.ebi.ac.uk/ols/api/search?q=folic%20acid) and suggesting resolutions (http://www.ebi.ac.uk/ols/api/suggest?q=folic+acid)
Implementation¶
PyBEL is implemented using the PyParsing module. It provides flexibility and incredible speed in parsing compared to regular expression implementation. It also allows for the addition of parsing action hooks, which allow the graph to be checked semantically at compile-time.
It uses SQLite to provide a consistent and lightweight caching system for external data, such as namespaces, annotations, ontologies, and SQLAlchemy to provide a cross-platform interface. The same data management system is used to store graphs for high-performance querying.
Extensions to BEL¶
The PyBEL compiler is fully compliant with both BEL v1.0 and v2.0 and automatically upgrades legacy statements. Additionally, PyBEL includes several additions to the BEL specification to enable expression of important concepts in molecular biology that were previously missing and to facilitate integrating new data types. A short example is the inclusion of protein oxidation in the default BEL namespace for protein modifications. Other, more elaborate additions are outlined below.
Syntax for Epigenetics¶
PyBEL introduces the gene modification function, gmod(), as a syntax for encoding epigenetic modifications. Its usage mirrors the pmod() function for proteins and includes arguments for methylation.
For example, the methylation of NDUFB6 was found to be negatively correlated with its expression in a study of insulin resistance and Type II diabetes. This can now be expressed in BEL such as in the following statement:
g(HGNC:NDUFB6, gmod(Me)) negativeCorrelation r(HGNC:NDUFB6)
References:
Note
This syntax is currently under consideration as BEP-0006.
Definition of Namespaces as Regular Expressions¶
BEL imposes the constraint that each identifier must be qualified with an enumerated namespace to enable semantic interoperability and data integration. However, enumerating a namespace with potentially billions of names, such as dbSNP, poses a computational issue. PyBEL introduces syntax for defining namespaces with a consistent pattern using a regular expression to overcome this issue. For these namespaces, semantic validation can be perform in post-processing against the underlying database. The dbSNP namespace can be defined with a syntax familiar to BEL annotation definitions with regular expressions as follows:
DEFINE NAMESPACE dbSNP AS PATTERN "rs[0-9]+"
Note
This syntax was proposed with BEP-0005 and has been officially accepted as part of the BEL 2.1 specification.
Definition of Resources using OWL¶
Previous versions of PyBEL until 0.11.2 had an alternative namespace definition. Now it is recommended to either
generate namespace files with reproducible build scripts following the Bio2BEL framework, or to directly add them to
the database with the Bio2BEL bio2bel.manager.namespace_manager.NamespaceManagerMixin
extension.
Things to Consider¶
Do All Statements Need Supporting Text?¶
Yes! All statements must be minimally qualified with a citation and evidence (now called SupportingText in BEL 2.0) to maintain provenance. Statements without evidence can’t be traced to their source or evaluated independently from the curator, so they are excluded.
Multiple Annotations¶
All single annotations are considered as single element sets. When multiple annotations are present, all are unioned and attached to a given edge.
SET Citation = {"PubMed","Example Article","12345"}
SET ExampleAnnotation1 = {"Example Value 11", "Example Value 12"}
SET ExampleAnnotation2 = {"Example Value 21", "Example Value 22"}
p(HGNC:YFG1) -> p(HGNC:YFG2)
Namespace and Annotation Name Choices¶
*.belns
and *.belanno
configuration files include an entry called “Keyword” in their respective
[Namespace] and [AnnotationDefinition] sections. To maintain understandability between BEL documents, PyBEL
warns when the names given in *.bel
documents do not match their respective resources. For now, capitalization
is not considered, but in the future, PyBEL will also warn when capitalization is not properly stylized, like forgetting
the lowercase ‘h’ in “ChEMBL”.
Why Not Nested Statements?¶
BEL has different relationships for modeling direct and indirect causal relations.
Direct¶
A => B
means that A directly increases B through a physical process.A =| B
means that A directly decreases B through a physical process.
Indirect¶
The relationship between two entities can be coded in BEL, even if the process is not well understood.
A -> B
means that A indirectly increases B. There are hidden elements in X that mediate this interaction through a pathway direct interactionsA (=> or =|) X_1 (=> or =|) ... X_n (=> or =|) B
, or through a set of multiple pathways that constitute a network.A -| B
means that A indirectly decreases B. Like forA -> B
, this process involves hidden components with varying activities.
Increasing Nested Relationships¶
BEL also allows object of a relationship to be another statement.
A => (B => C)
means that A increases the process by which B increases C. The example in the BEL Specp(HGNC:GATA1) => (act(p(HGNC:ZBTB16)) => r(HGNC:MPL))
represents GATA1 directly increasing the process by which ZBTB16 directly increases MPL. Before, directly increasing was used to specify physical contact, so it’s reasonable to conclude thatp(HGNC:GATA1) => act(p(HGNC:ZBTB16))
. The specification cites examples when B is an activity that only is affected in the context of A and C. This complicated enough that it is both impractical to standardize during curation, and impractical to represent in a network.A -> (B => C)
can be interpreted by assuming that A indirectly increases B, and because of monotonicity, conclude thatA -> C
as well.A => (B -> C)
is more difficult to interpret, because it does not describe which part of processB -> C
is affected by A or how. Is it thatA => B
, andB => C
, so we concludeA -> C
, or does it mean something else? Perhaps A impacts a different portion of the hidden process inB -> C
. These statements are ambiguous enough that they should be written as justA => B
, andB -> C
. If there is no literature evidence for the statementA -> C
, then it is not the job of the curator to make this inference. Identifying statements of this might be the goal of a bioinformatics analysis of the BEL network after compilation.A -> (B -> C)
introduces even more ambiguity, and it should not be used.A => (B =| C)
states A increases the process by which B decreases C. One interpretation of this statement might be thatA => B
andB =| C
. An analysis could inferA -| C
. Statements in the form ofA -> (B =| C)
can also be resolved this way, but with added ambiguity.
Decreasing Nested Relationships¶
While we could agree on usage for the previous examples, the decrease of a nested statement introduces an unreasonable amount of ambiguity.
A =| (B => C)
could mean A decreases B, and B also increases C. Does this mean A decreases C, or does it mean that C is still increased, but just not as much? Which of these statements takes precedence? Or do their effects cancel? The same can be said aboutA -| (B => C)
, and with added ambiguity for indirect increasesA -| (B -> C)
A =| (B =| C)
could mean that A decreases B and B decreases C. We could conclude that A increases C, or could we again run into the problem of not knowing the precedence? The same is true for the indirect versions.
Recommendations for Use in PyBEL¶
After considering the ambiguity of nested statements to be a great risk to clarity, and PyBEL disables the usage of nested statements by default. See the Input and Output section for different parser settings. At Fraunhofer SCAI, curators resolved these statements to single statements to improve the precision and readability of our BEL documents.
While most statements in the form A rel1 (B rel2 C)
can be reasonably expanded to A rel1 B
and
B rel2 C
, the few that cannot are the difficult-to-interpret cases that we need to be careful about in our
curation and later analyses.
Why Not RDF?¶
Current bel2rdf serialization tools build URLs with the OpenBEL Framework domain as a namespace, rather than respect the original namespaces of original entities. This does not follow the best practices of the semantic web, where URL’s representing an object point to a real page with additional information. For example, UniProt does an exemplary job of this. Ultimately, using non-standard URLs makes harmonizing and data integration difficult.
Additionally, the RDF format does not easily allow for the annotation of edges. A simple statement in BEL that one protein up-regulates another can be easily represented in a triple in RDF, but when the annotations and citation from the BEL document need to be included, this forces RDF serialization to use approaches like representing the statement itself as a node. RDF was not intended to represent this type of information, but more properly for locating resources (hence its name). Furthermore, many blank nodes are introduced throughout the process. This makes RDF incredibly difficult to understand or work with. Later, writing queries in SPARQL becomes very difficult because the data format is complicated and the language is limited. For example, it would be incredibly complicated to write a query in SPARQL to get the objects of statements from publications by a certain author.
Installation¶
The latest stable code can be installed from PyPI with:
$ python3 -m pip install pybel
The most recent code can be installed from the source on GitHub with:
$ python3 -m pip install git+https://github.com/pybel/pybel.git
For developers, the repository can be cloned from GitHub and installed in editable mode with:
$ git clone https://github.com/pybel/pybel.git
$ cd pybel
$ python3 -m pip install -e .
Extras¶
The setup.py
makes use of the extras_require
argument of setuptools.setup()
in order to make some heavy
packages that support special features of PyBEL optional to install, in order to make the installation more lean by
default. A single extra can be installed from PyPI like python3 -m pip install pybel[neo4j]
or multiple can
be installed using a list like python3 -m pip install pybel[neo4j,indra]
. Likewise, for developer
installation, extras can be installed in editable mode with python3 -m pip install -e .[neo4j]
or multiple can
be installed using a list like python3 -m pip install -e .[neo4j,indra]
. The available extras are:
neo4j¶
This extension installs the py2neo
package to support upload and download to Neo4j databases.
See also
indra¶
This extra installs support for indra
, the integrated network dynamical reasoner and assembler. Because it also
represents biology in BEL-like statements, many statements from PyBEL can be converted to INDRA, and visa-versa. This
package also enables the import of BioPAX, SBML, and SBGN into BEL.
See also
pybel.from_indra_pickle()
pybel.to_indra()
jupyter¶
This extra installs support for visualizing BEL graphs in Jupyter notebooks.
See also
pybel.io.jupyter.to_html()
Caveats¶
PyBEL extends the
networkx
for its core data structure. Many of the graphical aspects ofnetworkx
depend onmatplotlib
, which is an optional dependency.If
HTMLlib5
is installed, the test that’s supposed to fail on a web page being missing actually tries to parse it as RDFa, and doesn’t fail. Disregard this.
Upgrading¶
During the current development cycle, programmatic access to the definition and graph caches might become unstable. If you have any problems working with the database, try removing it with one of the following commands:
Running
pybel manage drop
(unix)Running
python3 -m pybel manage drop
(windows)Removing the folder
~/.pybel
PyBEL will build a new database and populate it on the next run.
Data Model¶
The pybel.struct
module houses functions for handling the main data structure in PyBEL.
Because BEL expresses how biological entities interact within many
different contexts, with descriptive annotations, PyBEL represents data as a directed multi-graph by sub-classing the
networkx.MultiDiGraph
. Each node is an instance of a subclass of the pybel.dsl.BaseEntity
and each
edge has a stable key and associated data dictionary for storing relevant contextual information.
The graph contains metadata for the PyBEL version, the BEL script metadata, the namespace definitions, the
annotation definitions, and the warnings produced in analysis. Like any networkx
graph, all attributes of
a given object can be accessed through the graph
property, like in: my_graph.graph['my key']
.
Convenient property definitions are given for these attributes that are outlined in the documentation for
pybel.BELGraph
.
This allows for much easier programmatic access to answer more complicated questions, which can be written with python
code. Because the data structure is the same in Neo4J, the data can be directly exported with pybel.to_neo4j()
.
Neo4J supports the Cypher querying language so that the same queries can be written in an elegant and simple way.
Constants¶
These documents refer to many aspects of the data model using constants, which can be found in the top-level module
pybel.constants
.
Terms describing abundances, annotations, and other internal data are designated in pybel.constants
with full-caps, such as pybel.constants.FUNCTION
and pybel.constants.PROTEIN
.
For normal usage, we suggest referring to values in dictionaries by these constants, in case the hard-coded strings behind these constants change.
Function Nomenclature¶
The following table shows PyBEL’s internal mapping from BEL functions to its own constants. This can be accessed
programatically via pybel.parser.language.abundance_labels
.
BEL Function |
PyBEL Constant |
PyBEL DSL |
---|---|---|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
Graph¶
-
class
pybel.
BELGraph
(name=None, version=None, description=None, authors=None, contact=None, license=None, copyright=None, disclaimer=None, path=None)[source]¶ An extension to
networkx.MultiDiGraph
to represent BEL.Initialize a BEL graph with its associated metadata.
- Parameters
version (
Optional
[str
]) – The graph’s version. Recommended to use semantic versioning orYYYYMMDD
format.
-
__add__
(other)[source]¶ Copy this graph and join it with another graph with it using
pybel.struct.left_full_join()
.Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> k = g + h
-
__iadd__
(other)[source]¶ Join another graph into this one, in-place, using
pybel.struct.left_full_join()
.Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> g += h
-
__and__
(other)[source]¶ Create a deep copy of this graph and left outer joins another graph.
Uses
pybel.struct.left_outer_join()
.Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> k = g & h
-
__iand__
(other)[source]¶ Join another graph into this one, in-place, using
pybel.struct.left_outer_join()
.Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> g &= h
-
property
name
¶ The graph’s name.
Hint
Can be set with the
SET DOCUMENT Name = "..."
entry in the source BEL script.
-
property
version
¶ The graph’s version.
Hint
Can be set with the
SET DOCUMENT Version = "..."
entry in the source BEL script.
-
property
description
¶ The graph’s description.
Hint
Can be set with the
SET DOCUMENT Description = "..."
entry in the source BEL document.
The graph’s authors.
Hint
Can be set with the
SET DOCUMENT Authors = "..."
entry in the source BEL document.
-
property
contact
¶ The graph’s contact information.
Hint
Can be set with the
SET DOCUMENT ContactInfo = "..."
entry in the source BEL document.
-
property
license
¶ The graph’s license.
Hint
Can be set with the
SET DOCUMENT Licenses = "..."
entry in the source BEL document
-
property
copyright
¶ The graph’s copyright.
Hint
Can be set with the
SET DOCUMENT Copyright = "..."
entry in the source BEL document
-
property
disclaimer
¶ The graph’s disclaimer.
Hint
Can be set with the
SET DOCUMENT Disclaimer = "..."
entry in the source BEL document.
-
property
namespace_url
¶ The mapping from the keywords used in this graph to their respective BEL namespace URLs.
Hint
Can be appended with the
DEFINE NAMESPACE [key] AS URL "[value]"
entries in the definitions section of the source BEL document.
-
property
defined_namespace_keywords
¶ The set of all keywords defined as namespaces in this graph.
-
property
namespace_pattern
¶ The mapping from the namespace keywords used to create this graph to their regex patterns.
Hint
Can be appended with the
DEFINE NAMESPACE [key] AS PATTERN "[value]"
entries in the definitions section of the source BEL document.
-
property
annotation_url
¶ The mapping from the annotation keywords used to create this graph to the URLs of the BELANNO files.
Hint
Can be appended with the
DEFINE ANNOTATION [key] AS URL "[value]"
entries in the definitions section of the source BEL document.
-
property
annotation_pattern
¶ The mapping from the annotation keywords used to create this graph to their regex patterns as strings.
Hint
Can be appended with the
DEFINE ANNOTATION [key] AS PATTERN "[value]"
entries in the definitions section of the source BEL document.
-
property
annotation_list
¶ The mapping from the keywords of locally defined annotations to their respective sets of values.
Hint
Can be appended with the
DEFINE ANNOTATION [key] AS LIST {"[value]", ...}
entries in the definitions section of the source BEL document.
-
property
defined_annotation_keywords
¶ Get the set of all keywords defined as annotations in this graph.
-
property
pybel_version
¶ The version of PyBEL with which this graph was produced as a string.
- Return type
-
property
warnings
¶ A list of warnings associated with this graph.
-
number_of_citations
()[source]¶ Return the number of citations contained within the graph.
- Return type
Return the number of authors contained within the graph.
- Return type
-
add_unqualified_edge
(u, v, relation)[source]¶ Add a unique edge that has no annotations.
- Parameters
u (
BaseEntity
) – The source nodev (
BaseEntity
) – The target noderelation (
str
) – A relationship label frompybel.constants
- Return type
- Returns
The key for this edge (a unique hash)
-
add_transcription
(gene, rna)[source]¶ Add a transcription relation from a gene to an RNA or miRNA node.
-
add_translation
(rna, protein)[source]¶ Add a translation relation from a RNA to a protein.
- Parameters
rna (
Rna
) – An RNA nodeprotein (
Protein
) – A protein node
- Return type
-
add_equivalence
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add two equivalence relations for the nodes.
-
add_orthology
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add two orthology relations for the nodes such that
u orthologousTo v
andv orthologousTo u
.
-
add_is_a
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'isA') → str¶ Add an
isA
relationship such thatu isA v
.
-
add_part_of
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'partOf') → str¶ Add a
partOf
relationship such thatu partOf v
.
-
add_has_variant
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasVariant') → str¶ Add a
hasVariant
relationship such thatu hasVariant v
.
-
add_has_reactant
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasReactant') → str¶ Add a
hasReactant
relationship such thatu hasReactant v
.
-
add_has_product
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *, relation: str = 'hasProduct') → str¶ Add a
hasProduct
relationship such thatu hasProduct v
.
-
add_qualified_edge
(u, v, *, relation, evidence, citation, annotations=None, subject_modifier=None, object_modifier=None, **attr)[source]¶ Add a qualified edge.
Qualified edges have a relation, evidence, citation, and optional annotations, subject modifications, and object modifications.
- Parameters
u – The source node
v – The target node
relation (
str
) – The type of relation this edge representsevidence (
str
) – The evidence string from an articlecitation (
Union
[str
,Tuple
[str
,str
],CitationDict
]) – The citation data dictionary for this evidence. If a string is given, assumes it’s a PubMed identifier and auto-fills the citation type.annotations (
Union
[Mapping
[str
,str
],Mapping
[str
,Set
[str
]],Mapping
[str
,Mapping
[str
,bool
]],None
]) – The annotations data dictionarysubject_modifier (
Optional
[Mapping
]) – The modifiers (like activity) on the subject node. See data model documentation.object_modifier (
Optional
[Mapping
]) – The modifiers (like activity) on the object node. See data model documentation.
- Return type
- Returns
The hash of the edge
-
add_binds
(u, v, *, evidence, citation, annotations=None, **attr)[source]¶ Add a “binding” relationship between the two entities such that
u => complex(u, v)
.- Return type
-
add_increases
(u, v, *, relation: str = 'increases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str¶ Wrap
add_qualified_edge()
for thepybel.constants.INCREASES
relation.
-
add_directly_increases
(u, v, *, relation: str = 'directlyIncreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str¶ Add a
pybel.constants.DIRECTLY_INCREASES
withadd_qualified_edge()
.
-
add_decreases
(u, v, *, relation: str = 'decreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str¶ Add a
pybel.constants.DECREASES
relationship withadd_qualified_edge()
.
-
add_directly_decreases
(u, v, *, relation: str = 'directlyDecreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str¶ Add a
pybel.constants.DIRECTLY_DECREASES
relationship withadd_qualified_edge()
.
-
add_association
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add a
pybel.constants.ASSOCIATION
relationship withadd_qualified_edge()
.
-
add_regulates
(u, v, *, relation: str = 'regulates', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str¶ Add a
pybel.constants.REGULATES
relationship withadd_qualified_edge()
.
-
add_correlation
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add a
pybel.constants.CORRELATION
relationship withadd_qualified_edge()
.
-
add_no_correlation
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add a
pybel.constants.NO_CORRELATION
relationship withadd_qualified_edge()
.
-
add_positive_correlation
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add a
pybel.constants.POSITIVE_CORRELATION
relationship withadd_qualified_edge()
.
-
add_negative_correlation
(u: pybel.dsl.node_classes.BaseEntity, v: pybel.dsl.node_classes.BaseEntity, *args, **kwargs) → str¶ Add a
pybel.constants.NEGATIVE_CORRELATION
relationship withadd_qualified_edge()
.
-
add_causes_no_change
(u, v, *, relation: str = 'causesNoChange', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr) → str¶ Add a
pybel.constants.CAUSES_NO_CHANGE
relationship withadd_qualified_edge()
.
-
add_inhibits
(u, v, *, relation: str = 'decreases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = {'modifier': 'Activity'}, **attr) → str¶ Add an “inhibits” relationship.
A more specific version of
add_decreases()
that automatically populates the object modifier with an activity.
-
add_activates
(u, v, *, relation: str = 'increases', evidence: str, citation: Union[str, Tuple[str, str], pybel.utils.CitationDict], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = {'modifier': 'Activity'}, **attr) → str¶ Add an “activates” relationship.
A more specific version of
add_increases()
that automatically populates the object modifier with an activity.
-
add_phosphorylates
(u, v, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr)¶ Add an increase of modified object with phosphorylation.
-
add_directly_phosphorylates
(u, v, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr)¶ Add a direct increase of modified object with phosphorylation.
-
add_dephosphorylates
(u, v, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr)¶ Add a decrease of modified object with phosphorylation.
-
add_directly_dephosphorylates
(u, v, code: Optional[str] = None, position: Optional[int] = None, *, evidence: str, citation: Union[str, Mapping[str, str]], annotations: Union[Mapping[str, str], Mapping[str, Set[str]], Mapping[str, Mapping[str, bool]], None] = None, subject_modifier: Optional[Mapping] = None, object_modifier: Optional[Mapping] = None, **attr)¶ Add a direct decrease of modified object with phosphorylation.
-
get_edge_citation
(u, v, key)[source]¶ Get the citation for a given edge.
- Return type
Optional
[CitationDict
]
-
static
edge_to_bel
(u, v, edge_data, sep=None, use_identifiers=False)[source]¶ Serialize a pair of nodes and related edge data as a BEL relation.
- Return type
-
iter_equivalent_nodes
(node)[source]¶ Iterate over nodes that are equivalent to the given node, including the original.
- Return type
Iterable
[BaseEntity
]
-
get_equivalent_nodes
(node)[source]¶ Get a set of equivalent nodes to this node, excluding the given node.
- Return type
Set
[BaseEntity
]
-
node_has_namespace
(node, namespace)[source]¶ Check if the node have the given namespace.
This also should look in the equivalent nodes.
- Return type
Nodes¶
Nodes (or entities) in a pybel.BELGraph
represent physical entities’ abundances. Most contain information
about the identifier for the entity using a namespace/name pair. The PyBEL parser converts BEL terms to an internal
representation using an internal domain specific language (DSL) that allows for writing BEL directly in Python.
For example, after the BEL term p(HGNC:GSK3B)
is parsed, it is instantiated as a Python object using the
DSL function corresponding to the p()
function in BEL, pybel.dsl.Protein
, like:
from pybel.dsl import Protein
gsk3b_protein = Protein(namespace='HGNC', name='GSK3B')
pybel.dsl.Protein
, like the others mentioned before, inherit from pybel.dsl.BaseEntity
, which itself
inherits from dict
. Therefore, the resulting object can be used like a dict that looks like:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
}
Alternatively, it can be used in more exciting ways, outlined later in the documentation for pybel.dsl
.
Variants¶
The addition of a variant tag results in an entry called ‘variants’ in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry ‘kind’ to identify whether it is a post-translational modification (PTM), gene modification, fragment, or HGVS variant.
Warning
The canonical ordering for the elements of the VARIANTS
list correspond to the sorted
order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list()
. Rather than
directly modifying the BELGraph’s structure, use pybel.BELGraph.add_node_from_data()
, which takes care of
automatically canonicalizing this dictionary.
HGVS Variants.
For example, the BEL term p(HGNC:GSK3B, var(p.Gly123Arg))
is translated to the following internal DSL:
from pybel.dsl import Protein, Hgvs
gsk3b_variant = Protien(namespace='HGNC', name='GSK3B', variants=Hgvs('p.Gly123Arg'))
Further, the shorthand for protein substitutions, pybel.dsl.ProteinSubstitution
, can be used to produce the
same result, as it inherits from pybel.dsl.Hgvs
:
from pybel.dsl import Protein, ProteinSubstitution
gsk3b_variant = Protien(namespace='HGNC', name='GSK3B', variants=ProteinSubstitution('Gly', 123, 'Arg'))
Either way, the resulting object can be used like a dict that looks like:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'p.Gly123Arg',
},
],
}
See also
BEL 2.0 specification on variants
HGVS conventions
PyBEL module
pybel.parser.modifiers.get_hgvs_language
Gene Substitutions¶
Gene Substitutions.
Gene substitutions are legacy statements defined in BEL 1.0. BEL 2.0 recommends using HGVS strings. Luckily,
the information contained in a BEL 1.0 encoding, such as g(HGNC:APP,sub(G,275341,C))
can be
automatically translated to the appropriate HGVS g(HGNC:APP, var(c.275341G>C))
, assuming that all
substitutions are using the reference coding gene sequence for numbering and not the genomic reference.
The previous statements both produce the underlying data:
from pybel.constants import *
{
FUNCTION: GENE,
NAMESPACE: 'HGNC',
NAME: 'APP',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'c.275341G>C',
},
],
}
See also
BEL 2.0 specification on gene substitutions
PyBEL module
pybel.parser.modifiers.get_gene_substitution_language
Gene Modifications¶
Gene Modifications.
PyBEL introduces the gene modification tag, gmod(), to allow for the encoding of epigenetic modifications. Its syntax follows the same style s the pmod() tags for proteins, and can include the following values:
M
Me
methylation
A
Ac
acetylation
For example, the node g(HGNC:GSK3B, gmod(M))
is represented with the following:
from pybel.constants import *
{
FUNCTION: GENE,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: GMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE,
NAME: 'Me',
},
},
],
}
The addition of this function does not preclude the use of all other standard functions in BEL; however, other compilers probably won’t support these standards. If you agree that this is useful, please contribute to discussion in the OpenBEL community.
See also
PyBEL module
pybel.parser.modifiers.get_gene_modification_language()
Protein Substitutions¶
Protein Substitutions.
Protein substitutions are legacy statements defined in BEL 1.0. BEL 2.0 recommends using HGVS strings. Luckily,
the information contained in a BEL 1.0 encoding, such as p(HGNC:APP,sub(R,275,H))
can be
automatically translated to the appropriate HGVS p(HGNC:APP, var(p.Arg275His))
, assuming that all
substitutions are using the reference protein sequence for numbering and not the genomic reference.
The previous statements both produce the underlying data:
from pybel.constants import *
{
FUNCTION: GENE,
NAMESPACE: 'HGNC',
NAME: 'APP',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'p.Arg275His',
},
],
}
See also
BEL 2.0 specification on protein substitutions
PyBEL module
pybel.parser.modifiers.get_protein_substitution_language
Protein Modifications¶
Protein Modifications.
The addition of a post-translational modification (PTM) tag results in an entry called ‘variants’ in the data dictionary associated with a given node. This entry is a list with dictionaries describing each of the variants. All variants have the entry ‘kind’ to identify whether it is a PTM, gene modification, fragment, or HGVS variant. The ‘kind’ value for PTM is ‘pmod’.
Each PMOD contains an identifier, which is a dictionary with the namespace and name, and can optionally include the position (‘pos’) and/or amino acid code (‘code’).
For example, the node p(HGNC:GSK3B, pmod(P, S, 9))
is represented with the following:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: PMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'Ph',
},
PMOD_CODE: 'Ser',
PMOD_POSITION: 9,
},
],
}
As an additional example, in p(HGNC:MAPK1, pmod(Ph, Thr, 202), pmod(Ph, Tyr, 204))
, MAPK is phosphorylated
twice to become active. This results in the following:
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'MAPK1',
VARIANTS: [
{
KIND: PMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'Ph',
},
PMOD_CODE: 'Thr',
PMOD_POSITION: 202
},
{
KIND: PMOD,
IDENTIFIER: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'Ph',
},
PMOD_CODE: 'Tyr',
PMOD_POSITION: 204
}
]
}
See also
BEL 2.0 specification on protein modifications
PyBEL module
pybel.parser.modifiers.get_protein_modification_language
Protein Truncations¶
Truncations.
Truncations in the legacy BEL 1.0 specification are automatically translated to BEL 2.0 with HGVS nomenclature.
p(HGNC:AKT1, trunc(40))
becomes p(HGNC:AKT1, var(p.40*))
and is represented with the following
dictionary:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'AKT1',
VARIANTS: [
{
KIND: HGVS,
IDENTIFIER: 'p.40*',
},
],
}
Unfortunately, the HGVS nomenclature requires the encoding of the terminal amino acid which is exchanged
for a stop codon, and this information is not required by BEL 1.0. For this example, the proper encoding
of the truncation at position also includes the information that the 40th amino acid in the AKT1 is Cys. Its
BEL encoding should be p(HGNC:AKT1, var(p.Cys40*))
. Temporary support has been added to
compile these statements, but it’s recommended they are upgraded by reexamining the supporting text, or
looking up the amino acid sequence.
See also
BEL 2.0 specification on truncations
PyBEL module
pybel.parser.modifiers.get_truncation_language
Protein Fragments¶
Fragments.
The addition of a fragment results in an entry called pybel.constants.VARIANTS
in the data dictionary associated with a given node. This entry is a list with dictionaries
describing each of the variants. All variants have the entry pybel.constants.KIND
to identify whether it is
a PTM, gene modification, fragment, or HGVS variant. The pybel.constants.KIND
value for a fragment is
pybel.constants.FRAGMENT
.
Each fragment contains an identifier, which is a dictionary with the namespace and name, and can optionally include the position (‘pos’) and/or amino acid code (‘code’).
For example, the node p(HGNC:GSK3B, frag(45_129))
is represented with the following:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: FRAGMENT,
FRAGMENT_START: 45,
FRAGMENT_STOP: 129,
},
],
}
Additionally, nodes can have an asterick (*) or question mark (?) representing unbound or unknown fragments, respectively.
A fragment may also be unknown, such as in the node p(HGNC:GSK3B, frag(?))
. This
is represented with the key pybel.constants.FRAGMENT_MISSING
and the value of ‘?’ like:
from pybel.constants import *
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'GSK3B',
VARIANTS: [
{
KIND: FRAGMENT,
FRAGMENT_MISSING: '?',
},
],
}
See also
BEL 2.0 specification on proteolytic fragments (2.2.3)
PyBEL module
pybel.parser.modifiers.get_fragment_language
Fusions¶
Fusions.
Gene, RNA, miRNA, and protein fusions are all represented with the same underlying data structure. Below
it is shown with uppercase letters referring to constants from pybel.constants
and. For example,
g(HGNC:BCR, fus(HGNC:JAK2, 1875, 2626))
is represented as:
from pybel.constants import *
{
FUNCTION: GENE,
FUSION: {
PARTNER_5P: {NAMESPACE: 'HGNC', NAME: 'BCR'},
PARTNER_3P: {NAMESPACE: 'HGNC', NAME: 'JAK2'},
RANGE_5P: {
FUSION_REFERENCE: 'c',
FUSION_START: '?',
FUSION_STOP: 1875,
},
RANGE_3P: {
FUSION_REFERENCE: 'c',
FUSION_START: 2626,
FUSION_STOP: '?',
},
},
}
See also
BEL 2.0 specification on fusions (2.6.1)
PyBEL module
pybel.parser.modifiers.get_fusion_language
PyBEL module
pybel.parser.modifiers.get_legacy_fusion_language
Unqualified Edges¶
Unqualified edges are automatically inferred by PyBEL and do not contain citations or supporting evidence.
Variant and Modifications’ Parent Relations¶
All variants, modifications, fragments, and truncations are connected to their parent entity with an edge having
the relationship hasParent
.
For p(HGNC:GSK3B, var(p.Gly123Arg))
, the following edge is inferred:
p(HGNC:GSK3B, var(p.Gly123Arg)) hasParent p(HGNC:GSK3B)
All variants have this relationship to their reference node. BEL does not specify relationships between variants, such as the case when a given phosphorylation is necessary to make another one. This knowledge could be encoded directly like BEL, since PyBEL does not restrict users from manually asserting unqualified edges.
List Abundances¶
Complexes and composites that are defined by lists. As of version 0.9.0, they contain a list of the data dictionaries
that describe their members. For example complex(p(HGNC:FOS), p(HGNC:JUN))
becomes:
from pybel.constants import *
{
FUNCTION: COMPLEX,
MEMBERS: [
{
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'FOS',
}, {
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'JUN',
}
]
}
The following edges are also inferred:
complex(p(HGNC:FOS), p(HGNC:JUN)) hasMember p(HGNC:FOS)
complex(p(HGNC:FOS), p(HGNC:JUN)) hasMember p(HGNC:JUN)
See also
BEL 2.0 specification on complex abundances
Similarly, composite(a(CHEBI:malonate), p(HGNC:JUN))
becomes:
from pybel.constants import *
{
FUNCTION: COMPOSITE,
MEMBERS: [
{
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'malonate',
}, {
FUNCTION: PROTEIN,
NAMESPACE: 'HGNC',
NAME: 'JUN',
}
]
}
The following edges are inferred:
composite(a(CHEBI:malonate), p(HGNC:JUN)) hasComponent a(CHEBI:malonate)
composite(a(CHEBI:malonate), p(HGNC:JUN)) hasComponent p(HGNC:JUN)
Warning
The canonical ordering for the elements of the pybel.constantsMEMBERS
list correspond to the sorted
order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list()
. Rather than
directly modifying the BELGraph’s structure, use BELGraph.add_node_from_data()
, which takes care of
automatically canonicalizing this dictionary.
See also
BEL 2.0 specification on composite abundances
Reactions¶
The usage of a reaction causes many nodes and edges to be created. The following example will illustrate what is added to the network for
rxn(reactants(a(CHEBI:"(3S)-3-hydroxy-3-methylglutaryl-CoA"), a(CHEBI:"NADPH"), \
a(CHEBI:"hydron")), products(a(CHEBI:"mevalonate"), a(CHEBI:"NADP(+)")))
As of version 0.9.0, the reactants’ and products’ data dictionaries are included as sub-lists keyed REACTANTS
and
PRODUCTS
. It becomes:
from pybel.constants import *
{
FUNCTION: REACTION
REACTANTS: [
{
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: '(3S)-3-hydroxy-3-methylglutaryl-CoA'
}, {
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'NADPH',
}, {
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'hydron',
}
],
PRODUCTS: [
{
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'mevalonate',
}, {
FUNCTION: ABUNDANCE,
NAMESPACE: 'CHEBI',
NAME: 'NADP(+)',
}
]
}
Warning
The canonical ordering for the elements of the REACTANTS
and PRODUCTS
lists correspond to the sorted
order of their corresponding node tuples using pybel.parser.canonicalize.sort_dict_list()
. Rather than
directly modifying the BELGraph’s structure, use BELGraph.add_node_from_data()
, which takes care of
automatically canonicalizing this dictionary.
The following edges are inferred, where X
represents the previous reaction, for brevity:
X hasReactant a(CHEBI:"(3S)-3-hydroxy-3-methylglutaryl-CoA")
X hasReactant a(CHEBI:"NADPH")
X hasReactant a(CHEBI:"hydron")
X hasProduct a(CHEBI:"mevalonate")
X hasProduct a(CHEBI:"NADP(+)"))
See also
BEL 2.0 specification on reactions
Edges¶
Design Choices¶
In the OpenBEL Framework, modifiers such as activities (kinaseActivity, etc.) and transformations (translocations,
degradations, etc.) were represented as their own nodes. In PyBEL, these modifiers are represented as a property
of the edge. In reality, an edge like sec(p(HGNC:A)) -> activity(p(HGNC:B), ma(kinaseActivity))
represents
a connection between HGNC:A
and HGNC:B
. Each of these modifiers explains the context of the relationship
between these physical entities. Further, querying a network where these modifiers are part of a relationship
is much more straightforward. For example, finding all proteins that are upregulated by the kinase activity of another
protein now can be directly queried by filtering all edges for those with a subject modifier whose modification is
molecular activity, and whose effect is kinase activity. Having fewer nodes also allows for a much easier display
and visual interpretation of a network. The information about the modifier on the subject and activity can be displayed
as a color coded source and terminus of the connecting edge.
The compiler in OpenBEL framework created nodes for molecular activities like kin(p(HGNC:YFG))
and induced an
edge like p(HGNC:YFG) actsIn kin(p(HGNC:YFG))
. For transformations, a statement like
tloc(p(HGNC:YFG), GOCC:intracellular, GOCC:"cell membrane")
also induced
tloc(p(HGNC:YFG), GOCC:intracellular, GOCC:"cell membrane") translocates p(HGNC:YFG)
.
In PyBEL, we recognize that these modifications are actually annotations to the type of relationship between the
subject’s entity and the object’s entity. p(HGNC:ABC) -> tloc(p(HGNC:YFG), GOCC:intracellular, GOCC:"cell membrane")
is about the relationship between p(HGNC:ABC)
and p(HGNC:YFG)
, while
the information about the translocation qualifies that the object is undergoing an event, and not just the abundance.
This is a confusion with the use of proteinAbundance
as a keyword, and perhaps is why many people prefer to use
just the keyword p
Example Edge Data Structure¶
Because this data is associated with an edge, the node data for the subject and object are not included explicitly. However, information about the activities, modifiers, and transformations on the subject and object are included. Below is the “skeleton” for the edge data model in PyBEL:
from pybel.constants import *
{
SUBJECT: {
# ... modifications to the subject node. Only present if non-empty.
},
RELATION: POSITIVE_CORRELATION,
OBJECT: {
# ... modifications to the object node. Only present if non-empty.
},
EVIDENCE: ...,
CITATION : {
CITATION_TYPE: CITATION_TYPE_PUBMED,
CITATION_REFERENCE: ...,
CITATION_DATE: 'YYYY-MM-DD',
CITATION_AUTHORS: 'Jon Snow|John Doe',
},
ANNOTATIONS: {
'Disease': {
'Colorectal Cancer': True,
},
# ... additional annotations as tuple[str,dict[str,bool]] pairs
},
}
Each edge must contain the RELATION
, EVIDENCE
, and CITATION
entries. The CITATION
must minimally contain CITATION_TYPE
and CITATION_REFERENCE
since these can be used to look up additional
metadata.
Note
Since version 0.10.2, annotations now always appear as dictionaries, even if only one value is present.
Activities¶
Modifiers are added to this structure as well. Under this schema,
p(HGNC:GSK3B, pmod(P, S, 9)) pos act(p(HGNC:GSK3B), ma(kin))
becomes:
from pybel.constants import *
{
RELATION: POSITIVE_CORRELATION,
OBJECT: {
MODIFIER: ACTIVITY,
EFFECT: {
NAME: 'kin',
NAMESPACE: BEL_DEFAULT_NAMESPACE,
}
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
Activities without molecular activity annotations do not contain an pybel.constants.EFFECT
entry: Under this
schema, p(HGNC:GSK3B, pmod(P, S, 9)) pos act(p(HGNC:GSK3B))
becomes:
from pybel.constants import *
{
RELATION: POSITIVE_CORRELATION,
OBJECT: {
MODIFIER: ACTIVITY
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
Locations¶
Locations.
Location data also is added into the information in the edge for the node (subject or object) for which it was
annotated. p(HGNC:GSK3B, pmod(P, S, 9), loc(GO:lysozome)) pos act(p(HGNC:GSK3B), ma(kin))
becomes:
from pybel.constants import *
{
SUBJECT: {
LOCATION: {
NAMESPACE: 'GO',
NAME: 'lysozome',
}
},
RELATION: POSITIVE_CORRELATION,
OBJECT: {
MODIFIER: ACTIVITY,
EFFECT: {
NAMESPACE: BEL_DEFAULT_NAMESPACE
NAME: 'kin',
}
},
EVIDENCE: ...,
CITATION: { ... },
}
The addition of the location()
element in BEL 2.0 allows for the unambiguous expression of the differences
between the process of hypothetical HGNC:A
moving from one place to another and the existence of
hypothetical HGNC:A
in a specific location having different effects. In BEL 1.0, this action had its own node,
but this introduced unnecessary complexity to the network and made querying more difficult.
This calls for thoughtful consideration of the following two statements:
tloc(p(HGNC:A), fromLoc(GO:intracellular), toLoc(GO:"cell membrane")) -> p(HGNC:B)
p(HGNC:A, location(GO:"cell membrane")) -> p(HGNC:B)
See also
BEL 2.0 specification on cellular location (2.2.4)
PyBEL module
pybel.parser.modifiers.get_location_language
Translocations¶
Translocations have their own unique syntax. p(HGNC:YFG1) -> sec(p(HGNC:YFG2))
becomes:
from pybel.constants import *
{
RELATION: INCREASES,
OBJECT: {
MODIFIER: TRANSLOCATION,
EFFECT: {
FROM_LOC: {
NAMESPACE: 'GO',
NAME: 'intracellular',
},
TO_LOC: {
NAMESPACE: 'GO',
NAME: 'extracellular space',
}
}
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
See also
BEL 2.0 specification on translocations
Degradations¶
Degradations are more simple, because there’s no :pybel.constants.EFFECT
entry.
p(HGNC:YFG1) -> deg(p(HGNC:YFG2))
becomes:
from pybel.constants import *
{
RELATION: INCREASES,
OBJECT: {
MODIFIER: DEGRADATION,
},
CITATION: { ... },
EVIDENCE: ...,
ANNOTATIONS: { ... },
}
See also
BEL 2.0 specification on degradations
Example Networks¶
This directory contains example networks, precompiled as BEL graphs that are appropriate to use in examples.
An example describing EGF’s effect on cellular processes.
SET Citation = {"PubMed","Clin Cancer Res 2003 Jul 9(7) 2416-25","12855613"}
SET Evidence = "This induction was not seen either when LNCaP cells were treated with flutamide or conditioned medium were pretreated with antibody to the epidermal growth factor (EGF)"
SET Species = 9606
tscript(p(HGNC:AR)) increases p(HGNC:EGF)
UNSET ALL
SET Citation = {"PubMed","Int J Cancer 1998 Jul 3 77(1) 138-45","9639405"}
SET Evidence = "DU-145 cells treated with 5000 U/ml of IFNgamma and IFN alpha, both reduced EGF production with IFN gamma reduction more significant."
SET Species = 9606
p(HGNC:IFNA1) decreases p(HGNC:EGF)
p(HGNC:IFNG) decreases p(HGNC:EGF)
UNSET ALL
SET Citation = {"PubMed","DNA Cell Biol 2000 May 19(5) 253-63","10855792"}
SET Evidence = "Although found predominantly in the cytoplasm and, less abundantly, in the nucleus, VCP can be translocated from the nucleus after stimulation with epidermal growth factor."
SET Species = 9606
p(HGNC:EGF) increases tloc(p(HGNC:VCP), GO:nucleus, GO:cytoplasm)
UNSET ALL
SET Citation = {"PubMed","J Clin Oncol 2003 Feb 1 21(3) 447-52","12560433"}
SET Evidence = "Valosin-containing protein (VCP; also known as p97) has been shown to be associated with antiapoptotic function and metastasis via activation of the nuclear factor-kappaB signaling pathway."
SET Species = 9606
cat(p(HGNC:VCP)) increases tscript(complex(p(HGNC:NFKB1), p(HGNC:NFKB2), p(HGNC:REL), p(HGNC:RELA), p(HGNC:RELB)))
tscript(complex(p(HGNC:NFKB1), p(HGNC:NFKB2), p(HGNC:REL), p(HGNC:RELA), p(HGNC:RELB))) decreases bp(MESHPP:Apoptosis)
UNSET ALL
-
pybel.examples.
egf_graph
¶
Curation of the article “Genetics ignite focus on microglial inflammation in Alzheimer’s disease”.
SET Citation = {"PubMed", "26438529"}
SET Evidence = "Sialic acid binding activates CD33, resulting in phosphorylation of the CD33
immunoreceptor tyrosine-based inhibitory motif (ITIM) domains and activation of the SHP-1 and
SHP-2 tyrosine phosphatases [66, 67]."
complex(p(HGNC:CD33),a(CHEBI:"sialic acid")) -> p(HGNC:CD33, pmod(P))
act(p(HGNC:CD33, pmod(P))) => act(p(HGNC:PTPN6), ma(phos))
act(p(HGNC:CD33, pmod(P))) => act(p(HGNC:PTPN11), ma(phos))
UNSET {Evidence, Species}
SET Evidence = "These phosphatases act on multiple substrates, including Syk, to inhibit immune
activation [68, 69]. Hence, CD33 activation leads to increased SHP-1 and SHP-2 activity that antagonizes Syk,
inhibiting ITAM-signaling proteins, possibly including TREM2/DAP12 (Fig. 1, [70, 71])."
SET Species = 9606
act(p(HGNC:PTPN6)) =| act(p(HGNC:SYK))
act(p(HGNC:PTPN11)) =| act(p(HGNC:SYK))
act(p(HGNC:SYK)) -> act(p(HGNC:TREM2))
act(p(HGNC:SYK)) -> act(p(HGNC:TYROBP))
UNSET ALL
-
pybel.examples.
sialic_acid_graph
¶
An example describing a single evidence about BRAF.
SET Citation = {"PubMed", "11283246"}
SET Evidence = "Expression of both dominant negative forms, RasN17 and Rap1N17, in UT7-Mpl cells decreased
thrombopoietin-mediated Elk1-dependent transcription. This suggests that both Ras and Rap1 contribute to
thrombopoietin-induced ELK1 transcription."
SET Species = 9606
p(HGNC:THPO) increases kin(p(HGNC:BRAF))
p(HGNC:THPO) increases kin(p(HGNC:RAF1))
kin(p(HGNC:BRAF)) increases tscript(p(HGNC:ELK1))
UNSET ALL
-
pybel.examples.
braf_example_graph
¶
An example describing statins.
-
pybel.examples.
statin_graph
¶
An example describing a translocation.
SET Citation = {"PubMed", "16170185"}
SET Evidence = "These modifications render Ras functional and capable of localizing to the lipid-rich inner surface of the cell membrane. The first and most critical modification, farnesylation, which is principally catalyzed by protein FTase, adds a 15-carbon hydrobobic farnesyl isoprenyl tail to the carboxyl terminus of Ras."
SET TextLocation = Review
cat(complex(p(HGNC:FNTA),p(HGNC:FNTB))) directlyIncreases p(SFAM:"RAS Family",pmod(F))
p(SFAM:"RAS Family",pmod(F)) directlyIncreases tloc(p(SFAM:"RAS Family"),MESHCS:"Intracellular Space",MESHCS:"Cell Membrane")
-
pybel.examples.
ras_tloc_graph
¶
Summary¶
Summary functions for BEL graphs.
-
pybel.struct.summary.
get_syntax_errors
(graph)[source]¶ List the syntax errors encountered during compilation of a BEL script.
-
pybel.struct.summary.
count_error_types
(graph)[source]¶ Count the occurrence of each type of error in a graph.
-
pybel.struct.summary.
count_naked_names
(graph)[source]¶ Count the frequency of each naked name (names without namespaces).
-
pybel.struct.summary.
calculate_incorrect_name_dict
(graph)[source]¶ Get missing names grouped by namespace.
-
pybel.struct.summary.
calculate_error_by_annotation
(graph, annotation)[source]¶ Group error names by a given annotation.
-
pybel.struct.summary.
get_functions
(graph)[source]¶ Get the set of all functions used in this graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A set of functions
-
pybel.struct.summary.
count_functions
(graph)[source]¶ Count the frequency of each function present in a graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A Counter from {function: frequency}
-
pybel.struct.summary.
count_namespaces
(graph)[source]¶ Count the frequency of each namespace across all nodes (that have namespaces).
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Returns
A Counter from {namespace: frequency}
- Return type
-
pybel.struct.summary.
get_namespaces
(graph)[source]¶ Get the set of all namespaces used in this graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A set of namespaces
-
pybel.struct.summary.
count_names_by_namespace
(graph, namespace)[source]¶ Get the set of all of the names in a given namespace that are in the graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
namespace (
str
) – A namespace keyword
- Return type
- Returns
A counter from {name: frequency}
- Raises
IndexError – if the namespace is not defined in the graph.
-
pybel.struct.summary.
get_names_by_namespace
(graph, namespace)[source]¶ Get the set of all of the names in a given namespace that are in the graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
namespace (
str
) – A namespace keyword
- Return type
- Returns
A set of names belonging to the given namespace that are in the given graph
- Raises
IndexError – if the namespace is not defined in the graph.
-
pybel.struct.summary.
get_unused_namespaces
(graph)[source]¶ Get the set of all namespaces that are defined in a graph, but are never used.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A set of namespaces that are included but not used
-
pybel.struct.summary.
count_variants
(graph)[source]¶ Count how many of each type of variant a graph has.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
-
pybel.struct.summary.
count_pathologies
(graph)[source]¶ Count the number of edges in which each pathology is incident.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
Counter
[BaseEntity
]
-
pybel.struct.summary.
get_top_pathologies
(graph, n=15)[source]¶ Get the top highest relationship-having edges in the graph by BEL.
-
pybel.struct.summary.
iterate_pubmed_identifiers
(graph)[source]¶ Iterate over all PubMed identifiers in a graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
An iterator over the PubMed identifiers in the graph
-
pybel.struct.summary.
get_pubmed_identifiers
(graph)[source]¶ Get the set of all PubMed identifiers cited in the construction of a graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A set of all PubMed identifiers cited in the construction of this graph
-
pybel.struct.summary.
iter_annotation_value_pairs
(graph)[source]¶ Iterate over the key/value pairs, with duplicates, for each annotation used in a BEL graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
-
pybel.struct.summary.
iter_annotation_values
(graph, annotation)[source]¶ Iterate over all of the values for an annotation used in the graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
annotation (str) – The annotation to grab
- Return type
-
pybel.struct.summary.
get_annotation_values_by_annotation
(graph)[source]¶ Get the set of values for each annotation used in a BEL graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A dictionary of {annotation key: set of annotation values}
-
pybel.struct.summary.
get_annotation_values
(graph, annotation)[source]¶ Get all values for the given annotation.
- Parameters
graph (pybel.BELGraph) – A BEL graph
annotation (
str
) – The annotation to summarize
- Return type
- Returns
A set of all annotation values
-
pybel.struct.summary.
count_relations
(graph)[source]¶ Return a histogram over all relationships in a graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A Counter from {relation type: frequency}
-
pybel.struct.summary.
get_annotations
(graph)[source]¶ Get the set of annotations used in the graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A set of annotation keys
-
pybel.struct.summary.
count_annotations
(graph)[source]¶ Count how many times each annotation is used in the graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A Counter from {annotation key: frequency}
-
pybel.struct.summary.
get_unused_annotations
(graph)[source]¶ Get the set of all annotations that are defined in a graph, but are never used.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A set of annotations
-
pybel.struct.summary.
get_unused_list_annotation_values
(graph)[source]¶ Get all of the unused values for list annotations.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
- Returns
A dictionary of {str annotation: set of str values that aren’t used}
Operations¶
This page outlines operations that can be done to BEL graphs.
-
pybel.struct.
left_full_join
(g, h)[source]¶ Add all nodes and edges from
h
tog
, in-place forg
.- Parameters
g (pybel.BELGraph) – A BEL graph
h (pybel.BELGraph) – A BEL graph
Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> left_full_join(g, h)
- Return type
None
-
pybel.struct.
left_outer_join
(g, h)[source]¶ Only add components from the
h
that are touchingg
.Algorithm:
Identify all weakly connected components in
h
Add those that have an intersection with the
g
Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> left_outer_join(g, h)
- Return type
None
-
pybel.struct.
union
(graphs, use_tqdm=False)[source]¶ Take the union over a collection of graphs into a new graph.
Assumes iterator is longer than 2, but not infinite.
- Parameters
- Returns
A merged graph
- Return type
Example usage:
>>> import pybel >>> g = pybel.from_bel_script('...') >>> h = pybel.from_bel_script('...') >>> k = pybel.from_bel_script('...') >>> merged = union([g, h, k])
Filters¶
This module contains functions for filtering node and edge iterables.
It relies heavily on the concepts of functional programming and the concept of predicates.
Transformations¶
This module contains functions that mutate or make transformations on a network.
Grouping¶
Functions for grouping BEL graphs into sub-graphs.
-
pybel.struct.grouping.
get_subgraphs_by_annotation
(graph, annotation, sentinel=None)[source]¶ Stratify the given graph into sub-graphs based on the values for edges’ annotations.
- Parameters
graph (pybel.BELGraph) – A BEL graph
annotation (str) – The annotation to group by
sentinel (Optional[str]) – The value to stick unannotated edges into. If none, does not keep undefined.
- Return type
Pipeline¶
-
class
pybel.
Pipeline
(protocol=None)[source]¶ Build and runs analytical pipelines on BEL graphs.
Example usage:
>>> from pybel import BELGraph >>> from pybel.struct.pipeline import Pipeline >>> from pybel.struct.mutation import enrich_protein_and_rna_origins, prune_protein_rna_origins >>> graph = BELGraph() >>> example = Pipeline() >>> example.append(enrich_protein_and_rna_origins) >>> example.append(prune_protein_rna_origins) >>> result = example.run(graph)
Initialize the pipeline with an optional pre-defined protocol.
- Parameters
protocol (
Optional
[Iterable
[Dict
]]) – An iterable of dictionaries describing how to transform a network
-
static
from_functions
(functions)[source]¶ Build a pipeline from a list of functions.
- Parameters
functions (iter[((pybel.BELGraph) -> pybel.BELGraph) or ((pybel.BELGraph) -> None) or str]) – A list of functions or names of functions
Example with function:
>>> from pybel.struct.pipeline import Pipeline >>> from pybel.struct.mutation import remove_associations >>> pipeline = Pipeline.from_functions([remove_associations])
Equivalent example with function names:
>>> from pybel.struct.pipeline import Pipeline >>> pipeline = Pipeline.from_functions(['remove_associations'])
Lookup by name is possible for built in functions, and those that have been registered correctly using one of the four decorators:
pybel.struct.pipeline.transformation()
,pybel.struct.pipeline.in_place_transformation()
,pybel.struct.pipeline.uni_transformation()
,pybel.struct.pipeline.uni_in_place_transformation()
,
- Return type
Pipeline
-
append
(name, *args, **kwargs)[source]¶ Add a function (either as a reference, or by name) and arguments to the pipeline.
- Parameters
name (str or (pybel.BELGraph -> pybel.BELGraph)) – The name of the function
args – The positional arguments to call in the function
kwargs – The keyword arguments to call in the function
- Return type
Pipeline
- Returns
This pipeline for fluid query building
- Raises
MissingPipelineFunctionError – If the function is not registered
-
extend
(protocol)[source]¶ Add another pipeline to the end of the current pipeline.
- Parameters
protocol (
Union
[Iterable
[Dict
],Pipeline
]) – An iterable of dictionaries (or another Pipeline)- Return type
Pipeline
- Returns
This pipeline for fluid query building
Example: >>> p1 = Pipeline.from_functions([‘enrich_protein_and_rna_origins’]) >>> p2 = Pipeline.from_functions([‘remove_pathologies’]) >>> p1.extend(p2)
-
run
(graph, universe=None)[source]¶ Run the contained protocol on a seed graph.
- Parameters
graph (pybel.BELGraph) – The seed BEL graph
universe (pybel.BELGraph) – Allows just-in-time setting of the universe in case it wasn’t set before. Defaults to the given network.
- Returns
The new graph is returned if not applied in-place
- Return type
-
static
load
(file)[source]¶ Load a protocol from JSON contained in file.
- Return type
Pipeline
- Returns
The pipeline represented by the JSON in the file
- Raises
MissingPipelineFunctionError – If any functions are not registered
-
static
loads
(s)[source]¶ Load a protocol from a JSON string.
- Parameters
s (
str
) – A JSON string- Return type
Pipeline
- Returns
The pipeline represented by the JSON in the file
- Raises
MissingPipelineFunctionError – If any functions are not registered
Transformation Decorators¶
This module contains the functions for decorating transformation functions.
A transformation function takes in a pybel.BELGraph
and either returns None (in-place) or a new
pybel.BELGraph
(out-of-place).
-
pybel.struct.pipeline.decorators.
in_place_transformation
(func)¶ A decorator for functions that modify BEL graphs in-place
-
pybel.struct.pipeline.decorators.
uni_in_place_transformation
(func)¶ A decorator for functions that require a “universe” graph and modify BEL graphs in-place
-
pybel.struct.pipeline.decorators.
uni_transformation
(func)¶ A decorator for functions that require a “universe” graph and create new BEL graphs from old BEL graphs
-
pybel.struct.pipeline.decorators.
transformation
(func)¶ A decorator for functions that create new BEL graphs from old BEL graphs
-
pybel.struct.pipeline.decorators.
get_transformation
(name)[source]¶ Get a transformation function and error if its name is not registered.
- Parameters
name (
str
) – The name of a function to look up- Returns
A transformation function
- Raises
MissingPipelineFunctionError – If the given function name is not registered
Exceptions¶
Exceptions for the pybel.struct.pipeline
module.
-
exception
pybel.struct.pipeline.exc.
MissingPipelineFunctionError
[source]¶ Raised when trying to run the pipeline with a function that isn’t registered.
-
exception
pybel.struct.pipeline.exc.
MetaValueError
[source]¶ Raised when getting an invalid meta value.
-
exception
pybel.struct.pipeline.exc.
MissingUniverseError
[source]¶ Raised when running a universe function without a universe being present.
Input and Output¶
Input and output functions for BEL graphs.
PyBEL provides multiple lossless interchange options for BEL. Lossy output formats are also included for convenient export to other programs. Notably, a de facto interchange using Resource Description Framework (RDF) to match the ability of other existing software is excluded due the immaturity of the BEL to RDF mapping.
-
pybel.
load
(path, **kwargs)[source]¶ Read a BEL graph.
- Parameters
path (
str
) – The path to a BEL graph in any of the formats with extensions described belowkwargs – The keyword arguments are passed to the importer function
- Return type
BELGraph
- Returns
A BEL graph.
This is the universal loader, which means any file path can be given and PyBEL will look up the appropriate load function. Allowed extensions are:
bel
bel.nodelink.json
bel.cx.json
bel.jgif.json
The previous extensions also support gzipping. Other allowed extensions that don’t support gzip are:
bel.pickle / bel.gpickle / bel.pkl
indra.json
-
pybel.
dump
(graph, path, **kwargs)[source]¶ Write a BEL graph.
- Parameters
graph (
BELGraph
) – A BEL graphpath (
str
) – The path to which the BEL graph is written.kwargs – The keyword arguments are passed to the exporter function
This is the universal loader, which means any file path can be given and PyBEL will look up the appropriate writer function. Allowed extensions are:
bel
bel.nodelink.json
bel.unodelink.json
bel.cx.json
bel.jgif.json
bel.graphdati.json
The previous extensions also support gzipping. Other allowed extensions that don’t support gzip are:
bel.pickle / bel.gpickle / bel.pkl
indra.json
tsv
gsea
- Return type
None
Import¶
Parsing Modes¶
The PyBEL parser has several modes that can be enabled and disabled. They are described below.
Allow Naked Names¶
By default, this is set to False
. The parser does not allow identifiers that are not qualified with
namespaces (naked names), like in p(YFG)
. A proper namespace, like p(HGNC:YFG)
must be used. By
setting this to True
, the parser becomes permissive to naked names. In general, this is bad practice and this
feature will be removed in the future.
Allow Nested¶
By default, this is set to False
. The parser does not allow nested statements is disabled. See overview.
By setting this to True
the parser will accept nested statements one level deep.
Citation Clearing¶
By default, this is set to True
. While the BEL specification clearly states how the language should be used as
a state machine, many BEL documents do not conform to the strict SET
/UNSET
rules. To guard against
annotations accidentally carried from one set of statements to the next, the parser has two modes. By default, in
citation clearing mode, when a SET CITATION
command is reached, it will clear all other annotations (except
the STATEMENT_GROUP
, which has higher priority). This behavior can be disabled by setting this to False
to re-enable strict parsing.
Reference¶
-
pybel.
from_bel_script
(path, **kwargs)[source]¶ Load a BEL graph from a file resource. This function is a thin wrapper around
from_lines()
.The remaining keyword arguments are passed to
pybel.io.line_utils.parse_lines()
, which populates aBELGraph
.- Return type
BELGraph
-
pybel.
from_bel_script_url
(url, **kwargs)[source]¶ Load a BEL graph from a URL resource.
- Parameters
url (
str
) – A valid URL pointing to a BEL document
The remaining keyword arguments are passed to
pybel.io.line_utils.parse_lines()
.- Return type
BELGraph
Transport¶
All transport pairs are reflective and data-preserving.
Bytes¶
Conversion functions for BEL graphs with bytes and Python pickles.
-
pybel.
from_bytes
(bytes_graph, check_version=True)[source]¶ Read a graph from bytes (the result of pickling the graph).
-
pybel.
to_bytes
(graph, protocol=4)[source]¶ Convert a graph to bytes with pickle.
Note that the pickle module has some incompatibilities between Python 2 and 3. To export a universally importable pickle, choose 0, 1, or 2.
- Parameters
graph (
BELGraph
) – A BEL networkprotocol (
int
) – Pickling protocol to use. Defaults toHIGHEST_PROTOCOL
.
- Return type
Node-Link JSON¶
Conversion functions for BEL graphs with node-link JSON.
-
pybel.
from_nodelink
(graph_json_dict, check_version=True)[source]¶ Build a graph from node-link JSON Object.
- Return type
BELGraph
-
pybel.
from_nodelink_jsons
(graph_json_str, check_version=True)[source]¶ Read a BEL graph from a node-link JSON string.
- Return type
BELGraph
-
pybel.
to_nodelink_jsons
(graph, **kwargs)[source]¶ Dump this graph as a node-link JSON object to a string.
- Return type
-
pybel.
from_nodelink_file
(path, check_version=True)[source]¶ Build a graph from the node-link JSON contained in the given file.
-
pybel.
to_nodelink_file
(graph, path, **kwargs)[source]¶ Write this graph as node-link JSON to a file.
Cyberinfrastructure Exchange¶
This module wraps conversion between pybel.BELGraph
and the Cyberinfrastructure Exchange (CX) JSON.
CX is an aspect-oriented network interchange format encoded in JSON with a format inspired by the JSON-LD encoding of Resource Description Framework (RDF). It is primarily used by the Network Data Exchange (NDEx) and more recent versions of Cytoscape.
See also
The NDEx Data Model Specification
CX Support for Cytoscape.js on the Cytoscape App Store
-
pybel.
from_cx_jsons
(graph_json_str)[source]¶ Read a BEL graph from a CX JSON string.
- Return type
BELGraph
-
pybel.
to_cx_jsons
(graph, **kwargs)[source]¶ Dump this graph as a CX JSON object to a string.
- Return type
-
pybel.
to_cx_file
(graph, path, indent=2, **kwargs)[source]¶ Write a BEL graph to a JSON file in CX format.
- Parameters
The example below shows how to output a BEL graph as CX to an open file.
from pybel.examples import sialic_acid_graph from pybel import to_cx_file with open('graph.bel.cx.json', 'w') as file: to_cx_file(sialic_acid_graph, file)
The example below shows how to output a BEL graph as CX to a file at a given path.
from pybel.examples import sialic_acid_graph from pybel import to_cx_file to_cx_file(sialic_acid_graph, 'graph.bel.cx.json')
If you have a big graph, you might consider storing it as a gzipped JGIF file by using
to_cx_gz()
.- Return type
None
JSON Graph Interchange Format¶
Conversion functions for BEL graphs with JGIF JSON.
The JSON Graph Interchange Format (JGIF) is specified similarly to the Node-Link JSON. Interchange with this format provides compatibilty with other software and repositories, such as the Causal Biological Network Database.
-
pybel.
from_jgif
(graph_jgif_dict, parser_kwargs=None)[source]¶ Build a BEL graph from a JGIF JSON object.
-
pybel.
to_jgif
(graph)[source]¶ Build a JGIF dictionary from a BEL graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Returns
A JGIF dictionary
- Return type
Warning
Untested! This format is not general purpose and is therefore time is not heavily invested. If you want to use Cytoscape.js, we suggest using
pybel.to_cx()
instead.The example below shows how to output a BEL graph as a JGIF dictionary.
import os from pybel.examples import sialic_acid_graph graph_jgif_json = pybel.to_jgif(sialic_acid_graph)
If you want to write the graph directly to a file as JGIF, see func:to_jgif_file.
-
pybel.
from_jgif_jsons
(graph_json_str)[source]¶ Read a BEL graph from a JGIF JSON string.
- Return type
BELGraph
-
pybel.
to_jgif_jsons
(graph, **kwargs)[source]¶ Dump this graph as a JGIF JSON object to a string.
- Return type
-
pybel.
to_jgif_file
(graph, file, **kwargs)[source]¶ Write JGIF to a file.
The example below shows how to output a BEL graph as JGIF to an open file.
from pybel.examples import sialic_acid_graph from pybel import to_jgif_file with open('graph.bel.jgif.json', 'w') as file: to_jgif_file(sialic_acid_graph, file)
The example below shows how to output a BEL graph as JGIF to a file at a given path.
from pybel.examples import sialic_acid_graph from pybel import to_jgif_file to_jgif_file(sialic_acid_graph, 'graph.bel.jgif.json')
If you have a big graph, you might consider storing it as a gzipped JGIF file by using
to_jgif_gz()
.- Return type
None
-
pybel.
to_jgif_gz
(graph, path, **kwargs)[source]¶ Write a graph as JGIF JSON to a gzip file.
- Return type
None
-
pybel.
from_cbn_jgif
(graph_jgif_dict)[source]¶ Build a BEL graph from CBN JGIF.
Map the JGIF used by the Causal Biological Network Database to standard namespace and annotations, then builds a BEL graph using
pybel.from_jgif()
.- Parameters
graph_jgif_dict (dict) – The JSON object representing the graph in JGIF format
- Return type
Example: .. code-block:: python
import requests from pybel import from_cbn_jgif apoptosis_url = ‘http://causalbionet.com/Networks/GetJSONGraphFile?networkId=810385422’ graph_jgif_dict = requests.get(apoptosis_url).json() graph = from_cbn_jgif(graph_jgif_dict)
Warning
Handling the annotations is not yet supported, since the CBN documents do not refer to the resources used to create them. This may be added in the future, but the annotations must be stripped from the graph before uploading to the network store using
pybel.struct.mutation.strip_annotations()
.
GraphDati¶
Conversion functions for BEL graphs with GraphDati.
Note that these are not exact I/O - you can’t currently use them as a round trip because the input functions expect the GraphDati format that’s output by BioDati.
-
pybel.
to_graphdati
(graph, *, use_identifiers=True, skip_unqualified=True, use_tqdm=False, metadata_extras=None)[source]¶ Export a GraphDati list using the nanopub.
- Parameters
graph – A BEL graph
use_identifiers (
bool
) – use OBO-style identifiersuse_tqdm (
bool
) – Show a progress bar while generating nanopubsskip_unqualified (
bool
) – Should unqualified edges be output as nanopubs? Defaults to false.metadata_extras (
Optional
[Mapping
[str
,Any
]]) – Extra information to pass into the metadata part of nanopubs
- Return type
-
pybel.
from_graphdati
(j, use_tqdm=True)[source]¶ Convert data from the “normal” network format.
Warning
BioDati crashes when requesting the
full
network format, so this isn’t yet explicitly supported- Return type
BELGraph
-
pybel.
to_graphdati_file
(graph, path, use_identifiers=True, **kwargs)[source]¶ Write this graph as GraphDati JSON to a file.
-
pybel.
to_graphdati_gz
(graph, path, **kwargs)[source]¶ Write a graph as GraphDati JSON to a gzip file.
- Return type
None
-
pybel.
from_graphdati_gz
(path)[source]¶ Read a graph as GraphDati JSON from a gzip file.
- Return type
BELGraph
-
pybel.
to_graphdati_jsons
(graph, **kwargs)[source]¶ Dump this graph as a GraphDati JSON object to a string.
- Parameters
graph (
BELGraph
) – A BEL graph- Return type
-
pybel.
from_graphdati_jsons
(s)[source]¶ Load a graph from a GraphDati JSON string.
- Parameters
graph – A BEL graph
- Return type
BELGraph
INDRA¶
Conversion functions for BEL graphs with INDRA.
After assembling a model with INDRA, a list of
indra.statements.Statement
can be converted to a pybel.BELGraph
with
indra.assemblers.pybel.PybelAssembler
.
from indra.assemblers.pybel import PybelAssembler
import pybel
stmts = [
# A list of INDRA statements
]
pba = PybelAssembler(
stmts,
name='Graph Name',
version='0.0.1',
description='Graph Description'
)
graph = pba.make_model()
# Write to BEL file
pybel.to_bel_path(belgraph, 'simple_pybel.bel')
Warning
These functions are hard to unit test because they rely on a whole set of java dependencies and will likely not be for a while.
-
pybel.
from_indra_statements
(stmts, name=None, version=None, description=None, authors=None, contact=None, license=None, copyright=None, disclaimer=None)[source]¶ Import a model from
indra
.- Parameters
stmts (List[indra.statements.Statement]) – A list of statements
version (
Optional
[str
]) – The graph’s version. Recommended to use semantic versioning orYYYYMMDD
format.
- Return type
-
pybel.
from_indra_statements_json
(stmts_json, **kwargs)[source]¶ Get a BEL graph from INDRA statements JSON.
- Return type
Other kwargs are passed to
from_indra_statements()
.
-
pybel.
from_indra_statements_json_file
(file, **kwargs)[source]¶ Get a BEL graph from INDRA statements JSON file.
- Return type
Other kwargs are passed to
from_indra_statements()
.
-
pybel.
to_indra_statements
(graph)[source]¶ Export this graph as a list of INDRA statements using the
indra.sources.pybel.PybelProcessor
.- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
list[indra.statements.Statement]
-
pybel.
to_indra_statements_json
(graph)[source]¶ Export this graph as INDRA JSON list.
- Parameters
graph (pybel.BELGraph) – A BEL graph
- Return type
-
pybel.
to_indra_statements_json_file
(graph, path, indent=2, **kwargs)[source]¶ Export this graph as INDRA statement JSON.
- Parameters
graph (pybel.BELGraph) – A BEL graph
Other kwargs are passed to
json.dump()
.
-
pybel.
from_biopax
(path, **kwargs)[source]¶ Import a model encoded in Pathway Commons BioPAX via
indra
.- Parameters
path (
str
) – Path to a BioPAX OWL file- Return type
Other kwargs are passed to
from_indra_statements()
.Warning
Not compatible with all BioPAX! See INDRA documentation.
Visualization¶
Jupyter¶
Support for displaying BEL graphs in Jupyter notebooks.
-
pybel.io.jupyter.
to_jupyter
(graph, width=1000, height=650, color_map=None)[source]¶ Display a BEL graph inline in a Jupyter notebook.
To use successfully, make run as the last statement in a cell inside a Jupyter notebook.
- Parameters
graph (
BELGraph
) – A BEL graphwidth (
int
) – The width of the visualization window to renderheight (
int
) – The height of the visualization window to rendercolor_map (
Optional
[Mapping
[str
,str
]]) – A dictionary from PyBEL internal node functions to CSS color strings like #FFEE00. Defaults todefault_color_map
- Returns
An IPython notebook Javascript object
- Return type
IPython.display.Javascript
Analytical Services¶
PyNPA¶
Exporter for PyNPA.
See also
-
pybel.
to_npa_directory
(graph, directory, **kwargs)[source]¶ Write the BEL file to two files in the directory for
pynpa
.- Return type
None
-
pybel.
to_npa_dfs
(graph, cartesian_expansion=False, nomenclature_method_first_layer=None, nomenclature_method_second_layer=None, direct_tf_only=False)[source]¶ Export the BEL graph as two lists of triples for the
pynpa
.- Parameters
graph (
BELGraph
) – A BEL graphcartesian_expansion (
bool
) – If true, applies cartesian expansion on both reactions (reactants x products) as well as list abundances usinglist_abundance_cartesian_expansion()
andreaction_cartesian_expansion()
nomenclature_method_first_layer (
Optional
[str
]) – Either “curie”, “name” or “inodes. Defaults to “curie”.nomenclature_method_second_layer (
Optional
[str
]) – Either “curie”, “name” or “inodes. Defaults to “curie”.
Pick out all transcription factor relationships. Protein X is a transcription factor for gene Y IFF
complex(p(X), g(Y)) -> r(Y)
Get all other interactions between any gene/rna/protein that are directed causal for the PPI layer
- Return type
Tuple
[DataFrame
,DataFrame
]
HiPathia¶
Convert a BEL graph to HiPathia inputs.
Input¶
SIF File¶
Text file with three columns separated by tabs.
Each row represents an interaction in the pathway. First column is the source node, third column the target node, and the second is the type of relation between them.
Only activation and inhibition interactions are allowed.
The name of the nodes in this file will be stored as the IDs of the nodes.
The nodes IDs should have the following structure: N (dash) pathway ID (dash) node ID.
HiPathia distinguish between two types of nodes: simple and complex.
Simple nodes:
Simple nodes may include many genes, but only one is needed to perform the function of the node. This could correspond to a protein family of enzymes that all have the same function - only one of them needs to be present for the action to take place. Simple nodes are defined within
Node IDs from simple nodes do not include any space, i.e. N-hsa04370-11.
Complex nodes:
Complex nodes include different simple nodes and represent protein complexes. Each simple node within the complex represents one protein in the complex. This node requires the presence of all their simple nodes to perform its function.
Node IDs from complex nodes are the juxtaposition of the included simple node IDs, separated by spaces, i.e. N-hsa04370-10 26.
ATT File¶
Text file with twelve (12) columns separated by tabulars. Each row represents a node (either simple or complex).
The columns included are:
ID
: Node ID as explained above.label
: Name to be shown in the picture of the pathway en HGNC. Generally, the gene name of the first included EntrezID gene is used as label. For complex nodes, we juxtapose the gene names of the first genes of each simple node included (see genesList column below).X
: The X-coordinate of the position of the node in the pathway.Y
: The Y-coordinate of the position of the node in the pathway.color
: The default color of the node.shape
: The shape of the node. “rectangle” should be used for genes and “circle” for metabolites.type
: The type of the node, either “gene” for genes or “compound” for metabolites. For complex nodes, the type of each of their included simple nodes is juxtaposed separated by commas, i.e. gene,gene.label.cex
: Amount by which plotting label should be scaled relative to the default.label.color
: Default color of the node.width
: Default width of the node.height
: Default height of the node.genesList
: List of genes included in each node, with EntrezID:
Simple nodes: EntrezIDs of the genes included, separated by commas (“,”) and no spaces, i.e. 56848,8877 for node N-hsa04370-11.
Complex nodes: GenesList of the simple nodes included, separated by a slash (“/”) and no spaces, and in the same order as in the node ID. For example, node N-hsa04370-10 26 includes two simple nodes: 10 and 26. Its genesList column is 5335,5336,/,9047, meaning that the genes included in node 10 are 5335 and 5336, and the gene included in node 26 is 9047.
-
pybel.
to_hipathia
(graph, directory)[source]¶ Export HiPathia artifacts for the graph.
- Return type
None
-
pybel.
to_hipathia_dfs
(graph)[source]¶ Get the ATT and SIF dataframes.
Identify nodes: 1. Identify all proteins 2. Identify all protein families 3. Identify all complexes with just a protein or a protein family in them
Identify interactions between any of those things that are causal
Profit!
- Return type
Tuple
[DataFrame
,DataFrame
]
SPIA¶
An exporter for signaling pathway impact analysis (SPIA) described by [Tarca2009].
- Tarca2009
Tarca, A. L., et al (2009). A novel signaling pathway impact analysis. Bioinformatics, 25(1), 75–82.
Machine Learning¶
Export functions for Machine Learning.
While BEL is a fantastic medium for storing metadata and high granularity information on edges, machine learning algorithms can not consume BEL graphs directly. This module provides functions that make inferences and interpretations of BEL graphs in order to interface with machine learning platforms. One example where we’ve done this is BioKEEN, which uses this module to convert BEL graphs into a format for knowledge graph embeddings.
-
pybel.
to_tsv
(graph, path, *, use_tqdm=False, sep='\\t', raise_on_none=False)[source]¶ Write the graph as a TSV.
Web Services¶
BEL Commons¶
Transport functions for BEL Commons.
BEL Commons is a free, open-source platform for hosting BEL content. Because it was originally developed and published in an academic capacity at Fraunhofer SCAI, a public instance can be found at https://bel-commons-dev.scai.fraunhofer.de. However, this instance is only supported out of posterity and will not be updated. If you would like to host your own instance of BEL Commons, there are instructions on its GitHub page.
-
pybel.
from_bel_commons
(network_id, host=None)[source]¶ Retrieve a public network from BEL Commons.
In the future, this function may be extended to support authentication.
- Parameters
network_id (
int
) – The BEL Commons network identifierhost (
Optional
[str
]) – The location of the BEL Commons server. Alternatively, looks up in PyBEL config withPYBEL_REMOTE_HOST
or the environment asPYBEL_REMOTE_HOST
Defaults topybel.constants.DEFAULT_SERVICE_URL
- Return type
BELGraph
-
pybel.
to_bel_commons
(graph, host=None, user=None, password=None, public=False)[source]¶ Send a graph to the receiver service and returns the
requests
response object.- Parameters
graph (
BELGraph
) – A BEL graphhost (
Optional
[str
]) – The location of the BEL Commons server. Alternatively, looks up in PyBEL config withPYBEL_REMOTE_HOST
or the environment asPYBEL_REMOTE_HOST
Defaults topybel.constants.DEFAULT_SERVICE_URL
user (
Optional
[str
]) – Username for BEL Commons. Alternatively, looks up in PyBEL config withPYBEL_REMOTE_USER
or the environment asPYBEL_REMOTE_USER
password (
Optional
[str
]) – Password for BEL Commons. Alternatively, looks up in PyBEL config withPYBEL_REMOTE_PASSWORD
or the environment asPYBEL_REMOTE_PASSWORD
- Return type
Response
- Returns
The response object from
requests
BioDati¶
Transport functions for BioDati.
BioDati is a paid, closed-source platform for hosting BEL content. However, they do have a demo instance running at https://studio.demo.biodati.com with which the examples in this module will be described.
As noted in the transport functions for BioDati, you should change the URLs to point to your own instance of BioDati. If you’re looking for an open source storage system for hosting your own BEL content, you may consider BEL Commons, with the caveat that it is currently maintained in an academic capacity. Disclosure: BEL Commons is developed by the developers of PyBEL.
-
pybel.
to_biodati
(graph, *, username='demo@biodati.com', password='demo', base_url='https://nanopubstore.demo.biodati.com', chunksize=None, use_tqdm=True, collections=None, overwrite=False, validate=True, email=False)[source]¶ Post this graph to a BioDati server.
- Parameters
graph (
BELGraph
) – A BEL graphusername (
str
) – The email address to log in to BioDati. Defaults to “demo@biodati.com” for the demo serverpassword (
str
) – The password to log in to BioDati. Defaults to “demo” for the demo serverbase_url (
str
) – The BioDati nanopub store base url. Defaults to “https://nanopubstore.demo.biodati.com” for the demo server’s nanopub storechunksize (
Optional
[int
]) – The number of nanopubs to post at a time. By default, does all.use_tqdm (
bool
) – Should tqdm be used when iterating?collections (
Optional
[Iterable
[str
]]) – Tags to add to the nanopubs for lookup on BioDatioverwrite (
bool
) – Set the BioDati upload “overwrite” settingvalidate (
bool
) – Set the BioDati upload “validate” settingemail (
Union
[bool
,str
]) – Who should get emailed with results about the upload? If true, emails to user used for login. If string, emails to that user. If false, no email.
- Return type
Response
- Returns
The response from the BioDati server (last response if using chunking)
Warning
BioDati does not support large uploads (yet?).
Warning
The default public BioDati server has been put here. You should switch it to yours. It will look like
https://nanopubstore.<YOUR NAME>.biodati.com
.
-
pybel.
from_biodati
(network_id, username='demo@biodati.com', password='demo', base_url='https://networkstore.demo.biodati.com')[source]¶ Get a graph from a BioDati network store based on its network identifier.
- Parameters
network_id (
str
) – The internal identifier of the network you want to download.username (
str
) – The email address to log in to BioDati. Defaults to “demo@biodati.com” for the demo serverpassword (
str
) – The password to log in to BioDati. Defaults to “demo” for the demo serverbase_url (
str
) – The BioDati network store base url. Defaults to “https://networkstore.demo.biodati.com” for the demo server’s network store
Example usage:
from pybel import from_biodati network_id = '01E46GDFQAGK5W8EFS9S9WMH12' # COVID-19 graph example from Wendy Zimmermann graph = from_biodati( network_id=network_id, username='demo@biodati.com', password='demo', base_url='https://networkstore.demo.biodati.com', ) graph.summarize()
Warning
The default public BioDati server has been put here. You should switch it to yours. It will look like
https://networkstore.<YOUR NAME>.biodati.com
.- Return type
BELGraph
Fraunhofer OrientDB¶
Transport functions for Franhofer’s OrientDB.
Fraunhofer hosts
an instance of OrientDB that contains BEL in a schema similar to
pybel.io.umbrella_nodelink
. However, they include custom relations that do not come
from a controlled vocabulary, and have not made the schema, ETL scripts, or documentation available.
Unlike BioDati and BEL Commons, the Fraunhofer OrientDB does not allow for uploads, so only
a single function pybel.from_fraunhofer_orientdb()
is provided by PyBEL.
-
pybel.
from_fraunhofer_orientdb
(database='covid', user='covid_user', password='covid', query=None)[source]¶ Get a BEL graph from the Fraunhofer OrientDB.
- Parameters
database (
str
) – The OrientDB database to connect touser (
str
) – The user to connect to OrientDBpassword (
str
) – The password to connect to OrientDBquery (
Optional
[str
]) – The query to run. Defaults to the URL encoded version ofselect from E
, whereE
is all edges in the OrientDB edge database. Likely does not need to be changed, except in the case of selecting specific subsets of edges. Make sure you URL encode it properly, because OrientDB’s RESTful API puts it in the URL’s path.
By default, this function connects to the
covid
database, that corresponds to the COVID-19 Knowledge Graph 0. If other databases in the Fraunhofer OrientDB are published and demo username/password combinations are given, the following table will be updated.Database
Username
Password
covid
covid_user
covid
The
covid
database can be downloaded and converted to a BEL graph like this:import pybel graph = pybel.from_fraunhofer_orientdb( database='covid', user='covid_user', password='covid', ) graph.summarize()
However, because the source BEL scripts for the COVID-19 Knowledge Graph are available on GitHub and the authors pre-enabled it for PyBEL, it can be downloaded with
pip install git+https://github.com/covid19kg/covid19kg.git
and used with the following python code:import covid19kg graph = covid19kg.get_graph() graph.summarize()
Warning
It was initially planned to handle some of the non-standard relationships listed in the Fraunhofer OrientDB’s schema in their OrientDB Studio instance, but none of them actually appear in the only network that is accessible. If this changes, please leave an issue at https://github.com/pybel/pybel/issues so it can be addressed.
- 0
Domingo-Fernández, D., et al. (2020). COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology. bioRxiv 2020.04.14.040667.
- Return type
BELGraph
Databases¶
SQL Databases¶
Conversion functions for BEL graphs with a SQL database.
-
pybel.
from_database
(name, version=None, manager=None)[source]¶ Load a BEL graph from a database.
If name and version are given, finds it exactly with
pybel.manager.Manager.get_network_by_name_version()
. If just the name is given, finds most recent withpybel.manager.Manager.get_network_by_name_version()
Neo4j¶
Output functions for BEL graphs to Neo4j.
-
pybel.
to_neo4j
(graph, neo_connection, use_tqdm=False)[source]¶ Upload a BEL graph to a Neo4j graph database using
py2neo
.- Parameters
graph (pybel.BELGraph) – A BEL Graph
neo_connection (str or py2neo.Graph) – A
py2neo
connection object. Refer to the py2neo documentation for how to build this object.
Example Usage:
>>> import py2neo >>> import pybel >>> from pybel.examples import sialic_acid_graph >>> neo_graph = py2neo.Graph("http://localhost:7474/db/data/") # use your own connection settings >>> pybel.to_neo4j(sialic_acid_graph, neo_graph)
Lossy Export¶
Umbrella Node-Link JSON¶
The Umbrella Node-Link JSON format is similar to node-link but uses full BEL terms as nodes.
Given a BEL statement describing that X
phosphorylates Y
like act(p(X)) -> p(Y, pmod(Ph))
,
PyBEL usually stores the act()
information about X
as part of the relationship. In Umbrella mode,
this stays as part of the node.
Note that this generates additional nodes in the network for each of the “modified” versions of
the node. For example, act(p(X))
will be represented as individual node instead of
p(X)
, as in the standard node-link JSON exporter.
A user might want to use this exporter in the following scenarios:
Represent transitivity in activities like in
p(X, pmod(Ph)) -> act(p(X)) -> p(Y, pmod(Ph)) -> act(p(Y))
with four nodes that are more ammenable to simulatons (e.g., boolean networks, petri nets).Visualizing networks that in similar way to the legacy BEL Cytoscape plugin from the BEL Framework (warning: now defunct) using tools like Cytoscape.
GraphML¶
Conversion functions for BEL graphs with GraphML.
-
pybel.
to_graphml
(graph, path, schema=None)[source]¶ Write a graph to a GraphML XML file using
networkx.write_graphml()
.- Parameters
The .graphml file extension is suggested so Cytoscape can recognize it. By default, this function exports using the PyBEL schema of including modifier information into the edges. As an alternative, this function can also distinguish between
- Return type
None
Miscellaneous¶
This module contains IO functions for outputting BEL graphs to lossy formats, such as GraphML and CSV.
-
pybel.
to_csv
(graph, path, sep=None)[source]¶ Write the graph as a tab-separated edge list.
The resulting file will contain the following columns:
Source BEL term
Relation
Target BEL term
Edge data dictionary
See the Data Models section of the documentation for which data are stored in the edge data dictionary, such as queryable information about transforms on the subject and object and their associated metadata.
- Return type
None
-
pybel.
to_sif
(graph, path, sep=None)[source]¶ Write the graph as a tab-separated SIF file.
The resulting file will contain the following columns:
Source BEL term
Relation
Target BEL term
This format is simple and can be used readily with many applications, but is lossy in that it does not include relation metadata.
- Return type
None
-
pybel.
to_gsea
(graph, path)[source]¶ Write the genes/gene products to a GRP file for use with GSEA gene set enrichment analysis.
See also
GSEA publication
- Return type
None
Manager¶
Manager API¶
The BaseManager takes care of building and maintaining the connection to the database via SQLAlchemy.
-
class
pybel.manager.
BaseManager
(engine, session)[source]¶ A wrapper around a SQLAlchemy engine and session.
Instantiate a manager from an engine and session.
-
base
¶ alias of
sqlalchemy.ext.declarative.api.Base
-
create_all
(checkfirst=True)[source]¶ Create the PyBEL cache’s database and tables.
- Parameters
checkfirst (
bool
) – Check if the database exists before trying to re-make it- Return type
None
-
The Manager collates multiple groups of functions for interacting with the database. For sake of code clarity, they are separated across multiple classes that are documented below.
-
class
pybel.manager.
Manager
(connection=None, engine=None, session=None, **kwargs)[source]¶ Bases:
pybel.manager.cache_manager._Manager
A manager for the PyBEL database.
Create a connection to database and a persistent session using SQLAlchemy.
A custom default can be set as an environment variable with the name
pybel.constants.PYBEL_CONNECTION
, using an RFC-1738 string. For example, a MySQL string can be given with the following form:mysql+pymysql://<username>:<password>@<host>/<dbname>?charset=utf8[&<options>]
A SQLite connection string can be given in the form:
sqlite:///~/Desktop/cache.db
Further options and examples can be found on the SQLAlchemy documentation on engine configuration.
- Parameters
connection (
Optional
[str
]) – An RFC-1738 database connection string. IfNone
, tries to load from the environment variablePYBEL_CONNECTION
then from the config file~/.config/pybel/config.json
whose value forPYBEL_CONNECTION
defaults topybel.constants.DEFAULT_CACHE_LOCATION
.engine – Optional engine to use. Must be specified with a session and no connection.
session – Optional session to use. Must be specified with an engine and no connection.
echo (bool) – Turn on echoing sql
autoflush (Optional[bool]) – Defaults to True if not specified in kwargs or configuration.
autocommit (Optional[bool]) – Defaults to False if not specified in kwargs or configuration.
expire_on_commit (Optional[bool]) – Defaults to False if not specified in kwargs or configuration.
scopefunc – Scoped function to pass to
sqlalchemy.orm.scoped_session()
From the Flask-SQLAlchemy documentation:
An extra key
'scopefunc'
can be set on theoptions
dict to specify a custom scope function. If it’s not provided, Flask’s app context stack identity is used. This will ensure that sessions are created and removed with the request/response cycle, and should be fine in most cases.Allowed Usages:
Instantiation with connection string as positional argument
>>> my_connection = 'sqlite:///~/Desktop/cache.db' >>> manager = Manager(my_connection)
Instantiation with connection string as positional argument with keyword arguments
>>> my_connection = 'sqlite:///~/Desktop/cache.db' >>> manager = Manager(my_connection, echo=True)
Instantiation with connection string as keyword argument
>>> my_connection = 'sqlite:///~/Desktop/cache.db' >>> manager = Manager(connection=my_connection)
Instantiation with connection string as keyword argument with keyword arguments
>>> my_connection = 'sqlite:///~/Desktop/cache.db' >>> manager = Manager(connection=my_connection, echo=True)
Instantiation with user-supplied engine and session objects as keyword arguments
>>> my_engine, my_session = ... # magical creation! See SQLAlchemy documentation >>> manager = Manager(engine=my_engine, session=my_session)
Manager Components¶
-
class
pybel.manager.
NetworkManager
(engine, session)[source]¶ Groups functions for inserting and querying networks in the database’s network store.
Instantiate a manager from an engine and session.
-
has_name_version
(name, version)[source]¶ Check if there exists a network with the name/version combination in the database.
- Return type
-
drop_network
(network)[source]¶ Drop a network, while also cleaning up any edges that are no longer part of any network.
- Return type
None
-
query_singleton_edges_from_network
(network)[source]¶ Return a query selecting all edge ids that only belong to the given network.
- Return type
-
get_network_by_name_version
(name, version)[source]¶ Load the network with the given name and version if it exists.
-
get_graph_by_name_version
(name, version)[source]¶ Load the BEL graph with the given name, or allows for specification of version.
- Return type
Optional
[BELGraph
]
-
get_networks_by_name
(name)[source]¶ Get all networks with the given name. Useful for getting all versions of a given network.
-
get_most_recent_network_by_name
(name)[source]¶ Get the most recently created network with the given name.
-
get_graph_by_most_recent
(name)[source]¶ Get the most recently created network with the given name as a
pybel.BELGraph
.- Return type
Optional
[BELGraph
]
-
get_network_by_id
(network_id)[source]¶ Get a network from the database by its identifier.
- Return type
-
get_graph_by_id
(network_id)[source]¶ Get a network from the database by its identifier and converts it to a BEL graph.
- Return type
BELGraph
-
get_networks_by_ids
(network_ids)[source]¶ Get a list of networks with the given identifiers.
Note: order is not necessarily preserved.
-
-
class
pybel.manager.
QueryManager
(engine, session)[source]¶ An extension to the Manager to make queries over the database.
Instantiate a manager from an engine and session.
-
query_nodes
(bel=None, type=None, namespace=None, name=None)[source]¶ Query nodes in the database.
- Parameters
- Return type
-
get_edges_with_annotation
(annotation, value)[source]¶ Search edges with the given annotation/value pair.
-
query_edges
(bel=None, source_function=None, source=None, target_function=None, target=None, relation=None)[source]¶ Return a query over the edges in the database.
Usually this means that you should call
list()
or.all()
on this result.- Parameters
bel (
Optional
[str
]) – BEL statement that represents the desired edge.source_function (
Optional
[str
]) – Filter source nodes with the given BEL functionsource (
Union
[None
,str
,Node
]) – BEL term of source node e.g.p(HGNC:APP)
orNode
object.target_function (
Optional
[str
]) – Filter target nodes with the given BEL functiontarget (
Union
[None
,str
,Node
]) – BEL term of target node e.g.p(HGNC:APP)
orNode
object.relation (
Optional
[str
]) – The relation that should be present between source and target node.
-
query_citations
(db=None, db_id=None, name=None, author=None, date=None, evidence_text=None)[source]¶ Query citations in the database.
-
query_edges_by_pubmed_identifiers
(pubmed_identifiers)[source]¶ Get all edges annotated to the documents identified by the given PubMed identifiers.
-
Models¶
This module contains the SQLAlchemy database models that support the definition cache and graph cache.
-
class
pybel.manager.models.
Base
(**kwargs)¶ The most base type
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
class
pybel.manager.models.
Namespace
(**kwargs)[source]¶ Represents a BEL Namespace.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
uploaded
¶ The date of upload
-
keyword
¶ Keyword that is used in a BEL file to identify a specific namespace
-
pattern
¶ Contains regex pattern for value identification.
-
miriam_id
¶ MIRIAM resource identifier matching the regular expression
^MIR:001\d{5}$
-
version
¶ Version of the namespace
-
url
¶ BELNS Resource location as URL
-
name
¶ Name of the given namespace
-
domain
¶ Domain for which this namespace is valid
-
species
¶ Taxonomy identifiers for which this namespace is valid
-
description
¶ Optional short description of the namespace
-
created
¶ DateTime of the creation of the namespace definition file
-
query_url
¶ URL that can be used to query the namespace (externally from PyBEL)
The author of the namespace
-
license
¶ License information
-
contact
¶ Contact information
-
-
class
pybel.manager.models.
NamespaceEntry
(**kwargs)[source]¶ Represents a name within a BEL namespace.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
name
¶ Name that is defined in the corresponding namespace definition file
-
identifier
¶ The database accession number
-
encoding
¶ The biological entity types for which this name is valid
-
-
class
pybel.manager.models.
Network
(**kwargs)[source]¶ Represents a collection of edges, specified by a BEL Script.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
name
¶ Name of the given Network (from the BEL file)
-
version
¶ Release version of the given Network (from the BEL file)
Authors of the underlying BEL file
-
contact
¶ Contact email from the underlying BEL file
-
description
¶ Descriptive text from the underlying BEL file
-
copyright
¶ Copyright information
-
disclaimer
¶ Disclaimer information
-
licenses
¶ License information
-
blob
¶ A pickled version of this network
-
classmethod
name_contains
(name_query)[source]¶ Build a filter for networks whose names contain the query.
-
classmethod
description_contains
(description_query)[source]¶ Build a filter for networks whose descriptions contain the query.
-
classmethod
id_in
(network_ids)[source]¶ Build a filter for networks whose identifiers appear in the given sequence.
-
store_bel
(graph)[source]¶ Insert a BEL graph.
- Parameters
graph (pybel.BELGraph) – A BEL Graph
-
-
class
pybel.manager.models.
Modification
(**kwargs)[source]¶ The modifications that are present in the network are stored in this table.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
type
¶ Type of the stored modification e.g. Fusion, gmod, pmod, etc
-
variantString
¶ HGVS string if sequence modification
-
residue
¶ Three letter amino acid code if PMOD
-
position
¶ Position of PMOD or GMOD
-
-
class
pybel.manager.models.
Node
(**kwargs)[source]¶ Represents a BEL Term.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
type
¶ The type of the represented biological entity e.g. Protein or Gene
-
is_variant
¶ Identifies weather or not the given node is a variant
-
has_fusion
¶ Identifies weather or not the given node is a fusion
-
bel
¶ Canonical BEL term that represents the given node
-
data
¶ PyBEL BaseEntity as JSON
-
-
class
pybel.manager.models.
Author
(**kwargs)[source]¶ Contains all author names.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
class
pybel.manager.models.
Citation
(**kwargs)[source]¶ The information about the citations that are used to prove a specific relation are stored in this table.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
db
¶ Type of the stored publication e.g. PubMed
-
db_id
¶ Reference identifier of the publication e.g. PubMed_ID
-
title
¶ Title of the publication
-
journal
¶ Journal name
-
volume
¶ Volume of the journal
-
issue
¶ Issue within the volume
-
pages
¶ Pages of the publication
-
date
¶ Publication date
-
first_id
¶ First author
-
last_id
¶ Last author
-
property
is_enriched
¶ Return if this citation has been enriched for name, title, and other metadata.
- Return type
-
-
class
pybel.manager.models.
Evidence
(**kwargs)[source]¶ This table contains the evidence text that proves a specific relationship and refers the source that is cited.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
text
¶ Supporting text from a given publication
-
-
class
pybel.manager.models.
Property
(**kwargs)[source]¶ The property table contains additional information that is used to describe the context of a relation.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
is_subject
¶ Identifies which participant of the edge if affected by the given property
-
modifier
¶ The modifier: one of activity, degradation, location, or translocation
-
relative_key
¶ Relative key of effect e.g. to_tloc or from_tloc
-
property
side
¶ Return either
pybel.constants.SUBJECT
orpybel.constants.OBJECT
.- Return type
-
-
class
pybel.manager.models.
Edge
(**kwargs)[source]¶ Relationships between BEL nodes and their properties, annotations, and provenance.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
-
bel
¶ Valid BEL statement that represents the given edge
-
md5
¶ The hash of the source, target, and associated metadata
-
data
¶ The stringified JSON representing this edge
-
to_json
(include_id=False)[source]¶ Create a dictionary of one BEL Edge that can be used to create an edge in a
BELGraph
.
-
insert_into_graph
(graph)[source]¶ Insert this edge into a BEL graph.
- Parameters
graph (pybel.BELGraph) – A BEL graph
-
Cookbook¶
An extensive set of examples can be found on the PyBEL Notebooks repository on GitHub. These notebooks contain basic usage and also make numerous references to the analytical package PyBEL Tools
Configuration¶
The default connection string can be set as an environment variable in your ~/.bashrc
. If you’re using MySQL or
MariaDB, it could look like this:
$ export PYBEL_CONNECTION="mysql+pymysql://user:password@server_name/database_name?charset=utf8"
Prepare a Cytoscape Network¶
Load, compile, and export to Cytoscape format:
$ pybel convert --path ~/Desktop/example.bel --graphml ~/Desktop/example.graphml
In Cytoscape, open with Import > Network > From File
.
Command Line Interface¶
Note
The command line wrapper might not work on Windows. Use python3 -m pybel
if it has issues.
PyBEL automatically installs the command pybel
. This command can be used to easily compile BEL documents
and convert to other formats. See pybel --help
for usage details. This command makes logs of all conversions
and warnings to the directory ~/.pybel/
.
pybel¶
PyBEL CLI on /home/docs/checkouts/readthedocs.org/user_builds/pybel/envs/v0.14.9/bin/python
pybel [OPTIONS] COMMAND [ARGS]...
Options
-
--version
¶
Show the version and exit.
-
-c
,
--connection
<connection>
¶ Database connection string. [default: sqlite:////home/docs/.pybel/pybel_0.14.0_cache.db]
compile¶
Compile a BEL script to a graph.
pybel compile [OPTIONS] PATH
Options
-
--allow-naked-names
¶
Enable lenient parsing for naked names
-
--disallow-nested
¶
Disable lenient parsing for nested statements
-
--disallow-unqualified-translocations
¶
Disallow unqualified translocations
-
--no-identifier-validation
¶
Turn off identifier validation
-
--no-citation-clearing
¶
Turn off citation clearing
-
-r
,
--required-annotations
<required_annotations>
¶ Specify multiple required annotations
-
--upgrade-urls
¶
-
--skip-tqdm
¶
-
-v
,
--verbose
¶
Arguments
-
PATH
¶
Required argument
insert¶
Insert a graph to the database.
pybel insert [OPTIONS] path
Arguments
-
path
¶
Required argument
machine¶
Get content from the INDRA machine and upload to BEL Commons.
pybel machine [OPTIONS] [AGENTS]...
Options
-
--local
¶
Upload to local database.
-
--host
<host>
¶ URL of BEL Commons. Defaults to https://bel-commons.scai.fraunhofer.de
Arguments
-
AGENTS
¶
Optional argument(s)
manage¶
Manage the database.
pybel manage [OPTIONS] COMMAND [ARGS]...
drop¶
Drop the database.
pybel manage drop [OPTIONS]
Options
-
--yes
¶
Confirm the action without prompting.
namespaces¶
Manage namespaces.
pybel manage namespaces [OPTIONS] COMMAND [ARGS]...
drop¶
Drop a namespace by URL.
pybel manage namespaces drop [OPTIONS] URL
Arguments
-
URL
¶
Required argument
networks¶
Manage networks.
pybel manage networks [OPTIONS] COMMAND [ARGS]...
neo¶
Upload to neo4j.
pybel neo [OPTIONS] path
Options
-
--connection
<connection>
¶ Connection string for neo4j upload.
-
--password
<password>
¶
Arguments
-
path
¶
Required argument
post¶
Upload a graph to BEL Commons.
pybel post [OPTIONS] path
Options
-
--host
<host>
¶ URL of BEL Commons. Defaults to https://bel-commons.scai.fraunhofer.de
Arguments
-
path
¶
Required argument
serialize¶
Serialize a graph to various formats.
pybel serialize [OPTIONS] path
Options
-
--tsv
<tsv>
¶ Path to output a TSV file.
-
--edgelist
<edgelist>
¶ Path to output a edgelist file.
-
--sif
<sif>
¶ Path to output an SIF file.
-
--gsea
<gsea>
¶ Path to output a GRP file for gene set enrichment analysis.
-
--graphml
<graphml>
¶ Path to output a GraphML file. Use .graphml for Cytoscape.
-
--nodelink
<nodelink>
¶ Path to output a node-link JSON file.
-
--bel
<bel>
¶ Output canonical BEL.
Arguments
-
path
¶
Required argument
Constants¶
Constants for PyBEL.
This module maintains the strings used throughout the PyBEL codebase to promote consistency.
-
pybel.constants.
get_cache_connection
()[source]¶ Get the preferred RFC-1738 database connection string.
Check the environment variable
PYBEL_CONNECTION
Check the
PYBEL_CONNECTION
key in the config file~/.config/pybel/config.json
. Optionally, this config file might be in a different place if the environment variablePYBEL_CONFIG_DIRECTORY
has been set.Return a default connection string using a SQLite database in the
~/.pybel
. Optionally, this directory might be in a different place if the environment variablePYBEL_RESOURCE_DIRECTORY
has been set.
- Return type
-
pybel.constants.
BEL_DEFAULT_NAMESPACE
= 'bel'¶ The default namespace given to entities in the BEL language
-
pybel.constants.
CITATION_TYPES
= {'Book': None, 'DOI': 'doi', 'Journal': None, 'Online Resource': None, 'Other': None, 'PubMed': 'pmid', 'PubMed Central': 'pmc', 'URL': None}¶ The valid citation types .. seealso:: https://wiki.openbel.org/display/BELNA/Citation
-
pybel.constants.
NAMESPACE_DOMAIN_TYPES
= {'BiologicalProcess', 'Chemical', 'Gene and Gene Products', 'Other'}¶ The valid namespace types .. seealso:: https://wiki.openbel.org/display/BELNA/Custom+Namespaces
-
pybel.constants.
CITATION_DB
= 'db'¶ Represents the key for the citation type in a citation dictionary
-
pybel.constants.
CITATION_IDENTIFIER
= 'db_id'¶ Represents the key for the citation reference in a citation dictionary
-
pybel.constants.
CITATION_DB_NAME
= 'db_name'¶ Represents the key for the optional PyBEL citation title entry in a citation dictionary
-
pybel.constants.
CITATION_DATE
= 'date'¶ Represents the key for the citation date in a citation dictionary
-
pybel.constants.
CITATION_AUTHORS
= 'authors'¶ Represents the key for the citation authors in a citation dictionary
-
pybel.constants.
CITATION_JOURNAL
= 'db_name'¶ Represents the key for the citation comment in a citation dictionary
-
pybel.constants.
CITATION_VOLUME
= 'volume'¶ Represents the key for the optional PyBEL citation volume entry in a citation dictionary
-
pybel.constants.
CITATION_ISSUE
= 'issue'¶ Represents the key for the optional PyBEL citation issue entry in a citation dictionary
-
pybel.constants.
CITATION_PAGES
= 'pages'¶ Represents the key for the optional PyBEL citation pages entry in a citation dictionary
-
pybel.constants.
CITATION_FIRST_AUTHOR
= 'first'¶ Represents the key for the optional PyBEL citation first author entry in a citation dictionary
-
pybel.constants.
CITATION_LAST_AUTHOR
= 'last'¶ Represents the key for the optional PyBEL citation last author entry in a citation dictionary
-
pybel.constants.
FUNCTION
= 'function'¶ The node data key specifying the node’s function (e.g.
GENE
,MIRNA
,BIOPROCESS
, etc.)
-
pybel.constants.
CONCEPT
= 'concept'¶ The key specifying a concept
-
pybel.constants.
NAMESPACE
= 'namespace'¶ The key specifying an identifier dictionary’s namespace. Used for nodes, activities, and transformations.
-
pybel.constants.
NAME
= 'name'¶ The key specifying an identifier dictionary’s name. Used for nodes, activities, and transformations.
-
pybel.constants.
IDENTIFIER
= 'identifier'¶ The key specifying an identifier dictionary
-
pybel.constants.
LABEL
= 'label'¶ The key specifying an optional label for the node
-
pybel.constants.
DESCRIPTION
= 'description'¶ The key specifying an optional description for the node
-
pybel.constants.
XREFS
= 'xref'¶ The key specifying xrefs
-
pybel.constants.
MEMBERS
= 'members'¶ They key representing the nodes that are a member of a composite or complex
-
pybel.constants.
REACTANTS
= 'reactants'¶ The key representing the nodes appearing in the reactant side of a biochemical reaction
-
pybel.constants.
PRODUCTS
= 'products'¶ The key representing the nodes appearing in the product side of a biochemical reaction
-
pybel.constants.
PARTNER_3P
= 'partner_3p'¶ The key specifying the identifier dictionary of the fusion’s 3-Prime partner
-
pybel.constants.
PARTNER_5P
= 'partner_5p'¶ The key specifying the identifier dictionary of the fusion’s 5-Prime partner
-
pybel.constants.
RANGE_3P
= 'range_3p'¶ The key specifying the range dictionary of the fusion’s 3-Prime partner
-
pybel.constants.
RANGE_5P
= 'range_5p'¶ The key specifying the range dictionary of the fusion’s 5-Prime partner
-
pybel.constants.
VARIANTS
= 'variants'¶ The key specifying the node has a list of associated variants
-
pybel.constants.
KIND
= 'kind'¶ The key representing what kind of variation is being represented
-
pybel.constants.
PYBEL_NODE_DATA_KEYS
= {'function', 'fusion', 'identifier', 'members', 'name', 'namespace', 'products', 'reactants', 'variants'}¶ The group of all BEL-provided keys for node data dictionaries, used for hashing.
-
pybel.constants.
DIRTY
= 'dirty'¶ Used as a namespace when none is given when lenient parsing mode is turned on. Not recommended!
-
pybel.constants.
ABUNDANCE
= 'Abundance'¶ Represents the BEL abundance, abundance()
-
pybel.constants.
GENE
= 'Gene'¶ Represents the BEL abundance, geneAbundance() .. seealso:: http://openbel.org/language/version_2.0/bel_specification_version_2.0.html#Xabundancea
-
pybel.constants.
RNA
= 'RNA'¶ Represents the BEL abundance, rnaAbundance()
-
pybel.constants.
MIRNA
= 'miRNA'¶ Represents the BEL abundance, microRNAAbundance()
-
pybel.constants.
PROTEIN
= 'Protein'¶ Represents the BEL abundance, proteinAbundance()
-
pybel.constants.
BIOPROCESS
= 'BiologicalProcess'¶ Represents the BEL function, biologicalProcess()
-
pybel.constants.
PATHOLOGY
= 'Pathology'¶ Represents the BEL function, pathology()
-
pybel.constants.
POPULATION
= 'Population'¶ Represents the BEL function, populationAbundance()
-
pybel.constants.
COMPOSITE
= 'Composite'¶ Represents the BEL abundance, compositeAbundance()
-
pybel.constants.
COMPLEX
= 'Complex'¶ Represents the BEL abundance, complexAbundance()
-
pybel.constants.
REACTION
= 'Reaction'¶ Represents the BEL transformation, reaction()
-
pybel.constants.
PYBEL_NODE_FUNCTIONS
= {'Abundance', 'BiologicalProcess', 'Complex', 'Composite', 'Gene', 'Pathology', 'Population', 'Protein', 'RNA', 'Reaction', 'miRNA'}¶ A set of all of the valid PyBEL node functions
-
pybel.constants.
rev_abundance_labels
= {'Abundance': 'a', 'BiologicalProcess': 'bp', 'Complex': 'complex', 'Composite': 'composite', 'Gene': 'g', 'Pathology': 'path', 'Population': 'pop', 'Protein': 'p', 'RNA': 'r', 'miRNA': 'm'}¶ The mapping from PyBEL node functions to BEL strings
-
pybel.constants.
RELATION
= 'relation'¶ The key for an internal edge data dictionary for the relation string
-
pybel.constants.
CITATION
= 'citation'¶ The key for an internal edge data dictionary for the citation dictionary
-
pybel.constants.
EVIDENCE
= 'evidence'¶ The key for an internal edge data dictionary for the evidence string
-
pybel.constants.
ANNOTATIONS
= 'annotations'¶ The key for an internal edge data dictionary for the annotations dictionary
-
pybel.constants.
SUBJECT
= 'subject'¶ The key for an internal edge data dictionary for the subject modifier dictionary
-
pybel.constants.
OBJECT
= 'object'¶ The key for an internal edge data dictionary for the object modifier dictionary
-
pybel.constants.
LINE
= 'line'¶ The key or an internal edge data dictionary for the line number
-
pybel.constants.
HASH
= 'hash'¶ The key representing the hash of the other
-
pybel.constants.
PYBEL_EDGE_DATA_KEYS
= {'annotations', 'citation', 'evidence', 'object', 'relation', 'subject'}¶ The group of all BEL-provided keys for edge data dictionaries, used for hashing.
-
pybel.constants.
PYBEL_EDGE_METADATA_KEYS
= {'hash', 'line'}¶ The group of all PyBEL-specific keys for edge data dictionaries, not used for hashing.
-
pybel.constants.
PYBEL_EDGE_ALL_KEYS
= {'annotations', 'citation', 'evidence', 'hash', 'line', 'object', 'relation', 'subject'}¶ The group of all PyBEL annotated keys for edge data dictionaries
-
pybel.constants.
HAS_REACTANT
= 'hasReactant'¶ A BEL relationship
-
pybel.constants.
HAS_PRODUCT
= 'hasProduct'¶ A BEL relationship
-
pybel.constants.
HAS_VARIANT
= 'hasVariant'¶ A BEL relationship
-
pybel.constants.
TRANSCRIBED_TO
= 'transcribedTo'¶
-
pybel.constants.
TRANSLATED_TO
= 'translatedTo'¶
-
pybel.constants.
INCREASES
= 'increases'¶ A BEL relationship
-
pybel.constants.
DIRECTLY_INCREASES
= 'directlyIncreases'¶ A BEL relationship
-
pybel.constants.
DECREASES
= 'decreases'¶ A BEL relationship
-
pybel.constants.
DIRECTLY_DECREASES
= 'directlyDecreases'¶ A BEL relationship
-
pybel.constants.
CAUSES_NO_CHANGE
= 'causesNoChange'¶ A BEL relationship
-
pybel.constants.
REGULATES
= 'regulates'¶ A BEL relationship
-
pybel.constants.
BINDS
= 'binds'¶ A BEL relationship
-
pybel.constants.
CORRELATION
= 'correlation'¶ A BEL relationship
-
pybel.constants.
NO_CORRELATION
= 'noCorrelation'¶ A BEL relationship
-
pybel.constants.
NEGATIVE_CORRELATION
= 'negativeCorrelation'¶ A BEL relationship
-
pybel.constants.
POSITIVE_CORRELATION
= 'positiveCorrelation'¶ A BEL relationship
-
pybel.constants.
ASSOCIATION
= 'association'¶ A BEL relationship
-
pybel.constants.
ORTHOLOGOUS
= 'orthologous'¶ A BEL relationship
-
pybel.constants.
ANALOGOUS_TO
= 'analogousTo'¶ A BEL relationship
-
pybel.constants.
IS_A
= 'isA'¶ A BEL relationship
-
pybel.constants.
RATE_LIMITING_STEP_OF
= 'rateLimitingStepOf'¶ A BEL relationship
-
pybel.constants.
SUBPROCESS_OF
= 'subProcessOf'¶ A BEL relationship
-
pybel.constants.
BIOMARKER_FOR
= 'biomarkerFor'¶ A BEL relationship
-
pybel.constants.
PROGONSTIC_BIOMARKER_FOR
= 'prognosticBiomarkerFor'¶ A BEL relationship
-
pybel.constants.
EQUIVALENT_TO
= 'equivalentTo'¶ A BEL relationship, added by PyBEL
-
pybel.constants.
PART_OF
= 'partOf'¶ A BEL relationship, added by PyBEL
-
pybel.constants.
CAUSAL_INCREASE_RELATIONS
= {'directlyIncreases', 'increases'}¶ A set of all causal relationships that have an increasing effect
-
pybel.constants.
CAUSAL_DECREASE_RELATIONS
= {'decreases', 'directlyDecreases'}¶ A set of all causal relationships that have a decreasing effect
-
pybel.constants.
DIRECT_CAUSAL_RELATIONS
= {'directlyDecreases', 'directlyIncreases'}¶ A set of direct causal relations
-
pybel.constants.
INDIRECT_CAUSAL_RELATIONS
= {'decreases', 'increases', 'regulates'}¶ A set of direct causal relations
-
pybel.constants.
CAUSAL_POLAR_RELATIONS
= {'decreases', 'directlyDecreases', 'directlyIncreases', 'increases'}¶ A set of causal relationships that are polar
-
pybel.constants.
CAUSAL_RELATIONS
= {'decreases', 'directlyDecreases', 'directlyIncreases', 'increases', 'regulates'}¶ A set of all causal relationships
-
pybel.constants.
CORRELATIVE_RELATIONS
= {'correlation', 'negativeCorrelation', 'noCorrelation', 'positiveCorrelation'}¶ A set of all correlative relationships
-
pybel.constants.
POLAR_RELATIONS
= {'decreases', 'directlyDecreases', 'directlyIncreases', 'increases', 'negativeCorrelation', 'positiveCorrelation'}¶ A set of polar relations
-
pybel.constants.
TWO_WAY_RELATIONS
= {'analogousTo', 'association', 'binds', 'correlation', 'equivalentTo', 'negativeCorrelation', 'noCorrelation', 'orthologous', 'positiveCorrelation'}¶ A set of all relationships that are inherently directionless, and are therefore added to the graph twice
-
pybel.constants.
UNQUALIFIED_EDGES
= {'equivalentTo', 'hasProduct', 'hasReactant', 'hasVariant', 'isA', 'orthologous', 'partOf', 'transcribedTo', 'translatedTo'}¶ A list of relationship types that don’t require annotations or evidence
-
pybel.constants.
GRAPH_METADATA
= 'document_metadata'¶ The key for the document metadata dictionary. Can be accessed by
graph.graph[GRAPH_METADATA]
, or by using the property built in to thepybel.BELGraph
,pybel.BELGraph.document()
-
pybel.constants.
METADATA_NAME
= 'name'¶ The key for the document name. Can be accessed by
graph.document[METADATA_NAME]
or by using the property built into thepybel.BELGraph
class,pybel.BELGraph.name()
-
pybel.constants.
METADATA_VERSION
= 'version'¶ The key for the document version. Can be accessed by
graph.document[METADATA_VERSION]
-
pybel.constants.
METADATA_DESCRIPTION
= 'description'¶ The key for the document description. Can be accessed by
graph.document[METADATA_DESCRIPTION]
-
pybel.constants.
METADATA_AUTHORS
= 'authors'¶ The key for the document authors. Can be accessed by
graph.document[METADATA_NAME]
-
pybel.constants.
METADATA_CONTACT
= 'contact'¶ The key for the document contact email. Can be accessed by
graph.document[METADATA_CONTACT]
-
pybel.constants.
METADATA_LICENSES
= 'licenses'¶ The key for the document licenses. Can be accessed by
graph.document[METADATA_LICENSES]
-
pybel.constants.
METADATA_COPYRIGHT
= 'copyright'¶ The key for the document copyright information. Can be accessed by
graph.document[METADATA_COPYRIGHT]
-
pybel.constants.
METADATA_DISCLAIMER
= 'disclaimer'¶ The key for the document disclaimer. Can be accessed by
graph.document[METADATA_DISCLAIMER]
-
pybel.constants.
METADATA_PROJECT
= 'project'¶ The key for the document project. Can be accessed by
graph.document[METADATA_PROJECT]
-
pybel.constants.
DOCUMENT_KEYS
= {'Authors': 'authors', 'ContactInfo': 'contact', 'Copyright': 'copyright', 'Description': 'description', 'Disclaimer': 'disclaimer', 'Licenses': 'licenses', 'Name': 'name', 'Project': 'project', 'Version': 'version'}¶ Provides a mapping from BEL language keywords to internal PyBEL strings
-
pybel.constants.
METADATA_INSERT_KEYS
= {'authors', 'contact', 'copyright', 'description', 'disclaimer', 'licenses', 'name', 'version'}¶ The keys to use when inserting a graph to the cache
-
pybel.constants.
INVERSE_DOCUMENT_KEYS
= {'authors': 'Authors', 'contact': 'ContactInfo', 'copyright': 'Copyright', 'description': 'Description', 'disclaimer': 'Disclaimer', 'licenses': 'Licenses', 'name': 'Name', 'project': 'Project', 'version': 'Version'}¶ Provides a mapping from internal PyBEL strings to BEL language keywords. Is the inverse of
DOCUMENT_KEYS
-
pybel.constants.
REQUIRED_METADATA
= {'authors', 'contact', 'description', 'name', 'version'}¶ A set representing the required metadata during BEL document parsing
-
pybel.constants.
FRAGMENT_START
= 'start'¶ The key for the starting position of a fragment range
-
pybel.constants.
FRAGMENT_STOP
= 'stop'¶ The key for the stopping position of a fragment range
-
pybel.constants.
FRAGMENT_MISSING
= 'missing'¶ The key signifying that there is neither a start nor stop position defined
-
pybel.constants.
FRAGMENT_DESCRIPTION
= 'description'¶ The key for any additional descriptive data about a fragment
-
pybel.constants.
GMOD_ORDER
= ['kind', 'identifier']¶ The order for serializing gene modification data
-
pybel.constants.
GSUB_REFERENCE
= 'reference'¶ The key for the reference nucleotide in a gene substitution. Only used during parsing since this is converted to HGVS.
-
pybel.constants.
GSUB_POSITION
= 'position'¶ The key for the position of a gene substitution. Only used during parsing since this is converted to HGVS
-
pybel.constants.
GSUB_VARIANT
= 'variant'¶ The key for the effect of a gene substitution. Only used during parsing since this is converted to HGVS
-
pybel.constants.
PMOD_CODE
= 'code'¶ The key for the protein modification code.
-
pybel.constants.
PMOD_POSITION
= 'pos'¶ The key for the protein modification position.
-
pybel.constants.
PMOD_ORDER
= ['kind', 'identifier', 'code', 'pos']¶ The order for serializing information about a protein modification
-
pybel.constants.
PSUB_REFERENCE
= 'reference'¶ The key for the reference amino acid in a protein substitution. Only used during parsing since this is concerted to HGVS
-
pybel.constants.
PSUB_POSITION
= 'position'¶ The key for the position of a protein substitution. Only used during parsing since this is converted to HGVS.
-
pybel.constants.
PSUB_VARIANT
= 'variant'¶ The key for the variant of a protein substitution.Only used during parsing since this is converted to HGVS.
-
pybel.constants.
TRUNCATION_POSITION
= 'position'¶ The key for the position at which a protein is truncated
-
pybel.constants.
belns_encodings
= {'A': {'Abundance', 'Complex', 'Gene', 'Protein', 'RNA', 'miRNA'}, 'B': {'BiologicalProcess', 'Pathology'}, 'C': {'Complex'}, 'G': {'Gene'}, 'M': {'miRNA'}, 'O': {'Pathology'}, 'P': {'Protein'}, 'R': {'RNA', 'miRNA'}}¶ The mapping from BEL namespace codes to PyBEL internal abundance constants ..seealso:: https://wiki.openbel.org/display/BELNA/Assignment+of+Encoding+%28Allowed+Functions%29+for+BEL+Namespaces
-
pybel.constants.
DEFAULT_SERVICE_URL
= 'https://bel-commons.scai.fraunhofer.de'¶ The default location of PyBEL Web
Language constants for BEL.
This module contains mappings between PyBEL’s internal constants and BEL language keywords.
-
class
pybel.language.
Entity
(*, namespace, name=None, identifier=None)[source]¶ Represents a named entity with a namespace and name/identifier.
Create a dictionary representing a reference to an entity.
- Parameters
-
pybel.language.
activity_labels
= {'cat': 'cat', 'catalyticActivity': 'cat', 'chap': 'chap', 'chaperoneActivity': 'chap', 'gap': 'gap', 'gef': 'gef', 'gtp': 'gtp', 'gtpBoundActivity': 'gtp', 'gtpaseActivatingProteinActivity': 'gap', 'guanineNucleotideExchangeFactorActivity': 'gef', 'kin': 'kin', 'kinaseActivity': 'kin', 'molecularActivity': 'molecularActivity', 'pep': 'pep', 'peptidaseActivity': 'pep', 'phos': 'phos', 'phosphataseActivity': 'phos', 'ribo': 'ribo', 'ribosylationActivity': 'ribo', 'tport': 'tport', 'transcriptionalActivity': 'tscript', 'transportActivity': 'tport', 'tscript': 'tscript'}¶ A dictionary of activity labels used in the ma() function in activity(p(X), ma(Y))
-
pybel.language.
activity_mapping
= {'cat': {'identifier': '0003824', 'name': 'catalytic activity', 'namespace': 'go'}, 'chap': {'identifier': '0044183', 'name': 'protein binding involved in protein folding', 'namespace': 'go'}, 'gap': {'identifier': '0032794', 'name': 'GTPase activating protein binding', 'namespace': 'go'}, 'gef': {'identifier': '0005085', 'name': 'guanyl-nucleotide exchange factor activity', 'namespace': 'go'}, 'gtp': {'identifier': '0005525', 'name': 'GTP binding', 'namespace': 'go'}, 'kin': {'identifier': '0016301', 'name': 'kinase activity', 'namespace': 'go'}, 'molecularActivity': {'identifier': '0003674', 'name': 'molecular_function', 'namespace': 'go'}, 'pep': {'identifier': '0008233', 'name': 'peptidase activity', 'namespace': 'go'}, 'phos': {'identifier': '0016791', 'name': 'phosphatase activity', 'namespace': 'go'}, 'ribo': {'identifier': '0003956', 'name': 'NAD(P)+-protein-arginine ADP-ribosyltransferase activity', 'namespace': 'go'}, 'tport': {'identifier': '0005215', 'name': 'transporter activity', 'namespace': 'go'}, 'tscript': {'identifier': '0001071', 'name': 'nucleic acid binding transcription factor activity', 'namespace': 'go'}}¶ Maps the default BEL molecular activities to Gene Ontology Molecular Functions
-
pybel.language.
compartment_mapping
= {'cell surface': {'identifier': '0009986', 'name': 'cell surface', 'namespace': 'go'}, 'cytoplasm': {'identifier': '0005737', 'name': 'cytoplasm', 'namespace': 'go'}, 'extracellular space': {'identifier': '0005615', 'name': 'extracellular space', 'namespace': 'go'}, 'intracellular': {'identifier': '0005622', 'name': 'intracellular', 'namespace': 'go'}, 'nucleus': {'identifier': '0005634', 'name': 'nucleus', 'namespace': 'go'}}¶ Maps the default BEL cellular components to Gene Ontology Cellular Components
-
pybel.language.
abundance_labels
= {'a': 'Abundance', 'abundance': 'Abundance', 'biologicalProcess': 'BiologicalProcess', 'bp': 'BiologicalProcess', 'complex': 'Complex', 'complexAbundance': 'Complex', 'composite': 'Composite', 'compositeAbundance': 'Composite', 'g': 'Gene', 'geneAbundance': 'Gene', 'm': 'miRNA', 'microRNAAbundance': 'miRNA', 'p': 'Protein', 'path': 'Pathology', 'pathology': 'Pathology', 'proteinAbundance': 'Protein', 'r': 'RNA', 'rnaAbundance': 'RNA'}¶ Provides a mapping from BEL terms to PyBEL internal constants
-
pybel.language.
abundance_sbo_mapping
= {'BiologicalProcess': {'identifier': '0000375', 'name': 'process', 'namespace': 'sbo'}, 'Complex': {'identifier': '0000297', 'name': 'protein complex', 'namespace': 'sbo'}, 'Gene': {'identifier': '0000243', 'name': 'gene', 'namespace': 'sbo'}, 'Pathology': {'identifier': '0000358', 'name': 'phenotype', 'namespace': 'sbo'}, 'RNA': {'identifier': '0000278', 'name': 'messenger RNA', 'namespace': 'sbo'}, 'miRNA': {'identifier': '0000316', 'name': 'microRNA', 'namespace': 'sbo'}}¶ Maps the BEL abundance types to the Systems Biology Ontology
-
pybel.language.
pmod_namespace
= {'ADP-ribosylation': 'ADPRib', 'ADPRib': 'ADPRib', 'Ac': 'Ac', 'Farn': 'Farn', 'Gerger': 'Gerger', 'Glyco': 'Glyco', 'Hy': 'Hy', 'ISG': 'ISG', 'ISG15-protein conjugation': 'ISG', 'ISGylation': 'ISG', 'Lysine 48-linked polyubiquitination': 'UbK48', 'Lysine 63-linked polyubiquitination': 'UbK63', 'Me': 'Me', 'Me1': 'Me1', 'Me2': 'Me2', 'Me3': 'Me3', 'Myr': 'Myr', 'N-linked glycosylation': 'NGlyco', 'NGlyco': 'NGlyco', 'NO': 'NO', 'Nedd': 'Nedd', 'Nitrosylation': 'NO', 'O-linked glycosylation': 'OGlyco', 'OGlyco': 'OGlyco', 'Ox': 'Ox', 'Palm': 'Palm', 'Ph': 'Ph', 'SUMOylation': 'Sumo', 'Sulf': 'Sulf', 'Sumo': 'Sumo', 'Ub': 'Ub', 'UbK48': 'UbK48', 'UbK63': 'UbK63', 'UbMono': 'UbMono', 'UbPoly': 'UbPoly', 'acetylation': 'Ac', 'adenosine diphosphoribosyl': 'ADPRib', 'di-methylation': 'Me2', 'dimethylation': 'Me2', 'farnesylation': 'Farn', 'geranylgeranylation': 'Gerger', 'glycosylation': 'Glyco', 'hydroxylation': 'Hy', 'methylation': 'Me', 'mono-methylation': 'Me1', 'monomethylation': 'Me1', 'monoubiquitination': 'UbMono', 'myristoylation': 'Myr', 'neddylation': 'Nedd', 'oxidation': 'Ox', 'palmitoylation': 'Palm', 'phosphorylation': 'Ph', 'polyubiquitination': 'UbPoly', 'sulfation': 'Sulf', 'sulfonation': 'sulfonation', 'sulfur addition': 'Sulf', 'sulphation': 'Sulf', 'sulphonation': 'sulfonation', 'sulphur addition': 'Sulf', 'tri-methylation': 'Me3', 'trimethylation': 'Me3', 'ubiquitination': 'Ub', 'ubiquitinylation': 'Ub', 'ubiquitylation': 'Ub'}¶ A dictionary of default protein modifications to their preferred value
-
pybel.language.
pmod_mappings
= {'ADPRib': {'synonyms': ['ADPRib', 'ADP-ribosylation', 'ADPRib', 'ADP-rybosylation', 'adenosine diphosphoribosyl'], 'xrefs': [{'namespace': 'go', 'name': 'protein ADP-ribosylation', 'identifier': '0006471'}, {'namespace': 'mod', 'name': 'adenosine diphosphoribosyl (ADP-ribosyl) modified residue', 'identifier': '00752'}, {'namespace': 'mop', 'name': 'adenosinediphosphoribosylation', 'identifier': '0000220'}]}, 'Ac': {'synonyms': ['Ac', 'acetylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein acetylation', 'identifier': '0006473'}, {'namespace': 'mod', 'name': 'acetylated residue', 'identifier': '00394'}, {'namespace': 'mop', 'name': 'acetylation', 'identifier': '0000030'}, {'namespace': 'sbo', 'name': 'acetylation', 'identifier': '0000215'}]}, 'Farn': {'synonyms': ['Farn', 'farnesylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein farnesylation', 'identifier': '0018343'}, {'namespace': 'mod', 'name': 'farnesylated residue', 'identifier': '00437'}, {'namespace': 'mop', 'name': 'farnesylation', 'identifier': '0000429'}]}, 'Gerger': {'synonyms': ['Gerger', 'geranylgeranylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein geranylgeranylation', 'identifier': '0018344'}, {'namespace': 'mod', 'name': 'geranylgeranylated residue ', 'identifier': '00441'}, {'namespace': 'mop', 'name': 'geranylgeranylation', 'identifier': '0000431'}]}, 'Glyco': {'synonyms': ['Glyco', 'glycosylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein glycosylation', 'identifier': '0006486'}, {'namespace': 'mod', 'name': 'glycosylated residue', 'identifier': '00693'}, {'namespace': 'mop', 'name': 'glycosylation', 'identifier': '0000162'}]}, 'Hy': {'synonyms': ['Hyhydroxylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein hydroxylation', 'identifier': '0018126'}, {'namespace': 'mod', 'name': 'hydroxylated residue', 'identifier': '00677'}, {'namespace': 'mop', 'name': 'hydroxylation', 'identifier': '0000673'}]}, 'ISG': {'activities': [{'namespace': 'go', 'name': 'ISG15 transferase activity', 'identifier': '0042296'}], 'synonyms': ['ISG', 'ISGylation', 'ISG15-protein conjugation'], 'xrefs': [{'namespace': 'go', 'name': 'ISG15-protein conjugation', 'identifier': '0032020'}]}, 'Me': {'synonyms': ['Me', 'methylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein methylation', 'identifier': '0006479'}, {'namespace': 'mod', 'name': 'methylated residue', 'identifier': '00427'}]}, 'Me1': {'is_a': ['Me'], 'synonyms': ['Me1', 'monomethylation', 'mono-methylation'], 'xrefs': [{'namespace': 'mod', 'name': 'monomethylated residue', 'identifier': '00599'}]}, 'Me2': {'is_a': ['Me'], 'synonyms': ['Me2', 'dimethylation', 'di-methylation'], 'xrefs': [{'namespace': 'mod', 'name': 'dimethylated residue', 'identifier': '00429'}]}, 'Me3': {'is_a': ['Me'], 'synonyms': ['Me3', 'trimethylation', 'tri-methylation'], 'xrefs': [{'namespace': 'mod', 'name': 'trimethylated residue', 'identifier': '00430'}]}, 'Myr': {'synonyms': ['Myr', 'myristoylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein myristoylation', 'identifier': '0018377'}, {'namespace': 'mod', 'name': 'myristoylated residue', 'identifier': '00438'}]}, 'NGlyco': {'is_a': ['Glyco'], 'synonyms': ['NGlyco', 'N-linked glycosylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein N-linked glycosylation', 'identifier': '0006487'}, {'namespace': 'mod', 'name': 'N-glycosylated residue', 'identifier': '00006'}, {'namespace': 'mop', 'name': 'N-glycosylation', 'identifier': '0002162'}]}, 'NO': {'synonyms': ['NO', 'Nitrosylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein nitrosylation', 'identifier': '0017014'}]}, 'Nedd': {'synonyms': ['Nedd', 'neddylation', 'RUB1-protein conjugation'], 'xrefs': [{'namespace': 'go', 'name': 'protein neddylation', 'identifier': '0045116'}, {'namespace': 'mod', 'name': 'neddylated lysine', 'identifier': '01150'}]}, 'OGlyco': {'is_a': ['Glyco'], 'synonyms': ['OGlyco', 'O-linked glycosylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein O-linked glycosylation', 'identifier': '0006493'}, {'namespace': 'mod', 'name': 'O-glycosylated residue', 'identifier': '00396'}, {'namespace': 'mop', 'name': 'O-glycosylation', 'identifier': '0003162'}]}, 'Ox': {'synonyms': ['Ox', 'oxidation'], 'xrefs': [{'namespace': 'go', 'name': 'protein oxidation', 'identifier': '0018158'}]}, 'Palm': {'synonyms': ['Palm', 'palmitoylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein palmitoylation', 'identifier': '0018345'}, {'namespace': 'mod', 'name': 'palmitoylated residue', 'identifier': '00440'}]}, 'Ph': {'synonyms': ['Ph', 'phosphorylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein phosphorylation', 'identifier': '0006468'}, {'namespace': 'mod', 'identifier': '00696'}]}, 'Sulf': {'synonyms': ['Sulf', 'sulfation', 'sulphation', 'sulfur addition', 'sulphur addition', 'sulfonation', 'sulphonation'], 'target': [{'namespace': 'chebi', 'name': 'sulfo group', 'identifier': '29922'}], 'xrefs': [{'namespace': 'go', 'name': 'protein sulfation', 'identifier': '0006477'}, {'namespace': 'mod', 'name': 'sulfated residue', 'identifier': '00695'}, {'namespace': 'mop', 'name': 'sulfonation', 'identifier': '0000559'}]}, 'Sumo': {'activities': [{'namespace': 'go', 'name': 'SUMO transferase activity', 'identifier': '0019789'}], 'synonyms': ['Sumo', 'SUMOylation', 'Sumoylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein sumoylation', 'identifier': '0016925'}, {'namespace': 'mod', 'name': 'sumoylated lysine', 'identifier': '01149'}]}, 'Ub': {'synonyms': ['Ub', 'ubiquitination', 'ubiquitinylation', 'ubiquitylation'], 'xrefs': [{'namespace': 'go', 'name': 'protein ubiquitination', 'identifier': '0016567'}, {'namespace': 'mod', 'name': 'ubiquitinylated lysine', 'identifier': '01148'}, {'namespace': 'sbo', 'name': 'ubiquitination', 'identifier': '0000224'}]}, 'UbK48': {'synonyms': ['UbK48', 'Lysine 48-linked polyubiquitination'], 'xrefs': [{'namespace': 'go', 'name': 'protein K48-linked ubiquitination', 'identifier': '0070936'}]}, 'UbK63': {'synonyms': ['UbK63', 'Lysine 63-linked polyubiquitination'], 'xrefs': [{'namespace': 'go', 'name': 'protein K63-linked ubiquitination', 'identifier': '0070534'}]}, 'UbMono': {'synonyms': ['UbMono', 'monoubiquitination'], 'xrefs': [{'namespace': 'go', 'name': 'protein monoubiquitination', 'identifier': '0006513'}]}, 'UbPoly': {'synonyms': ['UbPoly', 'polyubiquitination'], 'xrefs': [{'namespace': 'go', 'name': 'protein polyubiquitination', 'identifier': '0000209'}]}}¶ Use Gene Ontology children of go_0006464: “cellular protein modification process”
-
pybel.language.
pmod_legacy_labels
= {'A': 'Ac', 'F': 'Farn', 'G': 'Glyco', 'H': 'Hy', 'M': 'Me', 'O': 'Ox', 'P': 'Ph', 'R': 'ADPRib', 'S': 'Sumo', 'U': 'Ub'}¶ A dictionary of legacy (BEL 1.0) default namespace protein modifications to their BEL 2.0 preferred value
-
pybel.language.
gmod_namespace
= {'ADPRib': 'ADPRib', 'M': 'Me', 'Me': 'Me', 'methylation': 'Me'}¶ A dictionary of default gene modifications. This is a PyBEL variant to the BEL specification.
-
pybel.language.
gmod_mappings
= {'ADPRib': {'synonyms': ['ADPRib'], 'xrefs': [{'namespace': 'go', 'name': 'DNA ADP-ribosylation', 'identifier': '0030592'}]}, 'Me': {'synonyms': ['Me', 'M', 'methylation'], 'xrefs': [{'namespace': 'go', 'name': 'DNA methylation', 'identifier': '0006306'}]}}¶ Use Gene Ontology children of go:0006304 ! “DNA modification”
Parsers¶
This page is for users who want to squeeze the most bizarre possibilities out of PyBEL. Most users will not need this reference.
PyBEL makes extensive use of the PyParsing module. The code is organized to different modules to reflect the different faces ot the BEL language. These parsers support BEL 2.0 and have some backwards compatibility for rewriting BEL v1.0 statements as BEL v2.0. The biologist and bioinformatician using this software will likely never need to read this page, but a developer seeking to extend the language will be interested to see the inner workings of these parsers.
See: https://github.com/OpenBEL/language/blob/master/version_2.0/MIGRATE_BEL1_BEL2.md
BEL Parser¶
-
class
pybel.parser.parse_bel.
BELParser
(graph, namespace_to_term_to_encoding=None, namespace_to_pattern=None, annotation_to_term=None, annotation_to_pattern=None, annotation_to_local=None, allow_naked_names=False, disallow_nested=False, disallow_unqualified_translocations=False, citation_clearing=True, skip_validation=False, autostreamline=True, required_annotations=None)[source]¶ Build a parser backed by a given dictionary of namespaces.
Build a BEL parser.
- Parameters
graph (pybel.BELGraph) – The BEL Graph to use to store the network
namespace_to_term_to_encoding (
Optional
[Mapping
[str
,Mapping
[Tuple
[Optional
[str
],str
],str
]]]) – A dictionary of {namespace: {name: encoding}}. Delegated topybel.parser.parse_identifier.IdentifierParser
namespace_to_pattern (
Optional
[Mapping
[str
,Pattern
]]) – A dictionary of {namespace: regular expression strings}. Delegated topybel.parser.parse_identifier.IdentifierParser
annotation_to_term (
Optional
[Mapping
[str
,Set
[str
]]]) – A dictionary of {annotation: set of values}. Delegated topybel.parser.ControlParser
annotation_to_pattern (
Optional
[Mapping
[str
,Pattern
]]) – A dictionary of {annotation: regular expression strings}. Delegated topybel.parser.ControlParser
annotation_to_local (
Optional
[Mapping
[str
,Set
[str
]]]) – A dictionary of {annotation: set of values}. Delegated topybel.parser.ControlParser
allow_naked_names (
bool
) – If true, turn off naked namespace failures. Delegated topybel.parser.parse_identifier.IdentifierParser
disallow_nested (
bool
) – If true, turn on nested statement failures. Delegated topybel.parser.parse_identifier.IdentifierParser
disallow_unqualified_translocations (
bool
) – If true, allow translocations without TO and FROM clauses.citation_clearing (
bool
) – ShouldSET Citation
statements clear evidence and all annotations? Delegated topybel.parser.ControlParser
autostreamline (
bool
) – Should the parser be streamlined on instantiation?required_annotations (
Optional
[List
[str
]]) – Optional list of required annotations
-
gmod
= None¶ PyBEL BEL Specification variant
-
clear
()[source]¶ Clear the graph and all control parser data (current citation, annotations, and statement group).
-
handle_nested_relation
(line, position, tokens)[source]¶ Handle nested statements.
If
self.disallow_nested
is True, raises aNestedRelationWarning
.- Raises
NestedRelationWarning
-
check_function_semantics
(line, position, tokens)[source]¶ Raise an exception if the function used on the tokens is wrong.
- Raises
InvalidFunctionSemantic
- Return type
ParseResults
-
handle_term
(_, __, tokens)[source]¶ Handle BEL terms (the subject and object of BEL relations).
- Return type
ParseResults
-
handle_has_members
(_, __, tokens)[source]¶ Handle list relations like
p(X) hasMembers list(p(Y), p(Z), ...)
.- Return type
ParseResults
-
handle_has_components
(_, __, tokens)[source]¶ Handle list relations like
p(X) hasComponents list(p(Y), p(Z), ...)
.- Return type
ParseResults
-
handle_unqualified_relation
(_, __, tokens)[source]¶ Handle unqualified relations.
- Return type
ParseResults
-
handle_inverse_unqualified_relation
(_, __, tokens)[source]¶ Handle unqualified relations that should go reverse.
- Return type
ParseResults
-
pybel.io.line_utils.
parse_lines
(graph, lines, manager=None, disallow_nested=False, citation_clearing=True, use_tqdm=False, tqdm_kwargs=None, no_identifier_validation=False, disallow_unqualified_translocations=False, allow_redefinition=False, allow_definition_failures=False, allow_naked_names=False, required_annotations=None, upgrade_urls=False)[source]¶ Parse an iterable of lines into this graph.
Delegates to
parse_document()
,parse_definitions()
, andparse_statements()
.- Parameters
graph (
BELGraph
) – A BEL graphlines (
Iterable
[str
]) – An iterable over lines of BEL scriptmanager (
Optional
[Manager
]) – A PyBEL database managerdisallow_nested (
bool
) – If true, turns on nested statement failurescitation_clearing (
bool
) – ShouldSET Citation
statements clear evidence and all annotations? Delegated topybel.parser.ControlParser
use_tqdm (
bool
) – Usetqdm
to show a progress bar?tqdm_kwargs (
Optional
[Mapping
[str
,Any
]]) – Keywords to pass totqdm
disallow_unqualified_translocations (
bool
) – If true, allow translocations without TO and FROM clauses.required_annotations (
Optional
[List
[str
]]) – Annotations that are required for all statementsupgrade_urls (
bool
) – Automatically upgrade old namespace URLs. Defaults to false.
Warning
These options allow concessions for parsing BEL that is either WRONG or UNSCIENTIFIC. Use them at risk to reproducibility and validity of your results.
- Parameters
no_identifier_validation (
bool
) – If true, turns off namespace validationallow_naked_names (
bool
) – If true, turns off naked namespace failuresallow_redefinition (
bool
) – If true, doesn’t fail on second definition of same name or annotationallow_definition_failures (
bool
) – If true, allows parsing to continue if a terminology file download/parse fails
- Return type
None
Metadata Parser¶
-
class
pybel.parser.parse_metadata.
MetadataParser
(manager, namespace_to_term_to_encoding=None, namespace_to_pattern=None, annotation_to_term=None, annotation_to_pattern=None, annotation_to_local=None, default_namespace=None, allow_redefinition=False, skip_validation=False, upgrade_urls=False)[source]¶ A parser for the document and definitions section of a BEL document.
See also
BEL 1.0 Specification for the DEFINE keyword
Build a metadata parser.
- Parameters
manager – A cache manager
namespace_to_term_to_encoding (
Optional
[Mapping
[str
,Mapping
[Tuple
[Optional
[str
],str
],str
]]]) – An enumerated namespace mapping from {namespace keyword: {(identifier, name): encoding}}namespace_to_pattern (
Optional
[Mapping
[str
,Pattern
]]) – A regular expression namespace mapping from {namespace keyword: regex string}annotation_to_term (
Optional
[Mapping
[str
,Set
[str
]]]) – Enumerated annotation mapping from {annotation keyword: set of valid values}annotation_to_pattern (
Optional
[Mapping
[str
,Pattern
]]) – Regular expression annotation mapping from {annotation keyword: regex string}default_namespace (
Optional
[Set
[str
]]) – A set of strings that can be used without a namespaceskip_validation (
bool
) – If true, don’t download and cache namespaces/annotations
-
manager
= None¶ This metadata parser’s internal definition cache manager
-
namespace_to_term_to_encoding
= None¶ A dictionary of cached {namespace keyword: {(identifier, name): encoding}}
-
uncachable_namespaces
= None¶ A set of namespaces’s URLs that can’t be cached
-
namespace_to_pattern
= None¶ A dictionary of {namespace keyword: regular expression string}
-
default_namespace
= None¶ A set of names that can be used without a namespace
-
annotation_to_term
= None¶ A dictionary of cached {annotation keyword: set of values}
-
annotation_to_pattern
= None¶ A dictionary of {annotation keyword: regular expression string}
-
annotation_to_local
= None¶ A dictionary of cached {annotation keyword: set of values}
-
document_metadata
= None¶ A dictionary containing the document metadata
-
namespace_url_dict
= None¶ A dictionary from {namespace keyword: BEL namespace URL}
-
annotation_url_dict
= None¶ A dictionary from {annotation keyword: BEL annotation URL}
-
handle_document
(line, position, tokens)[source]¶ Handle statements like
SET DOCUMENT X = "Y"
.- Raises
InvalidMetadataException
- Raises
VersionFormatWarning
- Return type
ParseResults
-
raise_for_redefined_namespace
(line, position, namespace)[source]¶ Raise an exception if a namespace is already defined.
- Raises
RedefinedNamespaceError
- Return type
None
-
handle_namespace_url
(line, position, tokens)[source]¶ Handle statements like
DEFINE NAMESPACE X AS URL "Y"
.- Raises
RedefinedNamespaceError
- Raises
pybel.resources.exc.ResourceError
- Return type
ParseResults
-
handle_namespace_pattern
(line, position, tokens)[source]¶ Handle statements like
DEFINE NAMESPACE X AS PATTERN "Y"
.- Raises
RedefinedNamespaceError
- Return type
ParseResults
-
raise_for_redefined_annotation
(line, position, annotation)[source]¶ Raise an exception if the given annotation is already defined.
- Raises
RedefinedAnnotationError
- Return type
None
-
handle_annotations_url
(line, position, tokens)[source]¶ Handle statements like
DEFINE ANNOTATION X AS URL "Y"
.- Raises
RedefinedAnnotationError
- Return type
ParseResults
-
handle_annotation_list
(line, position, tokens)[source]¶ Handle statements like
DEFINE ANNOTATION X AS LIST {"Y","Z", ...}
.- Raises
RedefinedAnnotationError
- Return type
ParseResults
-
handle_annotation_pattern
(line, position, tokens)[source]¶ Handle statements like
DEFINE ANNOTATION X AS PATTERN "Y"
.- Raises
RedefinedAnnotationError
- Return type
ParseResults
-
has_enumerated_annotation
(annotation)[source]¶ Check if this annotation is defined by an enumeration.
- Return type
-
has_regex_annotation
(annotation)[source]¶ Check if this annotation is defined by a regular expression.
- Return type
-
has_local_annotation
(annotation)[source]¶ Check if this annotation is defined by an locally.
- Return type
-
has_enumerated_namespace
(namespace)[source]¶ Check if this namespace is defined by an enumeration.
- Return type
-
has_regex_namespace
(namespace)[source]¶ Check if this namespace is defined by a regular expression.
- Return type
Control Parser¶
-
class
pybel.parser.parse_control.
ControlParser
(annotation_to_term=None, annotation_to_pattern=None, annotation_to_local=None, citation_clearing=True, required_annotations=None)[source]¶ A parser for BEL control statements.
See also
BEL 1.0 specification on control records
Initialize the control statement parser.
- Parameters
annotation_to_term (
Optional
[Mapping
[str
,Set
[str
]]]) – A dictionary of {annotation: set of valid values} defined with URL for parsingannotation_to_pattern (
Optional
[Mapping
[str
,Pattern
]]) – A dictionary of {annotation: regular expression string}annotation_to_local (
Optional
[Mapping
[str
,Set
[str
]]]) – A dictionary of {annotation: set of valid values} for parsing defined with LISTcitation_clearing (
bool
) – ShouldSET Citation
statements clear evidence and all annotations?required_annotations (
Optional
[List
[str
]]) – Annotations that are required
-
has_enumerated_annotation
(annotation)[source]¶ Check if the annotation is defined as an enumeration.
- Return type
-
has_regex_annotation
(annotation)[source]¶ Check if the annotation is defined as a regular expression.
- Return type
-
raise_for_undefined_annotation
(line, position, annotation)[source]¶ Raise an exception if the annotation is not defined.
- Raises
UndefinedAnnotationWarning
- Return type
None
-
raise_for_invalid_annotation_value
(line, position, key, value)[source]¶ Raise an exception if the annotation is not defined.
- Raises
IllegalAnnotationValueWarning or MissingAnnotationRegexWarning
- Return type
None
-
raise_for_missing_citation
(line, position)[source]¶ Raise an exception if there is no citation present in the parser.
- Raises
MissingCitationException
- Return type
None
-
handle_annotation_key
(line, position, tokens)[source]¶ Handle an annotation key before parsing to validate that it’s either enumerated or as a regex.
- Raise
MissingCitationException or UndefinedAnnotationWarning
- Return type
ParseResults
-
handle_set_statement_group
(_, __, tokens)[source]¶ Handle a
SET STATEMENT_GROUP = "X"
statement.- Return type
ParseResults
-
handle_set_citation
(line, position, tokens)[source]¶ Handle a
SET Citation = {"X", "Y", "Z", ...}
statement.- Return type
ParseResults
-
handle_set_evidence
(_, __, tokens)[source]¶ Handle a
SET Evidence = ""
statement.- Return type
ParseResults
-
handle_set_command
(line, position, tokens)[source]¶ Handle a
SET X = "Y"
statement.- Return type
ParseResults
-
handle_set_command_list
(line, position, tokens)[source]¶ Handle a
SET X = {"Y", "Z", ...}
statement.- Return type
ParseResults
-
handle_unset_statement_group
(line, position, tokens)[source]¶ Unset the statement group, or raises an exception if it is not set.
- Raises
MissingAnnotationKeyWarning
- Return type
ParseResults
-
handle_unset_citation
(line, position, tokens)[source]¶ Unset the citation, or raise an exception if it is not set.
- Raises
MissingAnnotationKeyWarning
- Return type
ParseResults
-
handle_unset_evidence
(line, position, tokens)[source]¶ Unset the evidence, or throws an exception if it is not already set.
The value for
tokens[EVIDENCE]
corresponds to which alternate of SupportingText or Evidence was used in the BEL script.- Raises
MissingAnnotationKeyWarning
- Return type
ParseResults
-
validate_unset_command
(line, position, annotation)[source]¶ Raise an exception when trying to
UNSET X
ifX
is not already set.- Raises
MissingAnnotationKeyWarning
- Return type
None
-
handle_unset_command
(line, position, tokens)[source]¶ Handle an
UNSET X
statement or raises an exception if it is not already set.- Raises
MissingAnnotationKeyWarning
- Return type
ParseResults
-
handle_unset_list
(line, position, tokens)[source]¶ Handle
UNSET {A, B, ...}
or raises an exception of any of them are not present.Consider that all unsets are in peril if just one of them is wrong!
- Raises
MissingAnnotationKeyWarning
- Return type
ParseResults
Concept Parser¶
-
class
pybel.parser.parse_concept.
ConceptParser
(namespace_to_term_to_encoding=None, namespace_to_pattern=None, default_namespace=None, allow_naked_names=False, skip_validation=False)[source]¶ A parser for concepts in the form of
namespace:name
ornamespace:identifier!name
.Can be made more lenient when given a default namespace or enabling the use of naked names.
Initialize the concept parser.
- Parameters
namespace_to_term_to_encoding (
Optional
[Mapping
[str
,Mapping
[Tuple
[Optional
[str
],str
],str
]]]) – A dictionary of {namespace: {(identifier, name): encoding}}namespace_to_pattern (
Optional
[Mapping
[str
,Pattern
]]) – A dictionary of {namespace: regular expression string} to compiledefault_namespace (
Optional
[Set
[str
]]) – A set of strings that can be used without a namespaceallow_naked_names (
bool
) – If true, turn off naked namespace failures
-
has_enumerated_namespace
(namespace)[source]¶ Check that the namespace has been defined by an enumeration.
- Return type
-
has_regex_namespace
(namespace)[source]¶ Check that the namespace has been defined by a regular expression.
- Return type
-
raise_for_missing_namespace
(line, position, namespace, name)[source]¶ Raise an exception if the namespace is not defined.
- Return type
None
-
raise_for_missing_name
(line, position, namespace, name)[source]¶ Raise an exception if the namespace is not defined or if it does not validate the given name.
- Return type
None
-
handle_identifier_fqualified
(line, position, tokens)[source]¶ Handle parsing a qualified OBO-style identifier.
- Return type
ParseResults
-
handle_identifier_qualified
(line, position, tokens)[source]¶ Handle parsing a qualified identifier.
- Return type
ParseResults
-
handle_namespace_default
(line, position, tokens)[source]¶ Handle parsing an identifier for the default namespace.
- Return type
ParseResults
Sub-Parsers¶
Parsers for modifications to abundances.
Internal Domain Specific Language¶
PyBEL implements an internal domain-specific language (DSL).
This enables you to write BEL using Python scripts. Even better, you can programatically generate BEL using Python. See the Bio2BEL paper and repository for many examples.
Internally, the BEL parser converts BEL script into the BEL DSL
then adds it to a BEL graph object. When you iterate through
the pybel.BELGraph
, the nodes are instances of subclasses
of pybel.dsl.BaseEntity
.
Primitives¶
-
class
pybel.dsl.
Entity
(*, namespace, name=None, identifier=None)[source]¶ Represents a named entity with a namespace and name/identifier.
Create a dictionary representing a reference to an entity.
-
class
pybel.dsl.
BaseEntity
[source]¶ This is the superclass for all BEL terms.
A BEL term has three properties:
It has a type. Subclasses of this function should set the class variable
function
.It can be converted to BEL. Note, this is an abstract class, so all sub-classes must implement this functionality in
as_bel()
.It can be hashed, based on the BEL conversion
-
class
pybel.dsl.
BaseAbundance
(namespace, name=None, identifier=None, xrefs=None)[source]¶ The superclass for all named BEL terms.
A named BEL term has:
A type (taken care of by being a subclass of
BaseEntity
)A named
Entity
. Though this doesn’t directly inherit fromEntity
, it creates one internally using the namespace, identifier, and name. Ideally, both the identifier and name are given. If one is missing, it can be looked up withpybel.grounding.ground()
An optional list of xrefs, corresponding to the whole entity, not just the namespace/name. For example, the BEL term
p(HGNC:APP, frag(672_713)
could xref CHEBI:64647.
Build an abundance from a function, namespace, and a name and/or identifier.
-
class
pybel.dsl.
ListAbundance
(members)[source]¶ The superclass for all BEL terms defined by lists, as opposed to by names like in
BaseAbundance
.Build a list abundance node.
Named Entities¶
-
class
pybel.dsl.
Abundance
(namespace, name=None, identifier=None, xrefs=None)[source]¶ Builds an abundance node.
>>> from pybel.dsl import Abundance >>> Abundance(namespace='CHEBI', name='water')
Build an abundance from a function, namespace, and a name and/or identifier.
-
class
pybel.dsl.
BiologicalProcess
(namespace, name=None, identifier=None, xrefs=None)[source]¶ Builds a biological process node.
>>> from pybel.dsl import BiologicalProcess >>> BiologicalProcess(namespace='GO', name='apoptosis')
Build an abundance from a function, namespace, and a name and/or identifier.
-
class
pybel.dsl.
Pathology
(namespace, name=None, identifier=None, xrefs=None)[source]¶ Build a pathology node.
>>> from pybel.dsl import Pathology >>> Pathology(namespace='DO', name='Alzheimer Disease')
Build an abundance from a function, namespace, and a name and/or identifier.
Central Dogma¶
-
class
pybel.dsl.
CentralDogma
(namespace, name=None, identifier=None, xrefs=None, variants=None)[source]¶ The base class for “central dogma” abundances (i.e., genes, miRNAs, RNAs, and proteins).
Build a node for a gene, RNA, miRNA, or protein.
- Parameters
namespace (
str
) – The name of the database used to identify this entityname (
Optional
[str
]) – The database’s preferred name or label for this entityidentifier (
Optional
[str
]) – The database’s identifier for this entityxrefs (
Optional
[List
[Entity
]]) – Alternative database cross referencesvariants (
Union
[None
,Variant
,Iterable
[Variant
]]) – An optional variant or list of variants
-
class
pybel.dsl.
Gene
(namespace, name=None, identifier=None, xrefs=None, variants=None)[source]¶ Builds a gene node.
Build a node for a gene, RNA, miRNA, or protein.
- Parameters
namespace (
str
) – The name of the database used to identify this entityname (
Optional
[str
]) – The database’s preferred name or label for this entityidentifier (
Optional
[str
]) – The database’s identifier for this entityxrefs (
Optional
[List
[Entity
]]) – Alternative database cross referencesvariants (
Union
[None
,Variant
,Iterable
[Variant
]]) – An optional variant or list of variants
-
class
pybel.dsl.
Transcribable
(namespace, name=None, identifier=None, xrefs=None, variants=None)[source]¶ A base class for RNA and micro-RNA to share getting of their corresponding genes.
Build a node for a gene, RNA, miRNA, or protein.
- Parameters
namespace (
str
) – The name of the database used to identify this entityname (
Optional
[str
]) – The database’s preferred name or label for this entityidentifier (
Optional
[str
]) – The database’s identifier for this entityxrefs (
Optional
[List
[Entity
]]) – Alternative database cross referencesvariants (
Union
[None
,Variant
,Iterable
[Variant
]]) – An optional variant or list of variants
-
class
pybel.dsl.
Rna
(namespace, name=None, identifier=None, xrefs=None, variants=None)[source]¶ Builds an RNA node.
Example: AKT1 protein coding gene’s RNA:
>>> from pybel.dsl import Rna >>> Rna(namespace='HGNC', name='AKT1', identifier='391')
Non-coding RNAs can also be encoded such as U85:
>>> from pybel.dsl import Rna >>> Rna(namespace='SNORNABASE', identifier='SR0000073')
Build a node for a gene, RNA, miRNA, or protein.
- Parameters
namespace (
str
) – The name of the database used to identify this entityname (
Optional
[str
]) – The database’s preferred name or label for this entityidentifier (
Optional
[str
]) – The database’s identifier for this entityxrefs (
Optional
[List
[Entity
]]) – Alternative database cross referencesvariants (
Union
[None
,Variant
,Iterable
[Variant
]]) – An optional variant or list of variants
-
class
pybel.dsl.
MicroRna
(namespace, name=None, identifier=None, xrefs=None, variants=None)[source]¶ Represents an micro-RNA.
Human miRNA’s are listed on HUGO’s MicroRNAs (MIR) gene family.
MIR1-1 from HGNC:
>>> from pybel.dsl import MicroRna >>> MicroRna(namespace='HGNC', name='MIR1-1', identifier='31499')
MIR1-1 from miRBase:
>>> from pybel.dsl import MicroRna >>> MicroRna(namespace='MIRBASE', identifier='MI0000651')
MIR1-1 from Entrez Gene
>>> from pybel.dsl import MicroRna >>> MicroRna(namespace='ENTREZ', identifier='406904')
Build a node for a gene, RNA, miRNA, or protein.
- Parameters
namespace (
str
) – The name of the database used to identify this entityname (
Optional
[str
]) – The database’s preferred name or label for this entityidentifier (
Optional
[str
]) – The database’s identifier for this entityxrefs (
Optional
[List
[Entity
]]) – Alternative database cross referencesvariants (
Union
[None
,Variant
,Iterable
[Variant
]]) – An optional variant or list of variants
-
class
pybel.dsl.
Protein
(namespace, name=None, identifier=None, xrefs=None, variants=None)[source]¶ Builds a protein node.
Example: AKT
>>> from pybel.dsl import Protein >>> Protein(namespace='HGNC', name='AKT1')
Example: AKT with optionally included HGNC database identifier
>>> from pybel.dsl import Protein >>> Protein(namespace='HGNC', name='AKT1', identifier='391')
Example: AKT with phosphorylation
>>> from pybel.dsl import Protein, ProteinModification >>> Protein(namespace='HGNC', name='AKT', variants=[ProteinModification('Ph', code='Thr', position=308)])
Build a node for a gene, RNA, miRNA, or protein.
- Parameters
namespace (
str
) – The name of the database used to identify this entityname (
Optional
[str
]) – The database’s preferred name or label for this entityidentifier (
Optional
[str
]) – The database’s identifier for this entityxrefs (
Optional
[List
[Entity
]]) – Alternative database cross referencesvariants (
Union
[None
,Variant
,Iterable
[Variant
]]) – An optional variant or list of variants
Variants¶
-
class
pybel.dsl.
Variant
(kind)[source]¶ The superclass for variant dictionaries.
Build the variant data dictionary.
- Parameters
kind (
str
) – The kind of variant
-
class
pybel.dsl.
ProteinModification
(name, code=None, position=None, namespace=None, identifier=None, xrefs=None)[source]¶ Build a protein modification variant dictionary.
Build a protein modification variant data dictionary.
- Parameters
name (
str
) – The name of the modificationcode (
Optional
[str
]) – The three letter amino acid code for the affected residue. Capital first letter.position (
Optional
[int
]) – The position of the affected residuenamespace (
Optional
[str
]) – The namespace to which the name of this modification belongsidentifier (
Optional
[str
]) – The identifier of the name of the modification
Either the name or the identifier must be used. If the namespace is omitted, it is assumed that a name is specified from the BEL default namespace.
Example from BEL default namespace:
>>> from pybel.dsl import ProteinModification >>> ProteinModification('Ph', code='Thr', position=308)
Example from custom namespace:
>>> from pybel.dsl import ProteinModification >>> ProteinModification(name='protein phosphorylation', namespace='GO', code='Thr', position=308)
Example from custom namespace additionally qualified with identifier:
>>> from pybel.dsl import ProteinModification >>> ProteinModification(name='protein phosphorylation', namespace='GO', >>> identifier='0006468', code='Thr', position=308)
-
class
pybel.dsl.
GeneModification
(name, namespace=None, identifier=None, xrefs=None)[source]¶ Build a gene modification variant dictionary.
Either the name or the identifier must be used. If the namespace is omitted, it is assumed that a name is specified from the BEL default namespace.
Example from BEL default namespace:
>>> from pybel.dsl import GeneModification >>> GeneModification(name='Me')
Example from custom namespace:
>>> from pybel.dsl import GeneModification >>> GeneModification(name='DNA methylation', namespace='GO', identifier='0006306',)
Build a variant that has a reference.
- Parameters
Either the name or the identifier must be used. If the namespace is omitted, it is assumed that a name is specified from the BEL default namespace.
-
class
pybel.dsl.
Hgvs
(variant)[source]¶ Builds a HGVS variant dictionary.
Build an HGVS variant data dictionary.
- Parameters
variant (
str
) – The HGVS variant string
>>> from pybel.dsl import Protein, Hgvs >>> Protein(namespace='HGNC', name='AKT1', variants=[Hgvs('p.Ala127Tyr')])
-
class
pybel.dsl.
HgvsReference
[source]¶ Represents the “reference” variant in HGVS.
Build an HGVS variant data dictionary.
- Parameters
variant – The HGVS variant string
>>> from pybel.dsl import Protein, Hgvs >>> Protein(namespace='HGNC', name='AKT1', variants=[Hgvs('p.Ala127Tyr')])
-
class
pybel.dsl.
HgvsUnspecified
[source]¶ Represents an unspecified variant in HGVS.
Build an HGVS variant data dictionary.
- Parameters
variant – The HGVS variant string
>>> from pybel.dsl import Protein, Hgvs >>> Protein(namespace='HGNC', name='AKT1', variants=[Hgvs('p.Ala127Tyr')])
-
class
pybel.dsl.
ProteinSubstitution
(from_aa, position, to_aa)[source]¶ A protein substitution variant.
Build an HGVS variant data dictionary for the given protein substitution.
- Parameters
>>> from pybel.dsl import Protein, ProteinSubstitution >>> Protein(namespace='HGNC', name='AKT1', variants=[ProteinSubstitution('Ala', 127, 'Tyr')])
-
class
pybel.dsl.
Fragment
(start=None, stop=None, description=None)[source]¶ Represent the information about a protein fragment.
Build a protein fragment data dictionary.
- Parameters
Example of specified fragment:
>>> from pybel.dsl import Protein, Fragment >>> Protein(name='APP', namespace='HGNC', variants=[Fragment(start=672, stop=713)])
Example of unspecified fragment:
>>> from pybel.dsl import Protein, Fragment >>> Protein(name='APP', namespace='HGNC', variants=[Fragment()])
Fusions¶
-
class
pybel.dsl.
FusionBase
(partner_5p, partner_3p, range_5p=None, range_3p=None)[source]¶ The superclass for building fusion node data dictionaries.
Build a fusion node.
-
class
pybel.dsl.
GeneFusion
(partner_5p, partner_3p, range_5p=None, range_3p=None)[source]¶ Builds a gene fusion node.
Example, using fusion ranges with the ‘c’ qualifier
>>> from pybel.dsl import GeneFusion, Gene >>> GeneFusion( >>> ... partner_5p=Gene(namespace='HGNC', name='TMPRSS2'), >>> ... range_5p=EnumeratedFusionRange('c', 1, 79), >>> ... partner_3p=Gene(namespace='HGNC', name='ERG'), >>> ... range_3p=EnumeratedFusionRange('c', 312, 5034) >>> )
Example with missing fusion ranges:
>>> from pybel.dsl import GeneFusion, Gene >>> GeneFusion( >>> ... partner_5p=Gene(namespace='HGNC', name='TMPRSS2'), >>> ... partner_3p=Gene(namespace='HGNC', name='ERG'), >>> )
Build a fusion node.
-
class
pybel.dsl.
RnaFusion
(partner_5p, partner_3p, range_5p=None, range_3p=None)[source]¶ Builds an RNA fusion node.
Example, with fusion ranges using the ‘r’ qualifier:
>>> from pybel.dsl import RnaFusion, Rna >>> RnaFusion( >>> ... partner_5p=Rna(namespace='HGNC', name='TMPRSS2'), >>> ... range_5p=EnumeratedFusionRange('r', 1, 79), >>> ... partner_3p=Rna(namespace='HGNC', name='ERG'), >>> ... range_3p=EnumeratedFusionRange('r', 312, 5034) >>> )
Example with missing fusion ranges:
>>> from pybel.dsl import RnaFusion, Rna >>> RnaFusion( >>> ... partner_5p=Rna(namespace='HGNC', name='TMPRSS2'), >>> ... partner_3p=Rna(namespace='HGNC', name='ERG'), >>> )
Build a fusion node.
-
class
pybel.dsl.
ProteinFusion
(partner_5p, partner_3p, range_5p=None, range_3p=None)[source]¶ Builds a protein fusion node.
Build a fusion node.
Fusion Ranges¶
List Abundances¶
-
class
pybel.dsl.
ComplexAbundance
(members, namespace=None, name=None, identifier=None, xrefs=None)[source]¶ Build a complex abundance node with the optional ability to specify a name.
Build a complex list node.
- Parameters
members (
Iterable
[BaseAbundance
]) – A list of PyBEL node data dictionariesnamespace (
Optional
[str
]) – The namespace from which the name originatesidentifier (
Optional
[str
]) – The identifier in the namespace in which the name originatesxrefs (
Optional
[List
[Entity
]]) – Alternate identifiers for the entity if it is named
-
class
pybel.dsl.
CompositeAbundance
(members)[source]¶ Build a composite abundance node.
This node is effectively the “AND” inside BEL, which can help represent when two things need to be true at the same time. For example, in COVID 19, if both the NF-KB and IL6-STAT complex are present, then acute respiratory distress syndrome happens.
>>> from pybel.dsl import CompositeAbundance, ComplexAbundance, Protein, NamedComplexAbundance >>> CompositeAbundance([ ... NamedComplexAbundance('fplx', 'nfkb'), ... ComplexAbundance([ ... Protein('hgnc', identifier='6018', name='IL6'), ... Protein('hgnc', identifier='11364', name='STAT3'), ... ]), ... ])
Build a list abundance node.
Logging Messages¶
Errors¶
This module contains base exceptions that are shared through the package.
Parse Exceptions¶
Exceptions for the BEL parser.
A message for “General Parser Failure” is displayed when a problem was caused due to an unforeseen error. The line number and original statement are printed for the user to debug.
-
exception
pybel.parser.exc.
BELParserWarning
(line_number, line, position, *args)[source]¶ The base PyBEL parser exception, which holds the line and position where a parsing problem occurred.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
BELSyntaxError
(line_number, line, position, *args)[source]¶ For general syntax errors.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
InconsistentDefinitionError
(line_number, line, position, definition)[source]¶ Base PyBEL error for redefinition.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
RedefinedNamespaceError
(line_number, line, position, definition)[source]¶ Raised when a namespace is redefined.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
RedefinedAnnotationError
(line_number, line, position, definition)[source]¶ Raised when an annotation is redefined.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
NameWarning
(line_number, line, position, name, *args)[source]¶ The base class for errors related to nomenclature.
Build a warning wrapping a given name.
-
exception
pybel.parser.exc.
NakedNameWarning
(line_number, line, position, name, *args)[source]¶ Raised when there is an identifier without a namespace. Enable lenient mode to suppress.
Build a warning wrapping a given name.
-
exception
pybel.parser.exc.
MissingDefaultNameWarning
(line_number, line, position, name, *args)[source]¶ Raised if reference to value not in default namespace.
Build a warning wrapping a given name.
-
exception
pybel.parser.exc.
NamespaceIdentifierWarning
(line_number, line, position, namespace, name)[source]¶ The base class for warnings related to namespace:name identifiers.
Initialize the namespace identifier warning.
-
exception
pybel.parser.exc.
UndefinedNamespaceWarning
(line_number, line, position, namespace, name)[source]¶ Raised if reference made to undefined namespace.
Initialize the namespace identifier warning.
-
exception
pybel.parser.exc.
MissingNamespaceNameWarning
(line_number, line, position, namespace, name)[source]¶ Raised if reference to value not in namespace.
Initialize the namespace identifier warning.
-
exception
pybel.parser.exc.
MissingNamespaceRegexWarning
(line_number, line, position, namespace, name)[source]¶ Raised if reference not matching regex.
Initialize the namespace identifier warning.
-
exception
pybel.parser.exc.
AnnotationWarning
(line_number, line, position, annotation, *args)[source]¶ Base exception for annotation warnings.
Build an AnnotationWarning.
-
exception
pybel.parser.exc.
UndefinedAnnotationWarning
(line_number, line, position, annotation, *args)[source]¶ Raised when an undefined annotation is used.
Build an AnnotationWarning.
-
exception
pybel.parser.exc.
MissingAnnotationKeyWarning
(line_number, line, position, annotation, *args)[source]¶ Raised when trying to unset an annotation that is not set.
Build an AnnotationWarning.
-
exception
pybel.parser.exc.
AnnotationIdentifierWarning
(line_number, line, position, annotation, value)[source]¶ Base exception for annotation:value pairs.
Build an AnnotationWarning.
-
exception
pybel.parser.exc.
IllegalAnnotationValueWarning
(line_number, line, position, annotation, value)[source]¶ Raised when an annotation has a value that does not belong to the original set of valid annotation values.
Build an AnnotationWarning.
-
exception
pybel.parser.exc.
MissingAnnotationRegexWarning
(line_number, line, position, annotation, value)[source]¶ Raised if annotation doesn’t match regex.
Build an AnnotationWarning.
-
exception
pybel.parser.exc.
VersionFormatWarning
(line_number, line, position, version_string)[source]¶ Raised if the version string doesn’t adhere to semantic versioning or
YYYYMMDD
format.Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
MetadataException
(line_number, line, position, *args)[source]¶ Base exception for issues with document metadata.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
MalformedMetadataException
(line_number, line, position, *args)[source]¶ Raised when an invalid metadata line is encountered.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
InvalidMetadataException
(line_number, line, position, key, value)[source]¶ Raised when an incorrect document metadata key is used.
Hint
Valid document metadata keys are:
Authors
ContactInfo
Copyright
Description
Disclaimer
Licenses
Name
Version
See also
BEL specification on the properties section
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
MissingMetadataException
(line_number, line, position, key)[source]¶ Raised when a BEL Script is missing critical metadata.
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
InvalidCitationLengthException
(line_number, line, position, *args)[source]¶ Base exception raised when the format for a citation is wrong.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
CitationTooShortException
(line_number, line, position, *args)[source]¶ Raised when a citation does not have the minimum of {type, name, reference}.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
CitationTooLongException
(line_number, line, position, *args)[source]¶ Raised when a citation has more than the allowed entries, {type, name, reference, date, authors, comments}.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
MissingCitationException
(line_number, line, position, *args)[source]¶ Raised when trying to parse a BEL statement, but no citation is currently set.
This might be due to a previous error in the formatting of a citation.
Though it’s not a best practice, some BEL curators set other annotations before the citation. If this is the case in your BEL document, and you’re absolutely sure that all
UNSET
statements are correctly written, you can usecitation_clearing=True
as a keyword argument in any of the IO functions inpybel.from_lines()
,pybel.from_url()
, orpybel.from_path()
.Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
MissingSupportWarning
(line_number, line, position, *args)[source]¶ Raised when trying to parse a BEL statement, but no evidence is currently set.
All BEL statements must be qualified with evidence.
If your data is serialized from a database and provenance information is not readily accessible, consider referencing the publication for the database, or a url pointing to the data from either a programmatically or human-readable endpoint.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
MissingAnnotationWarning
(line_number, line, position, required_annotations)[source]¶ Raised when trying to parse a BEL statement and a required annotation is not present.
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
InvalidCitationType
(line_number, line, position, citation_type)[source]¶ Raised when a citation is set with an incorrect type.
Hint
Valid citation types include:
Book
PubMed
Journal
Online Resource
URL
DOI
Other
See also
OpenBEL wiki on citations
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
InvalidPubMedIdentifierWarning
(line_number, line, position, reference)[source]¶ Raised when a citation is set whose type is
PubMed
but whose database identifier is not a valid integer.Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
MalformedTranslocationWarning
(line_number, line, position, tokens)[source]¶ Raised when there is a translocation statement without location information.
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
PlaceholderAminoAcidWarning
(line_number, line, position, code)[source]¶ Raised when an invalid amino acid code is given.
One example might be the usage of X, which is a colloquial signifier for a truncation in a given position. Text mining efforts for knowledge extraction make this mistake often. X might also signify a placeholder amino acid.
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
NestedRelationWarning
(line_number, line, position, *args)[source]¶ Raised when encountering a nested statement.
See our the docs for an explanation of why we explicitly do not support nested statements.
Initialize the BEL parser warning.
-
exception
pybel.parser.exc.
InvalidEntity
(line_number, line, position, namespace, name)[source]¶ Raised when using a non-entity name for a name.
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
-
exception
pybel.parser.exc.
InvalidFunctionSemantic
(line_number, line, position, func, namespace, name, allowed_functions)[source]¶ Raised when an invalid function is used for a given node.
For example, an HGNC symbol for a protein-coding gene YFG cannot be referenced as an miRNA with
m(HGNC:YFG)
Initialize the BEL parser warning.
- Parameters
line_number – The line number on which this warning occurred
line – The content of the line
position – The position within the line where the warning occurred
References¶
If you find PyBEL useful for your work, please consider citing 1:
- 1
Hoyt, C. T., et al. (2017). PyBEL: a Computational Framework for Biological Expression Language. Bioinformatics, 34(December), 1–2.
Software using PyBEL¶
Roadmap¶
This project road map documents not only the PyBEL repository, but the PyBEL Tools and BEL Commons repositories as well as the Bio2BEL project.
PyBEL¶
- Performance improvements
Parallelization of parsing
On-the-fly validation with OLS or MIRIAM
Bio2BEL¶
- Generation of new namespaces, equivalencies, and hierarchical knowledge (isA and partOf relations)
FlyBase
InterPro (Done)
UniProt (Done)
Human Phenotype Ontology
Uber Anatomy Ontology
HGNC Gene Families (Done)
Enyzme Classification (Done)
- Integration of knowledge sources
ChEMBL
Comparative Toxicogenomics Database (Done)
BRENDA
MetaCyc
Protein complex definitions
Data2BEL¶
Integration of analytical pipelines to convert data to BEL
LD Block Analysis
Gene Co-expression Analysis
Differential Gene Expression Analysis
PyBEL Tools¶
- Biological Grammar
Network motif identification
Stability analysis
- Prior knowledge comparision
Molecular activity annotation
SNP Impact
- Implementation of standard BEL Algorithms
RCR
NPA
SST
- Development of new algorithms
Heat diffusion algorithms
Cart Before the Horse
Metapath analysis
Reasoning and inference rules
Subgraph Expansion application in NeuroMMSigDB
Chemical Enrichment in NeuroMMSigDB
BEL Commons¶
Integration with BELIEF
Integration with NeuroMMSigDB (Done)
Import and export from NDEx
Current Issues¶
Speed¶
Speed is still an issue, because documents above 100K lines still take a couple minutes to run. This issue is exacerbated by (optionally) logging output to the console, which can make it more than 3x or 4x as slow.
Namespaces¶
The default namespaces from OpenBEL do not follow a standard file format. They are similar to INI config files, but do not use consistent delimiters. Also, many of the namespaces don’t respect that the delimiter should not be used in the namespace names. There are also lots of names with strange characters, which may have been caused by copying from a data source that had specfic escape characters without proper care.
Testing¶
Testing was very difficult because the example documents on the OpenBEL website had many semantic errors, such as using names and annotation values that were not defined within their respective namespace and annotation definition files. They also contained syntax errors like naked names, which are not only syntatically incorrect, but lead to bad science; and improper usage of activities, like illegally nesting an activity within a composite statement.
Technology¶
This page is meant to describe the development stack for PyBEL, and should be a useful introduction for contributors.
Versioning¶
PyBEL is versioned on GitHub so changes in its code can be tracked over time and to make use of the variety of software development plugins. Code is produced following the Git Flow philosophy, which means that new features are coded in branches off of the development branch and merged after they are triaged. Finally, develop is merged into master for releases. If there are bugs in releases that need to be fixed quickly, “hot fix” branches from master can be made, then merged back to master and develop after fixing the problem.
Testing in PyBEL¶
PyBEL is written with extensive unit testing and integration testing. Whenever possible, test- driven development is practiced. This means that new ideas for functions and features are encoded as blank classes/functions and directly writing tests for the desired output. After tests have been written that define how the code should work, the implementation can be written.
Test-driven development requires us to think about design before making quick and dirty implementations. This results in better code. Additionally, thorough testing suites make it possible to catch when changes break existing functionality.
Tests are written with the standard unittest
library.
Unit Testing¶
Unit tests check that the functionality of the different parts of PyBEL work independently.
An example unit test can be found in tests.test_parse_bel.TestAbundance.test_short_abundance
. It ensures that
the parser is able to handle a given string describing the abundance of a chemical/other entity in BEL. It tests that
the parser produces the correct output, that the BEL statement is converted to the correct internal representation. In
this example, this is a tuple describing the abundance of oxygen atoms. Finally, it tests that this representation
is added as a node in the underlying BEL graph with the appropriate attributes added.
Integration Testing¶
Integration tests are more high level, and ensure that the software accomplishes more complicated goals by using many components. An example integration test is found in tests.test_import.TestImport.test_from_fileURL. This test ensures that a BEL script can be read and results in a NetworkX object that contains all of the information described in the script
Tox¶
While IDEs like PyCharm provide excellent testing tools, they are not programmatic.
Tox is python package that provides
a CLI interface to run automated testing procedures (as well as other build functions, that aren’t important to explain
here). In PyBEL, it is used to run the unit tests in the tests
folder with the pytest
harness. It also
runs check-manifest
, builds the documentation with sphinx
, and computes the code coverage of the tests.
The entire procedure is defined in tox.ini
. Tox also allows test to be done on many different versions of
Python.
Continuous Integration¶
Continuous integration is a philosophy of automatically testing code as it changes. PyBEL makes use of the Travis CI
server to perform testing because of its tight integration with GitHub. Travis automatically installs git hooks
inside GitHub so it knows when a new commit is made. Upon each commit, Travis downloads the newest commit from GitHub
and runs the tests configured in the .travis.yml
file in the top level of the PyBEL repository. This file
effectively instructs the Travis CI server to run Tox. It also allows for the modification of the environment variables.
This is used in PyBEL to test many different versions of python.
Code Coverage¶
After building, Travis sends code coverage results to codecov.io. This site helps visualize untested code and track the improvement of testing coverage over time. It also integrates with GitHub to show which feature branches are inadequately tested. In development of PyBEL, inadequately tested code is not allowed to be merged into develop.
Versioning¶
PyBEL uses semantic versioning. In general, the project’s version string will has a suffix -dev
like in
0.3.4-dev
throughout the development cycle. After code is merged from feature branches to develop and it is
time to deploy, this suffix is removed and develop branch is merged into master.
The version string appears in multiple places throughout the project, so BumpVersion is used to automate the updating of these version strings. See .bumpversion.cfg for more information.
Deployment¶
PyBEL is also distributed through PyPI (pronounced Py-Pee-Eye).
Travis CI has a wonderful integration with PyPI, so any time a tag is made on the master branch (and also assuming the
tests pass), a new distribution is packed and sent to PyPI. Refer to the “deploy” section at the bottom of the
.travis.yml
file for more information, or the Travis CI PyPI deployment documentation.
As a side note, Travis CI has an encryption tool so the password for the PyPI account can be displayed publicly
on GitHub. Travis decrypts it before performing the upload to PyPI.
Steps¶
bumpversion release
on development branchPush to git
After tests pass, merge develop in to master
After tests pass, create a tag on GitHub with the same name as the version number (on master)
Travis will automatically deploy to PyPI after tests pass. After checking deployment has been successful, switch to develop and
bumpversion patch