Summary

These scripts are designed to assist in the analysis of errors within BEL documents and provide some suggestions for fixes.

pybel_tools.summary.count_relations(graph)[source]

Return a histogram over all relationships in a graph.

Parameters

graph (pybel.BELGraph) – A BEL graph

Returns

A Counter from {relation type: frequency}

Return type

collections.Counter

pybel_tools.summary.get_edge_relations(graph)[source]

Build a dictionary of {node pair: set of edge types}.

Return type

Mapping[Tuple[BaseEntity, BaseEntity], Set[str]]

pybel_tools.summary.count_unique_relations(graph)[source]

Return a histogram of the different types of relations present in a graph.

Note: this operation only counts each type of edge once for each pair of nodes

Return type

Counter

pybel_tools.summary.count_annotations(graph)[source]

Count how many times each annotation is used in the graph.

Parameters

graph (pybel.BELGraph) – A BEL graph

Returns

A Counter from {annotation key: frequency}

Return type

collections.Counter

pybel_tools.summary.get_annotations(graph)[source]

Get the set of annotations used in the graph.

Parameters

graph (pybel.BELGraph) – A BEL graph

Returns

A set of annotation keys

Return type

set[str]

pybel_tools.summary.get_annotations_containing_keyword(graph, keyword)[source]

Get annotation/value pairs for values for whom the search string is a substring

Parameters
  • graph (BELGraph) – A BEL graph

  • keyword (str) – Search for annotations whose values have this as a substring

Return type

List[Mapping[str, str]]

pybel_tools.summary.count_annotation_values(graph, annotation)[source]

Count in how many edges each annotation appears in a graph

Parameters
  • graph (BELGraph) – A BEL graph

  • annotation (str) – The annotation to count

Return type

Counter

Returns

A Counter from {annotation value: frequency}

pybel_tools.summary.count_annotation_values_filtered(graph, annotation, source_predicate=None, target_predicate=None)[source]

Count in how many edges each annotation appears in a graph, but filter out source nodes and target nodes.

See pybel_tools.utils.keep_node() for a basic filter.

Parameters
  • graph (BELGraph) – A BEL graph

  • annotation (str) – The annotation to count

  • source_predicate (Optional[Callable[[BELGraph, BaseEntity], bool]]) – A predicate (graph, node) -> bool for keeping source nodes

  • target_predicate (Optional[Callable[[BELGraph, BaseEntity], bool]]) – A predicate (graph, node) -> bool for keeping target nodes

Return type

Counter

Returns

A Counter from {annotation value: frequency}

pybel_tools.summary.pair_is_consistent(graph, u, v)[source]

Return if the edges between the given nodes are consistent, meaning they all have the same relation.

Return type

Optional[str]

Returns

If the edges aren’t consistent, return false, otherwise return the relation type

pybel_tools.summary.get_consistent_edges(graph)[source]

Yield pairs of (source node, target node) for which all of their edges have the same type of relation.

Return type

Iterable[Tuple[BaseEntity, BaseEntity]]

Returns

An iterator over (source, target) node pairs corresponding to edges with many inconsistent relations

pybel_tools.summary.get_contradictory_pairs(graph)[source]

Iterates over contradictory node pairs in the graph based on their causal relationships

Return type

Iterable[Tuple[BaseEntity, BaseEntity]]

Returns

An iterator over (source, target) node pairs that have contradictory causal edges

pybel_tools.summary.count_pathologies(graph)[source]

Count the number of edges in which each pathology is incident.

Parameters

graph (pybel.BELGraph) – A BEL graph

Return type

Counter

pybel_tools.summary.get_unused_annotations(graph)[source]

Get the set of all annotations that are defined in a graph, but are never used.

Parameters

graph (pybel.BELGraph) – A BEL graph

Returns

A set of annotations

Return type

set[str]

pybel_tools.summary.get_unused_list_annotation_values(graph)[source]

Get all of the unused values for list annotations.

Parameters

graph (pybel.BELGraph) – A BEL graph

Returns

A dictionary of {str annotation: set of str values that aren’t used}

Return type

dict[str,set[str]]

pybel_tools.summary.count_error_types(graph)[source]

Count the occurrence of each type of error in a graph.

Return type

Counter

Returns

A Counter of {error type: frequency}

pybel_tools.summary.count_naked_names(graph)[source]

Count the frequency of each naked name (names without namespaces).

Return type

Counter

Returns

A Counter from {name: frequency}

pybel_tools.summary.get_naked_names(graph)[source]

Get the set of naked names in the graph.

Return type

Set[str]

pybel_tools.summary.get_incorrect_names_by_namespace(graph, namespace)[source]

Return the set of all incorrect names from the given namespace in the graph.

Return type

Set[str]

Returns

The set of all incorrect names from the given namespace in the graph

pybel_tools.summary.get_incorrect_names(graph)[source]

Return the dict of the sets of all incorrect names from the given namespace in the graph.

Return type

Mapping[str, Set[str]]

Returns

The set of all incorrect names from the given namespace in the graph

pybel_tools.summary.get_undefined_namespaces(graph)[source]

Get all namespaces that are used in the BEL graph aren’t actually defined.

Return type

Set[str]

pybel_tools.summary.get_undefined_namespace_names(graph, namespace)[source]

Get the names from a namespace that wasn’t actually defined.

Return type

Set[str]

Returns

The set of all names from the undefined namespace

pybel_tools.summary.calculate_incorrect_name_dict(graph)[source]

Group all of the incorrect identifiers in a dict of {namespace: list of erroneous names}.

Return type

Mapping[str, str]

Returns

A dictionary of {namespace: list of erroneous names}

pybel_tools.summary.calculate_error_by_annotation(graph, annotation)[source]

Group the graph by a given annotation and builds lists of errors for each.

Return type

Mapping[str, List[str]]

Returns

A dictionary of {annotation value: list of errors}

pybel_tools.summary.group_errors(graph)[source]

Group the errors together for analysis of the most frequent error.

Return type

Mapping[str, List[int]]

Returns

A dictionary of {error string: list of line numbers}

pybel_tools.summary.get_names_including_errors(graph)[source]

Takes the names from the graph in a given namespace and the erroneous names from the same namespace and returns them together as a unioned set

Return type

Mapping[str, Set[str]]

Returns

The dict of the sets of all correct and incorrect names from the given namespace in the graph

pybel_tools.summary.get_names_including_errors_by_namespace(graph, namespace)[source]

Takes the names from the graph in a given namespace (pybel.struct.summary.get_names_by_namespace()) and the erroneous names from the same namespace (get_incorrect_names_by_namespace()) and returns them together as a unioned set

Return type

Set[str]

Returns

The set of all correct and incorrect names from the given namespace in the graph

pybel_tools.summary.get_undefined_annotations(graph)[source]

Get all annotations that aren’t actually defined.

Return type

Set[str]

Returns

The set of all undefined annotations

pybel_tools.summary.get_namespaces_with_incorrect_names(graph)[source]

Return the set of all namespaces with incorrect names in the graph.

Return type

Set[str]

pybel_tools.summary.get_most_common_errors(graph, n=20)[source]

Get the (n) most common errors in a graph.

pybel_tools.summary.plot_summary_axes(graph, lax, rax, logx=True)[source]

Plots your graph summary statistics on the given axes.

After, you should run plt.tight_layout() and you must run plt.show() to view.

Shows: 1. Count of nodes, grouped by function type 2. Count of edges, grouped by relation type

Parameters
  • graph (pybel.BELGraph) – A BEL graph

  • lax – An axis object from matplotlib

  • rax – An axis object from matplotlib

Example usage:

>>> import matplotlib.pyplot as plt
>>> from pybel import from_pickle
>>> from pybel_tools.summary import plot_summary_axes
>>> graph = from_pickle('~/dev/bms/aetionomy/parkinsons.gpickle')
>>> fig, axes = plt.subplots(1, 2, figsize=(10, 4))
>>> plot_summary_axes(graph, axes[0], axes[1])
>>> plt.tight_layout()
>>> plt.show()
pybel_tools.summary.plot_summary(graph, plt, logx=True, **kwargs)[source]

Plots your graph summary statistics. This function is a thin wrapper around plot_summary_axis(). It automatically takes care of building figures given matplotlib’s pyplot module as an argument. After, you need to run plt.show().

plt is given as an argument to avoid needing matplotlib as a dependency for this function

Shows:

  1. Count of nodes, grouped by function type

  2. Count of edges, grouped by relation type

Parameters
  • plt – Give matplotlib.pyplot to this parameter

  • kwargs – keyword arguments to give to plt.subplots()

Example usage:

>>> import matplotlib.pyplot as plt
>>> from pybel import from_pickle
>>> from pybel_tools.summary import plot_summary
>>> graph = from_pickle('~/dev/bms/aetionomy/parkinsons.gpickle')
>>> plot_summary(graph, plt, figsize=(10, 4))
>>> plt.show()
pybel_tools.summary.is_causal_relation(edge_data)[source]

Check if the given relation is causal.

Return type

bool

pybel_tools.summary.get_causal_out_edges(graph, nbunch)[source]

Get the out-edges to the given node that are causal.

Return type

Set[Tuple[BaseEntity, BaseEntity]]

Returns

A set of (source, target) pairs where the source is the given node

pybel_tools.summary.get_causal_in_edges(graph, nbunch)[source]

Get the in-edges to the given node that are causal.

Return type

Set[Tuple[BaseEntity, BaseEntity]]

Returns

A set of (source, target) pairs where the target is the given node

pybel_tools.summary.is_causal_source(graph, node)[source]

Return true of the node is a causal source.

  • Doesn’t have any causal in edge(s)

  • Does have causal out edge(s)

Return type

bool

pybel_tools.summary.is_causal_central(graph, node)[source]

Return true if the node is neither a causal sink nor a causal source.

  • Does have causal in edges(s)

  • Does have causal out edge(s)

Return type

bool

pybel_tools.summary.is_causal_sink(graph, node)[source]

Return true if the node is a causal sink.

  • Does have causal in edge(s)

  • Doesn’t have any causal out edge(s)

Return type

bool

pybel_tools.summary.get_causal_source_nodes(graph, func)[source]

Return a set of all nodes that have an in-degree of 0.

This likely means that it is an external perturbagen and is not known to have any causal origin from within the biological system. These nodes are useful to identify because they generally don’t provide any mechanistic insight.

Return type

Set[BaseEntity]

pybel_tools.summary.get_causal_central_nodes(graph, func)[source]

Return a set of all nodes that have both an in-degree > 0 and out-degree > 0.

This means that they are an integral part of a pathway, since they are both produced and consumed.

Return type

Set[BaseEntity]

pybel_tools.summary.get_causal_sink_nodes(graph, func)[source]

Returns a set of all ABUNDANCE nodes that have an causal out-degree of 0.

This likely means that the knowledge assembly is incomplete, or there is a curation error.

Return type

Set[BaseEntity]

pybel_tools.summary.get_degradations(graph)[source]

Get all nodes that are degraded.

Return type

Set[BaseEntity]

pybel_tools.summary.get_activities(graph)[source]

Get all nodes that have molecular activities.

Return type

Set[BaseEntity]

pybel_tools.summary.get_translocated(graph)[source]

Get all nodes that are translocated.

Return type

Set[BaseEntity]

pybel_tools.summary.count_top_centrality(graph, number=30)[source]

Get top centrality dictionary.

Return type

Mapping[BaseEntity, int]

pybel_tools.summary.get_modifications_count(graph)[source]

Get a modifications count dictionary.

Return type

Mapping[str, int]

pybel_tools.summary.count_subgraph_sizes(graph, annotation='Subgraph')[source]

Count the number of nodes in each subgraph induced by an annotation.

Parameters

annotation (str) – The annotation to group by and compare. Defaults to ‘Subgraph’

Return type

Counter[int]

Returns

A dictionary from {annotation value: number of nodes}

pybel_tools.summary.calculate_subgraph_edge_overlap(graph, annotation='Subgraph')[source]

Build a DatafFame to show the overlap between different sub-graphs.

Options: 1. Total number of edges overlap (intersection) 2. Percentage overlap (tanimoto similarity)

Parameters
  • graph (BELGraph) – A BEL graph

  • annotation (str) – The annotation to group by and compare. Defaults to ‘Subgraph’

Return type

Tuple[Mapping[str, Set[Tuple[BaseEntity, BaseEntity]]], Mapping[str, Mapping[str, Set[Tuple[BaseEntity, BaseEntity]]]], Mapping[str, Mapping[str, Set[Tuple[BaseEntity, BaseEntity]]]], Mapping[str, Mapping[str, float]]]

Returns

{subgraph: set of edges}, {(subgraph 1, subgraph2): set of intersecting edges}, {(subgraph 1, subgraph2): set of unioned edges}, {(subgraph 1, subgraph2): tanimoto similarity},

pybel_tools.summary.summarize_subgraph_edge_overlap(graph, annotation='Subgraph')[source]

Return a similarity matrix between all subgraphs (or other given annotation).

Parameters

annotation (str) – The annotation to group by and compare. Defaults to "Subgraph"

Returns

A similarity matrix in a dict of dicts

Return type

dict

pybel_tools.summary.rank_subgraph_by_node_filter(graph, node_predicates, annotation='Subgraph', reverse=True)[source]

Rank sub-graphs by which have the most nodes matching an given filter.

A use case for this function would be to identify which subgraphs contain the most differentially expressed genes.

>>> from pybel import from_pickle
>>> from pybel.constants import GENE
>>> from pybel_tools.integration import overlay_type_data
>>> from pybel_tools.summary import rank_subgraph_by_node_filter
>>> import pandas as pd
>>> graph = from_pickle('~/dev/bms/aetionomy/alzheimers.gpickle')
>>> df = pd.read_csv('~/dev/bananas/data/alzheimers_dgxp.csv', columns=['Gene', 'log2fc'])
>>> data = {gene: log2fc for _, gene, log2fc in df.itertuples()}
>>> overlay_type_data(graph, data, 'log2fc', GENE, 'HGNC', impute=0.0)
>>> results = rank_subgraph_by_node_filter(graph, lambda g, n: 1.3 < abs(g[n]['log2fc']))
Return type

List[Tuple[str, int]]

pybel_tools.summary.summarize_subgraph_node_overlap(graph, node_predicates=None, annotation='Subgraph')[source]

Calculate the subgraph similarity tanimoto similarity in nodes passing the given filter.

Provides an alternate view on subgraph similarity, from a more node-centric view

pybel_tools.summary.count_pmids(graph)[source]

Count the frequency of PubMed documents in a graph.

Return type

Counter

Returns

A Counter from {(pmid, name): frequency}

pybel_tools.summary.get_pmid_by_keyword(keyword, graph=None, pubmed_identifiers=None)[source]

Get the set of PubMed identifiers beginning with the given keyword string.

Parameters
  • keyword (str) – The beginning of a PubMed identifier

  • graph (Optional[BELGraph]) – A BEL graph

  • pubmed_identifiers (Optional[Set[str]]) – A set of pre-cached PubMed identifiers

Return type

Set[str]

Returns

A set of PubMed identifiers starting with the given string

pybel_tools.summary.count_citations(graph, **annotations)[source]

Counts the citations in a graph based on a given filter

Parameters
  • graph (BELGraph) – A BEL graph

  • annotations (dict) – The annotation filters to use

Return type

Counter

Returns

A counter from {(citation type, citation reference): frequency}

pybel_tools.summary.count_citations_by_annotation(graph, annotation)[source]

Group the citation counters by subgraphs induced by the annotation.

Parameters
  • graph (BELGraph) – A BEL graph

  • annotation (str) – The annotation to use to group the graph

Return type

Mapping[str, Counter[str]]

Returns

A dictionary of Counters {subgraph name: Counter from {citation: frequency}}

pybel_tools.summary.count_authors(graph)[source]

Count the number of edges in which each author appears.

Return type

Counter[str]

pybel_tools.summary.count_author_publications(graph)[source]

Count the number of publications of each author to the given graph.

Return type

Counter[str]

pybel_tools.summary.get_authors(graph)[source]

Get the set of all authors in the given graph.

Return type

Set[str]

pybel_tools.summary.get_authors_by_keyword(keyword, graph=None, authors=None)[source]

Get authors for whom the search term is a substring.

Parameters
  • graph (pybel.BELGraph) – A BEL graph

  • keyword (str) – The keyword to search the author strings for

  • authors (set[str]) – An optional set of pre-cached authors calculated from the graph

Return type

Set[str]

Returns

A set of authors with the keyword as a substring

pybel_tools.summary.count_authors_by_annotation(graph, annotation='Subgraph')[source]

Group the author counters by sub-graphs induced by the annotation.

Parameters
  • graph (BELGraph) – A BEL graph

  • annotation (str) – The annotation to use to group the graph

Return type

Mapping[str, Counter[str]]

Returns

A dictionary of Counters {subgraph name: Counter from {author: frequency}}

pybel_tools.summary.get_evidences_by_pmid(graph, pmids)[source]

Get a dictionary from the given PubMed identifiers to the sets of all evidence strings associated with each in the graph.

Parameters
  • graph (BELGraph) – A BEL graph

  • pmids (Union[str, Iterable[str]]) – An iterable of PubMed identifiers, as strings. Is consumed and converted to a set.

Returns

A dictionary of {pmid: set of all evidence strings}

Return type

dict

pybel_tools.summary.count_citation_years(graph)[source]

Count the number of citations from each year.

Return type

Counter[int]

pybel_tools.summary.create_timeline(year_counter)[source]

Complete the Counter timeline.

Parameters

year_counter (Counter) – counter dict for each year

Return type

List[Tuple[int, int]]

Returns

complete timeline

pybel_tools.summary.get_citation_years(graph)[source]

Create a citation timeline counter from the graph.

Return type

List[Tuple[int, int]]

pybel_tools.summary.count_confidences(graph)[source]

Count the confidences in the graph.

Return type

Counter[str]