I/O and Interoperability¶
This notebook introduces the main data-export and backend-conversion paths.
AnnNet is the source of truth. External tables and backend graphs are projections of that source of truth.
In [1]:
Copied!
import sys
from pathlib import Path
repo_root = Path.cwd()
if not (repo_root / 'annnet').exists():
for parent in repo_root.parents:
if (parent / 'annnet').exists():
repo_root = parent
break
if str(repo_root) not in sys.path:
sys.path.insert(0, str(repo_root))
import annnet as an
import sys
from pathlib import Path
repo_root = Path.cwd()
if not (repo_root / 'annnet').exists():
for parent in repo_root.parents:
if (parent / 'annnet').exists():
repo_root = parent
break
if str(repo_root) not in sys.path:
sys.path.insert(0, str(repo_root))
import annnet as an
In [2]:
Copied!
G = an.AnnNet(directed=True)
G.add_vertices_bulk(
[
('EGFR', {'kind': 'protein'}),
('GRB2', {'kind': 'protein'}),
('SOS1', {'kind': 'protein'}),
('RAS', {'kind': 'protein'}),
]
)
G.add_edge('EGFR', 'GRB2', edge_id='e1', confidence=0.99)
G.add_edge('GRB2', 'SOS1', edge_id='e2', confidence=0.95)
G.add_edge(src=['SOS1', 'RAS', 'EGFR'], edge_id='h1', directed=False, process='complex')
G = an.AnnNet(directed=True)
G.add_vertices_bulk(
[
('EGFR', {'kind': 'protein'}),
('GRB2', {'kind': 'protein'}),
('SOS1', {'kind': 'protein'}),
('RAS', {'kind': 'protein'}),
]
)
G.add_edge('EGFR', 'GRB2', edge_id='e1', confidence=0.99)
G.add_edge('GRB2', 'SOS1', edge_id='e2', confidence=0.95)
G.add_edge(src=['SOS1', 'RAS', 'EGFR'], edge_id='h1', directed=False, process='complex')
Out[2]:
'h1'
Export to explicit tables¶
to_dataframes(...) is the easiest way to make the graph explicit as separate tables.
In [3]:
Copied!
tables = an.to_dataframes(G)
print(sorted(tables))
print('nodes table:')
print(tables['nodes'])
print('edges table:')
print(tables['edges'])
print('hyperedges table:')
print(tables['hyperedges'])
tables = an.to_dataframes(G)
print(sorted(tables))
print('nodes table:')
print(tables['nodes'])
print('edges table:')
print(tables['edges'])
print('hyperedges table:')
print(tables['hyperedges'])
['edges', 'hyperedges', 'nodes', 'slice_weights', 'slices'] nodes table: shape: (4, 2) ┌───────────┬─────────┐ │ vertex_id ┆ kind │ │ --- ┆ --- │ │ str ┆ str │ ╞═══════════╪═════════╡ │ EGFR ┆ protein │ │ GRB2 ┆ protein │ │ SOS1 ┆ protein │ │ RAS ┆ protein │ └───────────┴─────────┘ edges table: shape: (2, 8) ┌─────────┬────────┬────────┬────────┬──────────┬───────────┬────────────┬─────────┐ │ edge_id ┆ source ┆ target ┆ weight ┆ directed ┆ edge_type ┆ confidence ┆ process │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ bool ┆ str ┆ f64 ┆ null │ ╞═════════╪════════╪════════╪════════╪══════════╪═══════════╪════════════╪═════════╡ │ e1 ┆ EGFR ┆ GRB2 ┆ 1.0 ┆ true ┆ binary ┆ 0.99 ┆ null │ │ e2 ┆ GRB2 ┆ SOS1 ┆ 1.0 ┆ true ┆ binary ┆ 0.95 ┆ null │ └─────────┴────────┴────────┴────────┴──────────┴───────────┴────────────┴─────────┘ hyperedges table: shape: (1, 8) ┌─────────┬──────────┬────────┬──────┬──────┬─────────────────────────┬────────────┬─────────┐ │ edge_id ┆ directed ┆ weight ┆ head ┆ tail ┆ members ┆ confidence ┆ process │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ bool ┆ f64 ┆ null ┆ null ┆ list[str] ┆ null ┆ str │ ╞═════════╪══════════╪════════╪══════╪══════╪═════════════════════════╪════════════╪═════════╡ │ h1 ┆ false ┆ 1.0 ┆ null ┆ null ┆ ["SOS1", "EGFR", "RAS"] ┆ null ┆ complex │ └─────────┴──────────┴────────┴──────┴──────┴─────────────────────────┴────────────┴─────────┘
Backend conversion and algorithm interoperability¶
The lazy backend interoperability accessors live on the graph object itself: G.nx, G.ig, and G.gt.
Use backend() when you want the concrete projected backend graph object. Use G.nx.<function>(G, ...) when you want AnnNet to convert G, replace the graph argument with the NetworkX projection, dispatch the NetworkX function, and return the result.
In [4]:
Copied!
nx_graph = G.nx.backend()
print(type(nx_graph).__name__)
print('networkx nodes / edges:', nx_graph.number_of_nodes(), nx_graph.number_of_edges())
nx_graph = G.nx.backend()
print(type(nx_graph).__name__)
print('networkx nodes / edges:', nx_graph.number_of_nodes(), nx_graph.number_of_edges())
MultiDiGraph networkx nodes / edges: 4 8
In [5]:
Copied!
# Direct NetworkX interoperability: pass the AnnNet graph as the graph argument.
# The accessor converts G to a NetworkX graph, dispatches the function, and returns the result.
path_length = G.nx.shortest_path_length(G, source='EGFR', target='RAS')
print('EGFR -> RAS shortest path length:', path_length)
# Direct NetworkX interoperability: pass the AnnNet graph as the graph argument.
# The accessor converts G to a NetworkX graph, dispatches the function, and returns the result.
path_length = G.nx.shortest_path_length(G, source='EGFR', target='RAS')
print('EGFR -> RAS shortest path length:', path_length)
EGFR -> RAS shortest path length: 1
Native round-trip¶
The native .annnet format is the high-fidelity persistence format. Use it when AnnNet is the system of record.
In [6]:
Copied!
from pathlib import Path
out = Path('tmp_tutorial_graph.annnet')
G.write(out, overwrite=True)
G2 = an.AnnNet.read(out)
print('round-trip shape:', G2.shape)
print('round-trip vertices:', G2.vertices())
from pathlib import Path
out = Path('tmp_tutorial_graph.annnet')
G.write(out, overwrite=True)
G2 = an.AnnNet.read(out)
print('round-trip shape:', G2.shape)
print('round-trip vertices:', G2.vertices())
round-trip shape: (4, 3) round-trip vertices: ['EGFR', 'GRB2', 'SOS1', 'RAS']