{ "cells": [ { "cell_type": "markdown", "id": "f2d7b3c9", "metadata": {}, "source": [ "# Prior Knowledge and Graphs\n", "\n", "One of the central ideas of the CORNETO framework revolves around using *prior knowledge* in the form of *networks* or *graphs* to build specialized network inference methods. By prior knowledge, we mean any information that is available about the problem at hand, such as protein-protein interaction networks, genome-scale metabolic networks, or even causal connections between random variables. \n", "\n", "CORNETO provides a `Graph` class to construct prior kwnoledge graphs. This class is very flexible and can be used to build different types of graphs and hypergraphs, including undirected, directed, and mixed graphs, as well as graphs with multiple edge types and self-loops. It offers basic functionality to store vertices and edges with specific attributes, and basic graph operations.\n", "\n", "In this tutorial, we will see how the `Graph` class is used in CORNETO to encode graph problems or prior knowledge in general.\n", "\n", "\n", "```{note}\n", "*CORNETO is a library to design graph-based optimization problems, not a graph library.*\n", "\n", "The `Graph` class within CORNETO is designed as a base class for constructing general-purpose network methods on top of graphs. For users who require conventional, solver-free graph algorithms, such as Dijkstra for shortest path, the [NetworkX](https://networkx.org/) library provides a comprehensive suite of tools for graph operations. Additionally, for ease of interoperability, the `to_networkx` method facilitates straightforward conversion from a CORNETO graph to a NetworkX graph. This dual approach ensures flexibility and depth for diverse network analysis needs.\n", "```\n" ] }, { "cell_type": "code", "execution_count": 40, "id": "a1684e94", "metadata": {}, "outputs": [ { "data": { "text/html": [], "text/plain": [] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "
Installed version:v1.0.0.dev0 (up to date)
Available backends:CVXPY v1.5.1, PICOS v2.4.17
Default backend (corneto.opt):CVXPY
Installed solvers:CLARABEL, CVXOPT, ECOS, ECOS_BB, GLPK, GLPK_MI, GUROBI, OSQP, SCIP, SCIPY, SCS
Graphviz version:v0.20.3
Repository:https://github.com/saezlab/corneto
\n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import corneto as cn\n", "\n", "cn.info()" ] }, { "cell_type": "markdown", "id": "524197ea", "metadata": {}, "source": [ "## Manually creating a graph" ] }, { "cell_type": "code", "execution_count": 41, "id": "4512c595", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "\n", "2\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "3\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G = cn.Graph()\n", "G.add_edge(1, 2)\n", "G.add_edge(2, 3)\n", "G.add_edge(1, 3)\n", "G.plot()" ] }, { "cell_type": "markdown", "id": "bdcacdb8", "metadata": {}, "source": [ "By default, edges are directed. Undirected edges can be mixed with directed edges. The method `add_edge` returns the index of the new edge. Here is an example:" ] }, { "cell_type": "code", "execution_count": 42, "id": "ad64bcb7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "\n", "2\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "3\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "4\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = G.add_edge(3, 4, cn.EdgeType.UNDIRECTED)\n", "print(idx)\n", "G.plot()" ] }, { "cell_type": "markdown", "id": "2350f34c", "metadata": {}, "source": [ "Parallel edges are also supported. You can add multiple edges, both directed or undirected between vertices:" ] }, { "cell_type": "code", "execution_count": 43, "id": "d87919e0", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "\n", "2\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "3\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "4\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G.add_edge(3, 4, cn.EdgeType.UNDIRECTED)\n", "G.add_edge(3, 4)\n", "G.plot()" ] }, { "cell_type": "markdown", "id": "f924788c", "metadata": {}, "source": [ "Order of vertices and edges are preserved in the order of addition. Given an edge (u, v) added to the graph, u is added to the graph only if it's not already present." ] }, { "cell_type": "code", "execution_count": 44, "id": "8768fab0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 2, 3, 4)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G.V" ] }, { "cell_type": "code", "execution_count": 45, "id": "8fd6ecf9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((frozenset({1}), frozenset({2})),\n", " (frozenset({2}), frozenset({3})),\n", " (frozenset({1}), frozenset({3})),\n", " (frozenset({3}), frozenset({4})),\n", " (frozenset({3}), frozenset({4})),\n", " (frozenset({3}), frozenset({4})))" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G.E" ] }, { "cell_type": "markdown", "id": "3b793e80", "metadata": {}, "source": [ "## Edge attributes" ] }, { "cell_type": "code", "execution_count": 46, "id": "cfc44e12", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'__edge_type': 'directed',\n", " 'weight': 0.5,\n", " 'label': 'E(3->4)',\n", " '__source_attr': {3: {'__value': {}}},\n", " '__target_attr': {4: {'__value': {}}}}" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = G.add_edge(3, 4, weight=0.5, label=\"E(3->4)\")\n", "attr = G.get_attr_edge(idx)\n", "attr" ] }, { "cell_type": "code", "execution_count": 47, "id": "990951f4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "attr.weight" ] }, { "cell_type": "code", "execution_count": 48, "id": "fe14480b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'__edge_type': 'directed',\n", " 'weight': 0.5,\n", " 'label': 'E(3->4)',\n", " '__source_attr': {3: {'__value': {}}},\n", " '__target_attr': {4: {'__value': {}}}}" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G.get_attr_edge(idx)" ] }, { "cell_type": "code", "execution_count": 49, "id": "215eebcf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[6]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(G.get_edges_by_attr(\"label\", \"E(3->4)\"))" ] }, { "cell_type": "code", "execution_count": 50, "id": "58cc1998", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(frozenset({3}), frozenset({4}))" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "edge = G.get_edge(idx)\n", "edge" ] }, { "cell_type": "markdown", "id": "80ecbfd2", "metadata": {}, "source": [ "Graphs have some special attributes, starting by `__`, for example `__source_attr` and `__target_attr`. These are automatically added to store attributes between the edge and the vertices. The recommended way to access the special attributes are through the `get_attr` method and the `corneto.Attr` attributes, for example:" ] }, { "cell_type": "code", "execution_count": 51, "id": "281142cf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{3: {'__value': {}}}" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "attr = G.get_attr_edge(idx)\n", "attr.get_attr(cn.Attr.SOURCE_ATTR)" ] }, { "cell_type": "code", "execution_count": 52, "id": "9c1df45f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{3: {'__value': {}}}" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "attr.__source_attr" ] }, { "cell_type": "markdown", "id": "9b45cf9d", "metadata": {}, "source": [ "```{note}\n", "The API for handling attributes on graphs is still under development and will change in future versions. It is used mostly internally to transform different representations of prior knowledge.\n", "```" ] }, { "cell_type": "markdown", "id": "47d62960", "metadata": {}, "source": [ "## Importing graphs\n", "\n", "CORNETO implements also few adapters to import prior knowledge from other sources as corneto Graphs. For example, when working with signaling networks, one common format is the `SIF` files, which store vertices and edges in triples: `Source Interaction Target`, for example `A -1 B` to indicate that vertex (e.g. protein) A inhibits protein B. When importing a SIF file, CORNETO creates a graph with the attribute `interaction` which stores the type of interaction between vertices. The method `from_sif_tuples` takes a list of these triplets to generate the graph:" ] }, { "cell_type": "code", "execution_count": 53, "id": "d5b8730c", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "A\n", "\n", "A\n", "\n", "\n", "\n", "B\n", "\n", "B\n", "\n", "\n", "\n", "A->B\n", "\n", "\n", "\n", "\n", "\n", "C\n", "\n", "C\n", "\n", "\n", "\n", "A->C\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sif_graph = cn.Graph.from_sif_tuples([(\"A\", 1, \"B\"), (\"A\", -1, \"C\")])\n", "sif_graph.plot()" ] }, { "cell_type": "code", "execution_count": 54, "id": "558c4a7e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'__edge_type': 'directed',\n", " 'interaction': 1,\n", " '__source_attr': {'A': {'__value': {}}},\n", " '__target_attr': {'B': {'__value': {}}}},\n", " {'__edge_type': 'directed',\n", " 'interaction': -1,\n", " '__source_attr': {'A': {'__value': {}}},\n", " '__target_attr': {'C': {'__value': {}}}}]" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sif_graph.get_attr_edges()" ] }, { "cell_type": "code", "execution_count": 55, "id": "c3310b77", "metadata": {}, "outputs": [], "source": [ "# cn.Graph.from_sif_file()" ] }, { "cell_type": "markdown", "id": "6c0ac166", "metadata": {}, "source": [ "## Saving and reading\n", "\n", "The method `save` allows you..." ] }, { "cell_type": "code", "execution_count": 56, "id": "d29ddc6e", "metadata": {}, "outputs": [], "source": [ "G.save(\"my_graph\")" ] }, { "cell_type": "code", "execution_count": 57, "id": "035d04d4", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "\n", "2\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "3\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "4\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G_c = cn.Graph.load(\"my_graph.pkl.xz\")\n", "G_c.plot()" ] }, { "cell_type": "markdown", "id": "ef4a003b", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "0661cc86", "metadata": {}, "source": [ "## Hypergraphs\n", "\n", "Graphs in corneto support also hyper-edges connecting sets of vertices, something which is not supported by `networkx`. This is very useful to model more complex prior knowledge, such as metabolic networks, where edges are reactions connecting multiple vertices (reactants and products), e.g., `A + B -> C + D`, `D -> E + F`." ] }, { "cell_type": "code", "execution_count": 62, "id": "20ebd58d", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "A\n", "\n", "A\n", "\n", "\n", "\n", "e_0_center\n", "\n", "\n", "\n", "\n", "A->e_0_center\n", "\n", "\n", "\n", "\n", "B\n", "\n", "B\n", "\n", "\n", "\n", "B->e_0_center\n", "\n", "\n", "\n", "\n", "C\n", "\n", "C\n", "\n", "\n", "\n", "e_4_target\n", "\n", "\n", "\n", "\n", "C->e_4_target\n", "\n", "\n", "\n", "\n", "\n", "D\n", "\n", "D\n", "\n", "\n", "\n", "e_1_center\n", "\n", "\n", "\n", "\n", "D->e_1_center\n", "\n", "\n", "\n", "\n", "e_0_center->C\n", "\n", "\n", "\n", "\n", "\n", "e_0_center->D\n", "\n", "\n", "\n", "\n", "\n", "F\n", "\n", "F\n", "\n", "\n", "\n", "E\n", "\n", "E\n", "\n", "\n", "\n", "e_1_center->F\n", "\n", "\n", "\n", "\n", "\n", "e_1_center->E\n", "\n", "\n", "\n", "\n", "\n", "e_2_source\n", "\n", "\n", "\n", "\n", "e_2_source->A\n", "\n", "\n", "\n", "\n", "\n", "e_3_source\n", "\n", "\n", "\n", "\n", "e_3_source->B\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G = cn.Graph()\n", "G.add_edge((\"A\", \"B\"), (\"C\", \"D\"))\n", "G.add_edge(\"D\", (\"E\", \"F\"))\n", "G.add_edge((), \"A\")\n", "G.add_edge((), \"B\")\n", "G.add_edge(\"C\", ())\n", "G.plot()" ] }, { "cell_type": "code", "execution_count": 63, "id": "9646595e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('A', 'B', 'C', 'D', 'F', 'E')" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G.V" ] }, { "cell_type": "code", "execution_count": 64, "id": "1aada7ad", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((frozenset({'A', 'B'}), frozenset({'C', 'D'})),\n", " (frozenset({'D'}), frozenset({'E', 'F'})),\n", " (frozenset(), frozenset({'A'})),\n", " (frozenset(), frozenset({'B'})),\n", " (frozenset({'C'}), frozenset()))" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "G.E" ] }, { "cell_type": "markdown", "id": "dcbd2a10", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }