{
"cells": [
{
"cell_type": "markdown",
"id": "f2d7b3c9",
"metadata": {},
"source": [
"# Acyclic Flows\n",
"\n",
"Many problems in network inference rely on obtaining a directed acyclic graph (DAG) that encodes a specific function. For example, CARNIVAL, a method for inferring signaling networks by propagating changes in activity, identifies a DAG that represents the causal relationships between signaling molecules based on perturbation data.\n",
"\n",
"This approach is not only interesting in biological problems but also in more general ones. For instance, in Machine Learning, many problems related to representation learning require learning DAG structures. Techniques such as learning Bayesian networks and structural equation models in causal discovery are built upon this concept. In such scenarios, DAGs serve as probabilistic graphical models, encoding a set of random variables and their conditional dependencies. Identifying these structures from data requires learning DAGs, which involves searching over combinatorial structures that represent the data accurately.\n",
"\n",
"To simplify the modeling of these problems, CORNETO introduces the Acyclic Flow building block.\n",
"Acyclic Flow is an special case of a Flow problem where the flow is constrained to be acyclic. This means that the flow cannot go back to a node that has already been visited. This is implemented by augmenting the original flow problem with binary variables for the selection of edges and continuous variables to determine a topological order for the graph's vertices.\n",
"\n",
"In this tutorial, we will explore how to model such problems using CORNETO's Acyclic Flows. We will begin by constructing a graph with cycles. Then, we will find the largest DAG contained in the graph by transforming the problem into a network flow problem.\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a1684e94",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import numpy as np\n",
"\n",
"import corneto as cn\n",
"\n",
"cn.info()"
]
},
{
"cell_type": "markdown",
"id": "6cab54bd",
"metadata": {},
"source": [
"For illustration purposes, let's generate a directed graph which contains cycles"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4512c595",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"G_ex = cn.Graph()\n",
"G_ex.add_edges(\n",
" [\n",
" (\"v1\", \"v2\"),\n",
" (\"v1\", \"v7\"),\n",
" (\"v3\", \"v1\"),\n",
" (\"v2\", \"v4\"),\n",
" (\"v4\", \"v6\"),\n",
" (\"v4\", \"v7\"),\n",
" (\"v5\", \"v6\"),\n",
" (\"v6\", \"v7\"),\n",
" (\"v7\", \"v2\"),\n",
" (\"v2\", \"v5\"),\n",
" (\"v3\", \"v6\"),\n",
" (\"v5\", \"v1\"),\n",
" ]\n",
")\n",
"G_ex.plot()"
]
},
{
"cell_type": "markdown",
"id": "f4b0eab5",
"metadata": {},
"source": [
"We are going to use Acyclic flows to create a problem in which any graph with cycles is by definition infeasible. Thus, this problem will search across the space of valid acyclic graphs that can be extracted from the graph. \n",
"\n",
"## Largest DAG from a given source and sink vertex\n",
"\n",
"To illustrate a practical optimization problem based on acyclic flows, we will extract the largest DAG from a directed graph, from a source/root vertex to a target/terminal vertex. To do this, we need to convert first this graph into a flow problem. Note that since the graph is directed, solutions depend on the selected root node and the direction of edges.\n",
"\n",
"Let's say we are interested in finding an acyclic sub-graph from `v1` to `v7`. We need to add two extra edges, one to \"inject\" flow and one to \"remove\" flow from the graph. We are not interested in the particular values of the flow, these are just mathematical artifacts to convert the problem into a network flow that can be efficiently handled by CORNETO:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "dac9e12d",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"G = G_ex.copy()\n",
"G.add_edge((), \"v1\")\n",
"G.add_edge(\"v7\", ())\n",
"G.plot()"
]
},
{
"cell_type": "markdown",
"id": "9f47657a",
"metadata": {},
"source": [
"As you can see, two directed edges were added to the graph, one pointing to `v1` indicating that this is a source of injecting flow, and one going out from `v7`. Now we can solve an optimization problem in which we can extract some graph that contains at least both nodes. For example: \n",
"> What is the largest direct acyclic graph from V1 to V7?"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "cb9618a1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'_flow': Variable((14,)),\n",
" '_flow_ineg': Variable((14,), boolean=True),\n",
" '_flow_ipos': Variable((14,), boolean=True),\n",
" '_dag_layer_pos': Variable((7,)),\n",
" 'positive_flow': Variable((14,), boolean=True),\n",
" 'negative_flow': Variable((14,), boolean=True),\n",
" 'with_flow': Expression(AFFINE, NONNEGATIVE, (14,)),\n",
" 'flow': Variable((14,))}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We create an acyclic flow problem from the graph and we inspect the variables\n",
"P = cn.K.AcyclicFlow(G)\n",
"P.expr"
]
},
{
"cell_type": "markdown",
"id": "d4424fa3",
"metadata": {},
"source": [
"With `P.expr` we can inspect the variables and expressions registered by the problem. In general, variables starting with `_` are internal variables used to model the problem. Variables not starting with `_` are aliases of variables or expressions. Here we see that we have `flow` and `with_flow`. These are aliases for variable `_flow` and `with_flow = _flow_ipos + _flow_ineg`. The `with_flow` expression is a binary expression indicating if edge `i` has a nonzero flow (positive or negative), so if `P.expr.with_flow[i]` is 0, then `P.expr.flow[i]` is 0. If `P.expr.with_flow[i]` is 1, then `P.expr.flow[i]` greater than `tolerance` or lower than `-tolerance`. The `tolerance` parameter is a small number that indicates the minimum flow that we consider different from zero. This is necessary because of numerical issues in the solver. By default, `tolerance` is set to `1e-4`. It can be changed by passing the `tolerance` parameter to the `AcyclicFlow` constructor, however, changing the default value is not recommended unless you experience numerical issues.\n",
"\n",
"\n",
"Now, we will use the `with_flow` expression to indicate that we want to maximize the number of ones (the number of edges with flow different from zero). In this way, we will find the largest sub-graph which can carry an acyclic flow:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "611907b5",
"metadata": {},
"outputs": [],
"source": [
"# Largest acyclic sub-graph\n",
"# TODO: Not intuitive to use _flow_ipos. Better to have a GraphProblem\n",
"# and we do sum(GP.edges), where edges links to _flow_ipos + _flow_ineg.\n",
"P.add_objectives(sum(P.expr.with_flow), weights=-1)\n",
"P.solve();"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cb0ff26b",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"G.edge_subgraph(P.expr.with_flow.value > 0.5).plot(orphan_edges=False)"
]
},
{
"cell_type": "markdown",
"id": "4450e7fc",
"metadata": {},
"source": [
"The acyclic flow creates a special variable needed to guarantee that there is a valid topological sort of the vertices with no loops. After solving the problem, we directly get a topological sort of the vertices for free:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0b843804",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"v2 3\n",
"v1 0\n",
"v7 6\n",
"v3 0\n",
"v4 4\n",
"v6 5\n",
"v5 4\n"
]
}
],
"source": [
"for vertex, position in zip(G.V, P.symbols[\"_dag_layer_pos\"].value):\n",
" print(vertex, int(position))"
]
},
{
"cell_type": "markdown",
"id": "1ef7ff01",
"metadata": {},
"source": [
"## Largest DAG from any source to any sink\n",
"\n",
"We have used acyclic network flows to find the largest acyclic graph starting and ending in two given vertices. However, there is a more general problem: given a graph, find the largest acyclic sub-graph contained in the graph. \n",
"\n",
"We can use the same acyclic flow building block to define the problem. The difference is that now we need to add one extra edge per vertex. This edge will inject or remove flow from the graph, but we will let the solver decide which edges are selected for this. By extending this graph with those additional edges, we are reducing the problem of finding the largest acyclic sub-graph to an equivalent network flow problem."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "027d392b",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"G = G_ex.copy()\n",
"flow_edges = []\n",
"for i in range(G.nv):\n",
" flow_edges.append(G.add_edge(f\"v{i+1}\", ()))\n",
"G.plot()"
]
},
{
"cell_type": "markdown",
"id": "59d3c668",
"metadata": {},
"source": [
"Note that the edges we have added point from the origin vertex outward. By default, these edges only remove flow from the network. However, for a feasible solution, we need the ability to also inject flow into the network. We could simply add another edge pointing in the opposite direction.\n",
"\n",
"However, to illustrate the generality and flexibility of CORNETO, we will take a different approach. We will add a negative lower bound to those edges, which will automatically make the edges bi-directional. Depending on the sign of the flow, the edge can either inject or remove flow from the network. This is a very useful trick that simplifies the modeling of many problems."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "d2df07cd",
"metadata": {},
"outputs": [],
"source": [
"lb = np.zeros(G.ne)\n",
"ub = 10 * np.ones(G.ne)\n",
"\n",
"# We will add a negative lower bound for\n",
"# the dummy edges that we added before\n",
"lb[flow_edges] = -10"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d659cb1b",
"metadata": {},
"outputs": [],
"source": [
"P = cn.K.AcyclicFlow(G, lb=lb, ub=ub)\n",
"P.add_objectives(sum(P.expr.positive_flow), weights=-1)\n",
"P.solve();"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "0e913b68",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"G.edge_subgraph(P.expr.with_flow.value > 0.5).plot(orphan_edges=False)"
]
},
{
"cell_type": "markdown",
"id": "588bb2b7",
"metadata": {},
"source": [
"```{note}\n",
"Acyclic Flows on hypergraphs are not supported. Creating a `AcyclicFlow` problem on a graph with at least one hyperedge will raise an error. Acyclicity in hypergraphs can be indirectly achieved by using normal Network Flows and minimizing the number selected edges with flow in the network.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "f2d6630b",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}