Acyclic Flows#

Many network inference and optimization problems require obtaining a Directed Acyclic Graph (DAG) to accurately model specific functions or dependencies. DAGs are fundamental structures widely applicable across diverse fields, such as Bayesian networks, causal inference, machine learning architectures, and structural equation modeling.

In Bayesian networks, DAGs represent probabilistic graphical models where nodes correspond to random variables and edges encode conditional dependencies. Similarly, in causal inference and representation learning, identifying DAG structures is crucial for capturing causal relationships or underlying data representations accurately. Learning these DAG structures typically involves combinatorial search algorithms designed to discover the topology that best fits observational or experimental data.

To simplify the process of modeling and solving these problems, CORNETO introduces the concept of Acyclic Flows. An Acyclic Flow is a specialized version of a network flow problem with additional constraints that enforce acyclicity. Specifically, this approach ensures that flows do not revisit previously visited nodes, effectively preventing cycles in the resulting graph. This is achieved by augmenting the original flow formulation with binary variables for selecting edges and continuous variables that define a valid topological ordering of nodes.

In this tutorial, we will demonstrate how to effectively model problems using CORNETO’s Acyclic Flows. We will start by constructing an initial graph containing cycles, and then transform this into a network flow formulation to identify the largest possible DAG within this graph. Through this process, you will learn how to leverage CORNETO to solve practical problems requiring acyclic structures.

import numpy as np

import corneto as cn

cn.info()
Installed version:v1.0.0.dev5 (latest stable: v1.0.0-alpha)
Available backends:CVXPY v1.6.5, PICOS v2.6.1
Default backend (corneto.opt):CVXPY
Installed solvers:CVXOPT, GLPK, GLPK_MI, HIGHS, SCIP, SCIPY
Graphviz version:v0.20.3
Installed path:/home/runner/work/corneto/corneto/corneto
Repository:https://github.com/saezlab/corneto

For illustration purposes, let’s generate a directed graph which contains cycles

G_ex = cn.Graph()
G_ex.add_edges(
    [
        ("v1", "v2"),
        ("v1", "v7"),
        ("v3", "v1"),
        ("v2", "v4"),
        ("v4", "v6"),
        ("v4", "v7"),
        ("v5", "v6"),
        ("v6", "v7"),
        ("v7", "v2"),
        ("v2", "v5"),
        ("v3", "v6"),
        ("v5", "v1"),
    ]
)
G_ex.plot()
../../_images/79096d825f4e07c84fd00056411adff82b1de9c9f2e741d14970d5ecd6906fad.svg

We are going to use Acyclic flows to create a problem in which any graph with cycles is by definition infeasible. Thus, this problem will search across the space of valid acyclic graphs that can be extracted from the graph.

Largest DAG from a given source and sink vertex#

To illustrate a practical optimization problem based on acyclic flows, we will extract the largest DAG from a directed graph, from a source/root vertex to a target/terminal vertex. To do this, we need to convert first this graph into a flow problem. Note that since the graph is directed, solutions depend on the selected root node and the direction of edges.

Let’s say we are interested in finding an acyclic sub-graph from v1 to v7. We need to add two extra edges, one to “inject” flow and one to “remove” flow from the graph. We are not interested in the particular values of the flow, these are just mathematical artifacts to convert the problem into a network flow that can be efficiently handled by CORNETO:

G = G_ex.copy()
G.add_edge((), "v1")
G.add_edge("v7", ())
G.plot()
../../_images/20f06e00661b6508ef64b492bb4d21be4e8ecfa392ce223fa4181e4f39a1fc33.svg

As you can see, two directed edges were added to the graph, one pointing to v1 indicating that this is a source of injecting flow, and one going out from v7. Now we can solve an optimization problem in which we can extract some graph that contains at least both nodes. For example:

What is the largest direct acyclic graph from V1 to V7?

# We create an acyclic flow problem from the graph and we inspect the variables
P = cn.opt.AcyclicFlow(G)
P.expr
{'_flow': _flow: Variable((14,), _flow),
 '_flow_ipos': _flow_ipos: Variable((14,), _flow_ipos, boolean=True),
 '_flow_ineg': _flow_ineg: Variable((14,), _flow_ineg, boolean=True),
 '_dag_layer_pos': _dag_layer_pos: Variable((7,), _dag_layer_pos),
 'positive_flow': _flow_ipos: Variable((14,), _flow_ipos, boolean=True),
 'negative_flow': _flow_ineg: Variable((14,), _flow_ineg, boolean=True),
 'with_flow': Expression(AFFINE, NONNEGATIVE, (14,)),
 'flow': _flow: Variable((14,), _flow)}

With P.expr we can inspect the variables and expressions registered by the problem. In general, variables starting with _ are internal variables used to model the problem. Variables not starting with _ are aliases of variables or expressions. Here we see that we have flow and with_flow. These are aliases for variable _flow and with_flow = _flow_ipos + _flow_ineg. The with_flow expression is a binary expression indicating if edge i has a nonzero flow (positive or negative), so if P.expr.with_flow[i] is 0, then P.expr.flow[i] is 0. If P.expr.with_flow[i] is 1, then P.expr.flow[i] greater than tolerance or lower than -tolerance. The tolerance parameter is a small number that indicates the minimum flow that we consider different from zero. This is necessary because of numerical issues in the solver. By default, tolerance is set to 1e-4. It can be changed by passing the tolerance parameter to the AcyclicFlow constructor, however, changing the default value is not recommended unless you experience numerical issues.

Now, we will use the with_flow expression to indicate that we want to maximize the number of ones (the number of edges with flow different from zero). In this way, we will find the largest sub-graph which can carry an acyclic flow:

# Largest acyclic sub-graph
# TODO: Not intuitive to use _flow_ipos. Better to have a GraphProblem
# and we do sum(GP.edges), where edges links to _flow_ipos + _flow_ineg.
P.add_objectives(sum(P.expr.with_flow), weights=-1)
P.solve();
G.edge_subgraph(P.expr.with_flow.value > 0.5).plot(orphan_edges=False)
../../_images/25b9dd1b8643be06b4945aaf990ef3bc5c57b9248ccb7f5f311d18208066872f.svg

The acyclic flow creates a special variable needed to guarantee that there is a valid topological sort of the vertices with no loops. After solving the problem, we directly get a topological sort of the vertices for free:

for vertex, position in zip(G.V, P.symbols["_dag_layer_pos"].value):
    print(vertex, int(position))
v1 0
v2 1
v7 4
v3 0
v4 2
v6 3
v5 2

Largest DAG from any source to any sink#

We have used acyclic network flows to find the largest acyclic graph starting and ending in two given vertices. However, there is a more general problem: given a graph, find the largest acyclic sub-graph contained in the graph.

We can use the same acyclic flow building block to define the problem. The difference is that now we need to add one extra edge per vertex. This edge will inject or remove flow from the graph, but we will let the solver decide which edges are selected for this. By extending this graph with those additional edges, we are reducing the problem of finding the largest acyclic sub-graph to an equivalent network flow problem.

G = G_ex.copy()
flow_edges = []
for i in range(G.nv):
    flow_edges.append(G.add_edge(f"v{i+1}", ()))
G.plot()
../../_images/82b5090ffefedc420f0299f8a11fbf96379e5abf4416e3e1550a6d12bf3303a1.svg

Note that the edges we have added point from the origin vertex outward. By default, these edges only remove flow from the network. However, for a feasible solution, we need the ability to also inject flow into the network. We could simply add another edge pointing in the opposite direction.

However, to illustrate the generality and flexibility of CORNETO, we will take a different approach. We will add a negative lower bound to those edges, which will automatically make the edges bi-directional. Depending on the sign of the flow, the edge can either inject or remove flow from the network. This is a very useful trick that simplifies the modeling of many problems.

lb = np.zeros(G.ne)
ub = 10 * np.ones(G.ne)

# We will add a negative lower bound for
# the dummy edges that we added before
lb[flow_edges] = -10
P = cn.opt.AcyclicFlow(G, lb=lb, ub=ub)
P.add_objectives(sum(P.expr.positive_flow), weights=-1)
P.solve();
G.edge_subgraph(P.expr.with_flow.value > 0.5).plot(orphan_edges=False)
../../_images/3946aaed4c95ab0482678be25e4885fca6904577d2df6ec8467879823a1b4b4a.svg

Note

Acyclic Flows on hypergraphs are not supported. Creating a AcyclicFlow problem on a graph with at least one hyperedge will raise an error. Acyclicity in hypergraphs can be indirectly achieved by using normal Network Flows and minimizing the number selected edges with flow in the network.