Acyclic Flows#
Many network inference and optimization problems require obtaining a Directed Acyclic Graph (DAG) to accurately model specific functions or dependencies. DAGs are fundamental structures widely applicable across diverse fields, such as Bayesian networks, causal inference, machine learning architectures, and structural equation modeling.
In Bayesian networks, DAGs represent probabilistic graphical models where nodes correspond to random variables and edges encode conditional dependencies. Similarly, in causal inference and representation learning, identifying DAG structures is crucial for capturing causal relationships or underlying data representations accurately. Learning these DAG structures typically involves combinatorial search algorithms designed to discover the topology that best fits observational or experimental data.
To simplify the process of modeling and solving these problems, CORNETO introduces the concept of Acyclic Flows. An Acyclic Flow is a specialized version of a network flow problem with additional constraints that enforce acyclicity. Specifically, this approach ensures that flows do not revisit previously visited nodes, effectively preventing cycles in the resulting graph. This is achieved by augmenting the original flow formulation with binary variables for selecting edges and continuous variables that define a valid topological ordering of nodes.
In this tutorial, we will demonstrate how to effectively model problems using CORNETO’s Acyclic Flows. We will start by constructing an initial graph containing cycles, and then transform this into a network flow formulation to identify the largest possible DAG within this graph. Through this process, you will learn how to leverage CORNETO to solve practical problems requiring acyclic structures.
import numpy as np
import corneto as cn
cn.info()
|
|
For illustration purposes, let’s generate a directed graph which contains cycles
G_ex = cn.Graph()
G_ex.add_edges(
[
("v1", "v2"),
("v1", "v7"),
("v3", "v1"),
("v2", "v4"),
("v4", "v6"),
("v4", "v7"),
("v5", "v6"),
("v6", "v7"),
("v7", "v2"),
("v2", "v5"),
("v3", "v6"),
("v5", "v1"),
]
)
G_ex.plot()
We are going to use Acyclic flows to create a problem in which any graph with cycles is by definition infeasible. Thus, this problem will search across the space of valid acyclic graphs that can be extracted from the graph.
Largest DAG from a given source and sink vertex#
To illustrate a practical optimization problem based on acyclic flows, we will extract the largest DAG from a directed graph, from a source/root vertex to a target/terminal vertex. To do this, we need to convert first this graph into a flow problem. Note that since the graph is directed, solutions depend on the selected root node and the direction of edges.
Let’s say we are interested in finding an acyclic sub-graph from v1
to v7
. We need to add two extra edges, one to “inject” flow and one to “remove” flow from the graph. We are not interested in the particular values of the flow, these are just mathematical artifacts to convert the problem into a network flow that can be efficiently handled by CORNETO:
G = G_ex.copy()
G.add_edge((), "v1")
G.add_edge("v7", ())
G.plot()
As you can see, two directed edges were added to the graph, one pointing to v1
indicating that this is a source of injecting flow, and one going out from v7
. Now we can solve an optimization problem in which we can extract some graph that contains at least both nodes. For example:
What is the largest direct acyclic graph from V1 to V7?
# We create an acyclic flow problem from the graph and we inspect the variables
P = cn.opt.AcyclicFlow(G)
P.expr
{'_flow': _flow: Variable((14,), _flow),
'_flow_ipos': _flow_ipos: Variable((14,), _flow_ipos, boolean=True),
'_flow_ineg': _flow_ineg: Variable((14,), _flow_ineg, boolean=True),
'_dag_layer_pos': _dag_layer_pos: Variable((7,), _dag_layer_pos),
'positive_flow': _flow_ipos: Variable((14,), _flow_ipos, boolean=True),
'negative_flow': _flow_ineg: Variable((14,), _flow_ineg, boolean=True),
'with_flow': Expression(AFFINE, NONNEGATIVE, (14,)),
'flow': _flow: Variable((14,), _flow)}
With P.expr
we can inspect the variables and expressions registered by the problem. In general, variables starting with _
are internal variables used to model the problem. Variables not starting with _
are aliases of variables or expressions. Here we see that we have flow
and with_flow
. These are aliases for variable _flow
and with_flow = _flow_ipos + _flow_ineg
. The with_flow
expression is a binary expression indicating if edge i
has a nonzero flow (positive or negative), so if P.expr.with_flow[i]
is 0, then P.expr.flow[i]
is 0. If P.expr.with_flow[i]
is 1, then P.expr.flow[i]
greater than tolerance
or lower than -tolerance
. The tolerance
parameter is a small number that indicates the minimum flow that we consider different from zero. This is necessary because of numerical issues in the solver. By default, tolerance
is set to 1e-4
. It can be changed by passing the tolerance
parameter to the AcyclicFlow
constructor, however, changing the default value is not recommended unless you experience numerical issues.
Now, we will use the with_flow
expression to indicate that we want to maximize the number of ones (the number of edges with flow different from zero). In this way, we will find the largest sub-graph which can carry an acyclic flow:
# Largest acyclic sub-graph
# TODO: Not intuitive to use _flow_ipos. Better to have a GraphProblem
# and we do sum(GP.edges), where edges links to _flow_ipos + _flow_ineg.
P.add_objectives(sum(P.expr.with_flow), weights=-1)
P.solve();
G.edge_subgraph(P.expr.with_flow.value > 0.5).plot(orphan_edges=False)
The acyclic flow creates a special variable needed to guarantee that there is a valid topological sort of the vertices with no loops. After solving the problem, we directly get a topological sort of the vertices for free:
for vertex, position in zip(G.V, P.symbols["_dag_layer_pos"].value):
print(vertex, int(position))
v1 0
v2 1
v7 4
v3 0
v4 2
v6 3
v5 2
Largest DAG from any source to any sink#
We have used acyclic network flows to find the largest acyclic graph starting and ending in two given vertices. However, there is a more general problem: given a graph, find the largest acyclic sub-graph contained in the graph.
We can use the same acyclic flow building block to define the problem. The difference is that now we need to add one extra edge per vertex. This edge will inject or remove flow from the graph, but we will let the solver decide which edges are selected for this. By extending this graph with those additional edges, we are reducing the problem of finding the largest acyclic sub-graph to an equivalent network flow problem.
G = G_ex.copy()
flow_edges = []
for i in range(G.nv):
flow_edges.append(G.add_edge(f"v{i+1}", ()))
G.plot()
Note that the edges we have added point from the origin vertex outward. By default, these edges only remove flow from the network. However, for a feasible solution, we need the ability to also inject flow into the network. We could simply add another edge pointing in the opposite direction.
However, to illustrate the generality and flexibility of CORNETO, we will take a different approach. We will add a negative lower bound to those edges, which will automatically make the edges bi-directional. Depending on the sign of the flow, the edge can either inject or remove flow from the network. This is a very useful trick that simplifies the modeling of many problems.
lb = np.zeros(G.ne)
ub = 10 * np.ones(G.ne)
# We will add a negative lower bound for
# the dummy edges that we added before
lb[flow_edges] = -10
P = cn.opt.AcyclicFlow(G, lb=lb, ub=ub)
P.add_objectives(sum(P.expr.positive_flow), weights=-1)
P.solve();
G.edge_subgraph(P.expr.with_flow.value > 0.5).plot(orphan_edges=False)
Note
Acyclic Flows on hypergraphs are not supported. Creating a AcyclicFlow
problem on a graph with at least one hyperedge will raise an error. Acyclicity in hypergraphs can be indirectly achieved by using normal Network Flows and minimizing the number selected edges with flow in the network.