Section: New Results
Convex Relaxations for Learning Bounded Treewidth Decomposable Graphs
Participants : K. S. Sesh Kumar [correspondent] , Francis Bach.
In [24] we consider the problem of learning the structure of undirected graphical models with bounded treewidth, within the maximum likelihood framework. This is an NPhard problem and most approaches consider local search techniques. In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperforest polytopes with special structures, independently. A supergradient method is used to solve the dual problem, with a runtime complexity of $O({k}^{3}{n}^{k+2}logn)$ for each iteration, where $n$ is the number of variables and $k$ is a bound on the treewidth. We compare our approach to stateoftheart methods on synthetic datasets and classical benchmarks, showing the gains of the novel convex approach.
Graphical models provide a versatile set of tools for probabilistic modeling of large collections of interdependent variables. They are defined by graphs that encode the conditional independences among the random variables, together with potential functions or conditional probability distributions that encode the specific local interactions leading to globally welldefined probability distributions [42] , [93] , [67] .
In many domains such as computer vision, natural language processing or bioinformatics, the structure of the graph follows naturally from the constraints of the problem at hand. In other situations, it might be desirable to estimate this structure from a set of observations. It allows (a) a statistical fit of rich probability distributions that can be considered for further use, and (b) discovery of structural relationship between different variables. In the former case, distributions with tractable inference are often desirable, i.e., inference with runtime complexity does not scale exponentially in the number of variables in the model. The simplest constraint to ensure tractability is to impose treestructured graphs [52] . However, these distributions are not rich enough, and following earlier work [73] , [39] , [75] , [48] , [59] , [89] , we consider models with treewidth bounded, not simply by one (i.e., trees), but by a small constant $k$.
Beyond the possibility of fitting tractable distributions (for which probabilistic inference has linear complexity in the number of variables), learning boundedtreewidth graphical models is key to design approximate inference algorithms for graphs with higher treewidth. Indeed, as shown by [82] , [93] , [68] , approximating general distributions by tractable distributions is a common tool in variational inference. However, in practice, the complexity of variational distributions is often limited to trees (i.e., $k=1$), since these are the only ones with exact polynomialtime structure learning algorithms. The convex relaxation we designed enables us to augment the applicability of variational inference, by allowing a finer tradeoff between runtime complexity and approximation quality.
We make the following contributions:

We provide a novel convex relaxation for learning boundedtreewidth decomposable graphical models from data in polynomial time. This is achieved by posing the problem as a combinatorial optimization problem, which is relaxed to a convex optimization problem that involves the graphic and hypergraphic matroids.

We show how a supergradient ascent method may be used to solve the dual optimization problem, using greedy algorithms as inner loops on the two matroids. Each iteration has a runtime complexity of $O({k}^{3}{n}^{k+2}logn)$, where $n$ is the number of variables. We also show how to round the obtained fractional solution.

We compare our approach to stateoftheart methods on synthetic datasets and classical benchmarks showing the gains of the novel convex approach.