Section: New Results
Maximizing submodular functions using probabilistic graphical models
Participants : K. S. Sesh Kumar [correspondent] , Francis Bach.
In [34] we consider the problem of maximizing submodular functions; while this problem is known to be NPhard, several numerically efficient local search techniques with approximation guarantees are available. In this paper, we propose a novel convex relaxation which is based on the relationship between submodular functions, entropies and probabilistic graphical models. In a graphical model, the entropy of the joint distribution decomposes as a sum of marginal entropies of subsets of variables; moreover, for any distribution, the entropy of the closest distribution factorizing in the graphical model provides an bound on the entropy. For directed graphical models, this last property turns out to be a direct consequence of the submodularity of the entropy function, and allows the generalization of graphicalmodelbased upper bounds to any submodular functions. These upper bounds may then be jointly maximized with respect to a set, while minimized with respect to the graph, leading to a convex variational inference scheme for maximizing submodular functions, based on outer approximations of the marginal polytope and maximum likelihood bounded treewidth structures. By considering graphs of increasing treewidths, we may then explore the tradeoff between computational complexity and tightness of the relaxation. We also present extensions to constrained problems and maximizing the difference of submodular functions, which include all possible set functions.
Optimizing submodular functions has been an active area of research with applications in graphcutbased image segmentation [44] , sensor placement [69] , or document summarization [70] . A set function $F$ is a function defined on the power set ${2}^{V}$ of a certain set $V$. It is submodular if and only if for all $A,B\subseteq V$, $F\left(A\right)+F\left(B\right)\u2a7eF(A\cap B)+F(A\cup B)$. Equivalently, these functions also admit the diminishing returns property, i.e., the marginal cost of an element in the context of a smaller set is more than its cost in the context of a larger set. Classical examples of such functions are entropy, mutual information, cut functions, and covering functions—see further examples in [58] , [38] .
Submodular functions form an interesting class of discrete functions because minimizing a submodular function can be done in polynomial time [58] , while maximization, although NPhard, admits constant factor approximation algorithms [76] . In this paper, our ultimate goal is to provide the first (to the best of our knowledge) generic convex relaxation of submodular function maximization, with a hierarchy of complexities related to known combinatorial hierarchies such as the SheraliAdams hierarchy [83] . Beyond the graphical model tools that we are going to develop, having convex relaxations may be interesting for several reasons: (1) they can lead to better solutions, (2) they provide online bounds that may be used within branchandbound optimization and (3) they ease the use of such combinatorial optimization problems within structured prediction framework [91] .
We make the following contributions:

For any directed acyclic graph $G$ and a submodular function $F$, we define a bound ${F}_{G}\left(A\right)$ and study its properties (monotonicity, tightness), which is specialized to decomposable graphs.

We propose an algorithm to maximize submodular functions by maximizing the bound ${F}_{G}\left(A\right)$ with respect to $A$ while minimizing with respect to the graph $G$, leading to a convex variational method based on outer approximation of the marginal polytope [93] and inner approximation of the hypertree polytope.

We propose extensions to constrained problems and maximizing the difference of submodular functions, which include all possible set functions.