Section: Research Program
Adaptive Graph Construction
In most applications, edge weights are computed through a complex data modeling process and convey crucially important information for classifying nodes, making it possible to infer information related to each data sample even exploiting the graph topology solely. In fact, a widespread approach to several classification problems is to represent the data through an undirected weighted graph in which edge weights quantify the similarity between data points. This technique for coding input data has been applied to several domains, including classification of genomic data [42], face recognition [31], and text categorization [36].
In some cases, the full adjacency matrix is generated by employing
suitable similarity functions chosen through a deep understanding of
the problem structure. For example for the TF-IDF representation of documents,
the affinity between pairs of samples is often estimated through the
cosine measure or the
In this project we will address the problem of adaptive graph
construction towards several directions. The first one is about how to choose the best similarity measure given the objective learning
task. This question is related to the question of metric and similarity learning
( [25], [26]) which has not been considered in the
context of graph-based learning. In the context of structured
prediction, we will develop approaches where output structures are
organized in graphs whose similarity is given by top-
A different way we envision adaptive graph construction is in the context of semi-supervised learning. Partial supervision can take various forms and an interesting and original setting is governed by two currently studied applications: detection of brain anomaly from connectome data and polls recommendation in marketing. Indeed, for these two applications, a partial knowledge of the information diffusion process can be observed while the network is unknown or only partially known. An objective is to construct (or complete) the network structure from some local diffusion information. The problem can be formalized as a graph construction problem from partially observed diffusion processes. It has been studied very recently in [38]. In our case, the originality comes either from the existence of different sources of observations or from the large impact of node contents in the network.
We will study how to combine graphs defined by networked data and graphs built from flat data to solve a given task. This is of major importance for information networks because, as said above, we will have to deal with multiple relations between entities (texts, spans of texts, ...) and also use textual data and vectorial data.