Section: Research Program
Adaptive Graph Construction
In most applications, edge weights are computed through a complex data-modeling process and convey crucially important information for classifying nodes, which makes it possible to infer information related to each data sample even exploiting the graph topology solely. In fact, a widespread approach to the solution of several classification problems is representing the data through an undirected weighted graph in which edge weights quantify the similarity between data points. This technique for coding input data has been applied to several domains, including classification of genomic data ( [29] ), face recognition ( [17] ), and text categorization ( [22] ).
In some cases, the full adjacency matrix is generated by employing
suitable similarity functions chosen through a deep understanding of
the problem structure. For example TF-IDF representation of documents,
the affinity between pairs of samples is often estimated through the
cosine measure or the
In this project we will address the problem of adaptive graph
construction towards several directions. One is the question of
choosing the best similarity measure given the objective learning
task. This question is related to the question of similarity learning
( [13] ) which has not been considered in the
context of graph based learning. In the context of structured
prediction, we will develop approaches where output structures are
organized in graphs whose similarity is given by top-
A different way we envision adaptative graph construction is in the context of semi-supervised learning. Partial supervision can take various forms and an interesting and original setting is governed by two currently studied applications: detection of brain anomaly from connectome data and polls recommendation in marketing. Indeed, for these two applications, a partial knowledge of the information diffusion process can be observed while the network is unknown or only partially known. An objective is to construct (or complete) the network structure from some local diffusion information. The problem can be formalized as a graph construction problem from partially observed diffusion processes. It has been studied very recently in [24] . In our case, the originality comes either from the existence of different sources of observations or from the large impact of node contents in the network.
We will study how to combine graphs defined by networked data and graphs built from flat data to solve a given task. This is of major importance for information networks because, as said above, we will have to deal with multiple relations between entities (texts, spans of texts, ...) and also use textual data and vectorial data. We have started to work on combining graphs in a semi supervised setting for node classification problems along the PhD thesis of T. Ricatte. Future work include combination geared by semi-supervision on link prediction tasks. This can be studied in an active learning setting. But one important issue is to design scalable approaches, thus to exploit locality given by the network. Doing this we address another objective to build non uniformly parameterized combinations.