<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0 plus SVG 1.1//EN" "http://www.w3.org/2002/04/xhtml-math-svg/xhtml-math-svg.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
    <title>Project-Team:THOTH</title>
    <link rel="stylesheet" href="../static/css/raweb.css" type="text/css"/>
    <meta name="description" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.title" content="Overall Objectives - Overall Objectives"/>
    <meta name="dc.subject" content=""/>
    <meta name="dc.publisher" content="INRIA"/>
    <meta name="dc.date" content="(SCHEME=ISO8601) 2017-01"/>
    <meta name="dc.type" content="Report"/>
    <meta name="dc.language" content="(SCHEME=ISO639-1) en"/>
    <meta name="projet" content="THOTH"/>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
      <!--MathJax-->
    </script>
  </head>
  <body>
    <div class="tdmdiv">
      <div class="logo">
        <a href="http://www.inria.fr">
          <img style="align:bottom; border:none" src="../static/img/icons/logo_INRIA-coul.jpg" alt="Inria"/>
        </a>
      </div>
      <div class="TdmEntry">
        <div class="tdmentete">
          <a href="uid0.html">Project-Team Thoth</a>
        </div>
        <span>
          <a href="uid1.html">Personnel</a>
        </span>
      </div>
      <div class="tdmActPage">
        <a href="./uid3.html">Overall Objectives</a>
      </div>
      <div class="TdmEntry">Research Program<ul><li><a href="uid8.html&#10;&#9;&#9;  ">Designing and learning structured models</a></li><li><a href="uid12.html&#10;&#9;&#9;  ">Learning of visual models from minimal supervision</a></li><li><a href="uid17.html&#10;&#9;&#9;  ">Large-scale learning and optimization</a></li><li><a href="uid21.html&#10;&#9;&#9;  ">Datasets and evaluation</a></li></ul></div>
      <div class="TdmEntry">Application Domains<ul><li><a href="uid26.html&#10;&#9;&#9;  ">Visual applications</a></li><li><a href="uid32.html&#10;&#9;&#9;  ">Pluri-disciplinary research</a></li></ul></div>
      <div class="TdmEntry">
        <a href="./uid37.html">Highlights of the Year</a>
      </div>
      <div class="TdmEntry">New Software and Platforms<ul><li><a href="uid45.html&#10;&#9;&#9;  ">ACT-detector</a></li><li><a href="uid50.html&#10;&#9;&#9;  ">Joint object-action learning</a></li><li><a href="uid55.html&#10;&#9;&#9;  ">BlitzNet</a></li><li><a href="uid60.html&#10;&#9;&#9;  ">LCR-Net</a></li><li><a href="uid65.html&#10;&#9;&#9;  ">CKN-seq</a></li><li><a href="uid71.html&#10;&#9;&#9;  ">CKN-TensorFlow</a></li><li><a href="uid75.html&#10;&#9;&#9;  ">Stochs</a></li><li><a href="uid80.html&#10;&#9;&#9;  ">MODL</a></li><li><a href="uid85.html&#10;&#9;&#9;  ">Loter</a></li><li><a href="uid91.html&#10;&#9;&#9;  ">SPAMS</a></li><li><a href="uid95.html&#10;&#9;&#9;  ">MP-Net</a></li><li><a href="uid100.html&#10;&#9;&#9;  ">LVO</a></li><li><a href="uid105.html&#10;&#9;&#9;  ">SURREAL</a></li></ul></div>
      <div class="TdmEntry">New Results<ul><li><a href="uid111.html&#10;&#9;&#9;  ">Visual recognition in images</a></li><li><a href="uid135.html&#10;&#9;&#9;  ">Visual recognition in videos</a></li><li><a href="uid149.html&#10;&#9;&#9;  ">Large-scale statistical learning</a></li><li><a href="uid156.html&#10;&#9;&#9;  ">Machine learning and pluri-disciplinary research</a></li></ul></div>
      <div class="TdmEntry">Bilateral Contracts and Grants with Industry<ul><li><a href="uid165.html&#10;&#9;&#9;  ">MSR-Inria joint lab: scientific image and video mining</a></li><li><a href="uid166.html&#10;&#9;&#9;  ">MSR-Inria joint lab: structured large-scale machine learning</a></li><li><a href="uid167.html&#10;&#9;&#9;  ">Amazon</a></li><li><a href="uid168.html&#10;&#9;&#9;  ">Intel</a></li><li><a href="uid169.html&#10;&#9;&#9;  ">Facebook</a></li><li><a href="uid170.html&#10;&#9;&#9;  ">Xerox Research Center Europe</a></li><li><a href="uid171.html&#10;&#9;&#9;  ">Naver</a></li></ul></div>
      <div class="TdmEntry">Partnerships and Cooperations<ul><li><a href="uid173.html&#10;&#9;&#9;  ">Regional Initiatives</a></li><li><a href="uid175.html&#10;&#9;&#9;  ">National Initiatives</a></li><li><a href="uid178.html&#10;&#9;&#9;  ">European Initiatives</a></li><li><a href="uid182.html&#10;&#9;&#9;  ">International Initiatives</a></li><li><a href="uid198.html&#10;&#9;&#9;  ">International Research Visitors</a></li></ul></div>
      <div class="TdmEntry">Dissemination<ul><li><a href="uid203.html&#10;&#9;&#9;  ">Promoting Scientific Activities</a></li><li><a href="uid276.html&#10;&#9;&#9;  ">Teaching - Supervision - Juries</a></li><li><a href="uid313.html&#10;&#9;&#9;  ">Popularization</a></li></ul></div>
      <div class="TdmEntry">
        <div>Bibliography</div>
      </div>
      <div class="TdmEntry">
        <ul>
          <li>
            <a id="tdmbibentyear" href="bibliography.html">Publications of the year</a>
          </li>
        </ul>
      </div>
    </div>
    <div id="main">
      <div class="mainentete">
        <div id="head_agauche">
          <small><a href="http://www.inria.fr">
	    
	    Inria
	  </a> | <a href="../index.html">
	    
	    Raweb 
	    2017</a> | <a href="http://www.inria.fr/en/teams/thoth">Presentation of the Project-Team THOTH</a> | <a href="http://thoth.inrialpes.fr/">THOTH Web Site
	  </a></small>
        </div>
        <div id="head_adroite">
          <table class="qrcode">
            <tr>
              <td>
                <a href="thoth.xml">
                  <img style="align:bottom; border:none" alt="XML" src="../static/img/icons/xml_motif.png"/>
                </a>
              </td>
              <td>
                <a href="thoth.pdf">
                  <img style="align:bottom; border:none" alt="PDF" src="IMG/qrcode-thoth-pdf.png"/>
                </a>
              </td>
              <td>
                <a href="../thoth/thoth.epub">
                  <img style="align:bottom; border:none" alt="e-pub" src="IMG/qrcode-thoth-epub.png"/>
                </a>
              </td>
            </tr>
            <tr>
              <td/>
              <td>PDF
</td>
              <td>e-Pub
</td>
            </tr>
          </table>
        </div>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid8.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
      <div id="textepage">
        <!--DEBUT2 du corps du module-->
        <h2>Section: 
      Overall Objectives</h2>
        <h3 class="titre3">Overall Objectives</h3>
        <p>In 2018, it is expected that nearly 80% of the Internet
traffic will be due to videos, and that it would take an individual
over 5 million years to watch the amount of video that will cross
global IP networks each month by then. Thus, there is a
pressing and in fact increasing demand to annotate and index this
visual content for home and professional users alike. The available
text and speech-transcript metadata is typically not sufficient by itself for
answering most queries, and visual data must come into play. On the
other hand, it is not imaginable to learn the models of visual content
required to answer these queries by manually and precisely annotating
every relevant concept, object, scene, or action category in a
representative sample of everyday conditions—if only because it may
be difficult, or even impossible to decide a priori what are the
relevant categories and the proper granularity level. This suggests
reverting back to the original metadata as source of annotation,
despite the fact that the information it provides is typically sparse
(e.g., the location and overall topic of newscasts in a video archive)
and noisy (e.g., a movie script may tell us that two persons kiss in
some scene, but not when, and the kiss may occur off screen or not
have survived the final cut). On the other hand, this weak form of
“embedded annotation” is rich and diverse, and mining the
corresponding visual data from the web, TV or film archives guarantees
that it is representative of the many different scene settings
depicted in situations typical of on-line content. Thus, leveraging
this largely untapped source of information, rather than attempting to
hand label all possibly relevant visual data, is a key to the future
use of on-line imagery.</p>
        <p>Today's object recognition and scene understanding technology operates
in a very different setting; it mostly relies on fully supervised
classification engines, and visual models are essentially (piecewise)
rigid templates learned from hand labeled images. The sheer scale of
on-line data and the nature of the embedded annotation call for a
departure from this fully supervised scenario. The main idea of the
Thoth project-team is to develop a new framework for learning
the structure and parameters of visual models by actively exploring
large digital image and video sources (off-line archives as well as
growing on-line content, with millions of images and thousands of hours of video), and
exploiting the weak supervisory signal provided by the accompanying
metadata. This huge volume of visual training data will allow us to
learn complex non-linear models with a large number of parameters,
such as deep convolutional networks and higher-order graphical
models. This is an ambitious goal, given the sheer volume and
intrinsic variability of the visual data available on-line, and the
lack of a universally accepted formalism for modeling it. Yet, the
potential payoff is a breakthrough in visual object recognition and
scene understanding capabilities. Further, recent advances at a
smaller scale suggest that this is realistic. For example, it is
already possible to determine the identity of multiple people from
news images and their captions, or to learn human action models from
video scripts. There has also been recent progress in adapting
supervised machine learning technology to large-scale settings, where
the training data is very large and potentially infinite, and some of
it may not be labeled. Methods that adapt the structure of visual
models to the data are also emerging, and the growing computational
power and storage capacity of modern computers are enabling factors
that should of course not be neglected.</p>
        <p>One of the main objective of Thoth is to transform massive visual data
into trustworthy knowledge libraries. For that, it addresses several challenges.</p>
        <ul>
          <li>
            <p class="notaparagraph"><a name="uid4"> </a>designing and learning structured models capable of representing complex visual
information.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid5"> </a>learning visual models from minimal supervision or unstructured meta-data.</p>
          </li>
          <li>
            <p class="notaparagraph"><a name="uid6"> </a>large-scale learning and optimization.</p>
          </li>
        </ul>
      </div>
      <!--FIN du corps du module-->
      <br/>
      <div class="bottomNavigation">
        <div class="tail_aucentre">
          <a href="./uid1.html" accesskey="P"><img style="align:bottom; border:none" alt="previous" src="../static/img/icons/previous_motif.jpg"/> Previous | </a>
          <a href="./uid0.html" accesskey="U"><img style="align:bottom; border:none" alt="up" src="../static/img/icons/up_motif.jpg"/>  Home</a>
          <a href="./uid8.html" accesskey="N"> | Next <img style="align:bottom; border:none" alt="next" src="../static/img/icons/next_motif.jpg"/></a>
        </div>
        <br/>
      </div>
    </div>
  </body>
</html>
