Section: Overall Objectives


Project WAM aims at making it easier to develop and use rich multimedia contents and applications on the web.

Many web sites are specializing in a single type of content, such as Picasa and Flickr for photographs, YouTube and Dailymotion for videos, iTunes and Deezer for music, etc. Some other sites offer web pages that contain text, pictures, videos and audio simultaneously (newspaper sites, for instance). So, different types of contents coexist on the web, even on the same web page, but this does not really make a multimedia web or multimedia pages. The web has demonstrated how links, relations, connections, interactions between pieces of information can enhance the raw content of each piece. We are not there yet with multimedia content. Integrating and connecting heterogeneous contents on the web still have to be explored.

That is the reason why we pay a particular attention to documents and applications that tightly integrate different types of media objects, be they discrete (text, images, equations) or continuous (video, audio, animations). Continuous contents add a time dimension to documents that mix various sorts of contents. This extra dimension raises new issues. It has to be combined with other, more traditional points of view on documents, such as their layout and style (spatial dimension), their organization often represented as a hierarchical structure (logical dimension), etc.

In the context of the web, multimedia resources are distributed and can be assembled in various ways to make different documents and to be processed by multiple applications, running on all sorts of computers, devices and networks. For this reason, they have to be represented in platform-neutral formats.

This approach to web multimedia content and applications raises a number of issues. We have chosen to address four categories of problems:

Multimedia Models and Formats

For a long time, most multimedia web pages have isolated continuous content behind the fences of add-ons or plug-ins, thus preventing real interaction between these contents and the rest of their host page or the whole web. In addition, the many interactive features that are available with discrete content have no equivalent within plug-ins, where users are limited to the same level of control they have with a VCR.

New models are required to represent the many dimensions of multimedia documents. Ideally, such models should keep the aspects of traditional documents that have proven useful, and extend them with the specificities of the web environment and continuous contents. The key issue here is to allow all these aspects to be present simultaneously for representing a single document. This would allow document models to be rich and versatile enough to offer many possibilities to a broad range of applications handling multimedia contents.

To be used in real applications, such multimedia document models have to be instantiated in actual formats and languages. As documents have to be part of the web, these formats must be compatible with existing web formats. They could be extensions of existing formats, or new languages that share as many features as possible with the existing ones. The goal is not to create a separate web for multimedia content, but to seamlessly extend the web as we know it.

XML Processing

XML was created for representing documents and data on the web in a secure and rigorous way. XML is now the ground on which web formats are built. If we want to propose new formats for the web, they have to be based on XML, and we need to make sure new applications will be able to take advantage of these formats. It is therefore crucial to better understand how XML structures can be handled, and what are the theoretical tools that may help to develop an effective framework for processing XML structures.

This is of course an ambitious and long-term goal that requires intermediate steps. The first specialized languages for handling XML structures were transformation-oriented (XSLT , XDuce, CDuce, etc.). Typically, programs written in these languages read an XML structure and produce another XML structure as their output, after performing some transformations. Query languages can also be considered as behaving that way. So, the transformation paradigm is an interesting intermediate step towards general XML processing. Actually, a number of applications can be built as transformations: document formatting (XSLT was initially developed as part of the XSL formatting language), filtering, merging, conversion, re-purposing, data query, etc.

A major component in an XML transformation language is the part that allows a programmer to select in the input structure the data of interest for a given transformation. We have then focused on this part of XML processing languages, and we have in particular studied the XPath language, which is used in a variety of other languages for XML (XSLT, XQuery , XML Schemas). We have also studied the CSS Selectors which play a similar role in the CSS language for style sheets. The main goal of this work is to find the theoretical tools and formalisms that are needed for static analysis of XPath expressions, in order to help programmers develop better and more reliable code for XML data and documents.

This work on XML has been recently extended to RDF and its query language SPARQL, in order to extend to the semantic web the results achieved for the web of documents and data.

Multimedia Authoring

Before they can be processed, multimedia XML documents have to be created. A significant part of web documents are generated by programs from other documents and data (see XML processing above), but another part is created by human authors using authoring tools. For multimedia formats to be really used, it is important that authoring tools be available.

Our work in the area of multimedia authoring tools aims at developing editing techniques for creating rich multimedia documents that take advantage of the many new dimensions of multimedia formats. The challenge is to keep these tools simple enough for average web users. Methods used for static, textual documents do not work for dynamic, multimedia web resources. New approaches have to be developed and experimented.

Research in this area is strongly connected with software development projects, with the goal of creating real tools that can be deployed on the web and that real users can use.

Augmented Environments

For the previous three objectives we have chosen Augmented Reality as an application domain that helps us focus our work in accordance with application requirements.

To recreate or augment our perception of the real world, all modalities may be involved. For visual perception, the media that come to mind are text, graphics, photographs, video (live or recorded). But augmented reality is not restricted to the visual space. The auditory space also contributes to re-creating or extending the user environment. Moreover, the visual and auditory spaces are connected: events happening in one space often have consequences in the other, and all this is synchronized.

The geographical space is important in augmented environments. The location of the users in the real or virtual world plays a key role, as well as the moves they make. This involves mobility, navigation, and specific kinds of information, such as maps or points of interest (PoIs). A number of information resources required to build augmented environments are available on the web. Applications have then not only to capture a lot of information about the local environment of their user (mainly through various sensors), but they also need to access additional information on the web.

All these features of augmented environments are very demanding for the other activities in the team. They require all kinds of multimedia information, that they have to combine. This information has to be processed efficiently and safely, often in real time, and it has also, for a significant part, to be created by human users.