Qgaris a research project-team of Loria.
Our group works on the conversion of weakly structured information—such as the image of a paper document, or a PDF file, for instance—into ``enriched'' information, structured in such a way that it can be directly handled within information systems. Our research belongs to the document analysis field, and more precisely to the graphics recognition community. However, this community is perfectly aware of the fact that recognition alone is not sufficient, and will not lead to fully automated back-conversion. This is mainly due to the problem of taking completely into account the semantics (or domain knowledge) of the information which is processed.
Our group aims at using graphics recognition methods to index and structure weakly structured graphical information, contained in graphics-rich documentssuch as technical documentation. In these cases, text-based retrieval (based on annotations or textual references, for instance) must be complemented with the handling of purely graphical information, such as symbols or drawing parts. We are interested in exploring the capacity of graphics recognition methods to compute useful features for indexing and information retrieval in technical documentation.
Our scientific foundations are in the domain of image analysis and pattern recognition. For many years, the main contributions of our group were in the area of algorithms and methods for image analysis and segmentation, with a specific thrust on images of graphics-intensive documents. In the last years, while keeping a regular activity in this domain, we have moved our main effort towards pattern recognition methods, especially for symbol recognition and spotting. But, of course, recognition tasks also require the prior extraction of features, using image processing and segmentation methods.
There are a number of algorithmic problems in the conversion from pixels to features. Our group has designed several algorithms and methods for binarization, vectorization or text-graphics segmentation. However designing new methods, or variants of older methods, is not enough. We also must be able to characterize and evaluate the performances of the methods we use, to study their robustness and reliability, and to have stable implementations for them (hence, the focus on software, see § ).
Vectorization consists in converting a binary image into a series of graphical primitives (vectors and circular arcs), which are a good representation of the original drafted figure. We have worked for many years on this problem, and have proposed several algorithms. As most of the existing methods have two major drawbacks, over-segmentation and poor geometric precision, especially at the junctions between vectors , the focus of our work on vectorization has been to overcome these limitations.
Too often, feature extraction methods rely on a number of parameters which must be fine-tuned from one application to another. We aim at defining robust and stable methods, with as few parameters as possible, by studying the relationship between the values of the parameters, the data to be processed, and the results yielded. We are more particularly concerned with two segmentation problems:
Binarizationcan be defined as the partitioning of an image into two classes: shape and background. It is a fundamental step in document image analysis systems. Many methods are available, but few studies have been looking more specifically at the problems found with documents containing simultaneously text and graphics. We are exploring various approaches, especially some methods based on fuzzy logic, which do not necessitate any manually-set thresholding parameter.
Text-graphics segmentationconsists in separating image parts considered to be of textual nature from other parts supposed to be graphics. Most existing methods are based on the analysis of connected components. We have improved existing methods to achieve higher-quality segmentation, including steps to retrieve the parts of the text which touch the graphics .
Performance evaluation is a major concern in document analysis, and more generally in image processing, pattern recognition and computer vision. There are different approaches to the problem. One of them considers the method to be evaluated as a completely separate module, which is fed with synthetic or real data. Evaluation then consists in comparing the result of the module with some ground truth. Another approach consists in evaluating the performances of a segmentation method by the observed qualities of recognition steps using the features yielded by the segmentation. This is sometimes called goal-oriented performance evaluation .
We are actively involved in the organization of performance evaluation campaigns for symbol recognition, at a national and international level. Our project-team is in particular leader of the Épeires project (cf. § ), affiliated to the Techno-Vision program. This project aims at providing a complete environment for such performance evaluations, for our own needs—as organizers—as well as for the needs of any team working on symbol recognition or using recognition methods and algorithms.
Of course, a number of open scientific questions are related to performance evaluation of graphics recognition methods. This includes intricate problems such as defining simple and non-biased metrics and matching procedures between the ground-truth and the output of recognition methods, when the answer is more complex than a simple ``recognized'' or ``not recognized'' label (a good example is the evaluation of a vectorization method). Another potential problem is the generation of large sets of training or benchmarking data by using image degradation models—these models have to be realistic and as interchangeable as possible with real data.
Symbol recognition consists in localizing and identifying the symbols which are present in a graphical document . Symbols are natural features to use in indexing and retrieval applications. Whereas symbol recognition has ancient and solid foundations, and has proved to be mature for character recognition, for instance, complex symbols, with large variations, are still a major problem. Existing recognition methods suffer from a number of weaknesses which make them difficult to use in such a context and which we aim at dealing with:
In querying and browsing applications, it is often impossible to work with a database of reference symbols known in advance. It is more often the case that the user delineates an arbitrary symbol in a drawing and queries for similar symbols in the set of available documents. We therefore need to work on methods able to recognize or at least spot ``on the fly'', without prior learning or precompilation of models.
We are interested in dealing with cases where prior segmentation of the image is difficult or even impossible, i.e. in designing segmentation-free recognition methods, or methods which perform segmentation while they recognize.
There are a number of efficient methods to recognize a symbol among 10 or 20 different models, even if the symbol to be recognized is distorted by noise or by other touching graphics, for example. However, these methods do not scale well to the recognition of a number of symbols an order of magnitude larger. There are both computational complexity issues and open questions about the discrimination power of the methods chosen for recognition.
Signatures are often used for indexing and retrieval purposes, but most work has concentrated on text-based or image-based signatures. We think that there is also room for graphics-specific signatures, to achieve an efficient localization and recognition of symbols, and we currently work in two directions:
Quick and robust symbol localization through symbol-based signatures: we propose to combine a feature descriptor method with a structural representation of symbols. We define a robust structural representation based on key points, which allows a quick localization of candidate symbols within documents. Each candidate is then recognized using an adaptation of the Radon transform, which preserves the main geometric transformations usually required for the recognition of symbols. We process directly the grey level document image, in order to improve the recognition steps.
Vector-based signatures: it is not always necessary to directly work on the raw image data, whereas vectorization can yield a set of vector data. In many cases, we also actually retrieve vector data from CAD files or similar electronic representations. It is therefore interesting to also use signatures directly computed on these vector data.
Ultimately, our idea is to be able to design features which can provide signatures for efficient symbol spotting and identification of broad hypotheses . This will probably be enough for good retrieval performance, in many cases. If finer recognition is needed, the preliminary use of such signatures can eliminate a number of symbol hypotheses and help segmenting out the candidate region. More classical recognition methods can then be applied with more efficiency.
In the presence of a large number of symbols, both signatures and structural recognition methods may not be sufficient to discriminate. They could then be used as pre-classification steps, followed by recognition through usual classification methods within the family identified by the signature.
Our main application domain is the processing and analysis of documents—i.e. information produced by humans for communicating with other humans—which convey a huge amount of information in very ``poor'' formats: paper documents, or low-level, poorly structured digital formats such as Postscript, PDF or DXF.
We are more specifically interested in graphics-rich documents, typically technical documentation containing text, but also a lot of graphics. The usual text-based indexing and retrieval methods are still of interest, but we also need additional ways of accessing the information conveyed by the documents: recurring symbols, connections between textual descriptions and drawing parts, etc. Within this general application area, we deal with two major kinds of document analysis applications:
Specific documentation referring to a well-known framework of technical knowledge: a good knowledge of the kind of information we want to extract from the documents is usually available. In the case of symbol recognition, we have models of the symbols to be recognized. The electrical wiring diagrams in aircrafts, as those we are dealing with in the European FRESH project (cf. § ), are a typical example.
Open documentation, where we make few or even no strong assumptions on the kind of information we will have to deal with: this is typically the case with applications for browsing large sets of heterogeneous documents, with the user providing ``on the fly'' information about the symbols or structures he is looking for. We are currently working with France Télécom R&D on this topic (cf. § ).
When starting a new application, a research team working in a multi-disciplinary domain like graphics recognition must be able to reuse whole or part of software implemented for previous work, as well as collected experience. Since several years, the Qgarproject-team has devoted much effort to the construction of a software environment including:
QgarLib, a library of C++ classes implementing basic graphics analysis and recognition methods,
QgarApps, an applicative layer, including high-level applications (binarizations, edge detection, text-graphics separation, thick-thin separation, vectorization, etc.),
QgarGUI, a graphical interface to design and run applications, providing data manipulation and display capabilities.
Application management is plugin-based. Each executable binary file is paired with a XML description file which is parsed when the user interface is launched: the application is then dynamically integrated into the menus of the interface, and the dialog boxes to access the documentation and run the application are dynamically generated. The description thus allows any application to be easily coupled with a remote system using an approach of the same kind. Conversely, as the integration (or removal) of an application does not imply any modification of the user interface, it is easy to install remote applications, provided by partners for testing, for example. This is particularly useful for comparing different methods performing the same task, in the context of performance evaluation, a topic which is part of our current research work, as previously mentioned.
The whole system is written in C++ and includes about 100,000 lines of code, including unit-testing procedures. A particular attention has been paid to the support of ``standard'' formats (PBM+, DXF, SVG), high-quality documentation, configuration facilities (using autoconf/automake), and support of Unix/Linux and Windows operating systems.
The Qgar system is registered with the French agency for software protection (APP) and may be freely downloaded from its web site ( http://www.qgar.org). In the period January to September 2005, the source code was downloaded on average 36 times each month and the documentation browsed more than 10 times each day (robots excluded, with an estimation that 90% of the visits are by indexing robots). The software has also been used within the context of several cooperation projects.
Until now, a great part of the work on the Qgar software has been devoted to its architecture. As the system reached a stable and matured state at the end of 2004, we currently focus on its content, that is to say the QgarLib library and the QgarApps applications: reengineering of the high-level tools, which have never been revised until now, and, above all, integration of the new methods and tools resulting from ongoing work.
The QgarLib library has been completely reorganized, in order to get rid of recurrent technical problems, like inclusion loops, once for all. An important effort has also been made to match the C++ standard notations and data layout.
The set of unit-testing procedures has been completed, so that reliable tests are available for all the basic tools and that the correctness of the library may be guaranteed in case of minor or major modification of any module.
Some high-level methods, like Trier and Taxt's binarization, for example, have been refactored to increase their efficiency.
An auxiliary library, organized like QgarLib, on the basis of the same programming and documentation standards, has been installed. It stores the draft versions of new methods and applications designed by the members of the project-team, to make their reengineering easier before they are integrated into the Qgar software.
All this led to the release of a new version of the software, Qgar 2.2, on November 2, 2005.
In the context of the Épeires performance evaluation project (cf. § ), we currently build up a complete information system, able to manage all required data related to performance evaluation of symbol recognition methods. It includes the management of data themselves, but also of their classification, of the automatic degradation processes, of the participants profiles, of the available tests and results storage. We also develop a collaborative ground-truth managing software, used to create, review and validate labelling of test images with respect to the reference symbols. The architecture of this software is based on a client/server model, developed in Java and PHP and connected to the information system. The software is hosted on gforge.inria.fr.
We have also developed a web site for the project, where all resources will be freely available for the scientific community. The users will be able to manage their methods, to generate tests evaluation for specific purposes or to download existing ones, to send their results, and to analyze them. Obviously, the same functionalities will be provided to the organizers (the Épeires consortium), as well as more administrative ones.
This raster-to-vector conversion software (cf. § ), won the first prize in the 2005 International Arc Detection Contest. It was initiated during a former collaboration with FS2i under a Cifrecontract. Discussions are going on with FS2i about its further exploitation.
In the area of document image segmentation, we have proposed a new algorithm for binarization— i.e. separating foreground from background—of scanned document images. The method is based on hierarchical decomposition . The document is roughly binarized and the resulting image is then converted into a quadtree. For each area of the image, corresponding to a branch of the quadtree, a local threshold is computed and applied to all the pixels belonging to the region under consideration.
A new method of raster-to-vector conversion was designed during the PhD thesis of Xavier Hilaire and was extended during his post-doc period . This method consists in segmenting the skeleton using a technique based on random sampling, and then in simplifying the result. It is robust with a best bound of 50% noise reached for indefinitely long primitives. The accurate estimation of the recognized vectors' parameters is enabled by explicitely computing their feasibility domains. The method won the first prize at the 2005 International Contest on Arc Detection.
In most recognition problems which have been satisfactorily solved, the data are matched with a set of known models, provided that it has been possible to perform some learning or model description beforehand. We are interested in what we have called ``on the fly symbol recognition''. We might also call it ``dynamic'' or ``unsupervised'' recognition: the user is given the possibility to delineate a region of interest in a document, calling it a symbol, and to simply ask the system to localize other instances of this symbol. We have named this general problem ``symbol spotting'', as it has similarities with other problems in document image analysis, such as word spotting, table spotting, etc.
This problem is probably more related to content-based image retrieval problems, with the difference that we assume that the user looks for graphical constructions which can be categorized as symbols, and hence that features, signatures, matching and classification methods used in symbol recognition are probably appropriate also in this context.
We have currently two ongoing PhD theses where various aspects of this problem are addressed. In the work of Daniel Zuwala, we are studying a method to spot potential symbols without using any a prioriknowledge. We use a structural representation of the document, which is divided into key points . Candidate symbols are located thanks to a density measure and photometric information is used to discard or accept symbols with respect to a model query. In the work of Jan Rendek, in cooperation with France Télécom R&D, we are exploring the combination of feature-based classification and relevance feedback in the context of freehand recognition without any known a priorimodel.
Within the context of user-guided on-the-fly spotting and recognition, we obtained some first promising results using a combination of nearest-neigbour grouping and user feedback. We have shown that user feedback drastically enhances system convergence towards a high recognition rate. Even with very few user-labelled symbols ( e.g. 10), precision goes up with 30% for a recall of 50%. These results were obtained on a reduced set of examples, using Zernike moment descriptors only. Current work focuses on extending the method to other kinds of symbol description, and on better user guidance to increase information return by developing statistical analysis approaches of pertinence measures and by increasing the size of data sets. Part of this work is currently being conducted in collaboration with the Computer Vision Center at the Universitat Autònoma de Barcelona (cf. § ).
Ultimately, we aim at providing mature graphics recognition methods which can become a part of multimodal indexing and retrieval environments allowing the user to browse large multimedia and document databases, on criteria such as text, graphics, images, video, etc. and combinations of different criteria (text/graphics associations, for instance). Our starting cooperations with the Texmex and Lear project-teams (cf. § ) are steps in this direction.
We have defined a new method for combining shape descriptors based on a behavior study of a learning set . Each descriptor is applied on several clusters of objects or symbols. For each cluster and for any descriptor, an appropriate mapping is directly carried out from the learning database. Then, existing conflicts are assessed and integrated into a map. Such a combination of descriptors improves the recognition from real data. We have also proposed a new approach to automatically extract an appropriate subset of shape descriptors dedicated to a given application. A model based on the Choquet integral and on Shapley values has been designed to extract a subset of descriptors associated to each cluster and related to the classes of shapes. Experimental studies using real databases have demonstrated the usefulness of such an approach.
We are also studying a new combination rule for classifiers, based on feature vectors. The method is inspired by boosting algorithms which combine weak classifiers to obtain a new robust one. This new point of view allows us to get free from the constraining hypothesis of independence, which is assumed on existing combination rules.
Performance evaluation of recognition methods has gained increasing interest in the last years. A major contribution is the organization of two international contests on symbol recognition methods . Since the end of 2004, our project-team is also leader of the Épeires project, affiliated to the Techno-Vision campaign. The aim of this project is to build up a complete environment for performance evaluation of symbol recognition and localization methods. Our project-team is currently developing this environment and is working with the other Épeires partners to define the related metrics and protocols required for the evaluation campaigns. This also includes the development of an open source collaborative ground-truth managing software for dataset labelling purpose.
We have a long-lasting partnership with France Télécom R&D, on various issues within document image analysis. From 2001 to 2003, we were members of the RntlDocMining project led by FT R&D. Since the end of 2004, we have a new partnership with FT R&D on the topic of on-the-fly symbol recognition (§ ). In addition to the contract itself FT R&D pays the salary of Jan Rendek, a PhD student, under a Cifrecontract.
Algo'tech Informatique is a French company based in Biarritz, which develops CAD solutions in electrical design. They have a vectorization and document analysis system for electrical wiring drawings. Within the European
Strepproject
Fresh
The partners of the project are: Algo'tech Informatique (France), Estia(France), Euro Inter (France), EadsSogerma Drawing (France), Ceit(Spain), Rector (Poland), Tekever (Portugal), and Zenon (Greece).
This research project
Associated partners are the universities of La Rochelle (L3i laboratory), of Rouen ( Psilaboratory), of Tours (LI laboratory), of Lyon ( Liris), of Rennes ( Irisa), and the Qgarproject-team.
Techno-Vision is a French national program with the purpose of funding projects related to performance evaluation of vision algorithms in computer science. We are leaders of the Épeires project
The funded partners are the universities of La Rochelle (L3i laboratory), of Rouen ( Psilaboratory), of Tours (LI laboratory), and the Algo'Tech company. The non-funded partners are the City University of Hong Kong, the Dagteam of the Computer Vision Center of Barcelona, and the Onelaboratory of France Télécom R&D.
In 2005, we continued our long-lasting scientific cooperation with the Computer Vision Center at Universitat Autònoma de Barcelona. This included joint PhD supervisions, student, regular researcher and post-doc exchanges, collaboration in the Techno-Vision Épeires project,
Inriaassociated team SymbolRec
Jan Rendek, a PhD student of the Qgarproject-team, spent two months in Barcelona, from mid-October to mid-December, under the supervision of Josep Lladós and Bart Lamiroy, in collaboration with Marçal Rusiñol. The main goal of his stay was to overcome some of the segmentation shortcomings of the on-the-fly symbol recognition method he is currently developing for his PhD thesis. This method is based on the Cvcsymbol spotting technique, which will be coupled with the use of a nearest-neighbour-based relevance feedback, in order to reduce the number of false positives.
At the same period, Daniel Zuwala, another PhD student of the Qgar project-team, spent two months in Barcelona too, under the supervision of Ernest Valveny and Salvatore Tabbone. The main goal of his stay was to improve the spotting method he is developing for his PhD thesis, so that a large variety of symbols can be spotted, even in the context of highly connected documents.
Joan Mas Romeu spent November at Loria, in the Qgarproject-team, under the supervision of Bart Lamiroy. His work mainly concerned the integration of a user feedback loop in the construction of syntactic model descriptions. A grammar which correctly identifies a prioriunmodelled objects is automatically inferred from a known set of graphical primitives and binary relationships, and from positive as well as negative examples iteratively provided by the user.
Oriol Ramos Terrades also spent November in the Qgarproject-team, under the supervision of Salvatore Tabbone. He studied several ways to use boosting algorithms for combining and selecting features vectors within a supervised classification framework. He obtained promising results which should be published soon.
We initiated in 2004 a cooperation with Liu Wenyin's group at the City University of Hong Kong on performance evaluation of symbol recognition methods and on the definition of a large database of reference symbols for future international contests. This cooperation includes a collaboration within the Techno-Vision Épeires project (§ ), and will lead to a pending joint PhD supervision on symbol recognition.
Xavier Hilaire is continuing his cooperation with Pr. B. John Oommen, from Ottawa University, Canada, on the definition of a new scheme to globally match a given vector representation of an image against its ground gruth, for the purpose of performance evaluation. The definition of the scheme revealed an interesting problem, not yet investigated in the literature, and conjectured to be NP-hard. A heuristic-based solution has been implemented and tested, and appeared to be powerful. We wish to carry on this cooperation, since the necessary formal proofs have now to be stated in order to publish the method.
We initiated contacts with Ifi( Institut de la Francophonie pour l'Informatique) at Hanoi, Vietnam, through the research stay of a Master student, Nguyen Thi Oanh, who now starts a PhD work under joint supervision.
Karl Tombre is editor in chief of the International Journal on Document Analysis and Recognition ( Ijdar), and member of the editorial board of Elcvia, Machine Graphics & Vision, and Arima.
Karl Tombre is president of the French association for pattern recognition and image processing ( Afrif) and first vice-president of the International Association for Pattern Recognition ( Iapr).
Bart Lamiroy is elected to the administration council of Inpl, and is the elected representative of the Inplat the Ciriladministration council.
Bart Lamiroy is a member of the Comité de suivi de l'espace transfertof the Lorialaboratory. This committee follows and evaluates spin-offs and start-ups created by Loriamembers.
Gérald Masini is responsible for the commission of computing facilities users ( Comin) of Loria.
Karl Tombre heads the Department for Computer Science of the IaemDoctoral School, common to the four universities in Lorraine.
He is elected to the studies council ( Cevu) of Inpl.
At Loria, we have worked with Isabelle Debled-Rennesson, from the Adageteam, on defining a fast polygonal approximation of digital curves.
Within its activities towards the development of a framework allowing a more semantic level of image querying, the Qgar project-team has established informal exchanges with the TexMex project-team at Irisa-Inria Rennes. More precisely, we have started investigating the possibilities of combining the TexMex text analysis and mining know-how and the Qgar document image analysis expertise, in order to design automated processes which can analyze the text-graphics relationship on as high a semantic level as possible. Some promising first joint contacts have been established with potential industrial partners.
Within its activities towards the development of a framework allowing a more semantic level of image querying, the Qgarproject-team has established informal exchanges with the TexMex project-team at Irisa-InriaRennes. More precisely, we have started investigating the possibilities of combining the TexMex text analysis and mining know-how and the Qgar document image analysis expertise, in order to design automated processes which can analyze the text-graphics relationship on as high a semantic level as possible. Some promising first joint contacts have been established with potential industrial partners.
We also regularly work with the Imadoc group at Irisa, especially within the Madonne project (§ ) on heritage documents. At the end of 2005, we also initiated contacts with the Lear project-team at InriaRhône-Alpes, on a pending joint industrial collaboration on indexing and retrieving information from large multimedia databases.
Most members of the Qgarproject-team are University faculty members and as such have a statutory teaching service in their respective universities. In addition, several of them have major organizational and administrative responsibilities. The faculty members have teaching positions at various places:
Suzanne Collin, at Esial(engineering school, master of engineering level).
Philippe Dosch, at Université Nancy 2, at bachelor level.
He is the director of studies for the bachelor degree Administration of open source systems, networks and applications. He is also a member of the recruitment committee in computer science ( Commission de Spécialistes, 27 esection) at Université Nancy 2.
Bart Lamiroy, at École des Mines de Nancy/Institut National Polytechnique de Lorraine (engineering school, master of engineering level).
He is responsible for one option at the Department of Computer Science, and is the technical coordinator of the Ipisospecialized degree. He is also a member of the recruitment committee in computer science ( Commission de Spécialistes, 27 esection) at Institut National Polytechnique de Lorraine.
Salvatore Tabbone, at Université Nancy 2, at bachelor and master level.
He is head of the Department of Computer Science at Université Nancy 2.
Karl Tombre, at École des Mines de Nancy/Institut National Polytechnique de Lorraine (engineering school, master of engineering level).
He is head of the Department of Computer Science at École des Mines de Nancy and also heads the recruitment committee in computer science ( Commission de Spécialistes, 27 esection) at Institut National Polytechnique de Lorraine.
Laurent Wendling, at Esial(engineering school, master of engineering level).
He is director of studies at Esial.
Philippe Dosch was member of the organization committee for the symbol recognition contest at Grec'05 (Hong Kong).
Salvatore Tabbone was/is member of the program committees of Acm-Sac'05 (Santa Fe, New Mexico), Orasis'05(Clermont-Ferrand, France), Acidca-Icmi'05 (Tozeur, Tunisia), Iccr'05 (India), Icpr'06 (Hong Kong), Cifed'06 (Fribourg, Switzerland), and Acm-Sac'06 (Dijon, France).
Karl Tombre was/is member of the program committee for Icdar'05 (Seoul, Korea), Grec'05 (Hong Kong), Mva'05 (Tsukuba, Japan), Vis'05 (Amsterdam, The Netherlands), IbPRIA'05 (Estoril, Portugal), Cores'05 (Rydzyna Castle, Poland), X Ciarp(Havana, Cuba), Icpr'06 (Hong Kong), Das'06 (Nelson, New Zealand), Sspr'06 (Hong Kong), Rfia'06 (Tours, France), Cifed'06 (Fribourg, Switzerland), Cari'06 (Cotonou, Bénin), Icdar'07 (Curitiba, Brazil), and Mva'07 (Tokyo, Japan).
Laurent Wendling was member of the program committees of Orasis'05 (Clermont-Ferrand, France) and Acidca-Icmi'05 (Tozeur, Tunisia).