The goal of the DOLPHIN

**Analysis of the structure of a MOP:**The analysis of the structure of the Pareto front by means of different approaches (statistical indicators, metamodeling, etc.) allows the design
of efficient and robust hybrid optimization techniques. In general, the current theory does not allow the complete analysis of optimization algorithms. Several questions are unanswered: i)
why a given method is efficient? ii) why certain instances are difficult to solve? Some work is needed to guide the user in the design of efficient methods.

The NFL theorem (No Free Lunch) shows that two optimization methods have the same global performances on the whole set of uniform optimization problems. Then, it is crucial to make some hypothesis on the studied problem. This may be done in two steps:

analyzing the target problem to identify its landscape properties,

including this knowledge in the proposed optimization method.

Our interest in this project is to answer all these questions and remarks for the multi-objective case. Another point considered is the performance evaluation of multi-objective optimization methods. We are also working on approximation algorithms with performance guarantee and the convergence properties of stochastic algorithms.

**Cooperation of optimization methods (metaheuristics and/or exact methods):**

The hybridation of optimization methods allows the cooperation of complementary different methods. For instance, the cooperation between a metaheuristic and an exact method allows to take advantage of the intensification process of an exact method in finding the best(s) solution(s) in a sub-space, and the diversification process of the metaheuristic in reducing the search space to explore.

In this context, different types of cooperation may be proposed. Those approaches are under study in the project and we are applying them to different generic MOPs (flow-shop scheduling problem, vehicle routing problem, covering tour problem, access network design, and the association rule problem in data mining).

**Parallel optimization methods:**Parallel and distributed computing may be considered as a tool to speedup the search to solve large MOPs and/or to improve the robustness of a given
method. Following this objective, we design and implement parallel metaheuristics (evolutionary algorithms, tabu search approach) and parallel exact methods (branch and bound algorithm,
branch and cut algorithm) for solving different large MOPs. Moreover, the joint use of parallelism and cooperation allows the improvement of the quality of the obtained solutions.

**Framework for parallel and distributed hybrid metaheuristics:**Our team contributes to the development of an open source framework for metaheuristics, named ParadisEO (PARAllel and
DIStributed Evolving Objects). Our contribution in this project is the extension of the EO (Evolving Objects) framework

In this project, our goal is the efficient design and implementation of this framework on different types of parallel and distributed hardware platforms: cluster of workstations (COW), networks of workstations (NOW) and GRID computing platforms, using the different suited programming environments (MPI, Condor, Globus, PThreads). The coupling with well-known frameworks for exact methods (such as COIN) will also be considered. The exact methods for MOPs developed in this project will be integrated in those software frameworks.

The experimentation of our framework by different users and applications outside the DOLPHIN project is considered. This is done in order to validate the design and the implementation issues of ParadisEO.

**Validation:**The designed approaches are validated on generic and real-life MOPs, such as:

Scheduling problems: Flow-shop scheduling problem;

Routing problems: Vehicle routing problem (VRP) and covering tour problem (CTP);

Mobile telecommunications: Design of mobile telecommunications networks (contract with France Telecom R&D) and design of access networks (contract with Mobinets);

Genomics: Association rule discovery (data mining task) for mining genomic data, protein identification, docking and conformational sampling of molecules.

Engineering design problems: Design of polymers.

Some benchmarks and their associated optimal Pareto fronts or the best known Pareto fronts have been defined and made available on the Web. We are also developing an open source
software, named GUIMOO

The analysis of structures (landscapes) of MOPs and the performance assessment of resolution methods are significant topics in the design of optimization methods. The effectiveness of metaheuristics depends on the properties of the landscape (roughness, convexity, etc). The notion of landscape has been first described in by the way of the study of species evolution. Then, this notion has been used to analyze combinatorial optimization problems.

The landscape is defined by a neighborhood operator and can be represented by a graph
G= (
V,
E). The vertices represent the solutions of the problem and an edge
(
e
_{1},
e
_{2})exists if the solution
e_{2}can be obtained by an application of the neighborhood operator on the solution
e_{1}. Then, considering this graph as the ground floor, we elevate each solution to an altitude equals to its cost. We obtain a surface, or landscape, made of peaks, valleys, plateaus,
cliffs, ...The problem lies in the difficulty to have a realistic view of this landscape.

As others, we believe that the main point of interest in the domain of combinatorial optimization is not the design of the best algorithm for a large number of problems but the search for the most adapted method to an instance or a set of instances of a given problem. Therefore, we are convinced that no ideal metaheuristic, designed as a black-box, may exist.

Indeed, the first studies realized in our research group on the analysis of landscapes of different mono-objective combinatorial optimization problems (traveling salesman problem, quadratic assignment problem) have showed that not only different problems correspond to different structures but also that different instances of the same problem correspond to different structures.

For instance, we have realized a statistical study of the landscapes of the quadratic assignment problem. Some indicators, which characterize the landscape of an instance, have been proposed and a taxonomy of the instances including three classes has been deduced. Hence it is not enough to adapt the method to the problem under study but it is necessary to specialize it according to the type of treated instance.

So in its studies of mono-objective problems, the OPAC research group has introduced into the resolution methods some information about the problem to be solved. The landscapes of some combinatorial problems have been studied in order to investigate the intrinsic natures of their instances. The resulting information have been inserted into an optimization strategy and allowed the design of efficient and robust hybrid methods. The extension of these studies to multi-objective problems is a part of the DOLPHIN project , , .

The DOLPHIN project is also interested in the performance assessment of multi-objective optimization methods. Nowadays, statistical techniques developed for mono-objective problems are used with adaptation to the multi-objective case. Nevertheless, specific tools are necessary in many cases: for example, the comparison of two different algorithms is relatively easy in the mono-objective case - we compare the quality of the best solution obtained in a fixed time, or the time needed to obtain a solution of a certain quality. The same idea can not be immediately transposed to the case where the output of the algorithms is a set of solutions having several quality measures, and not a single solution.

Various indicators have been proposed in the literature for evaluating the performance of multi-objective optimization methods but no indicator seems to outperform the others
. The OPAC research group has proposed two
indicators: the
*contribution*and the
*entropy*
. The contribution evaluates the supply in
term of Pareto-optimal solutions of a front compared to another one. The entropy gives an idea of the diversity of the found solutions. These two metrics are used to compare the different
metaheuristics in the research group, for example in the resolution of the bi-objective flow-shop problem, and also to show the contribution of the various mechanisms introduced in these
metaheuristics.

These metrics and others (generational distance, spacing, ...) are integrated in the open software GUIMOO developed within the framework of the DOLPHIN project. This software is dedicated to the visualization of landscapes (2D and 3D) for multi-objective optimization and the performance analysis by the use of special metrics.

One of the main issues in the DOLPHIN project is the study of the landscape of multi-objective problems and the performance assessment of multi-objective optimization methods to design efficient and robust resolution methods:

*Landscape study:*The goal here is to extend the study of landscapes of the mono-objective combinatorial optimization problems to multi-objective problems in order to determine the
structure of the Pareto frontier and to integrate this knowledge about the problem structure in the design of resolution methods.

This study has been initiated for the bi-objective flow-shop problem. We have studied the convexity of the frontiers obtained in order to show the interest of our Pareto approach compared to an aggregation approach, which only allows to obtain the Pareto solutions situated on the convex hull of the Pareto front (supported solutions).

Our preliminary study of the landscape of the bi-objective flow-shop problem shows that the supported solutions are very closed to each other. This remark leads us to improve an exact method initially proposed for bi-objective problems. Furthermore, a new exact method able to deal with any number of objectives has been designed.

*Performance assessment:*The goal here is to extend GUIMOO in order to provide efficient visual and metric tools for evaluating the assessment of multi-objective resolution
methods.

The success of metaheuristics is based on their ability to find efficient solutions in a reasonable time . But with very large problems and/or multi-objective problems, efficiency of metaheuristics may be compromised. Hence, in this context it is necessary to integrate metaheuristics in more general schemes in order to develop even more efficient methods. For instance, this can be done by different strategies such as cooperation and parallelization.

The DOLPHIN project deals with
*``a posteriori''*multi-objective optimization where the set of Pareto solutions (solutions of best compromise) have to be generated in order to give to the decision maker the opportunity
to choose the solution that interests him/her.

Population-based methods, such as evolutionary algorithms, are well fitted for multi-objective problems, as they work with a set of solutions
,
. To be convinced one may refer to the list of
references on Evolutionary Multi-objective Optimization maintained by Carlos A. Coello Coello

In order to assess the performances of the proposed mechanisms, we always proceed in two steps: first, experiments are carried out on academic problems, for which some best known results exist; second, we use real industrial problems to cope with large and complex MOPs. The lack of references in terms of optimal or best know Pareto set is a major problem. Therefore, the obtained results in this project and the test data sets will be available at the URL http://www.lifl.fr/OPACat Benchmarks.

In order to cope with advantages of the different metaheuristics, an interesting idea is to combine them. Indeed, the hybridization of metaheuristics allows the cooperation of methods having complementary behaviors. The efficiency and the robustness of such methods depend on the balance between the exploration of the whole search space and the exploitation of interesting areas.

Hybrid metaheuristics have received considerable interest these last years in the field of combinatorial optimization. A wide variety of hybrid approaches have been proposed in the literature and give very good results on numerous single objective optimization problems, which are either academic (traveling salesman problem, quadratic assignment problem, scheduling problem, ...) or real-world problems. This efficiency is generally due to combinations of single-solution based methods (iterative local search, simulated annealing, tabu search...) with population-based methods (genetic algorithms, ants search, scatter search...). A taxonomy of hybridization mechanisms may be found in . It proposes to decompose those mechanisms into 4 classes:

*LRH class - Low-level Relay Hybrid*: This class groups algorithms in which a given metaheuristic is embedded into a single-solution metaheuristic. Few examples from the literature
belong to this class.

*LTH class - Low-level Teamwork Hybrid*: In this class, a metaheuristic is embedded into a population-based metaheuristic in order to exploit strengths of single-solution and
population-based metaheuristics.

*HRH class - High-level Relay Hybrid*: Here, self contained metaheuristics are executed in a sequence. For instance, a population-based metaheuristic is executed to locate
interesting regions and then a local search is performed to exploit these regions.

*HTH class - High-level Teamwork Hybrid*: This scheme involves several self-contained algorithms performing a search in parallel and cooperating. An example will be the island model,
based on GAs, where the population is partitioned into small subpopulations and a GA is executed per subpopulation. Some individuals can migrate between subpopulations.

Let us notice, that if hybrid methods have been studied in the mono-criterion case, their application in the multi-objective context is not yet widely spread. The objective of the DOLPHIN project is to integrate specificities of multi-objective optimization into the definition of hybrid models.

Until now only few exact methods have been proposed to solve multi-objective problems. They are based either on a Branch-and-bound approach, on the algorithm
A^{*}or on dynamic programming. However, those methods are limited to two objectives and are, most of the time, not able to be used on the complete problem, which is often large scale MOP.
Therefore, sub search spaces have to be defined in order to be able to use exact methods. Hence, in the same manner as hybridization of metaheuristics, the cooperation of metaheuristics and
exact methods is also a main issue in this project. Indeed, it allows to use the exploration capacity of metaheuristics, as well as the intensification ability of exact methods, which are
able to find optimal solution(s) in a restricted search space. Sub search spaces have to be defined along the search. Such strategies can be found in the literature, but they are only applied
to mono-objective academic problems.

We have extended the previous taxonomy for hybrid metaheuristics to the cooperation between exact methods and metaheuristics. Using this taxonomy, we are investigating cooperative multi-objective methods. In this context, several types of cooperations may be considered, according to the way the metaheuristic and the exact method cooperate. For instance, a metaheuristic can use an exact method for intensification or an exact method can use a metaheuristic to reduce the search space.

Moreover, a part of the DOLPHIN project deals with studying exact methods in the multi-objective context in order: i) to be able to solve small size problems and to validate proposed heuristic approaches; ii) to have more efficient/dedicated exact methods that can be hybridized with metaheuristics. In this context, the use of parallelism will push back limits of exact methods, which will be able to explore larger size search space .

Based on the previous works on multi-objective optimization, it appears that to improve metaheuristics, it becomes essential to integrate knowledge about the problem structure. This knowledge can be gained during the search. This would allow to adapt operators that may be specific for multi-objective optimization or not. The goal here is to design auto-adaptive methods that are able to react to the problem structure. Moreover, regarding the hybridization and the cooperation aspects, the objectives of the DOLPHIN project are to deepen those studies as follows:

*Design of metaheuristics for the multi-objective optimization:*To improve metaheuristics, it becomes essential to integrate knowledge about the problem structure, that we may get
during the execution. This would allow to adapt operators that may be specific for multi-objective optimization or not. The goal here is to design auto-adaptive methods that are able to
react to the problem structure.

*Design of cooperative metaheuristics:*Previous studies show the interest of hybridization for a global optimization and the importance of problem structure study for the design of
efficient methods. It is now necessary to generalize hybridization of metaheuristics and to propose adaptive hybrid models that may evolve during the search while selecting the
appropriate metaheuristic. Multi-objective aspects have to be introduced in order to cope with specificities of multi-objective optimization.

*Design of cooperative schemes between exact methods and metaheuristics:*Once the study on possible cooperation schemes is achieved, we will have to test and compare them in the
multi-objective context.

*Design and conception of parallel metaheuristics:*Our previous works on parallel metaheuristics allow us to speed up the resolution of large scale problems. It could be also
interesting to study the robustness of the different parallel models (in particular in the multi-objective case) and to propose rules that determine, given a specific problem, which kind
of parallelism to use. Of course these goals are not disjoined and it will be interesting to simultaneously use hybrid metaheuristics and exact methods. Moreover, those advanced
mechanisms may require the use of parallel and distributed computing in order to easily make evolve simultaneously cooperating methods and to speed up the resolution of large scale
problems.

*Validation:*In order to validate results obtained we always proceed in two phases: validation on academic problems, for which some best known results exist and use on real problems
(industrial) to cope with problem size constraints.

Moreover, those advanced mechanisms are to be used in order to integrate the distributed multi-objective aspects in the ParadisEO Platform (see the paragraph on software platform).

Parallel and distributed computing may be considered as a tool to speedup the search to solve large MOPs and to improve the robustness of a given method. Moreover, the joint use of parallelism and cooperation allows improvements onthe quality of the obtained Pareto sets. Following this objective, we will design and implement parallel models for metaheuristics (evolutionary algorithms, tabu search approach) and for exact methods (branch-and-bound algorithm, branch-and-cut algorithm) for solving different large MOPs.

One of the goal of the DOLPHIN project is to integrate the developed parallel models into software frameworks. Several frameworks for parallel distributed metaheuristics
have been proposed in the literature. Most of them focus only either on evolutionary algorithms or on local search methods. Only few frameworks are dedicated to the design of both families of
methods. On the other hand, existing optimization frameworks either do not provide parallelism at all or just supply at most one parallel model. In this project, a new framework for parallel
hybrid metaheuristics is proposed, named
*Parallel and Distributed Evolving Objects (ParadisEO)*based on EO. The framework provides in a transparent way the hybridization mechanisms presented in the previous section, and the
parallel models described in the next section. Concerning the developed parallel exact methods for MOPs, we will integrate them into well-known frameworks such as COIN.

According to the family of addressed metaheuristics, we may distinguish two categories of parallel models: parallel models managing a single solution, and parallel models that handle a
population of solutions. The major single solution-based parallel models are the following: the
*parallel neighborhood exploration model*and the
*multi-start model*.

*The parallel neighborhood exploration model*is basically a ``low level" model that splits the neighborhood into partitions that are explored and evaluated in parallel. This model is
particularly interesting when the evaluation of each solution is costly and/or when the size of the neighborhood is large. It has been successfully applied to the mobile network design
problem (see Application section).

*The multi-start model*consists in executing in parallel several local searches (that may be heterogeneous), without any information exchange. This model raises particularly the
following question: is it equivalent to execute
klocal searches during a time
tthan executing a single local search during
k×
t? To answer this question we tested a multi-start Tabu search on the quadratic assignment problem. The experiments have shown that the answer is often
landscape-dependent. For example, the multi-start model may be well-suited for landscapes with multiple basins.

Parallel models that handle a population of solutions are mainly: the
*island model*, the
*central model*and
*the distributed evaluation of a single solution*. Let us notice that the last model may also be used with single-solution metaheuristics.

In
*the island model*, the population is split into several sub-populations distributed among different processors. Each processor is responsible of the evolution of one sub-population.
It executes all the steps of the metaheuristic from the selection to the replacement. After a given number of generations (synchronous communication), or when a convergence threshold is
reached (asynchronous communication), the migration process is activated. Then, exchanges of solutions between sub-populations are realized, and received solutions are integrated into the
local sub-population.

*The central (Master/Worker) model*allows to keep the sequentiality of the original algorithm. The master centralizes the population and manages the selection and the replacement
steps. It sends sub-populations to the workers that execute the recombination and evaluation steps. The latter returns back newly evaluated solutions to the master. This approach is
efficient when the generation and evaluation of new solutions is costly.

*The distributed evaluation model*consists in a parallel evaluation of each solution. This model has to be used when, for example, the evaluation of a solution requires access to
very large databases (data mining applications) that may be distributed over several processors. It may also be useful in a multi-objective context, where several objectives have to be
computed simultaneously for a single solution.

As these models have now been identified, our objective is to study them in the multi-objective context in order to use them advisedly. Moreover, these models may be merged to combine different levels of parallelism and to obtain more efficient methods , .

Our objectives focus on these issues are the following:

*Design of parallel models for metaheuristics and exact methods for MOPs*: We will develop parallel cooperative metaheuristics (evolutionary algorithms and local search such as Tabu
search) for solving different large MOPs. Moreover, we are designing a new exact method, named PPM (Parallel Partition Method), based on branch and bound and branch and cut algorithms.
Finally, some parallel cooperation schemes between metaheuristics and exact algorithms have to be used to solve MOPs in an efficient manner.

*Integration of the parallel models into software frameworks*: The parallel models for metaheuristics will be integrated in the ParadisEO software framework. The proposed
multi-objective exact methods must be first integrated into standard frameworks for exact methods such as COIN and BOB++. A
*coupling*with ParadisEO is then needed to provide hybridization between metaheuristics and exact methods.

*Efficient deployment of the parallel models on different parallel and distributed architecture including GRIDs*: The designed algorithms and frameworks will be efficiently deployed
on non-dedicated networks of workstations, dedicated cluster of workstations and SMP (Symetric Multi-processors) machines. For GRID computing platforms, peer to peer (or P2P) middlewares
(XtremWeb-Condor) will be used to implement our frameworks. For this purpose, the different optimization algorithms may be re-visited for their efficient deployment.

In this project, some well known optimization problems are re-visited in terms of multi-objective modelization and resolution:

Flow-shop scheduling problem: The flow-shop problem is one of the most well-known scheduling problems. However, most of the works of the literature use a mono-objective model. In general, the minimized objective is the total completion time (makespan). Many other criteria may be used to schedule tasks on different machines: maximum tardiness, total tardiness, mean job flowtime, number of delayed jobs, maximum job flowtime, etc. In the DOLPHIN project, a bi-criteria model, which consists in minimizing the makespan and the total tardiness, is studied. A tri-criteria flow-shop problem, minimizing in addition the maximum tardiness, is also studied. It will allow to develop and test multi-objective (and not only bi-objective) exact methods.

Routing problems: The vehicle routing problem (VRP) is a well-known problem and it has been studied since the end of the 50's. It has a lot of practical applications in many industrial areas (ex. transportation, logistics, ...). Existing studies of the VRP are almost all concerned with the minimization of the total distance only. The model studied in the DOLPHIN project introduces a second objective, whose purpose is to balance the length of the tours. This new criterion is expressed as the minimization of the difference between the length of the longest tour and the length of the shortest tour. As far as we know, this model is one of the pioneer work of the literature.

The second routing problem is a generalization of the covering tour problem (CTP). In the DOLPHIN project, this problem is solved as a bi-objective problem where a set of constraints are modeled as an objective. The two objectives are: i) minimization of the length of the tour; ii) minimization of the largest distance between a node to be covered and a visited node. As far as we know, this study is among the first works that tackle a classic mono-objective routing problem by relaxing constraints and building a more general MOP.

The third routing problem under study comes from an industrial application. The model is close to the classical vehicule routing problem, but additional constraints linked with the pratices of the logistic companies enforce the difficulty of the problem.

For all studied problems, standard benchmarks have been extended to the multi-objective case. The benchmarks and the obtained results (optimal Pareto front, best known Pareto front) are available on the Web pages associated to the project and from the MCDM (International Society on Multiple Criteria Decision Making) web site. This is an important issue to encourage comparison experiments in the research community.

With the extraordinary success of mobile telecommunication systems, service providers have been affording huge investments for network infrastructures. Mobile network design appears of outmost importance and then is a major issue in mobile telecommunication systems. The design of large cellular networks is a complex task with a great impact on the quality of service and the cost of the network. With the continuous and rapid growth of communication traffic, large scale planning becomes more and more difficult. Automatic or interactive optimization algorithms and tools would be very useful and helpful. Advances in this area will certainly lead to important improvements concerning the service quality and the deployment cost.

In this project, the solution of planification problems, in terms of modelization and resolution, are developed in a multi-criteria context associating financial criteria (cost of the network), technical criteria (coverage, availability), and marketing criteria (quality of service). Two complementary design problems are considered:

Radio mobile network design: This work is realized in collaboration with France Telecom R&D. Engineering of radio mobile telecommunication networks involves two major problems: the design of the radio network, and the frequency assignment. The design consists in positioning base stations (BS) on potential sites, in order to fulfill some objectives and constraints. The frequency planning sets up frequencies used by BS with criteria of reusing. In this project, we address the first problem. Network design is a NP-hard combinatorial optimization problem. The BS problem deals with finding a set of sites for antennas from a set of pre-defined candidate sites, determining the type and the number of antennas, and setting up the configuration of different parameters of the antennas (tilt, azimuth, power, ...). A new formulation of the problem as a multi-objective constrained combinatorial optimization problem is considered. The model deals with specific objectives and constraints due to the engineering of cellular radio network. Reducing costs without sacrificing the quality of service are issues of concern. Most of the proposed models in the literature optimize a single objective (cover, cost, linear aggregation of objectives, etc.).

Access network design: This work is realized in collaboration with Mobinets. The problem consists in minimizing the cost of the access network and maximizing its availability. Operators can only be competitive and economical if they have an optimized access network. Since the transmission costs are becoming high compared to the equipment costs, and the traffic demands are increasing with the introduction of new services, it is vital for operators to find cost-optimized transmission network solutions at higher bit rates. Many constraints dealing with technologies and service providers have to be satisfied. All deployed important technologies (ex. GSM, UMTS) will be considered.

Bioinformatic research is a great challenge for our society and numerous research entities of different specialities (biology, medical or information technology) collaborating on specific thema.

Regarding genomics application, we collaborate with academic and industrial partners (IBL: Biology Institute of Lille; IPL: Pasteur Institute of Lille; IT-Omics firm) to study genetic factors which may explain multi-factorial diseases such as diabete, obesity or cardiovascular diseases. The originality is to look not for a single factor, but for one or several combinations of factors (that may be of different natures: genetic factors, environmental factors...) among a very large set of potential factors (several thousands). The scientific goal is to formulate hypotheses describing associations that may have any influence on diseases under study. These hypotheses have to be verified by biologists thanks to additional experiments.

The genomic application of the DOLPHIN project deals with post-genomic, where a very large amount of data are obtained thanks to advanced technologies and have to be analyzed. Hence, one of the goals of the project is to develop analysis methods in order to discover knowledge in data coming from biological experiments. Our originality is to first identify knowledge discovery problems in the genomic application and then to transform these datamining problems into optimization problems where an (or several) objective function(s) has(have) to be optimized. Following this, it is possible to apply optimization methods to these modeled problems.

An analysis of some genomic problems leads us to model them as an association-rule mining problem (classical task of datamining). It consists in discovering several associations of factors among a very large set of potential factors. Hence the combinatoric (the number of potential solutions) associated to the problem is huge. Moreover, the landscape study, which highlights a flat and rough landscape, indicates that exploratory methods are required to deal with such problems. Finally a study on criteria commonly used for association-rule mining shows that a lot of criteria exist and that there exists no universal criterion. Hence we propose to model the association-rule mining problem as a multi-objective one.

For all these reasons, we choose to base our resolution approach on evolutionary algorithms. Now, we are working on solving these problems from a multi-objective point of view, and studying hybridization of evolutionary algorithms with exact methods able to solve sub-problems (small size problems). Another important point is that evaluation functions, for such applications, may be very time consuming, because they require complex statistical computations. Therefore a parallel implementation is required in order to allow a large exploration of the search space without degrading the computational time.

Another application of those models are carried out in Proteomics. Proteomics consists in the global analysis of proteins. In fact, proteomics is very important to understand the biological mechanisms in the living cells, but also how different factors can influence them. The main goal of the proteomic is to identify experimental proteins. In this domain, we collaborate with the team of C. Rollando (research Director at CNRS) head director of the proteomic platform of the genopole of Lille.

Our objective is to automatically discover proteins and new protein variants from experimental spectrum. The protein variants and new protein identification is a complex problem. In fact, it cannot be summarized as a simple scoring of an experimental protein against protein databases, it needs additional processes to explore the huge space of potential solutions. For the protein variant, there are many modifications: insertion, deletion or substitution of an amino acid and also post-traductional modifications on it. So it is not practically feasible to generate all the possibilities of combination of modifications for a given size of protein (exponential complexity). The new protein identification problem is close because we cannot generate all possible proteins (with also modifications) in order to find the experimental one. For both a method of optimization is necessary.

PARADISEO is a white-box object-oriented generic framework dedicated to the flexible design of metaheuristics, parallel and hybrid metaheuristics. It is based on a clear conceptual separation between the solution methods and the problems they intend to solve in order to confer a maximum design and code reuse. We have extended the framework with PARADISEO-MOEO (Multi-Objective Evolving Objects) to solve multi-objective optimization problems. Indeed, different mechanisms have been introduced. PARADISEO-MOEO provides a wide range of reusable features and techniques related to Pareto-based multi-objective optimization such as performance metrics, elitism, fitness sharing, selection and replacement strategies as well as the most common fitness assignment schemes (the ranking strategies used in MOGA, NSGA, NSGA-II, SPEA, SPEA2, IBEA, and more). The fine-grained components of PARADISEO-MOEO confer a high genericity, flexibility, adaptability and extensibility. Moreover, a genuine conceptual effort has been done to provide a set of classes allowing to ease and speed up the development of efficient programs in an incremental way while having a minimal programming effort.

See web pages http://paradiseo.gforge.inria.fr/

The framework PARADISEO has been coupled with Globus GT4 to tackle optimization problems on Grids. The coupling of ParadisEO with Globus consisted in two major steps: design and implementation, and deployment on the Grid, in particular Grid'5000. The first step consisted in the gridification of the parallel and hybrid models provided in ParadisEO meaning their adaptation to the properties of grids (large scale, heterogeneous and dynamic nature of the resources and multi-administrative domain). The MPICH-G2 communication library has been considered. The second step consisted in building a system image for Globus 4 including MPICH-G2. This image allows to build a virtual Globus grid able to deploy and execute the parallel hybrid meta-heuristics provided by ParadisE0.

See web pages http://paradiseo.gforge.inria.fr/

ASCQ_
MEis a proteomics optimization program for the identification of proteins from mass spectrometry raw data (MS) directly from spectra whithout mass list extraction. The
ASCQ_
MEapplication is available for on-line interrogation on the site
https://www.genopole-lille.fr/logiciel/ascq_me/.

Furthermore, its score has been greatly improved thanks to comparison with the other existing tools. It is based on the percentage of peptide having "good" spectral correlation with the experimental MS spectrum (the redundant peptides due to variable modifications are only count one time). For each identification request, it is possible to know each protein ranking, each peptide corresponding scoring and how the spectra are correlated (both experimental and simulated spectra but also correlation spectrum).

The SSO software (Sequence Shape Order) is inspired of de novo peptide sequencing methods for making de novo protein sequencing. It consists in the three following steps:

Sequence : from MS/MS spectra, peptide partial sequences of amino acids are generated thanks to a traditional de novo peptide sequencing.

Shape : thanks to a MS spectrum, an adaptive genetic algorithm (AGA) based on spectral correlation (inspired of the
ASCQ_
MEscoring) is used to propose potential complete peptide sequences.

Order : thanks to a MS spectrum, (gained from a digestion from another enzyme), the peptides are ordered for obtaining complete protein sequences that well correspond to the experimental protein.

The approach is a real de novo method (no use of databases), but it clearly depends on the experimental data quality. Tests are in progress with three already known proteins. It is a C++ application combining the PARADISEO-EO, MO and PEO (parallel version) platforms.

Our exact optimization method PPM (Parallel Partitioning Method) for bi-objective optimization problems has been extended to general multi-objective combinatorial optimization problems.

This method is able to solve efficiently multi-objective problems (having more than two objectives) and is inherently parallel. Indeed, the method is based on the splitting of the search space into several areas leading to elementary exact searches and determines all the Pareto front into three stages:

Bounding the search space. The nadir solution has to be computed. Let us note that with more than two objectives, finding this point is not trivial. Dynamic programming techniques have been used for this task.

Partitioning the search space into well-balanced partitions. The problem with
kobjectives is partitioned according to one objective. Then, recursively, problems with
k-1,
k-2... objectives are partitioned. An
-constraint technique is used to split the search space according to one objective.

Search the other efficient solutions in each partition. Each partition formes a box. A new search method is proposed for problems that can not easily be bounded, in order to avoid too many repeated searches.

The parallel design of the algorithm increases its performance. It has been applied on a three-objective flow-shop problem and gave encouraging results.

We have studied the convergence of generic stochastic search algorithms toward the Pareto set of continuous multi-objective optimization problems. The focus was on obtaining a finite approximation that should capture the entire solution set in a suitable sense. For this, we have used the concept of -dominance. We have observed that – under mild assumptions about the process to generate new candidate solutions – the limit approximation set is determined entirely by the archiving strategy. We investigated two different archiving strategies which lead to a different limit behavior of the algorithms, yielding bounds on the obtained approximation quality as well as on the cardinality of the resulting Pareto set approximation. Further, we have demonstrated the potential for a possible hybridization of a given stochastic search algorithm with a particular local search strategy – multi-objective continuation methods – by showing that the concept of -dominance can be integrated into this approach in a suitable way.

Approximability is studied in the context of large instances for multi-objective combinatorial optimization problems.

For the multi-objective combinatorial case the computational complexity has not received much attention, although it is highly related to the degree of approximability of problems. In order to overcome this drawback we identify the most important classes of multi-objective combinatorial optimization problems for which complexity results exist. We then introduce a unified formalism to derive a general technique for extending mono-objective results to multi-objective cases. Afterwards, we use this approach to provide new complexity results and approximability bounds for the bi-objective permutation flow-shop problem.

The study of approximability is completed with solutions offered by
*a priori*analysis. For this purpose landscape analysis is performed by making use of hyper-ellipses which are further deformed in order to fit the set of best compromise solutions, namely
the Pareto set.

The proposed technique starts by providing an enclosing surface for the set of feasible solutions. The technique consists in using least squares direct fit of ellipses - for the bi-objective case - together with DACE (Design and Analysis of Computer Experiments) models. By making use of covariance matrix adaptation the model is further deformed and a region of interest is kept. The advantage of this method is provided by the small amount of time needed to compute the parameters of the enclosing surface (less than 15 seconds).

The initial surface can be used as a tool in order to guide the search, as a bound or as a reference - by metrics- in order to compute the quality of a given approximation set. Its efficiency has been tested for the bi-objective permutation flow-shop problem. The method holds also for problems which have a positive correlation between the objective functions and a normal distribution in the objective space for all the individual criteria.

The importance of multi-objective optimization is globally established nowadays. Furthermore, a great part of real-world problems are subject to uncertainties due to,
*e.g.*, noisy or approximated fitness function(s), varying parameters or dynamic environments. Moreover, although evolutionary algorithms are commonly used to solve multi-objective
problems on the one hand and to solve stochastic problems on the other hand, very few approaches combine simultaneously these two aspects. For instance, flow-shop scheduling problems are
generally studied in a single-objective deterministic way whereas they are, by nature, multi-objective and are subject to a wide range of uncertainties. However, these two features have never
been investigated at the same time. Then, to tackle the optimization of stochastic multi-objective problems, three indicator-based algorithms, which are able to handle any type of probability
distribution have been proposed. The first method, called
IBEA_{1}, consists in preserving the deterministic approach by computing the fitness of a solution on a single evaluation. The second method, namely
IBEA_{avg}, is based on average objective values. At last, the
IBEA_{stoch}method consists in estimating the quality of a solution in a probabilistic way. The latter, already investigated in
on continuous problems, has never been applied
neither to the combinatorial case nor to the stochastic models proposed here. These methods were all experimented on a bi-objective flow-shop scheduling problem with stochastic processing times
using a proactive approach. And, according to the experimental protocol we formulated, we concluded that
IBEA_{avg}was overall more efficient than
IBEA_{1}and
IBEA_{stoch}for our problem.

A Grid-based approach for the Branch-and-Bound algorithm has been proposed. The main questions to deal with are: dynamic load balancing, fault tolerance, and the detection of termination. Indeed, the irregular nature of the tree explored by the algorithm and the volatile and large scale characteristics of the Grid involve a large amount of load balancing and checkpointing operations. These operations require an exorbitant communication and storage cost associated with the work units (collections of nodes) dynamically generated. Therefore, the proposed approach is based on a new encoding of the explored tree and the work units allowing to optimize the dynamic distribution and checkpointing processes on the Grid.

The algorithm has been applied to the Flow-Shop scheduling problem. Using our algorithm, the problem (50 jobs on 20 machines) has been optimally solved for the first time within 25 days. The method allowed not only to improve the best known solution for the problem instance but it also proved the optimality of the provided solution. The experiments have been performed on a Grid including processors from simultaneously Grid5000 and different educational networks of University of Lille1 (Polytech'Lille, IEEA, IUT "A"). The number of processors averaged approximately 500, and peaked at 1245 machines.

The design of grid-aware metaheuristics often involves the cost of a painful apprenticeship of parallelization techniques and complex grid computing technologies. In order to free from such
burden the developers who are unfamiliar with those advanced features, combinatorial optimization frameworks must integrate the up-to-date parallelization techniques and allow their transparent
exploitation and deployment on computational grids. We have recently proposed a framework called
*ParadisEO*dedicated to the reusable design of parallel metaheuristics for dedicated parallel hardware platforms. We have extended the ParadisEO framework (
*ParadisEO-G*) to allow the design and deployment of these parallel metaheuristics on computational grids. Such extension consisted in two steps: firstly, the design and algorithmics
associated with the parallel models provided and encapsulated into the framework have been revisited. The objective is to take into account the major characteristics of the grids:
multi-administrative domain, large scale, volatility and heterogeneity. Secondly, ParadisEO has been coupled with the Globus grid middleware. ParadisEO-G has been validated on real-world
problems such as the mobile network design, protein identification, and protein folding and docking.

ASCQ_
MEhas been validated and compared with other peptide mass fingerprinting (PMF) identification engine according to two series of tests. The first one was an accuracy
test with known proteins with different amount of product. The second test consists in plasma protein identification. These tests have proved that
ASCQ_
MEgives equivalent but also complementary results in comparison with the most used identification engines. All the tests have been made with a limited version of
ASCQ_
MEthat corresponds to its on-line version, so everyone can easily reproduce the same tests to make their own opinion.

(2006-2008): The cooperation with SOGEP, the logistic and delivery subsidiary company of REDCATS (PINAULT PRINTEMPS REDOUTE) consists in solving a logistic and transportation problem. The objective here is the design and implementation of a decision aid framework for solving complex vehicle routing problems including different constraints.

(2004-2006): The objective is to model and solve the access network design problem (GSM and UMTS technologies). The problem has been formulated as a multi-constrained spanning tree problem. Heuristics approaches have been proposed. Now, cooperative exact approaches using mathematical programming (branch and cut and price), and constraint programming ( CP(graph) ) are under study in order to improve the quality of the obtained results. After a first study of the use of Dantzig-Wolf decomposition with constraint programming in access network design problems, the main step is now to develop the cooperative part of the mono-objective method before experiments on new multiobjectives models taking into account quality of service measurement.

ARCIR project "Puces Nano 3D" (2004-2006), supported by the Region. This projects aims the creation of new types of microarrays. It is a collaboration with IEMN (Institut d'Electronique, Microélectronique et Nanotechnologies de Lille), IBL (Institut de Biologie de Lille), IPL (Institut de Biopuces de Lille).

LOVAD Project ''Logistique pour la vente à distance'' (2000-2006) of the CPER (Regional Contract) TACT operation.

COLIVAD project (Pilotage Optimal des processus de Livraison en Vente à Distance (2006-2008): This project is part of "Pole de compétitivité"
*Industries du commerce*. It deals with solving a logistic and transportation problem.

Decrypton project (Conformational sampling and docking on Grids and application to neuromuscular disease ) (2006-2008) : collaboration with INSERM and IBL (Lille Institute of Biology).

ANR DOCK (Docking on Grids) (2006-2009) : collaboration with IBL (Institut de Biologie de Lille) and CEA (Grenoble).

ANR CHOC (Challenges on Combinatorial Optimization on Grids) (2006-2009) : collaboration with Prism (Univ. of Versailles), MOAIS (INRIA Rhones-Alpes), GILCO (Grenoble).

PPF (Bioinformatics) (2006-2009) : This national program whitin the university of Lille (USTL) deals with solving bioinformatics and computational biology problems using combinatorial optimization techniques.

ACI ``Masse de données'' Project GGM ``Geno-Medical Grid'' (2004-2007), in collaboration with LIRIS (Lyon) and IRIT (Toulouse) laboratories. Our concern in this project is the design and implementation of parallel multi-objective optimization techniques to extract association rules from large and distributed genomic and medical data.

ACI GRID'5000 Grant (2004-2007). This project deals with the deployment of a national grid platform. Our laboratory will host a cluster of more than 500 processors. The DOLPHIN project coordinates this action for Lille.

ACI "Nanosciences" (2004-2006), project "Interactionpolypeptide3D". This project, in collaboration with IEMN (Institut d'Electronique, Microélectronique et Nanotechnologies de Lille), IBL (Institut de Biologie de Lille) aims to study interactions between polypeptids on a nanostructured surface. Our concern in this project is to data management (storage and analysis).

INRIA project 3+3 Méditerrannée PERFORM (2006-2009) involving the University of Malaga (Spain), University of Constantine (Algeria), and University of Tunis (Tunisia). This project deals with multi-objective optimization.

University of Constantine (2004-2008): CMEP program with the University of Constantine (Algeria) on "Metaheuristics for optimization of hard problems".

COST European project GRAAL (2004-2007) on designing and experimenting multi-objective formulations for telecommunication problems.

NEGST (NExt Grid Systems and Techniques) program between CNRS (France) and Japan on optimization on Grids.

The project had visitors during the year 2006:

Jose Manuel Nieto (Malaga, Spain)

Mohammed Elachir Menai (Constantine, Algeria)

Mohamed Batouche (Constantine, Algeria)

Khaled Mellouli (HEC, Tunis, Tunisia)

Essegir Hajer (HEC, Tunis, Tunisia)

Grégoire Dooms (University of Louvain la neuve - Belgium)

Thomas Stutzle (ULB, Bruxelles, Belgium)

Co-fondator and chair of the group META (Metaheuristics: Theory and Applications, http://www.lifl.fr/~talbi/META). This group is associated to the ROADEF (French Operations Research Society), and the CNRS research groups GDR ALP and MACS.

Chair of the group PM2O (Multi-objective Mathematical Programming, http://www.lifl.fr/PM2O). This group is associated to the ROADEF (French Operations Research Society), and the CNRS research group GDR I3.

Direction of the CIB (Bioinformatics Center) of the Genopole of Lille.

Secretary of the ROADEF society.

Scientific Committee of the Genopole of Lille.

EURO-PAREO (European working group on Parallel Processing in Operations Research).

EURO-EU/ME (European working group on Metaheuristics).

EURO-ESICUP (European Working Group on Cutting and Packing).

ECCO (European Chapter on Combinatorial Optimization).

JET national group on evolutionary computation.

ERCIM (European Research Consortium for Informatics and Mathematics) working group on Soft Computing.

Book on "Parallel Combinatorial Optimization" (Wiley & Sons, USA, 2006 - ISBN-0-471-72101-8).

E-G. Talbi et al. "Artificial Evolution", LNCS N.3871, Springer, 2006.

E-G. Talbi and A. Zomaya, Special issue on "Grids for bioinformatics and computational biology" in the journal JPDC (Journal of Parallel and Distributed Computing", 2006.

E. Alba et A. Nebro and E-G. Talbi, special issue in "Journal of Heuristics" on "Latest advances in metaheuristics for multi-objective optimization", 2006.

E. Alba, E-G. Talbi and A. Zomaya, Special issue in the journal " Computer Communications" on "Nature inspired distributed computing in communication", 2006.

NIDISC Workshop organization (International Workshop on Nature Inspired Distributed Computing) organized jointly with ACM/IEEE IPDPS (International Parallel and Distributed Processing Symposium): NIDISC'06 (Rhode Island, USA).

EGC'2006 Conference co-organization (Extraction et Gestion des Connaissance), Jan 2006, Lille, France.

ROADEF'2006 Conference organization (Major National Conference in Operations Research), Feb. 2006, Lille, France.

Flowshop Contest: the Scheduling Challenge, Nov 2006. http://www.lifl.fr/~talbi/challenge/index.html

META'2006 Conference : Creation, program and organization chairs of the META conference (Metaheuristics : Theory and Applications).

Workshop EGPDC'2006 (Distributed knowledge discovery and Management) organized in conjunction with EGC'2006 Conference, Jan 2006, Lille, France.

Organization of sessions in META'2006, ROADEF'2006, ISMP'2006, LT'2006, INCOM'2006, etc.

Review of journal papers:

Parallel Computing

IEEE Transactions on Systems Man and Cybernetics

Annals of Operations Research

Calculateurs Parallèles

IEEE Transactions on Parallel and Distributed Systems

Journal of Supercomputing

IEEE Transactions on Evolutionary Computation

Parallel and Distributed Computing Practices

Genetic Programming and Evolvable Machines

Journal of Heuristics

European Journal of Operational Research

Journal of Computational Optimization and Applications

Information Processing Letters

Extraction de connaissances et apprentissage

European Physical Journal B

4OR

Journal of Mathematical Modelling and Algorithms

Bioinformatics

International Journal of Production Economics

Computers and Operation Research

Discrete Applied Mathematics

...

Review of different projects :

JEI (Jeune Equipe Innovante) of the DRRT (Délégation Régionale à la Recherche et à la Technologie) of the french research ministry (2006)

International Conferences on Evolutionary Computation:

CEC (Congress on Evolutionary Computation): CEC'06 (Singapour).

GECCO (Genetic and Evolutionary Computation Conference): GECCO'2006.

EvoCOP (European Conference on Evolutionary Computation in Combinatorial Optimization): EvoCOP'2006

EvoBIO (European Workshop on Evolutionary Computation and Bioinformatics): EvoBio'2006

HM'2006 International Workshop on Hybrid Metaheuristics, Spain, Sept 2006

HiPCoMB "IEEE Workshop on High Performance Computing in Medicine and Biology ", Vienna, Austria, Apr 2006.

MOSIM'2007, Rabat, Maroc.

SIAM (International Conference on Data Mining), Florida, USa, Apr 2006.

Workshop DEXA GLOBE'06 " Grid and peer-to-peer computings impacts on large scale heterogeneous distributed database systems ", Krakow, Poland, 2006.

PPSN International Conference on Parallel Problem Solving from Nature, PPSN IX, Reykjavik, Iceland, July 2006.

IEEE GrC International Conference on Granular Computing, Atlanta, USA, May 2006.

BIOMA'2006 International Conference on Bioinspired Optimization Methods and their Applications, Ljubljana, Slovenia, Oct 2006.

LT'2006 (Logistique et Transport), Hammamet, Tunisie, Mai 2006.

GADA (First International Symposium on Grid Computing, High-Performance and Distributed Applications), Montpellier, France, Oct-Nov 2006.

IEEE SSSM'2006 " International Conference on Service Systems and Service Management", Troyes, France, Oct 2006.

IEEE SPCA'06 " International Symposium on Pervasive Computing and Applications ", Xinjiang, China, Aug 2006.

DM-WSN'2006 First IEEE International Workshop on Data Mining and Wireless Sensor Networks, in conjunction with the IEEE Int. Conf. on Data Mining ICDM'06, Déc 2006, Hong-Kong, China.

META'2006 Conference : (Metaheuristics : Theory and Applications), Hammamet, Nov 2006.

ICTAI 06, The 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Nov. 13-15 2006, Washington DC, USA

CIBCB 2007 - IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, April 1-5 2007, Honolulu, Hawaii, USA

Pr Talbi was referee of the following thesis:

Feb 2006, Phd of P. Delisle, ``Parallélisation de la métaheuristique d'optimisation par colonies de fourmis sur architectures à mémoire partagée'', Université de Reims Champagne-Ardenne. Jury : T. Crainic, V-D. Cung, M. Gravel, M. Krajecki, C. Solnon, E-G. Talbi.

May 2006, Phd of Gabriel Luque, ``Solving real-world problems using parallel distributed metaheuristics'', University of Malaga - Espagne.

July 2006, Phd of T. Berradia, ``Contribution au problème du plus court chemin pour le transport des matières dangereuses'', à l'INSA Rouen. Jury : A. Benabdelhafid, P. Borne, A. El Midouni, S. Hayat, H. Heinich, D. Jolly, J. Mouzna, E-G. Talbi.

Sept 2006, Phd of C. Wilbault, ``Heuristiques hybrides pour la résolution de problèmes en variables 0-1 mixtes'', à l'University of Valenciennes et du Hainaut-Cambresis. Jury : A. Fréville, M. Hifi, S. Hanafi, F. Semet, E-G. Talbi, J. Teghem.

Nov 2006, Phd of A. Goeffon, ``Nouvelles heuristiques de voisinage et mémétiques pour le problème maximum de parcimonie'', University of Angers. Jury : J-J. Chabrier, P. Galinier, J-K. Hao, J-M. Richer, E-G. Talbi.

Pr Dhaenens was referee of the following thesis:

Dec 2006, PhD of M. Defrance, "Algorithmes pour l'analyse de régions régulatrices dans le génome d'eucaryotes supérieurs", Université de Lille I. Jury : R. Blossey, C. Dhaenens, B. Jacq, T. Lecroq, J. Sherman, H. Touzet.

Dec 2006, PhD of A. Meena, "Allocation, Assignation et Ordonnancement pour les systèmes sur puces multiprocesseurs", Université de Lille I. Jury : P. Boulet, K. de Bosschere, C. Dhaenens, Y. Robert, Y. Sorel, R. Woods.

Pr Melab was referee of the following thesis:

Dec 2006, PhD of R. Rouvoy, "Design of Technical Service from Model-Driven Software Engineering to Middleware Construction", Université de Lille I. Jury : N. Melab, J-M. Geib, P. Merle.

Postgraduate "Parallel combinatorial optimization", University of Malaga, Spain (E-G. Talbi).

Postgraduate "GRID computing", University of Luxembourg, Luxembourg (E-G. Talbi).

Postgraduate "Data mining algorithms", University of Tunis, Tunisia (E-G. Talbi).

Postgraduate "Multi-objective optimization", University of Sfax, Tunisia (C. Dhaenens).

Postgraduate "Parallel computing", University of Bejaia, Algeria (E-G. Talbi).

Postgraduate: ``Optimization methods'' (E-G. Talbi, L. Jourdan).

Postgraduate ``GRID computing'', (N. Melab).

Undergraduate (USTL): ``Operations Research'' (L. Jourdan).

Undergraduate (Polytech'Lille): ``Operations Research'' (C. Dhaenens, E-G. Talbi).

Undergraduate (Polytech'Lille): ``Graphs and combinatorics'' (C. Dhaenens).

Undergraduate (Polytech'Lille): ``Data mining'' (L. Jourdan, C. Dhaenens).

Undergraduate (Polytech'Lille): ``Advanced Optimization'' (L. Jourdan).

Undergraduate (Polytech'Lille): ``Production Management'' (C. Dhaenens).

Undergraduate (IEEA): ``Distributed Systems'' (N. Melab).

Undergraduate (IEEA): ``Operations Research'' (N. Melab).