The objective of the DOLPHIN

**Analysis of the structure of a MOP:**The statistical analysis of the structure of the Pareto front by means of different indicators allows the design of efficient and robust hybrid optimization techniques. In general, the current theory does not allow the complete analysis of
optimization algorithms. Several questions are unanswered: i) why a given method is efficient? ii) why certain instances are difficult to solve? Some work is needed to guide the user in the design of efficient methods.

The NFL theorem (No Free Lunch) shows that two optimization methods have the same global performances on the whole set of uniform optimization problems. Then, it is crucial to make some hypothesis on the studied problem. This may be done in two steps:

analyzing the target problem to identify its landscape properties,

including this knowledge in the proposed optimization method.

Our interest in this project is to answer all these questions and remarks for the multi-objective case. Another point considered is the performance evaluation of multi-objective optimization methods.

**Cooperation of optimization methods (metaheuristics and/or exact methods):**

The hybridation of optimization methods allows the cooperation of complementary different methods. For instance, the cooperation between a metaheuristic and an exact method allows to take advantage of the intensification process of an exact method in finding the best(s) solution(s) in a sub-space, and the diversification process of the metaheuristic in reducing the search space to explore.

In this context, different types of cooperation may be proposed. Those approaches are under study in the project and we are applying them to different generic MOPs (flow-shop scheduling problem, vehicle routing problem, covering tour problem, and the association rule problem in data mining).

**Parallel optimization methods:**Parallel and distributed computing may be considered as a tool to speedup the search to solve large MOPs and to improve the robustness of a given method. Following this objective, we design and implement parallel metaheuristics (evolutionary
algorithms, tabu search approach) and parallel exact methods (branch and bound algorithm, branch and cut algorithm) for solving different large MOPs. Moreover, the joint use of parallelism and cooperation allows the improvement of the quality of the obtained solutions.

Our experience in the domain of parallel optimization dates from the apparition of network of transputers (1988), where the mapping of the parallel program and the routing of messages were explicitely designed by the programmer.

**Framework for parallel and distributed hybrid metaheuristics:**Our team contributes to the development of an open source framework for metaheuristics, named ParadisEO (PARAllel and DIStributed Evolving Objects). Our contribution in this project is the extension of the EO (Evolving
Objects) framework

In this project, our goal is the efficient design and implementation of this framework on different types of parallel and distributed hardware platforms: cluster of workstations (COW), networks of workstations (NOW) and grid computing platforms, using the different suited programming environments (MPI, Condor, XtremWeb). The coupling with well-known frameworks for exact methods (such as COIN and BOB++) will also be considered. The exact methods for MOPs developed in this project will be integrated in those software frameworks.

A particular concern are considered to the experimentation of our framework by different users and applications outside the DOLPHIN project. This is done in order to validate the design and the implementation issues of ParadisEO.

**Validation:**The designed approaches are validated on generic and real-life MOPs, such as:

Scheduling problems: Flow-shop scheduling problem;

Routing problems: Vehicle routing problem (VRP) and covering tour problem (CTP);

Mobile telecommunications: Design of mobile telecommunications networks (contract with France Telecom R&D) and design of access networks (contract with Mobinets);

Genomics: Association rule discovery (data mining task) for mining genomic data (contract with GenFit and IT-OMICS).

Some benchmarks and their associated optimal Pareto fronts or the best known Pareto fronts will be available on the Web. We are also developing an open source software, named GUIMOO

The analysis of structures (landscapes) of MOPs and the performance assessment of resolution methods are significant topics in the design of optimization methods. The effectiveness of metaheuristics depends on the properties of the landscape (roughness, convexity, etc). The notion of landscape has been first described in by the way of the study of species evolution. Then, this notion has been used to analyze combinatorial optimization problems.

The landscape is defined by a neighborhood operator and can be represented by a graph
G= (
V,
E). The vertices represent the solutions of the problem and an edge
(
e
_{1},
e
_{2})exists if the solution
e_{2}can be obtained by an application of the neighborhood operator on the solution
e_{1}. Then, considering this graph as the ground floor, we elevate each solution to an altitude equals to its cost. We obtain a surface, or landscape, made of peaks, valleys, plateaus, cliffs, ...The problem lies in the difficulty to have a realistic view of this landscape.

As others, we believe that the main point of interest in the domain of combinatorial optimization is not the design of the best algorithm for a large number of problems but that it is to search for the most adapted method to an instance or a set of instances of a given problem. Therefore, we guess that no ideal metaheuristic, designed as a black-box, may exist.

Indeed, the first studies realized in our research group on the analysis of landscapes of different mono-objective combinatorial optimization problems (traveling salesman problem, quadratic assignment problem) have showed that not only different problems correspond to different structures but also that different instances of the same problem correspond to different structures.

For instance, we have realized a statistical study of the landscapes of the quadratic assignment problem. Some indicators, which characterize the landscape of an instance, have been proposed and a taxonomy of the instances including three classes has been deduced. Hence it is not enough to adapt the method to the problem under study but it is necessary to specialize it according to the type of instance treated.

So in its studies of mono-objective problems, the OPAC research group has introduced into the resolution methods some information about the problem to be solved. The landscapes of some combinatorial problems have been studied in order to investigate the intrinsic natures of their instances. The resulting information have been inserted into an optimization strategy and allowed the design of efficient and robust hybrid methods. The extension of these studies to multi-objective problems is a part of the DOLPHIN project , , .

The DOLPHIN project is also interested in the performance assessment of multi-objective optimization methods. Nowadays, statistical techniques developed for mono-objective problems are used with adaptation to the multi-objective case. Nevertheless, specific tools are necessary in many cases: for example, the comparison of two different algorithms is relatively easy in the mono-objective case - we compare the quality of the best solution obtained in a fixed time, or the time needed to obtain a solution of a certain quality. The same idea can not be immediately transposed to the case where the output of the algorithms is a set of solutions having several quality measures, and not a single solution.

Various indicators have been proposed in the literature for evaluating the performance of multi-objective optimization methods but no indicator seems to outperform the others
. The OPAC research group has proposed two indicators: the
*contribution*and the
*entropy*
. The contribution evaluates the supply in term of Pareto-optimal solutions of a front compared to another one. The
entropy gives an idea of the diversity of the found solutions. These two metrics are used to compare the different metaheuristics in the research group, for example in the resolution of the bi-objective flow-shop problem, and also to show the contribution of the various mechanisms
introduced in these metaheuristics.

These metrics and others (generational distance, spacing, ...) are integrated in the open software GUIMOO developed within the framework of the DOLPHIN project. This software is dedicated to the visualization of landscapes (2D and 3D) for multi-objective optimization and the performance analysis by the use of special metrics.

One of the main issue in the DOLPHIN project is the landscape study of the multi-objective problems and the performance assessment of multi-objective optimization methods to design efficient and robust resolution methods:

*Landscape study:*The goal here is to extend the study of landscapes of the mono-objective combinatorial optimization problems to multi-objective problems in order to determine the structure of the Pareto frontier and to integrate this knowledge about the problem structure in the
design of resolution methods.

This study has been initiated for the bi-objective flow-shop problem. We have studied the convexity of the frontiers obtained in order to show the interest of our Pareto approach compared to an aggregation approach, which only allows to obtain the Pareto solutions situated on the convex hull of the Pareto front (supported solutions).

Our preliminary study of the landscape of the bi-objective flow-shop problem shows that the supported solutions are very closed to each other. This remark leads us to improve an exact method initially proposed for bi-objective problems. Furthermore, a new exact method able to deal with any number of objectives has been designed.

*Performance assessment:*The goal here is to extend GUIMOO in order to provide efficient visual and metric tools for evaluating the assessment of multi-objective resolution methods.

The success of metaheuristics is based on their ability to find efficient solutions in a reasonable time . But with very large problems and/or multi-objective problems, efficiency of metaheuristics may be compromised. Hence, in this context it is necessary to integrate metaheuristics in more general schemes in order to develop even more efficient methods. For instance, this can be done by different strategies such as cooperation and parallelization.

The DOLPHIN project deals with
*``a posteriori''*multi-objective optimization where the set of Pareto solutions (solutions of best compromise) have to be generated in order to give to the decision maker the opportunity to choose the solution that interests him.

Population based methods, such as evolutionary algorithms, are well fitted for multi-objective problems as they work with a set of solutions
,
. To be convinced one may refer to the list of references on Evolutionary Multi-objective Optimization maintained by
Carlos A. Coello Coello

In order to assess the performances of the proposed mechanisms, we always proceed in two steps: first, experiments are carried out on academic problems, for which some best known results exist; second, we use real industrial problems to cope with large and complex MOPs. The lack of references in terms of optimal or best know Pareto set is a major problem. Therefore, the obtained results in this project and the test data sets will be available at the URL http://www.lifl.fr/OPACat Benchmarks.

In order to cope with advantages of the different metaheuristics, an interesting idea is to combine them. Indeed, the hybridization of metaheuristics allows the cooperation of methods having complementary behaviors. The efficiency and the robustness of such methods depend on the balance between the exploration of the whole search space and the exploitation of interesting areas.

Hybrid metaheuristics have received considerable interest these recent years in the field of combinatorial optimization. A wide variety of hybrid approaches have been proposed in the literature and give very good results on numerous single objective optimization problems, that are either academic (traveling salesman problem, quadratic assignment problem, scheduling problem, ...) or real-world problems. This efficiency is due to combinations of single solution based methods (iterative local search, simulated annealing, tabu search...) with population based methods (genetic algorithms, ants search, scatter search...). A taxonomy of hybridization mechanisms may be found in . It proposes to decompose those mechanisms into 4 classes:

*LRH class - Low-level Relay Hybrid*: This class groups algorithms in which a given metaheuristic is embedded into a single-solution metaheuristic. Few examples from the literature belong to this class.

*LTH class - Low-level Teamwork Hybrid*: In this class, a metaheuristic is embedded into a population-based metaheuristic in order to exploit strengths of single-solution and population-based metaheuristics.

*HRH class - High-level Relay Hybrid*: Here, self contained metaheuristics are executed in a sequence. For instance, a population-based metaheuristic is executed to locate interesting regions and then a local search is performed to exploit these regions.

*HTH class - High-level Teamwork Hybrid*: This scheme involves several self-contained algorithms performing a search in parallel and cooperating. An example will be the island model, based on GAs, where the population is partitioned into small subpopulations and a GA is executed
per subpopulation. Some individuals can migrate between subpopulations.

Let us notice, that if hybrid methods have been studied in the mono-criterion case, their application in the multi-objective context is not yet widely spread. The objective of the DOLPHIN project is to integrate specificities of multi-objective optimization into the definition of hybrid models , .

Until now only few exact methods have been proposed to solve multi-objective problems. They are based either on a Branch-and-bound approach, on the algorithm
A^{*}or on dynamic programming. However, those methods are limited to two objectives and are, most of the time, not able to be used on the complete problem, which is often large scale MOP. Therefore, sub search spaces have to be defined in order to be able to use exact methods. Hence, in
the same manner than hybridization of metaheuristics, the cooperation of metaheuristics and exact methods is also a main issue in this this project. Indeed, it allows to use the exploration capacity of metaheuristics as well as the intensification ability of exact methods that are able to
find optimal solution(s) in a restricted search space. Sub search spaces will have to be defined along the search. Such strategies can be found in the literature, but they are only applied to mono-objective academic problems.

We have extended the previous taxonomy for hybrid metaheuristics to the cooperation between exact methods and metaheuristics. Using this taxonomy, we are investigating cooperative multi-objective methods. In this context, several types of cooperations may be considered, according to the way the metaheuristic and the exact method cooperates. For instance, a metaheuristic can use an exact method for intensification or an exact method can use a metaheuristic to reduce the search space).

Moreover, a part of the DOLPHIN project deals with studying exact methods in the multi-objective context in order: i) to be able to solve small size problems and to validate proposed heuristic approaches; ii) to have more efficient/dedicated exact methods that can be hybridized with metaheuristics. In this context, the use of parallelism will push back limits of exact methods which will be able to explore larger size search space .

Based on the previous works on multi-objective optimization, it appears that to improve metaheuristics, it becomes essential to integrate knowledge about the problem structure. This knowledge can be gained during the search. This would allow to adapt operators that may be specific for multi-objective optimization or not. The goal here is to design auto-adaptive methods that are able to react to the problem structure. Moreover, regarding the hybridization and the cooperation aspects, the objectives of the DOLPHIN project are to deepen those studies as follows:

improve metaheuristics, it becomes essential to integrate knowledge about the problem structure, that we may get during the execution. This would allow to adapt operators that may be specific for multi-objective optimization or not. The goal here is to design auto-adaptive methods that are able to react to the problem structure.

*Design of cooperative metaheuristics:*Previous studies show the interest of hybridization for a global optimization and the importance of problem structure study for the design of efficient methods. It is now necessary to generalize metaheuristic hybridization and to propose
adaptive hybrid models that may evolve during the search while selecting the appropriate metaheuristic. Multi-objective aspects have to be introduced in order to cope with specificities of multi-objective optimization.

*Design of cooperative schemes between exact methods and metaheuristics:*Once the study on possible cooperation schemes will be ended, we will have to test and compare them in the multi-objective context.

works on parallel metaheuristics allow us to speed up the resolution of large scale problems. It could be also interesting to study the robustness of the different parallel models (in particular in the multi-objective case) and to propose rules that determine, given a specific problem, which kind of parallelism to use.

Of course these goals are not disjoined and it will be interesting to simultaneously use hybrid metaheuristics and exact methods. Moreover, those advanced mechanisms may require the use of parallel and distributed computing in order to easily make evolve simultaneously cooperating methods and to speed up the resolution of large scale problems.

proceed in two phases: validation on academic problems, for which some best known results exist and use on real problems (industrial) to cope with problem size constraints.

distributed multi-objective aspects in the ParadisEO Platform (see the paragraph on software platform).

Parallel and distributed computing may be considered as a tool to speedup the search to solve large MOPs and to improve the robustness of a given method. Moreover, the joint use of parallelism and cooperation allows the improvement of the quality of the obtained Pareto sets. Following this objective, we will design and implement parallel models for metaheuristics (evolutionary algorithms, tabu search approach) and for exact methods (branch and bound algorithm, branch and cut algorithm) for solving different large MOPs.

One of the goal of the DOLPHIN project is to integrate the developed parallel models into software frameworks. Several frameworks for parallel distributed metaheuristics have been proposed in the literature. Most of them focus only either on evolutionary algorithms or
on local search methods. Only few frameworks are dedicated to the design of both of the two families of methods. On the other hand, existing optimization frameworks either do not provide parallelism at all or just supply at most one parallel model. In this project, a new framework for
parallel hybrid metaheuristics is proposed, named
*Parallel and Distributed Evolving Objects (ParadisEO)*based on EO. The framework provides in a transparent way the hybridization mechanisms presented in the previous section, and the parallel models described in the next section. Concerning the developed parallel exact methods for MOPs,
we will integrate them into well-known frameworks such as COIN.

According to the family of addressed metaheuristics, we may distinguish two categories of parallel models: parallel models managing a single solution and parallel models that handle a population of solutions. The major single solution-based parallel models are the following: the
*parallel neighborhood exploration model*and the
*multi-start model*.

*The parallel neighborhood exploration model*is basically a ``low level" model that splits the neighborhood into partitions that are explored and evaluated in parallel. This model is particularly interesting when the evaluation of each solution is costly and/or when the size of the
neighborhood is large. It has been successfully applied to the mobile network design problem (see Application section).

*The multi-start model*consists in executing in parallel several local searches (that may be heterogeneous), without any information exchange. This model raises particularly the following question: is it equivalent to execute
klocal searches during a time
tthan executing a single local search during
k×
t? To answer this question we tested a multi-start Tabu search on the quadratic assignment problem. The experiments have shown that the answer is often landscape-dependent. For example, the multi-start model may be well-suited for landscapes with multiple
basins.

On the other hand, parallel models that handle a population of solutions are mainly: the
*island model*, the
*central model*and
*the distributed evaluation of a single solution*. Let us notice that the last model may also be used with single-solution metaheuristics.

In
*the island model*, the population is split into several sub-populations distributed among different processors. Each processor is responsible of the evolution of one sub-population. It executes all the steps of the metaheuristic from the selection to the replacement. After a given
number of generations (synchronous communication), or when a convergence threshold is reached (asynchronous communication), the migration process is activated. Then, exchanges of solutions between sub-populations are realized, and received solutions are integrated into the local
sub-population.

*The central (Master/Worker) model*allows to keep the sequentiality of the original algorithm. The master centralizes the population and manages the selection and the replacement steps. It sends sub-populations to the workers that execute the recombination and evaluation steps. The
latter return back newly evaluated solutions to the master. This approach is efficient when the generation and evaluation of new solutions is costly.

*The distributed evaluation model*consists in a parallel evaluation of each solution. This model has to be used when, for example, the evaluation of a solution requires access to very large databases (data mining applications) that may be distributed over several processors. It may
also be useful in a multi-objective context, where several objectives have to be computed simultaneously for a single solution.

As these models have now been identified, our objective is to study them in the multi-objective context in order to use them advisedly. Moreover, these models may be merged to combine different levels of parallelism and to obtain more efficient methods , .

In the last years, loosely and closely coupled clusters of workstations have become a real architectural alternative to traditional supercomputing environments. As a matter of fact, workstations are more and more powerful while they are less and less expensive. In addition,
interconnection networks have known a great technological evolution that makes them more speedy (Myrinet, GigaEthernet, etc.). Our research group has developed a parallel adaptive programming environment named
*MARS (Multi-user Adaptive Resource Scheduler)*. MARS provides for the programmers an API

Our objectives focus on these issues are the following:

*Design of parallel models for metaheuristics and exact methods for MOPs*: We will develop parallel cooperative metaheuristics (evolutionary algorithms and local search such as Tabu search) for solving different large MOPs. Moreover, we are designing a new exact method, named PPM
(Parallel Partition Method), based on branch and bound and branch and cut algorithms. Finally, some parallel cooperation schemes between metaheuristics and exact algorithms have to be used to solve MOPs in an efficient manner.

*Integration of the parallel models into software frameworks*: The parallel models for metaheuristics will be integrated in the ParadisEO software framework. The proposed multi-objective exact methods must be first integrated into standard frameworks for exact methods such as COIN
and BOB++. A
*coupling*with ParadisEO is then needed to provide hybridization between metaheuristics and exact methods.

*Efficient deployment of the parallel models on different parallel and distributed architecture including GRIDs*: The designed algorithms and frameworks will be efficiently deployed on non-dedicated networks of workstations, dedicated cluster of workstations and SMP (Symetric
Multi-processors) machines. For GRID computing platforms, peer to peer (or P2P) middlewares (XtremWeb-Condor) will be used to implement our frameworks. For this purpose, the different optimization algorithms may be re-visited for their efficient deployment (see
).

In this project, some well known optimization problems are re-visited in terms of multi-objective modelization and resolution:

Flow-shop scheduling problem: The flow-shop problem is one of the most well-known scheduling problems. However, most of the works of the literature use a mono-objective model. In general, the objective minimized is the total completion time (makespan). Many other criteria may be used to schedule tasks on different machines: maximum tardiness, total tardiness, mean job flowtime, number of delayed jobs, maximum job flowtime, etc. In the DOLPHIN project, a bi-criteria model which consists in minimizing the makespan and the total tardiness is studied.

Routing problems: The vehicle routing problem (VRP) is a well-known problem and it has been studied since the end of the 50's. It has a lot of practical applications in many industrial areas (ex. transport, logistic, ...). Existing studies of the VRP are almost all concerned with the minimization of the total distance only. The model studied in the DOLPHIN project introduces a second objective whose purpose is to balance the length of the tours. This new criterion is expressed as the minimization of the difference between the length of the longest tour and the length of the shortest tour. As far as we know, this model is one of the pioneer work of the literature.

The second routing problem is a generalization of the covering tour problem (CTP). In the DOLPHIN project, this problem is solved as a bi-objective problem where a set of constraints are modeled as an objective. The two objectives are: i) minimization of the length of the tour; ii) minimization of the largest distance between a node to be covered and a visited node. As far as we know, this study is among the first work which tackle a classic mono-objective routing problem by relaxing constraints and building a more general MOP.

For all studied problems, standard benchmarks have been extended to the multi-objective case. The benchmarks and the obtained results (optimal Pareto front, best known Pareto front) are available on the Web pages associated to the project and from the MCDM (International Society on Multiple Criteria Decision Making) web site. This is an important issue to encourage comparison experiments in the research community.

With the extraordinary success of mobile telecommunication systems, service providers have been affording huge investments for network infrastructures. Mobile network design appears of outmost importance and then is a major issue in mobile telecommunication systems. The design of large cellular networks is a complex task with a great impact on the quality of service and the cost of the network. With the continuous and rapid growth of communication traffic, large scale planning becomes more and more difficult. Automatic or interactive optimization algorithms and tools would be very useful and helpful. Advances in this area will certainly lead to important improvements concerning the service quality and the deployment cost.

This need is now even more acute with the advent of third-generation systems, such as Universal Mobile Telecommunication System (UMTS), because of the increased complexity of the system and the number of parameters that must be considered:

large deployment zones (many hundreds of km
^{2}for an urban network);

high density (many urban networks with more than 1000 antennas);

multi-periodic planification (important investment on long-term planification);

introduction of many cooperating systems with different technologies (GSM, DCS, UMTS).

In this project, the solution of planification problems, in terms of modelization and resolution, are developed in a multi-criteria context associating financial criteria (cost of the network), technical criteria (coverage, availability), and marketing criteria (quality of service). Two complementary design problems are considered:

Radio mobile network design: This work is realized in collaboration with France Telecom R&D. Engineering of radio mobile telecommunication networks involves two major problems: The design of the radio network and the frequency assignment. The design consists in positioning base stations (BS) on potential sites, in order to fulfill some objectives and constraints. The frequency planning sets up frequencies used by BS with criteria of reusing. In this project, we address the first problem. Network design is a NP-hard combinatorial optimization problem. The BS problem deals with finding a set of sites for antennas from a set of pre-defined candidate sites, determining the type and the number of antennas, and setting up the configuration of different parameters of the antennas (tilt, azimuth, power, ...). A new formulation of the problem as a multi-objective constrained combinatorial optimization problem is considered. The model deals with specific objectives and constraints due to the engineering of cellular radio network. Reducing costs without sacrificing the quality of service are issues of concern. Most of the proposed models in the literature optimize a single objective (cover, cost, linear aggregation of objectives, etc.).

Access network design: This work is realized in collaboration with Mobinets. The problem consists in minimizing the cost of the access network and maximizing its availability. Operators can only be competitive and economical if they have an optimized access network. Since the transmission costs are becoming high compared to the equipment costs, and the traffic demands are increasing with the introduction of new services, it is vital for operators to find cost-optimized transmission network solutions at higher bit rates. Many constraints dealing with technologies and service providers have to be satisfied. All deployed important technologies (ex. GSM, UMTS) will be considered.

Bioinformatic research is a great challenge for our society and numerous research entities of different specialities (biology, medical or information technology) are grouping each other to collaborate on specific thema.

Regarding genomics application, we collaborate with academic and industrial partners (IBL: Biology Institute of Lille; IPL: Pasteur Institute of Lille; IT-Omics firm) to study genetic factors that may explain multi-factorial diseases such as diabete, obesity or cardiovascular diseases. The originality is to look not for a single factor, but for one or several combinations of factors (that may be of different natures: genetic factors, environmental factors...) among a very large set of potential factors (several thousands). The scientific goals are to formulate hypotheses describing associations that may have any influence on diseases under study. These hypotheses will have to be verified by biologists thanks to additional experiments.

The genomic application of the DOLPHIN project deals with post-genomic, where a very large number of data are obtained thanks to advanced technologies and have to be analyzed. Hence, one of the goals of the project is to develop methods of analysis in order to extract information from data coming from biological experiments. Our originality is to first extract knowledge discovery problems from the genomic application and then to transform these knowledge discovery problems (datamining problems) into optimization problems where an (or several) objective function(s) has(have) to be optimized. Following this, it is possible to apply to these modified problems optimization methods.

An analysis of some genomic problems lead us to model them as an association rule mining problem (classical task of datamining). The main characteristic is that we look for several associations of factors among a very large set of potential factors. Hence the combinatoric (the number of potential solutions) associated to the problem is huge. Moreover, the landscape study, which shows a flat and rough landscape, indicates that exploratory methods are required to deal with such problems. Finally a study on criteria commonly used for association rule mining shows that a lot of criteria exist and that there exists no universal criterion. Hence we propose to model the association rules mining problem as a multi-objective one.

For all these reasons, we choose to base our resolution approach on evolutionary algorithms. Now, we are working on solving these problems in a multi-objective point of view, and studying hybridization of evolutionary algorithms with exact methods able to solve sub-problems (small size problems). Another important point is that evaluation functions, for such applications, may be very time consuming, because they require complex statistical computations. Therefore a parallel implementation is required in order to allow a large exploration of the search space without degrading the computational time

Since few years, scientists use the word of proteomic. It designs the global analyze of proteins . In fact, the proteins study is very important to understand the biological mechanisms in the living cells but also how different factors can influence them. The main goal of the proteomic is to identify experimental proteins. In this domain, we collaborate with the team of C. Rollando (research Director at CNRS) head director of the proteomic plateform of the genopole of Lille.

Our objective is to automatically discover proteins and new protein variants from experimental sprectrum. The protein variants and new proteins identification is a complex problem. In fact, it can not be summarized as a simple scoring of an experimental protein against protein databases, it needs additional processes to explore the huge space of potential solutions. For the protein variant, there are a lot of modifications that a protein can have : insertion, deletion or substitution of an amino acid and also post-traductional modifications on it. So it is not realist to generate all the possibilities of combination of modifications for a given size of protein (exponential complexity). The new protein identification problem is close because we can't generate all possible proteins (with also modifications) in order to found the experimental one. For both a method of optimization is necessary.

PARallel and DIStributed Evolving Objects (PARADISEO) is an extension of the templates-based, ANSI-C++ compliant evolutionary computation library EO. Thanks to the collaboration with Malaga (Spain) we have added Scatter Search to PARADISEO. We also integrate Multi Objective Evolving Objects embedding some features related to Pareto optimization in particular SPEA 1,2 and we validate the software on flowshop problem.

See web pages http://www.lifl.fr/OPAC

ASC_ME is an original algorithm for peptide mass fingerprinting directly from mass spectrum without mass list extraction. First, each protein refered in the FASTA data base, is digested according to enzyme specificity. At this stage a formal proof of the completeness of the digestion algorithm will be given. The isotopic cluster of each peptide is then calculated using Fast Fourier Transform using the algorithm introduced by A.L. Rockwood. The predicted spectrum is then matched against the experimental spectrum. The simplest scoring based on spectrum multiplication gave better results than the classical peptide mass fingerprinting engines. The average time for performing all the steps, digestion, spectrum simulation, scoring is a few second per protein, so a non-redundant human FASTA base is scanned in one day.

See web pages http://www.lifl.fr/OPAC.

We have proposed a new exact method, called the Parallel Partitioning Method (PPM), able to solve efficiently bi-objective problems. This method is based on the splitting of the search space into several areas leading to elementary exact searches and determines all the Pareto front into three stages. First, the two extreme efficient solutions are computed in order to limit the search space. In the second stage, well spread solutions are searched, thanks to an -constraint approach, in order to divide the search space into well-balanced partitions. The third stage consists in finding the other efficient solutions in each partition. This method has a parallel behavior which may increase its performances. It has been applied on a bi-objective flow-shop and gave interesting results.

There are many methods but none are specially effective for the protein variants and new proteins identification. We develop a completely new evaluation function called FFT (for Fast Fourier Transform) based on an algorithm developped by A.L. Rockwood
to compute isotopic distribution. We have developed a optimized version of this algorithm in order to build
simulated spectra for making scoring between experimental MS spectrum and simulated spectrum. It's the first time that a scoring using MS data works on experimental spectrum directly without making the mono-isotopic list extraction of the experimental spectrum

This score has been validated by a research of known proteins on a database. Typically, we had experimental data corresponding to a well known protein and the function has found the correct protein by parsing a protein database in fasta format (the UniProt database http://www.expasy.uniprot.org/database/download.shtml, UniProtKB/Swiss-Prot in fasta format).

This score will now be used in optimization methods to discover new proteins and new variants.

Association rules is a very general model in datamining. It allows to find relationships between items of a database. The main quality measures used to identify association rules are the support and the confidence. The first one identifies frequent rules whereas the second one identifies true rules. In the case of genomic data, biologists are not really interested in frequent rules and other criteria have to be taken into account. Hence, we made a statistical analysis of about 20 criteria that have been proposed in the litterature to evaluate association rules. This analysis has been driven on several databases (coming from genomic and other applications fields). This analysis showed that some criteria are strongly correlated whereas other are complementary. Hence, regarding this analysis, we propose to use a complete model for the association rules problem using five complementary criteria: Support, confidence, conviction, interest and surprise. This model has been validated as it has been used by a multi-objective evolutionary algorithm that managed to find interesting rules.

Within the framework of the Doctorate thesis of Mohand Mezmaz, a Grid-based approach for the Branch-and-Bound algorithm has been proposed. The approach deals with the dynamic load balancing and checkpointing-based fault tolerance issues. Indeed, the irregular nature of the tree explored by the algorithm and the volatile and large scale characteristics of the Grid involve a large amount of load balancing and checkpointing operations. These operations require an exorbitant communication and storage cost associated with the work units (collections of nodes) dynamically generated. Therefore, the proposed approach is based on special codings of the explored tree and the work units allowing to optimize the dynamic distribution and checkpointing processes on The Grid.

The algorithm has been applied to the Flow-Shop scheduling problem, one of the hardest challenge problems in combinatorial optimization. As it was previously presented, the problem consists roughly to find a schedule of a set of jobs on a set of machines that minimizes the total execution time (makespan). The jobs must be scheduled in the same order on all machines, and each machine can not be simultaneously assigned to two jobs. Using our algorithm, the problem (50 jobs on 20 machines) has been optimally solved for the first time. The method allowed not only to improve the best known solution for the problem instance but it also proved the optimality of the provided solution. The experiments have been performed on a Grid including processors from simultaneously Grid5000 and different educational networks of Université de Lille1 (Polytech'Lille, IEEA, IUT "A"). The number of processors averaged approximately 500, and peaked at 1245 machines. The optimal solution has been found on December 4th with a total wall-clock time of seven weeks.

A first version of the algorithm has been published in Springer Verlag LNCS 3470, proceedings of the European Grid Conference ( ). The improved algorithm has been presented in HDR report of Nouredine Melab.

The design of grid-aware metaheuristics often involves the cost of a painful apprenticeship of parallelization techniques and complex grid computing technologies. In order to free from such burden the developers who are unfamiliar with those advanced features, combinatorial optimization
frameworks must integrate the up-to-date parallelization techniques and allow their transparent exploitation and deployment on computational grids. We recently have proposed a framework called
*ParadisEO*dedicated to the reusable design of parallel metaheuristics for only dedicated parallel hardware platforms. During the last year (2004-2005) of the Doctorate thesis of Sebastien Cahon, ParadisEO has been extended (
*ParadisEO-CMW*) to allow the design and deployment of these parallel metaheuristics on computational grids. Such extension consisted in two steps: firstly, the design and algorithmics associated with the parallel models provided and encapsulated into the framework have been revisited.
The objective is to take into account the major characteristics of the grids: multi-administrative domain, large scale, volatility and heterogeneity. Secondly, ParadisEO has been coupled with the Condor-MW grid middleware. ParadisEO-CMW has been validated on two real-world problems: the
mobile network design (contract with France Telecom R&D) and the building of predictive models in NIR spectrum analysis (Collaboration with the LASIR labs. of Université de Lille1).

This work has been published in the proceedings of the
*IEEE/ACM CCGRID'2005*international conference. An extented version of the paper has been accepted with minor revision for publication in the
*Journal of parallel and Distributed Computing*and is currently under decision. The ParadisEO-CMW free software is available at:
http://www.lifl.fr/~cahon/paradisEO.

(2003-2005): The cooperation with France Telecom has been reinforced with this action. The objective here is the design and implementation of a framework for solving multi-objective optimization problems in the mobile telecommunications area.

(2004-2006): In this recent contract, the objective is to model and solve the access network design problem (GSM and UMTS technologies).

MOST Project ``Methodologies for Optimization in Transport and Telecommunications Systems'' (2000-2006) of the CPER (Regional Contract) TACT operation. The OPAC / DOLPHIN team leads the action ``Multi-objective optimization''.

ACI ``Masse de données'' (Large dataset) Project GGM ``Geno-Medical Grid'' (2004-2007), in collaboration with LIRIS (Lyon) and IRIT (Toulouse) laboratories. Our concern in this project is the design and implementation of parallel multi-objective optimization techniques to extract association rules from large and distributed genomic and medical data.

ACI GRID'5000 Grant (2004-2007). This project deals with the deployment of a national grid platform. Our laboratory will host a cluster of more than 250 PCs. The DOLPHIN project coordinated this action for Lille.

Arcir project "Puces Nano 3D" (2004-2006), supported by the Region. This projects aims the creation of new types of microarrays. It is a collaboration with IEMN (Institut d'Electronique, Microélectronique et Nanotechnologies de Lille), IBL (Institut de Biologie de Lille), IPL (Institut de Biopuces de Lille).

Member of the European Network of Excellence EvoNet: The goal of this network is to coordinate the research on evolutionary computation between academic research laboratories and industrials. Our team is associated mainly to the actions EvoTel (Evolutionary computation for Telecommunications) and EvoBio (Evolutionary computation for Bioinformatics).

University of Malaga (Spain, 2003-2005): A collaboration has been initiated on optimization frameworks on Grids.

University of Luxembourg (2004-2007): A common project has been initiated on robust optimization for cutting problems.

University of Constantine (2005-2007): CMEP program with the University of Constantine (Algeria) on "Metaheuristics for optimization of hard problems"

European project GRAAL (2004-2007) on algorithms for optimisation on telecommunication

The project had some visitors during the year 2005:

Enrique Torres Alba (Malaga, Spain)

Francesco Luna (Malaga, Spain)

Gabriel Luque (Malaga, Spain)

Mohammed El Bachir Menai (Constantine, Algeria)

Mohamed Batouche (Constantine, Algeria)

Mohamed-Khireddine Kholladi (Constantine, Algeria)

Khaled Mellouli (HEC, Tunis, Tunisia)

Co-fondator and co-chair of the group META (Metaheuristics: Theory and Applications, http://www.lifl.fr/~talbi/META). This group is associated to the ROADEF (French Operations Research Society), and the CNRS research groups GDR ALP and MACS.

Chair of the group PM2O (Multi-objective Mathematical Programming, http://www.lifl.fr/PM2O). This group is associated to the ROADEF (French Operations Research Society), and the CNRS research group GDR I3.

Direction of the CIB (Bioinformatics Center) of the Genopole of Lille.

Scientific Committee of the Genopole of Lille.

EURO-PAREO (European working group on Parallel Processing in Operations Research).

EURO-EU/ME (European working group on Metaheuristics).

ECCO (European Chapter on Combinatorial Optimization).

JET national group on evolutionary computation.

ERCIM (European Research Consortium for Informatics and Mathematics) working group on Soft Computing.

Special issue of the international journal "International Journal on Foundations of Computer Science" on parallel computing for complex problem solving (2005).

Book on "Parallel Combinatorial Optimization" (Wiley & Sons, USA, 2005).

E. Alba, E-G. Talbi, A. Zomaya et F. Ercal, in IJFCS "International Journal of Foundation of Computer Science", Special issue of the international journal on parallel computing for complex problem solving, 2005.

P. Siarry, E-G. Talbi, "Metaheuristics for complex optimization problems", in JESA "Journal Européen des Systèmes Automatisés", 2005.

EA'2005 (International Conference on Artificial Evolution)

ROADEF'2006 (Major National Conference in Operations Research in France)

NIDISC Workshop (International Workshop on Nature Inspired Distributed Computing) organized jointly with ACM/IEEE IPDPS (International Parallel and Distributed Processing Symposium): NIDISC'05(Denver, USA), NIDISC'06 (Rhode Island, USA).

International Conference EA'2005 (Artificial Evolution), Oct 2005, Lille, France.

EGC'2006 (Extraction and Gestion des Connaissance), Jan 2006, Lille, France.

ROADEF'2006 (Major National Conference in Operations Research in France), Feb. 2006, Lille, France.

Flowshop Contest: the Scheduling Challenge, Oct 2005. http://www.lifl.fr/~talbi/challenge/index.html

Review of journal papers:

Parallel Computing

IEEE Transactions on Systems Man and Cybernetics

Annals of Operations Research

Calculateurs Parallèles

IEEE Transactions on Parallel and Distributed Systems

Journal of Supercomputing

IEEE Transactions on Evolutionary Computation

Parallel and Distributed Computing Practices

Genetic Programming and Evolvable Machines

Journal of Heuristics

European Journal of Operational Research

Journal of Computational Optimization and Applications

Information Processing Letters

Extraction de connaissances et apprentissage

European Physical Journal B

4OR

Journal of Mathematical Modelling and Algorithms

Bioinformatics

International Journal of Production Economics

Computers and Operation Research

...

Review of different projects :

JEI (Jeune Equipe Innovante) of the DRRT (Délégation Régionale à la Recherche et à la Technologie) of the french research ministry (2005)

Expert for ANR "Calcul intensif et Grilles de calcul" (2005)

Expert for evaluation of LARODEC laboratory (Tunis, Tunisia, 2005)

Expert for Australian Research Council (ARC, Australia, 2005)

International Conferences on Evolutionary Computation:

CEC (Congress on Evolutionary Computation): CEC'05 (Edhimburg).

GECCO (Genetic and Evolutionary Computation Conference): GECCO'2005.

EvoCOP (European Conference on Evolutionary Computation in Combinatorial Optimization): EvoCOP'2005

EvoBIO (European Workshop on Evolutionary Computation and Bioinformatics): EvoBio'2005

MIC (Metaheuristics International Conference): MIC 2005, Vienne, Austria, Aug 2005

HM'2005 Second International Workshop on Hybrid Metaheuristics, Barcelona, Spain, Aout 2005

EMO " International Conference on Evolutionary Multi-criterion Optimization ", EMO'05, Guanajuato, Mexique, Mar 2005

SIAM (International Conference on Data Mining): SIAM 2005, Newport Beach, USA, Apr 2005.

Workshop DEXA GLOBE'05 " Grid and peer-to-peer computings impacts on large scale heterogeneous distributed database systems ", Copenhagen, Denmark, Aout 2005

IFIP NPC International Conference on Network and Parallel Computing: NPT'2005, Beiking, Nov 2005.

First conference on Operations Research Practice in Africa ORPA-1, Ouagadoudou, Burkina Faso, Avril 2005

HiPCoMB'2005 " First IEEE Workshop on High Performance Computing in Medicine and Biology ", Fukuaka, Japan, Juil 2005

ICCIB'2005 " International Conference on Biologically Inspired Computing and Computers in Biology", Alberta, Canada, May 2005

Pr Talbi was referee of the following thesis:

June 2005, Phd of M. D. Hernandez at University of Genève and Swiss institute of Bioformatic, Genève, Suisse, " Stratégies d'optimisation combinatoire pour le problème de l'alignement local multiple sans indels et application aux séquences protéiques ", Jury: A. Danchin, E-G. Talbi, R. Gras, Ron D. Appe, B. Chopard.

July 2005, Phd of L. Baduel at University of Nice-Sophia Antipolis, " Typed groups for the Grid ", Jury: J. Montagnat, H. Bal, A. Schiper, E-G. Talbi, E. Cecchet, D. Caromel, F. Baude.

Dr Dhaenens was referee of the following thesis:

June 2005, Phd of B. Eteve at University of Tours, "Des problèmes d'ordonnancement multicritères de type just-à- temps : Modélisation et résolution", Jury: M-C Portmann, B. Perz, E. Salanville, C. Dhaenens, V. T'Kindt, J-C. Billaut.

Postgraduate: ``Optimization methods'' (E-G. Talbi).

Postgraduate: ``Data mining for bioinformatics'' (E-G. Talbi, L. Jourdan).

Undergraduate (USTL): ``Operations Research'' (L. Jourdan).

Undergraduate (Polytech'Lille): ``Operations Research'' (C. Dhaenens, E-G. Talbi).

Undergraduate (Polytech'Lille): ``Graphs and combinatorics'' (C. Dhaenens, F. Seynhaeve, E-G. Talbi).

Undergraduate (Polytech'Lille): ``Data mining'' (E-G. Talbi, C. Dhaenens).

Undergraduate (Polytech'Lille): ``Advanced Optimization'' (E-G. Talbi, F. Seynhaeve).

Undergraduate (Polytech'Lille): ``Production Management'' (C. Dhaenens).

Undergraduate (Polytech'Lille): ``Distributed Systems'' (N. Melab, S. Cahon, E-G. Talbi).

Undergraduate (Polytech'Lille): ``Advanced Networking'' (N. Melab).