Clime is part of CEREA (Atmospheric Environment Center, joint laboratory of École des Ponts ParisTech/EDF R&D). CEREA activities are focused on air pollution and concern the numerical modeling of atmospheric environment in order to assess the impact of transport and energy production. This explains the focus of Clime on air quality and Chemistry-Transport-Models (CTM). We especially developed data assimilation methods for coupling CTMs with observational data (real time data and/or field campaigns).

The international political and scientific context is indicating the serious potential risks related to environmental problems and is pointing out the role that can be played by models and observation systems for the evaluation and forecasting of these risks. At the political level, agreements, such as the Kyoto protocol, European directives on air quality or on major accident hazards involving dangerous substances (Seveso directive), and the French Grenelle de l'Environnement establish objectives for the mitigation of environmental risks. These objectives are supported at a scientific level by international initiatives, like the European GMES program (Global Monitoring of Environment and Security), or national programs, such as the Air Chemistry program, which give a long term structure to environmental research. These initiatives emphasize the importance of observational data and the potential of satellite acquisitions.

The complexity of the environmental phenomena as well as the operational objectives of risk mitigation necessitate an intensive interweaving between physical models, data processing, simulation, visualization and database tools.

This situation is met for instance in atmospheric
pollution, an environmental domain whose modeling is gaining
an ever-increasing significance and impact, either at local
(air quality), regional (transboundary pollution) or global
scale (greenhouse effect). In this domain, modeling systems
are used for operational forecasts (short or long term),
detailed case studies, impact studies for industrial sites,
as well as coupled modeling (e.g., pollution and health,
pollution and economy). These scientific subjects strongly
require linking the models with all available data either of
physical origin (e.g., models outputs), coming from raw
observations (satellite acquisitions and/or information
measured
*in situ*by an observation network) or obtained by
processing and analysis of these observations (e.g., chemical
concentrations retrieved by inversion of a radiative transfer
model).

Clime has been jointly created, by INRIA and École des Ponts ParisTech, for studying these questions with researchers in data assimilation, image processing, and modeling.

Clime carries out research activities in three main area:

Data assimilation methods: inverse modeling, network design, ensemble methods, uncertainties estimation, uncertainties propagation...

Image assimilation: assimilating structures within environmental forecasting models, solving ill-posed image processing problems with data assimilation technics, defining dynamic models from images.

Development of integrated chains for data/models/outputs (system architecture, workflow, database, visualization, ...).

This activity is currently one of the major concerns of environmental sciences. It matches up the setting and the use of data assimilation methods, for instance variational methods (4D-var). An emerging issue lies in the propagation of uncertainties by models, notably through ensemble forecasting methods.

Although modeling is not part of the scientific objectives of Clime, the project-team has complete access to models developed by CEREA: the models from Polyphemus (pollution forecasting from local to regional scales) and Code_Saturne (urban scale). In regard to other modeling domains, Clime accesses models through co-operation initiatives either directly (for instance, the ocean model developed at MHI, Ukraine, has been provided to the team), or indirectly (for instance, issues on image assimilation in meteorology are studied in collaboration with operational centres).

The research activities tackle scientific issues such as:

Within a family of models (differing by their physical formulations and numerical approximations), which is the optimal model for a given set of observations?

How to make a forecast (and a better forecast!) by using several models corresponding to different physical formulations? It also raises the question: how should data be assimilated in this context?

Which observational network should be set up to perform a better forecast, while taking into account additional criteria such as observation cost? What are the optimal location, type and mode of deployment of sensors? How should the trajectories of mobile sensors be operated, while the studied phenomenon is evolving in time? This issue is usually referred as “network design”.

How to assess the quality of a forecast? How do data quality, missing data, data obtained from sub-optimal locations, affect the forecast? How to better include information on uncertainties (of data, of models) within the data assimilation system?

In geosciences, the issue of coupling data, in particular satellite acquisitions, and models is extensively studied for meteorology, oceanography, chemistry-transport models, land surface models. However, satellite images are mainly assimilated on a point-wise basis, without taking into account their spatial structures. To better understand our research orientation, a classification of image assimilation methods is proposed:

Image approach. Image assimilation allows the extraction of features from image sequences, for instance motion fields. A model of the dynamics is considered (often obtained by simplification of a physical model). An observation operator is defined to express the links between the model state and the pixel value. In the simplest case, the pixel value corresponds to one coordinate of the model state and the observation operator is a projection. However, in most cases, the operator is highly complex, implicit and non-linear. Data assimilation techniques are developed to control the initial state or the whole assimilation window. Image assimilation is also applied to learn reduced models from image data and estimate a reliable and small-size reconstruction of the dynamics. Fluminance and Clime are contributors on that topic.

Model approach. Image assimilation is used to control an environmental model and obtain improved forecasts. In order to take into account the spatial and temporal coherency of structures, specific image characteristics are considered, and dedicated norms and observation error covariances are defined. Moise and Clime are contributors on that topic.

Correcting a model. Another topic, mainly described for meteorology in the literature, concerns the location of structures. How to force the existence and to correct the location of structures in the model state using image information? Most of the operational meteorological forecasting institutes, such as MétéoFrance, UK-met, KNMI (in Netherlands), ZAMG (in Austria) and Met-No (in Norway), study this issue because operational forecasters often modify their forecasts based on comparisons between the model outputs and the structures displayed on satellite images. Research has been initiated in Clime on that problem.

An objective of Clime is to participate in the design and creation of software chains for impact assessment and environmental crisis management. Such software chains bring together static or dynamic databases, data assimilation systems, forecast models, processing methods for environmental data and images, complex visualization tools, scientific workflows, ...

Clime is currently building, in partnership with École des
Ponts ParisTech and EDF R&D, such a system for air
pollution modeling: Polyphemus (see web site
http://

The central application domain of the project-team is atmospheric chemistry. We develop and maintain the air quality modeling system Polyphemus, which includes several numerical models (Gaussian models, Lagrangian model, two 3D Eulerian models including Polair3D) and their adjoints, and different high level methods: ensemble forecast, sequential and variational data assimilation algorithms. Advanced data assimilation, network design, inverse modeling, ensemble forecast are studied in the context of air chemistry–note that addressing these high level issues requires controlling the full software chain (models and data assimilation algorithms).

The activity on assimilation of satellite data is mainly
carried out for meteorology and oceanography. This is
addressed in cooperation with external partners who provide
the numerical models. Concerning oceanography, the aim is to
improve the forecast of ocean circulation, by assimilation of
fronts and vortices displayed on the image data. Concerning
meteorology, the focus is on correcting the model location of
structures related to high-impact weather events (cyclones,
convective storms,
*etc*.) by assimilating the images.

Air quality modeling implies studying the interactions
between meteorology and atmospheric chemistry in the various
phases of matter, which leads to the development of highly
complex models. The different usages of these models comprise
operational forecasting, case studies, impact studies,
*etc*, with both societal (e.g., public information on
pollution forecast) and economical impacts (e.g., impact
studies for dangerous industrial sites). Models lack some
appropriate data, for instance better emissions, to perform
an accurate forecast and data assimilation techniques are
recognized as a key point for the improvement of forecast's
quality.

In this context, Clime is interested in various problems, the following being the crucial ones:

The development of ensemble forecast methods for estimating the quality of the prediction, in relation with the quality of the model and the observations. Sensitivity analysis with respect to the model's parameters so as to identify physical and chemical processes, whose modeling must be improved.

The development of methodologies for sequential aggregation of ensemble simulations. What ensembles should be generated for that purpose, how spatialized forecasts can be generated with aggregation, how can the different approaches be coupled with data assimilation?

The definition of second-order data assimilation methods for the design of optimal observation networks. Management of combinations of sensor types and deployment modes. Dynamic management of mobile sensors' trajectories.

How to estimate the emission rate of an accidental release of a pollutant, using observations and a dispersion model (from the near-field to the continental scale)? How to optimally predict the evolution of a plume? Hence, how to help people in charge of risk evaluation for the population?

The definition of non-Gaussian approaches for data assimilation.

The assimilation of satellite measurements of troposphere chemistry.

The activities of Clime in air quality are supported by the development of the Polyphemus air quality modeling system. This system has a modular design which makes it easier to manage high level applications such as inverse modeling, data assimilation and ensemble forecast.

The capacity of performing a high quality forecast of the
state of the ocean, from the regional to the global scales,
is of major interest. Such a forecast can only be obtained by
systematically coupling numerical models and observations (
*in situ*and satellite data). In this context, being
able to assimilate image structures becomes fundamental.
Examples of such structures are:

apparent motion linked to surface velocity;

trajectories, obtained either from tracking of features or from integration of the velocity field;

spatial structures, such as fronts, eddies or filaments.

Image Models for these structures are developed taking into account the underlying physical processes. Image data are assimilated within the Image Models to derive pseudo-observations of the state variables which are further assimilated within the numerical ocean forecast model.

Meteorological forecasting constitutes a major applicative challenge for Image Assimilation. Although satellite data are operationally assimilated within models, this is mainly done on an independent pixel basis: the observed radiance is linked to the state variable via a radiative transfer model, that plays the role of an observation operator. Indeed, because of their limited spatial and temporal resolutions, numerical weather forecast models fail to exploit image structures, such as precursors of high impact weather:

cyclogenesis related to the intrusion of dry stratospheric air in the troposphere (a precursor of cyclones);

convective systems (supercells) leading to heavy winter time storms;

low-level temperature inversion
leading to fog and ice formation,
*etc*.

To date, there is no available method for assimilating data which are characterized by a strong coherence in space and time. Meteorologists have developed qualitative Conceptual Models (CMs), for describing the high impact weathers and their signature on images, and tools to detect CMs on image data. The result of this detection is used for correcting the numerical models, for instance by modifying the initialization. The aim is therefore to develop a methodological framework allowing to assimilate the detected CMs within numerical forecast models. This is a challenging issue given the considerable impact of the related meteorological events.

Polyphemus (see the web site
http://
*etc*. It is able to handle simulations from local to
continental scales, with several physical models. It is
divided into three main parts:

libraries that gather data processing tools (SeldonData), physical parameterizations (AtmoData) and postprocessing abilities (AtmoPy);

programs for physical preprocessing and chemistry-transport models (Polair3D, Castor, two Gaussian models, a Lagrangian model);

drivers on top of the models in order to implement advanced simulation methods such as data assimilation algorithms.

In 2010, two stable versions were released. The main changes are: addition of an interface between the Eulerian model Polair3D and the data assimilation library Verdandi, addition of an interface to the (Fortran) chemistry-transport model Chimere, wider support of parallelization. Preliminary work has been carried out for a complete overhaul of the input/output operations and of the configuration files.

The leading idea is to develop a data assimilation library intended to be generic, at least for high-dimensional systems. Data assimilation methods, developed and used by several teams at INRIA, are generic enough to be coded independently of the system to which they are applied. Therefore these methods can be put together in a library aiming at:

making easier the application of methods to a great number of problems,

making the developments perennial and sharing them,

improving the broadcast of data assimilation works.

An object-oriented language (C++) has been chosen for the core of the library. A higher-level interface to Python is automatically built. The design raised many questions, related to high dimensional scientific computing, the limits of the object contents and their interfaces. The chosen object-oriented design is mainly based on three class hierarchies: the methods, the observation managers and the models. Several base facilities have also been included, for message exchanges between the objects, output saves, logging capabilities, computing with sparse matrices.

In 2010, Verdandi has been extended with several methods: extended Kalman filter, unscented Kalman filter and its reduced version, reduced minimax filter, Hamilton-Jacobi-Bellman solver and Monte Carlo simulations. The observations managers have been extended with aggregation capabilities. Several example models have been added. The documentation has been largely improved. The Python interface is now fully operational.

A first version (0.7) was released in September and a second one (0.8) in December.

Since the beginning, the CLIME project has been focusing on new techniques for data assimilation. As air quality is prone to non-Gaussian statistics, an expertise has first been developed on rigorous non-Gaussian approaches, often based on information-theoretical tools (maximum entropy on the mean, relative entropy, second order analysis, etc.). Another expertise is now being developed in multiscale data assimilation, and the mathematical tools required to deal with many space and time scales within data assimilation schemes.

In atmospheric and ocean data assimilation, and specifically in atmospheric chemistry, the choice of the resolution of control space is essential for the quality of the analysis. Yet, in the absence of a proper theoretical framework, this choice has little been explored.

We made significant progresses on this topic following three axes:

the development of a conceptual and mathematical framework for a multiscale assimilation of observations. The formalism, probabilistic and of Bayesian nature, makes the Best Linear Unbiaised Estimator (BLUE) analysis and the multiscale structure of control space consistent. This approach allows to determine errors that are scale dependent, such as representativity errors.

the optimal design of a grid for the control space, thanks to adaptive multiscale structures (tilings) within the previous formalism. We showed that most of the degrees of freedom for the signal used in data assimilation can be read in an adaptive grid whose grid-cells number is significantly lower than the number of grid-cells of the finest available grid.

the analytical construction of asymptotic solutions (large grid-cell number) that allows to find grids (optimal for data assimilation) of control space, in almost no time.

These have been illustrated on atmospheric chemistry applications: CO2 fluxes inverse modeling, global surveillance of nuclear tests, and tracer dispersion (ETEX).

The Best Linear Unbiased Estimator (BLUE) has been widely used in atmospheric and oceanic data assimilation. However, when the errors from data (observations and background forecasts) have non-Gaussian probability density functions (pdfs), the BLUE differs from the absolute Minimum Variance Unbiased Estimator (MVUE), minimizing the mean square a posteriori error. The non-Gaussianity of errors can be due to the inherent statistical skewness and positiveness of some physical observables (e.g., moisture, chemical species) or because of the nonlinearity of the models and observation operators acting on Gaussian errors. Non-Gaussianity of assimilated data errors can be justified from a priori hypotheses or inferred from statistical diagnostics of innovations (observation minus background). Following this rationale, we compute measures of innovation non-Gaussianity, namely its skewness and kurtosis, relating it to: a) the non-Gaussianity of the individual errors themselves, b) the correlation between nonlinear functions of errors, and c) the heteroscedasticity of errors within diagnostic samples. Those relationships impose bounds for skewness and kurtosis of errors which are critically dependent on the error variances, thus leading to a necessary tuning of error variances in order to accomplish consistency with innovations. We evaluate the sub-optimality of the BLUE as compared to the MVUE, in terms of excess of error variance, under the presence of non-Gaussian errors. The error pdfs are obtained by the maximum entropy method constrained by error moments up to fourth order, from which the Bayesian probability density function and the MVUE are computed. The impact is higher for skewed extreme innovations and grows in average with the skewness of data errors, especially if those skewnesses have the same sign. Application has been performed to the quality-accepted ECMWF innovations of brightness temperatures of a set of High Resolution Infrared Sounder (HIRS) channels. In this context, the MVUE led in some extreme cases to a potential reduction of 20-60% error variance as compared to the BLUE.

This study corresponds to a review published in Monthly Weather Review that discusses recent advances in geophysical data assimilation beyond Gaussian statistical modeling, in the fields of meteorology, oceanography, as well as atmospheric chemistry. The non-Gaussian features are stressed rather than the nonlinearity of the dynamical models, although both aspects are entangled. Ideas recently proposed to deal with these non-Gaussian issues, in order to improve the state or parameter estimation, are emphasised.

The general Bayesian solution to the estimation problem and the techniques to solve it are first presented, as well as the obstacles that hinder their use in high-dimensional and complex systems. Approximations to the Bayesian solution relying on Gaussian, or on second-order moment closure, have been wholly adopted in geophysical data assimilation (e.g., Kalman filters and quadratic variational solutions). Yet, nonlinear and non-Gaussian effects remain. They essentially originate in the nonlinear models and in the non-Gaussian priors. How these effects are handled within algorithms based on Gaussian assumptions is then described. Statistical tools that can diagnose them and measure deviations from Gaussianity are recalled.

Advanced techniques that seek to handle the estimation problem beyond Gaussianity are reviewed: maximum entropy filter, Gaussian anamorphosis, non-Gaussian priors, particle filter with an ensemble Kalman filter as a proposal distribution, maximum entropy on the mean or strictly Bayesian inferences for large linear models, etc. Several ideas are illustrated with recent, or original examples that possess some features of high-dimensional systems. Many of the new approaches are well understood only in special cases and have difficulties that remain to be circumvented. Some of the suggested approaches are quite promising, and sometimes already successful for moderately large though specific geophysical applications. Hints are given as to where progress might come from.

Two studies focused on inverse modeling of source and emissions of atmospheric pollutant. Even though the focus is on application, new methodologies are proposed.

Inverse modeling techniques are nowadays applied to improve spatially and temporary distributed emission inventories for (usually non-reactive) tracers. When applying such methods, one often encounters the so-called co-localisation problem, i.e., spurious corrections to the a priori inventory at places where emissions and observations are co-located. Several approaches are used to deal with this problem. These approaches include: coarsening the spatial resolution of the emissions, adding spatial correlations on the covariance matrices (influence radii); adding constraints on the spatial derivatives into the functional being minimized; multiplying the emission error covariance matrix by a diagonal matrix of weighting factors. We tested these methods for the commonly used case of Gaussian assumptions over the prior distribution of emissions (4D-Var) which results in the imposition of further constraints to ensure positive emissions and also, when applicable, to non-Gaussian assumptions that naturally restrict emissions to be positive and/or that assume a multiplicative nature of errors. These methods are applied so as to improve emissions at a city scale. Intercomparison of methods shows that even though all methods solve the co-localisation problem resulting in similar general patterns, detailed patterns can greatly change according to the method used: from smooth, isotropic and short range modifications to not so smooth, non-isotropic and long range modifications. When using a Poisson assumption over the emission probability function, emissions modification patterns are similar to those found when using Gaussian assumptions, but changes are multiplicative in the way of correction factors, closer to the nature of errors in emission inventories. Also, in the same framework, we proposed and tested a fully Bayesian approach that deals more naturally with the co-localisation problem, considering both Gaussian and Poisson prior density distributions.

The aim of this research activity is the implementation of data assimilation methods, particularly inverse modeling methods, in the context of an accidental radiological release from a nuclear facility (typically a power plant) in order to give to the decision makers accurate real-time forecasts for the radioactive plume trajectory. This study has been conducted through a partnership with the Finnish Meteorological Institute. In particular, two Chemistry-Transport Models have been used : Polair3D, developed by the CEREA, and SILAM, developed by the FMI. Highlights are:

Implementation and validation of data assimilation methods, simple enough to be used in operational context, but still efficient.

Attenuation by the data assimilation system of the bias of large errors.

Implementation and validation of statistical tools to discriminate, in the case where the release site is unknown, the power plant responsible for an accident.

We assess the efficiency of the ozone monitoring network over France (BDQA) by investigating a network reduction problem. We examine how well a subset of the BDQA network can represent the full network. The performance of a subnetwork is taken to be the root mean square error (RMSE) of the hourly ozone mean concentration estimations over the whole network given the observations from that subnetwork. Spatial interpolations (kriging) are conducted for the ozone estimation taking into account the ozone mean statistics and hourly-varying spatial correlations. The network reduction problem is solved using a simulated annealing algorithm. Significant improvements can be obtained through these optimisations. Removing optimally half of the stations leads to an estimation error of the order of the standard observational error (5ppb). The resulting optimal subnetworks are dense in large urban agglomerations. For large rural regions, the stations are uniformly distributed.

In this study, the BDQA background stations are partially redistributed over France under a set of design objectives which are defined on a regular grid that covers France. Spatial interpolations are used to extrapolate simulated concentrations (of chemistry-transport models or assimilation results) to these grid nodes. Three types of criteria are considered: the geostatistical, geometrical, and physical ones. Simulated annealing is employed to select optimally the stations. Significant improvement with all the proposed criteria has been found for the optimally redistributed network against the original background BDQA network. For complex objectives, e.g. that addressing the heterogeneity of ozone field, the physical criteria are more appropriate.

Due to the great uncertainties that arise in air quality modeling, relying on a single model may be not sufficient. Therefore ensembles of simulations are now considered in a wide range of applications, from uncertainty estimation to operational forecast.

Based on ensemble simulations, improved forecasts can be generated by means of linear combinations of the individual forecasts. A weight is associated to each model, depending on past observations and simulations (Figure ). In past years, new machine learning algorithms (sequential aggregation) were developed and used for this purpose. Most of these provide theoretical bounds on the performance (compared to the optimal constant model combination) and deliver significantly improved forecasts.

In previous applications of the sequential aggregation, the observation errors were not taken into account, and the weights were learned at observed locations and applied in the whole domain without theoretical support. In order to overcome these limitations, the aggregation procedure has been coupled with classical data assimilation methods. Instead of forecasting the observation, the procedure aims at forecasting analyses (computed with data assimilation methods) and is therefore called ensemble forecast of analyses. The performance of the forecasts is similar to the previous approach, but the observation error is taken into account and the spatial patterns are better forecast.

The new approach has been transferred to INERIS and
successfully applied to forecast peak ozone. It is now
running operationally on the Prév'air platform (
http://

Air quality forecasts are limited by strong uncertainties especially in the input data and in the physical formulation of the models. There is a need to estimate these uncertainties for the evaluation of the forecasts, the production of probabilistic forecasts, and a more accurate estimation of the error covariance matrices required by data assimilation.

Because a large part of the uncertainty in the forecast originates from uncertainties in the model formulation (primarily the physical parameterizations), a multimodel ensemble seems to be the adequate tool for uncertainty estimation. A large ensemble with 100 members was generated over year 2001 and analyzed with criteria like the Brier score. Work on the calibration of the ensemble has been carried out (Figure ). A sub-ensemble is extracted from the full ensemble so that it properly estimates the simulation uncertainties. The method proved to be robust both in space and time, so that it is possible to forecast the uncertainty of an ozone field over Europe.

Some work identified the impact of the different sources of uncertainties. Preliminary work has been carried out to quantify the representativeness errors and the observation errors, so as to better estimate the errors due to the model alone.

The first Monte Carlo simulations for aerosol simulations were launched at INERIS, using the chemistry-transport model Chimere and the Monte Carlo driver from Polyphemus. It made clear that some modeling of the correlation between the errors in the inputs (between aerosol species and size bins) was needed, and that the large biases made the uncertainty estimation difficult. Preliminary calibration of the resulting ensembles was also carried out.

For Monte Carlo simulations (without the multimodel approach), the perturbation of input fields has been improved with spatial constraints. The input fields can now be sampled based on a variance-covariance matrix so that the perturbations are dependent on space. It was shown this essentially solves the localization problem in the long-distance covariances of the output concentrations.

Sequences of images display structures evolving in time. This information is recognized of major interest by, for instance, meteorological forecasters. However, the satellite acquisitions are mostly assimilated in geophysical models on a point-wise basis, discarding the space-time coherence visualized by the evolution of structures. Assimilating images is thus becoming of major interest and the problem should be considered in two ways:

from the model's viewpoint, the problem is to control the location of structures using the observations,

from the image's viewpoint, a model of the dynamics and structures has to be built from the observations.

In both cases, image information is assimilated within models, raising a number of theoretical and experimental questions.

We address the issue of motion estimation on noisy images displaying large displacements of objects. “Noisy images” means that data contain either missing values or noisy measures. In order to capture the large displacements, we consider a non-linear transport equation for the image luminance function. Its resolution is an ill-posed problem. For solving it, we replace the usual Tikhonov spatial regularization techniques by a data assimilation method involving an evolution equation, that describes the dynamics of velocities. Data assimilation solves, with respect to the state vector (the velocity field) the evolution equation and the non-linear transport equation. To be generic, we use a simple dynamic model, expressing the constancy of velocity on a pixel trajectory. As this model is a rough approximation, a weak formulation of 4D-Var is considered, which includes an error on the evolution equation. Moreover, missing data and noisy measures are discarded from the solution, thanks to a confidence function which weights the contribution of these pixels. We quantified the method on synthetic data and applied it on video sequences and remote sensing acquisitions (SST and radar images) displaying wide areas of missing data.

The observation equation of a Data Assimilation system expresses the links between state and observation vectors. If observations are images, this link is complex and represented by an implicit and non-linear equation. A first strategy consists in a direct assimilation of images, using this implicit equation as the observation equation. The main advantage is the absence of pre-processing of the observation data, which allows inferring the observation error from the metadata given by satellite images providers.

However, an alternative strategy consists in extending the state vector by adding a component, named pseudo-image, that is comparable to the image observation. This requires to define an evolution equation for this additional quantity. The main advantage is that the observation equation becomes trivial and consists in a measure of the difference of the pseudo-image and the real observation, at satellite acquisition dates.

Addressing the issue of motion estimation, the linear advection of pixel brightness by velocity is used as observation equation for the first strategy and as part of the evolution equation for the second one. We compared results of both methods on synthetic and satellite data. We showed that the second strategy significantly improves the quality of results. The first reason is that the observation equation of the second strategy allows a more efficient comparison of the state and observation vectors. The second one is that the transport of image brightness, used as evolution equation of the pseudo-image quantity for the second strategy, is controlled by the method with a chosen time step, while there was no such possibility with the first strategy.

The objective is to infer the dynamics from a sequence
of satellite images. The application concerns the
estimation of surface velocity from Sea Surface Temperature
(SST) satellite acquisitions. We define an
*Image Model*(
*IM*) describing the temporal evolution of the surface
temperature and velocity fields. SST observations are then
assimilated in the
*IM*with 4D-Var methods. The data assimilation system
has several options: evolution equation, background value,
regularization term. We compared:

Two equations expressing heuristics on the dynamics: the stationary hypothesis and the shallow-water equation, that links the velocity to the water layer thickness.

Two regularization terms: one based on the gradient of the norm of velocity and on the incompressibility assumption, and the second based on the divergence and rotational of motion.

Several background conditions; different heuristics for the water thickness field and the velocity field have been tested.

It has been shown that the best results are obtained with the shallow-water equations, a regularization term that penalizes the variations of the divergence and the rotational, and null background value for the motion field.

In the context of sea surface velocity estimation, the
*Shallow Water Image Model*(
*SWIM*) is used to express the evolution of the
temperature and dynamics and is used for assimilating
satellite acquisitions over the Black Sea. The images are
NOAA/AVHRR Sea Surface Temperature data. The validation is
performed with Sea Level Anomaly (SLA) measured by the
altimeters onboard of Envisat, GFO and Jason1.

The outputs of the SWIM model are the surface velocity
and
h, thickness of the surface layer. The Sea Level
Anomaly can be estimated from
has its deviation from the value at rest. On another
hand, the altimeters are 1-dimensional instruments
measuring SLA along their track. We then compare the SLA
derived from
hand the one measured by the altimeters on the same
tracks. These two curves have the same shape (see
figure
), which is moreover strongly
correlated to the velocity vector. This validates the
estimation of Sea Surface velocity by SWIM.

The objective is to retrieve motion from satellite data. This is usually obtained by image processing methods or data assimilation techniques. One major drawback is the huge computational cost and required memory, due to the size of image data. To get round of that, we propose to use reduced-order models. Consider given an evolution model for the motion field, for instance the Euler equation, the aim is to approximate the model by projection on a reduced space.

This requires the definition of bases for images and motion fields. Two strategies are considered:

a Proper Orthogonal Decomposition
(POD) is applied on the discrete satellite image
sequence producing the image eigenvector basis. An
initial motion field is computed on the first two
images. It is used as input by the simulation model to
obtain snapshots, on which POD is again applied to get
the motion basis. Image and motion basis are both
learned from the image data. Images are projected on
the image reduced space and represented by coefficients
b_{j}(
t)while motion fields are
projected on the motion reduced space with coefficients
a_{i}(
t).

the reduced basis is obtained by keeping only the large scale elements of a wavelet basis. It does not depend on the input image sequence. The images and the two components of motion are projected on the subspace spanned by it, and represented again by coefficients.

We then apply a 4D-Var method and assimilate the
coefficients
b_{j}(
t)to retrieve the coefficients
a_{i}(
t).

A quantitative validation has been performed with a twin experiment and the POD strategy. Results are displayed on Figure .

In air quality modeling, the model error is supposed to take into account uncertainty on the meteorological fields (winds and vertical diffusivities), the segregation and mixing in emission areas which affect the effective kinetic rates of reactions, the boundary condition fields, all physical parameterizations (dry deposition, wet scavenging), etc. All the above sources of error have bounded energy and typically are not normally distributed or independent.

In order to take this into account in the data assimilation process, we applied the Minimax State Estimation (MSE) approach. It is well known that a bottle-neck of minimax estimation algorithms as well as of the family of Kalman-type filters is the dimension issue. To solve it, we applied a powerful version of the minimax filter developed for the so-called differential-algebraic equations. This filter works for any linear ordinary differential equation with time-dependent coefficients on any linear manifold which can also change in time. Based on this novel approach, we derived a computationally tractable reduced version of the minimax filter. The derivation was made in a new and rigorous framework. In addition to the reduction, the new filter shows all the interesting properties inherited from the minimax setting, especially the description of the (model and observational) errors which only need to have bounded energy. The later is important in the context of applications because the errors are always bounded. In contrast, most high-dimensional statistical filters are designed for unbounded random errors with special distribution function.

The new algorithm was implemented into the data assimilation library Verdandi so that it can be applied to a wide range of numerical models. In particular, experiments are currently held for air quality forecast. The filter has been plugged to the air quality modeling system Polyphemus which is used for daily operational forecasts.

Data assimilation algorithms based on the 4D-Var formulation look for the so-called conditional mode estimate. The latter maximizes the conditional probability density function, provided the initial condition, model error and observation noise are realizations of independent Gaussian random variables. However this Gaussian assumption is often not satisfied for geophysical flows. Moreover, the estimation error of the conditional mode estimate is not a first-hand result of these methods. The issues above can be addressed by means of the Minimax State Estimation (MSE) approach. It allows to filter out any random (with bounded correlation operator) or deterministic (with bounded energy) noise and to assess the worst-case estimation error. We applied MSE to the problem of estimation of the velocity field of uncertain 2D incompressible fluid flow, provided the flow is displayed on a given sequence of images and the model error and observation noise are bounded deterministic functions of any shape. The aim is to describe the evolution in time of the reachability set, that is the set of all states of the 2D Navier-Stokes Equation compatible with the observations. Having the estimate of the Reachability Set (RS), we defined the minimax estimate of the velocity field as a minimax center of the RS and the worst-case estimation error is set to be the “radius” of the RS. We applied the classical Galerkin projection methods in order to approximate NSE by a finite dimensional system of ordinary differential equations with high state space dimension. The latter system is then reduced to the low-dimensional Differential-Algebraic Equation (DAE) (based on the robust Proper Orthogonal Decomposition (POD)) and the extended minimax state estimation algorithm has been applied to the resulting low-dimensional DAE. It should be noted that direct application of POD in state estimation problems is not efficient as the state equation contains an uncertain model error. Therefore one would need to generate an extended set of snapshots sampling the uncertain parameters. The presented two-step reduction allowed to omit the difficulty connected with sampling a model error which takes values in a functional space.

In the field of forest fires risk management, important
challenges exist in terms of people and goods preservation.
Answering to strong needs from different actors
(firefighters, foresters), researchers focus their efforts
to develop operational decision support system tools that
may forecast wildfire behavior. This requires the
evaluation of models performance, but currently, simulation
errors are not sufficiently qualified and quantified. As
the main objective is to realize a
*decision support system*, it is required to establish
robust forecast evaluations. In the context of the ANR
project IDEA, the evaluation of model simulations has been
started with a bibliographical review, the implementation
of a series of forecast scores and the definition of a
series of ideal cases where some classical scores may fail
(especially in taking into account the dynamics).

Clime is partner with INERIS (National Institute for Environmental and Industrial Risks) in a joint cooperation devoted to air quality forecast. This includes research topics in uncertainty estimation, data assimilation and ensemble modeling.

Clime also provides support to INERIS in order to operate the Polyphemus system, for ensemble forecasting and uncertainty estimations at local and continental scale.

A research contract between CEREA/Clime and the IRSN on the topic of network design came to an end in 2010. An expert report which summarises what has been acomplished has been delivered to the IRSN.

Clime takes part to a joint Ilab with the group SETH (Numtech). The objective is to (1) transfer Clime work in data assimilation, ensemble forecasting and uncertainty estimation, with application to urban air quality, (2) identify the specific problems encountered at urban scale in order to determine new research directions. The first study addresses the application of data assimilation at urban scale.

Clime takes part with Numtech and AirParif to the project EXPAIR, from the call “Futur en Seine” organized by Cap Digital and notably supported by Île-de-France. Clime is in charge of providing data assimilation methods in order to generate analyses out of ADMS simulations and AirParif ground observations.

Clime has run an R2DS project “Optimization of Monitoring Networks for Air Quality”, with a grant from Île-de-France region. The aim is to optimally reduce/design a monitoring network for pollutants (ozone in particular). It ended in July.

Clime takes part to the ANR project ATLAS ("From Applications to Theory in Learning and Adaptive Statistics"). Clime collaborates with Gilles Stoltz, co-leader of ATLAS, on the application of machine learning to air quality forecasting.

Clime takes part to the ANR project IDEA that addresses the propagation of wildland fires. Clime is in charge of the estimation of the uncertainties, based on sensitivity studies and ensemble simulations.

The three-year project Multiscale Data Assimilation in Geophysics [MSDAG] has been accepted by the ANR SYSCOMM. Fours partners are in the project: CEREA (Clime project-team, Marc Bocquet, PI of the whole project), Fluminance and Moise Project-team, LSCE (Peter Rayner). The preparatory work has led to the definition of a document where an overview of state-of-the-art methodological approaches for multiscale data assimilation is presented. The project has started on January 2009.

Clime is running the project MIDAR “Inverse modelling of deposition measurements in case of a radiological release”, under the framework of the LEFE-ASSIM program of INSU. This includes a cooperation with the Institute for Safety Problems of Nuclear Power Plants (National Academy of Sciences of Ukraine).

Clime has run a LEFE-ASSIM project of INSU, on “Advanced data assimilation techniques for the monitroing of accidental realse of pollutant”. This includes a cooperation with the air qulity group of the Finnish Meteorological Institute. It ended in September.

Clime is member of the ERCIM working group “Environmental Modeling”. Within this working group, Clime cooperates with FORTH-IACM on remote sensing methods and definition of ontologies for complex applications.

Following cooperations with CMM
(Chile) on establishing air quality forecast systems and
data assimilation capacities in Chile (supported by a
research project STIC-AmSud), the Chilean meteorological
office (Dirección Meteorológica de Chile) now produces
its operational air quality forecasts with Polyphemus.
The 3-day forecasts essentially cover Santiago. The
forecasts are accessible online in the form of maps, time
series and video (
http://

Marc Bocquet is co-president of the scientific commitee of the INSU/LEFE action Assimilation.

Isabelle Herlin is member of the Scientific Council of CSFRS (High Council for Strategic Education and Research).She is leading the evaluation commitee of international collaborations at INRIA. She is member of the program committee for the “INRIA-Industrie” meeting on information technologies for sustainable cities.

Data Assimilation for Geophysics (Master OACOS (ocean, atmosphere, climate and space observation), ENSTA ParisTech/École des Ponts ParisTech): 30 hours (Marc Bocquet, Vivien Mallet).

Master on nuclear energy: 9 hours (Marc Bocquet, Irène Korsakissok, Vivien Mallet,Victor Winiarek).

Algorithmics: 30 hours, ESIEE Management (Isabelle Herlin).

Introduction to chemistry-transport models (Paris VII): 4 hours (Vivien Mallet).

Air Pollution (École des Ponts ParisTech): 3h00 (Vivien Mallet).

Serge Guillas from University College of London, United Kingdom: from February 17th to 19th, 2010.