Section: New Results

Optimization and compilation techniques

Generated Code Optimization

Performing a model-to-source transformation, whereby a high-level language is mapped to CUDA or OpenCL, is an attractive option. In particular, it enables to harness the power of GPUs without any expertise on the GPGPU programming. In this work, we add a new compilation option for the Gaspard2 transformation chain : UMl2OpenCL to detect shareable data zone. The tilers from ArrayOL, which allow express the data parallelism from repetitive tasks, are analyzed in time compilation to create areas of shared data. The identification of these areas is crucial to allow us loading data on shared areas of memory that have high throughput. Consequently, programs automatically generated shall have performances comparable to manually well written programs.

Methodology to generate OpenCL code from MARTE models

In order to reduce design complexity, we propose an approach to generate code for OpenCL API, an open standard for parallel programming of heterogeneous systems. This approach is based on Model Driven Engineering (MDE) and Modeling and Analysis of Real-Time and Embedded Systems (MARTE) standard proposed by Object Management Group (OMG). The aim is to provide resources to non-specialist in parallel programming to implement their applications. Moreover, concepts like reuse and platform independence are present. Since we have designed an application and execution platform architecture, we can reuse the same project to add more functionalities and/or change the target architecture. Consequently, this approach helps industries to achieve their time-to-market constraints. The resulting code, for the host and compute devices, are compilable source files that satisfy the specifications defined on design time.

Profiling into Models

Regarding the models fine tuning, we propose integrating software-profiling results to higher-level specification models [56] . The aim is to optimize the models and, consequently, the generated code. The model optimization approach relies on the Gaspard2 branch dedicated to code generation for OpenCL and GPUs [58] . We offer software execution feedback, based on models transformation traceability  [75] , to model designers. These feedbacks enable the designers to tune their models in order to improve the software performances even if they do not have in-depth knowledge on the running platform (GPU). First, the code is generated from a first designed model using Gaspard2. The resulting code is then executed within an existing profiling environment. Afterwards, profiling results are delivered directly to designer as annotations in the model. Basically, we move up two types of information, using traceability. The first type directly results from the profiler, e.g. processor occupancy, onto specific regions in the model, enlightening the regions that requires tuning. The second type correspond to results of an expert system analysis that we provide. Information of this second type is delivered to designers as advices in the model annotations. The expert system generates these advices from platform features and running results. For example, it can suggest changing the shape of a task in order to optimize the processor occupancy. The more we feed the knowledge base and engine of the expert system, the more it is able to give better advices.

The model optimization relies on the hypothesis that the high level models are error free. Since these models are complex, it is difficult for the designers to conceive them correctly the first time. We propose a new approach, enabling the model designer to debug its models. For this purpose, we offer a quick and automatic code instrumentation to the model designer. As for the model optimisation, we take advantage of the model transformation traceability to keep the link between models and software execution and to provide execution information feedback. Hence, the information produced in the running environment during the software execution is moved up directly onto the models, allowing the model designers verifying the behavior of their software, directly on the high level models.

Static Analysis of Polychronous Specifications with SMT Theory

As opposed to single clocked synchronous programming paradigms, polychronous formalism allows specification of concurrent data flow computation on signals such that various data flows can evolve asynchronous with respect to each other. We formulated the clock analysis in Signal compilation [38] and the detection of false loops in MRICDF as a decision problem in Satisfiability Modulo Theory (SMT) [30] [59] . Due to recent interests in SMT solvers, a number of efficient solvers are available which offer a greater expressiveness in dealing with non Boolean constraints and allow us to discern false loops from realizable causalities in reasonable computation time. We demonstrated that several polychronous specifications rejected by current compilers due to their inability to identify only true causal loops, can be synthesized as correct sequential embedded software.

Programming functional and real-time aspects simultaneously

An embedded system is usually required to respect real-time constraints related to physical constraints, either those of its environment or those of the physical devices it controls. First, it is often multi-periodic since its devices have different physical characteristics and must therefore be controlled at different rates. Second, the system must respect deadline constraints, which may correspond for instance to a maximum end-to-end latency requirement between observations (inputs) and the corresponding reactions (outputs). A correct implementation must respect all the real-time constraints and must also be functionally deterministic, meaning that the outputs of the system are always the same for a given sequence of inputs. Current practice often deals with this two aspects separately, while our objective is to deal with them simultaneously.

To this intent, we must first introduce real-time primitives at the programming language level. We carried on previous work on the Prelude language [19] , which provides such primitives in a synchronous data-flow language. We produced a complete end-to-end framework for the design and the implementation of embedded systems on a symmetric multicore: the Prelude -SchedMCore toolset [32] . We recently started a Master research project to study how real-time aspects could be introduced in more traditional programming paradigms with the Scala a language.

The Prelude compiler translates a program into a set of dependent periodic tasks. We proposed a new dynamic priority-based scheduling policy capable of dealing with the extended precedence constraints (constraints between tasks of different periods) of such systems in [36] , [48] .

Finally, as Prelude xs semantics defines formally both the functional and the temporal behaviour of a system, we studied temporal formal verification in [46] .

Chaining Localized Model Transformation

Usually, two transformations can only be chained if the output metamodel of the first one is included into the input metamodel of the second one. This compliance issue forces to design either tailored fine-grain model transformations for a dedicated chain or large and complex transformations. In both cases, transformations are not reusable and hardly maintainable.

In order to solve this problem, we have introduced localized transformations which apply to a (typically very small) subset of an input metamodel of a transformation. Each localized transformation is designed and implemented to accomplish a specific transformation task, and involves and is applicable to a few concepts. The input and output metamodels of these transformations are not disjoint contrarily to traditional transformations; new chaining constraints have to be defined. We have thus defined new chaining constraints based on a type analysis to specify when two transformations can be chained in one, both or any order  [96] . In some cases, this analysis concludes that the transformations can be chained in both order but with some input models, the two output models resulting of the two chaining, are not the same. We have introduce an intermediary abstraction level independent of any transformation language that focuses on read, modified, created and deleted metaelements. We are pursuing our investigations with this new abstraction level.