Cairn is a common project with CNRS, University of Rennes 1 (ENSSAT Lannion and IFSIC Rennes) and ENS Cachan-Antenne de Bretagne, and is located on two sites: Rennes and Lannion. The team has been created on January the 1 st, 2008 and is a “reconfiguration” of the former R2D2 research team from Irisa.
The scientific aim of Cairnis to study hardware and software architectures of Reconfigurable System-on-Chip ( rs o c ), i.e. integrated chips which include reconfigurable blocks whose hardware configuration may be changed before or even during execution.
Reconfigurable systems have been considered by research in computer science and electrical engineering for about twenty years , thanks to the possibilities opened up initially by Field Programmable Gate Arrays ( fpga) technology and more recently by reconfigurable processors , . In fpga, a particular hardware configuration is obtained by loading a binary stream that is used to shape parameterizable blocks into specific hardware functions. In a reconfigurable processor, coarse-grained logic elements operate on word-size operands and employ reconfigurable operators as computing elements. They are generally tightly coupled with one or more processor cores and act as reconfigurable computing accelerators. Usually, the configuration streams are small enough to ensure run-time – or dynamic – reconfiguration. In a broader sense, hardware reconfiguration may happen not only in a single chip, but in a distributed hardware system, in order to adapt this system to changing conditions. This happens, for example, on a mobile system.
Recent evolutions in technology and modern hardware systems confirm that reconfigurable chips are increasingly used in recent applications or embedded into more general System-on-Chip ( so c) . Rapidly changing application standards in fields such as communications and information security ask for frequent modifications of the devices. Software updates may often not be sufficient to keep devices in the market, but hardware redesigns are quite expensive. The need to continuously adapt to changing environments (e.g. cognitive radio) is another incentive to use dynamic reconfiguration at runtime. Finally, with technologies at 65 nm and below, manufacturing problems strongly influence electrical parameters of transistors, and transient errors caused by particles or radiations will also more and more often appear during execution: error detection and correction mechanisms or autonomic self-control can benefit from reconfiguration capabilities.
Standard processors or system-on-chip enable to develop flexible software on fixed hardware. Reconfigurable platform enables to develop flexible software on flexible hardware .
As the density of chips increases , power efficiency has become "the Graal" of chip architects: not only for portable devices but also for high-performance general-purpose processors, power (or energy) considerations are as important as the overall performance of the products. This power challenge can only be tackled by using application-specific architectures, or at least by incorporating some application-specific elements into so cs, as asics (Application Specific Integrated Circuit) are much more power-efficient than gpps (General-Purpose Processor). The designers of so cs thus face a very difficult challenge: trading between the flexibility of gppwhich leads to high-volume and short design time, and the efficiency of asics which helps solving the power efficiency problem. Therefore, reconfigurable architectures are often recognized to exhibit the best trade-off potential between power, performance, cost and flexibility , because their hardware structure can be adapted to the application needs.
However, reconfigurable systems raise several questions:
What are the basic elements of a good reconfigurable system? In the early days, they were bit-level operators, and they tend to become word-level operators. There is however no agreement on the model that should be used.
How can we reconfigure such a system quickly? When to reconfigure? What is the information needed to reconfigure?
How can we program efficiently reconfigurable systems? We would like to have compilers, not hardware synthesizers and place-and-routers.
In an application, what must be targeted to reconfigurable chips and what to conventionalprocessors? More generally, how can we transform and optimize an algorithm to take advantage of the potential of reconfigurable chips?
The scientific goal of Cairnis to contribute to answer these questions, based on our background and past experience. To this end, Cairnintends to approach energy efficient reconfigurable architectures from three angles: the invention of new reconfigurable platforms, associated design and compilation tools, and the exploration of the interaction between algorithms and architectures. Power consumption and processing power are considered as the main constraints in our proposed architecture, design flow and algorithm optimizations, in order to maximize the global energy efficiency of the system.
Wireless Communicationis our privileged field of applications. Our research includes the prototyping of parts of these applications on reconfigurable and programmable platforms. Moreover in the framework of research and/or contractual cooperations other application domainsare considered: image indexing, video processing, cryptography and traffic filtering in high-speed networks.
Members of CAIRN team have collaboration with large companies like STmicroelectronics, Thomson, Thales, Atmel, Xilinx, Geensys or SME like Aphycare Technologies, SmartQuantum, R-interface and are involved in several national or international funded projects (ITEA2 Geodes, ANR funded Cifaer, Fosfor, SoCLib, Roma, SVP, Semim@ge, OverSoC, BioWiic and "Poles de compétitivités" funded Spring, Captiv, Transmedi@, RPS2).
The Gecos project is an open source Eclipse based compiler infrastructure developed in the group since 2004. Gecos was designed so as to address part of the shortcomings of existing C/C++ infrastructures such as SUIF and LLVM. Its main characteristic are high-lighted below:
A complete Java based implementation based on sound software engineering practices (design patterns and eclipse plug-in), which allows for fast prototyping of complex compiler passes.
An extensible C compiler front-end, which makes C language extensions easy to implement.
Most of the classical compiler passes, including constant propagation, liveness analysis, single assignment form, procedure inlining, loop unrolling, etc.
A retargetable back-end for ASIP and soft core synthesis including code selection, flexible register allocators, generic assembler, VHDL and SystemC soft-core generation.
To illustrate the commitment of the group members in this infrastructure, it is to note that an important development effort took place in 2008. This effort resulted in the merging of two representations (HCDG for Hierarchical Conditional Dependency Graph, and PRDG for Polyhedral Reduced Dependency Graph) in a new one. These models find their root respectively in the synchronousmodel (HCDG) and in the polyhedral model(PRDG).
The development of the Gecos infrastructure is expected to move ahead at an even faster pace thanks to two joint STMicroelectronics/INRIA projects, which aim at using the Gecos infrastructure as a source-to-source optimizing compiler for hardware synthesis.
The development of complex applications is traditionally divided into three steps: theoretical study of the algorithms, study of the target architecture and implementation. When facing new
emerging applications such as high-performance, low-power, low-cost mobile communication systems or smart sensor-based systems, it is mandatory to strengthen the design flow by a simultaneous
study of both algorithmic and architectural issues
Figure shows the global design flow that we propose to develop. It is organized in levels which refer to our three research themes: application optimization (algorithmic, fixed-point), platform instance optimization (hardware and middleware), and stepwise refinement and compilation of software tasks (transformations, configuration generation).
In the rest of this part, we briefly describe the challenges concerning new reconfigurable platformsin Section , the issues on compiler and synthesis toolsrelated to these platforms in Section , and the remaining challenges in algorithm architecture interactionin Section .
The available technology for building reconfigurable systems is the field-programmable gate arrays ( fpga) introduced to the market in the mid 1980s. Today's components feature millions of gates of programmable logic, and they are dense enough to host complete computing systems on a programmable chip. These fpgas have been the reconfigurable computing mainstream for a couple of years and achieve flexibility by supporting gate-level reconfigurability, e.g. they can be fully optimized for any application at the bit-level. However, their flexibility is achieved at a very important interconnection cost. To be configured, a large amount of data must be distributed via a slow serial programming process to all the processing and interconnection resources. Configurations must be stored in an external memory. These interconnection and configuration overheads lead to energy inefficient architectures.
To increase optimization potential of programmable processors without the fpgas penalties, the functional-level reconfiguration was introduced. Reconfigurable Processorsare the most advanced class of reconfigurable architectures. The main concern of this class of architectures is to support flexibility while reducing reconfiguration overhead. Precursors of this class were the KressArray , RaPid , and RaW machines which were specifically designed for streaming algorithms. Morphosys , Remarc or Adres contain programmable ALUs with a reconfigurable interconnect. These works have led to commercial products such as the Extreme Processor Platform (XPP) from PACT, Bresca from Silicon Hive, designed mainly for telecommunication application.
Another strong trend towards heterogeneous reconfigurable processors can be observed. Hybrid architectures combine standard gppor dspcores with arrays of field-configurable elements. These new reconfigurable architectures are entering the commercial market. Some of their benefits are the following: functionality on demands (set-top boxes for digital TV equipped with decoding hardware on demand), acceleration on demand (coprocessors that accelerate computationally demanding applications in multimedia, communications applications), and shorter time to market (products that target asicplatforms can be released earlier using reconfigurable hardware).
Dynamic reconfiguration allows an architecture to adapt to various incoming tasks. This requires complex management and control which can be provided as services of a real-time operating system (RTOS) : communication, memory management, task scheduling and task placement . Such an Operating System (OS) approach has many advantages: it is a complete design framework, independent of the technology and of the hardware architecture, thus helping to drastically reduce the design time of the complete platform.
Communications in a reconfigurable platform is also a very important research subject. The role of communication resources is to support transactions between the different components of the platform, either between macro-components of the platform – main processor, dedicated modules, dynamically reconfigurable parts of the platform – or inside the elements of the reconfigurable parts themselves. This has motivated studies on Networks on Chip for Reconfigurable so cs that trade off flexibility and quality of services.
In Cairnwe mainly target reconfigurable system-on-chip (RSoC) defined as a set of computing and storing resources organized around a flexible interconnection network and integrated onto a single Silicon chip (or programmable chip such as FPGAs). The architecture is specialized for an application domain, and the flexibility is featured by hardware reconfiguration and software programmability. Therefore, computing resources are heterogeneous and we focus on the following:
Reconfigurable hardware blocks with a dynamic behaviorwhere reconfigurability can be achieved at the bit or at the operator level. Our research aims at defining new reconfigurable computing and storing resources. Since reconfiguration must occur as fast as possible (typically a few cycles), the reduction of the configuration bit-stream is also a key issue.
When performance and power consumption are major constraints, it is well known that optimized specialized hardware blocks (often called IPs for Intellectual Properties) are the best (and often the only) solution. As a flexible extension of specialized IPs, we study multi-mode componentsfor very specific set of high-complexity algorithms, without loss of performances.
Specialized processors with tailored instruction-setstill offer a viable solution to trade-off between energy efficiency and flexibility. They are especially interesting in the context of recent FPGA platforms where multiple processors can be easily embedded. We also focus on the automatic generation of an optimal customized instruction-set and of the associated data-path and interface with an embedded processor core.
The absence of compilers is one of the major limitations for the use of reconfigurable architectures in real-life applications. Therefore, the ability to compile and optimize code on reconfigurable hardware platforms from high-level specifications is the key for a real success story and is a hot topic in the research community. We continue our research efforts to offer efficient tools with close links to architectures.
Most current programming environments for reconfigurable systems consist of separate tool flows for the software and the hardware. Processor code and configuration data for the reconfigurable processing units are handcrafted and wrapped into libraries of functions. Progress beyond current practices calls for compilers capable of generating code and configurations from a high-level general-purpose programming language. Such a compiler decides which operations go into the reconfigurable processors. Loops or frequently executed code fragments are good candidates for reconfigurable platforms. For general-purpose code, this leads to several problems: it is difficult to extract sets of operations with matching granularity at a sufficient level of parallelism; inner loops of general-purpose programs often contain excess code; i.e. code that must be run on a CPU such as exceptions, function or system calls. Efforts aimed at automatic code generation for reconfigurable architectures include .
Another approach to programming and design of reconfigurable platform, especially for special-purpose elements, is to use techniques inspired from high-level synthesis. Here also, loops are the target of the methods: the goal is to either generate special-purpose architectures made out of arithmetic operators or to produce parallel architectures. In both cases, the output may be either efficient special-purpose hardware for computation-intensive tasks or the parameters for a reconfigurable architecture. Such approaches will eventually create a bridge between compilation techniques and hardware design.
Finally, we continue to investigate floating-point to fixed-point automatic conversion with the objective to develop an open-source tool. Multimedia and signal processing are main
application fields for reconfigurable platforms. In general, these algorithms are specified using floating-point operations, but, for efficiency reasons, they have to be implemented with
fixed-point operations either in software for
dspcores or as special-purpose hardware. Unfortunately, fixed-point conversion is very challenging and time-consuming, typically demanding 25 to 50%
of the total design or implementation time
As Cairnfocus on domain-specific systems-on-chip including reconfigurable capabilities, algorithmic-level optimizations have a great potential on the efficiency of the overall system. Based on the skills and experiences in “signal processing and communications” of some Cairn's members, we conduct research on algorithmic optimization techniques under two main constraints: energy consumption and computation accuracy; and for two main application domains: fourth-generation (4G) mobile telecommunications and wireless sensor networks (WSN). These application domains are very conducive to our research activities. The high complexity of the first one and the stringent power constraint of the second one, require the design of specific high-performance and energy efficient SoCs. Sections to detail the application domains that we focus on.
Our research is based on realistic applications, in order to both discover the main needs created by these applications and to invent realistic and interesting solutions.
The high complexity of the Next-Generation (4G) Wireless Communication Systemsleads to the design of real-time high-performance specific architectures. The study of these techniques is one of the main field of applications for our research, based on our experience on WCDMA for 3G implementation.
In Wireless Sensor Networks(WSN), where each wireless node has to operate without battery replacement for a long time, energy consumption is the most important constraint. In this domain, we mainly study energy-efficient architectures and wireless cooperative techniques for WSN.
Intelligent Transportation Systems(ITS), and especially Automotive Systems, more and more apply technology advances. While wireless transmissions allow a car to communicate with another or even with road infrastructure, automotive industrycan also propose driver assistance and more secure vehicles thanks to improvements in computation accuracy for embedded systems.
Other important fields will be also considered: specialized hardware systems for the filtering of the network traffic at high-speed, high-speed true-random number generation for security, content-based image retrieval and video processing.
With the advent of the next generation (4G) broadband wireless communications, the combination of MIMO wireless technology with multicarrier CDMA (MC-CDMA) has been recognized as one of the most promising techniques to support high data rate and high performance. Moreover, future mobile devices will have to propose interoperability between wireless communication standards (4G, WiMax ...) and then implement MIMO pre-coding, already used by WiMax standard. Finally, in order to maximize mobile devices lifetime and guarantee quality of services to consumers, 4G systems will certainly use cooperative MIMO schemes or MIMO relays. Our research activity focuses on MIMO pre-coding and MIMO cooperative communications with the aim of algorithmic optimization and implementation prototyping.
Sensor networks are a very dynamic domain of research due, on the one hand, to the opportunity to develop innovative applications that are linked to a specific environment, and on the other hand to the challenge of designing totally autonomous communicating objects. Cross-layer optimizations lead to energy-efficient architectures and cooperative techniques dedicated to sensor networks applications.
Technology advances, for embedded devices inside vehicles or communication systems between vehicles (V2V) or with road infrastructure (V2R), allow to significantly improve the security of drivers and road users.
One of our goals is to propose new low-cost and energy-efficient mobile communications solutions to ease and make safer road traffic conditions. Considering "intelligent" road signs and vehicles, i.e. equipped with an autonomous radio communication system, drivers will be able to receive at any time various information about traffic fluidity or road sign identification. In particular, cooperative MIMO techniques are used to decrease the energy consumption of the communications.
Other research related to automative systems is for example the design of proved accurate fixed-point controllers.
In multimedia applications, audio and video processing is the major challenge embedded systems have to face. It is computationally intensive with power requirements to meet. Video or image processing at pixel level, like image filtering, edge detection and pixel correlation or at bloc level such as transforms, quantization, entropy coding and motion estimation have to be accelerated. We investigate the potential of reconfigurable architectures for the design of efficient and flexible accelerators in the context of multimedia applications.
Besides the development of new reconfigurable architectures, the need for efficient compilation flow is stronger than ever. Challenges come from the high parallelism of these architectures and also from new constraints such as resource heterogeneity, memory hierarchy and power. This is a hot topic in the reconfigurable architecture community, and we continue our effort to offer efficient compilers with close links to architectures. We aim at defining a highly effective software framework for the compilation of high-level specifications into optimized code executed on a reconfigurable hardware platform. Figure shows the global framework that we are currently developing.
Our approach assumes that the application is specified as a hierarchical block diagram of communicating tasks expressing data-flow or control, where each task is expressed using languages like C, Signal, Scilab or Matlab, and is then transformed into an internal representation by the compiler front-end. Then, our framework is based on applying some high-level transformations onto the internal representation.
Different internal representations are used depending on the targeted transformations or the targeted architectures.
The classical Control and Data Flow Graphs (CDFG) is the main internal formalism of our framework. It is the basis for transformations like code optimizations, fixed-point transformations, instruction-set extraction or scheduling. Gateways will be provided from CDFG to other supported formalisms.
The Hierarchical Conditional Dependency Graph (HCDG) format
Other internal representations like Signal Flow Graphs (SFG) and Polyhedral Reduced Dependence Graph (PRDG) will be used respectively for application accuracy estimation and loop parallelization techniques.
Finally, back-end tools enable the generation of code like VHDL for the hardwired or reconfigurable blocks, C for embedded processor software, and SystemC for simulation purposes (e.g. fixed-point simulations). The compiler front-end, the back-end generators, the transformation toolbox as well as the different internal representations and their respective gateways are based on a unique framework: the Gecos framework.
Besides Cairn's general design workflow, and in order to promote research undertaken by Cairn, several hardware and software prototypes are developed. Among those, five distributed software are presented in this report: Gecos a flexible compilation platform, Float2Fix an infrastructure for the automatic transformation of software code aiming at the conversion of floating-point data types into a fixed-point representation, FWRToolbox a Matlab open-source toolbox used to analyze and optimize the Finite Word Length effects of digital filters/controllers, UPaK for the compilation and the synthesis targeting reconfigurable platforms, and Interconnect Explorer a high-level power and delay estimation tool for on-chip interconnects.
Gecos is a generic compilation flow built from simple transformation tasks. In Gecos tasks are assembled using a simple script language : variables carry data (intermediate representation, profiling data, etc.) and functions call transformations. This simple language allows to easily create or customize compilation flows. Gecos is developed using OSGi plugins and Eclipse extension framework which ease the installation and the development of new transformation and analysis tasks. The platform is in active development but it already contains many transformations of a standard modern compiler (C frontend, SSA transformation, code selector, register allocator, etc.). Some works are currently undertaken to use gecos as a bridge between other compilation or synthesis activities (UPaK, FloatToFix).
Find more information on its dedicated web page:
http://
In parallel to the definition of the fixed-point conversion methodology a tool (FloatToFix) is under development to provide an optimized fixed-point specification from the application description. The application is described with a C code using floating-point types. The tool generates a C code using fixed-point data types ( ac_fixed) from Mentor Graphics. The development of the FloatToFix tool is achieved by an INRIA graduate engineer. The first version of the FloatToFix tool was based on the SUIF front-end developed by the Stanford University. The tool has been refactored to separate the fixed-point conversion transformations and the SUIF front-end. To obtain a tool independent of the front-end, an XML interface has been developed. The main development efforts have been concentrated on the accuracy evaluation module. This module is made-up of three mains steps corresponding to noise level modelization, transfer function determination and quantization noise expression generation. A first version of this module, able to handle non-recursive and linear-time-invariant systems, will be available soon. The roadmap for the development of F2F is its inclusion into the Gecos framework.
The FWRToolbox is a Matlab open-source toolbox used to analyze the Finite Word Length effects of digital filters/controllers implementations and find “optimal” realizations (according to
open-loop/closed-loop sensitivity measures, roundoff noise analysis, etc.). It allows to automatically generate C, Matlab of VHDL fixed-point code. Find more information on its dedicated web
page:
http://
We are developing (with strong collaboration of Lund University, Sweden and Queensland University, Australia) UPaK Abstract Unified Pattern Based Synthesis Kernel for Hardware and Software Systems . The preliminary experimental results obtained by the UPak system show that the methods employed in the systems enable a high coverage of application graphs with small quantities of patterns. Moreover, high application execution speed-ups are ensured, both for sequential and parallel application execution with processor extensions implementing the selected patterns. UPaK is one the the basis for our research on compilation and synthesis for reconfigurable platforms. It is based on the HCDG representation of the Polychrony software designed at INRIA-Rennes in the project-team Espresso.
In today's so c, interconnects introduce delays and consume power and chip resources. A tool, called Interconnect Explorer, has been developed for high-level estimation of interconnect performances which provides fast and accurate figures for both time and power consumption . These results allowed us to determine new key issues that have to be taken into account for future performance optimizations. This tool is based on energy and timing multi-input tables obtained from transistor-level simulations. The tool can be configured by setting the following parameters: technology, metal layer, bus length, bus width, frequency, and bufferization type. Interconnect Explorer provides users with results in terms of energy consumption, static power consumption, average dynamic power consumption, maximum dynamic power consumption, instantaneous dynamic power consumption, maximum frequency allowed on the bus, area of the bus (wires and buffers), commutation rate per bit and percentage of appearance of each type of transitions. The maximum error between consumption results provided by Interconnect Explorer and SPICE simulation is less than 6%. Interconnect Explorer provides results instantaneously (less than 1 second computation) whereas a SPICE simulation of the same configuration takes several hours.
Our aim is to propose new flexible arithmetic operators in term of accuracy. To optimize fixed-point implementations, architectures must offer operators which support different data word-lengths. Operator efficiency can be increased using subword parallelism (SWP) scheme. A single SWP instruction performs the same operation on multiple sets of subwords in parallel using SWP operators. In the existing SWP capable processors, the choices for subword data sizes are usually 8, 16, 32 bits etc. The reason behind the selection of these subword sizes being the less complexity of SWP operator design especially when subword sizes are multiple of the smallest subword size. However in multimedia applications, the input data (pixels) for computations is 8, 10, 12 and sometimes 16 bits. These multimedia data sizes are not in coordination with existing processor's subwords sizes resulting in the under utilization of processors resources. We designed SWP versions of some basic operators (add, absolute, multiply and multiply accumulate MAC) which can support multimedia oriented subword sizes (8, 10, 12 and 16) . Subsequently, these basic operators will be used to implement more complex multimedia operators according to the user requirements.
In a mobile society, more and more devices need to continuously adapt to changing environments that is to say devices will have to be flexible to implement different algorithms at different times. Such mode switches require more than just software based changes but also adaptation of the application specific hardware components. To issue this requirement, we investigate two ways. The first one is the design of a reconfigurable processor able to adapt its computing structure to a dedicated domain: video and image processing applications. The processor is built around a pipeline of coarse grain reconfigurable operators exhibiting a good trade-off between performance and power consumption. On the contrary of what has been done in previous reconfigurable processors, flexibility is not obtained through the use of a flexible interconnect network but on the use of configurable domain-dedicated units. This work is done in the context of the ROMA ANR project. We particularly investigate reconfigurable operator design and compilation framework. The second way is multi-mode architecture design which does not lead to any reconfiguration time penalty. Such architectures implement all required operators according to the pre-defined set of computations to be performed. In order to optimize area these operators are shared between the set of algorithms and control logic steers the data to operators depending on the particular algorithm to be executed at a specific time. Area overhead depends on algorithm matching (too different algorithms or performance constraints can not lead to architectures with efficiently shared operators). Targeted domains are typically channel encoding, cryptography and multimedia. This work is done through a collaboration with IMS Lab. (B. Le Gal).
For several years, memory area in SoC architectures has increased strongly. Today, circuit designers define SoC with an ever-increasing number of memory banks to store large amount of data. These banks are organized into a multi-level embedded memory hierarchy to ensure high-performance. However, due to the weak activity of the memory and according to the number of transistors, memory power consumption, and especially static power consumption, represents a major part of the global SoC power.
In this context, we have defined a reconfigurable memory hierarchy model suited to specialized SoC . The organization is based on a multi-banked architecture which ensures high performance accesses to the data. Each processing element can be directly connected to one or several memory banks. These links can be local through multibus network, or global through complete crossbar module. These links can be reconfigured and the hierarchy can be tuned according to the applications needs. Each memory bank has its own address generator. These generators permit to produce all regular address sequences and can be compared to a very small and simple core processor which enables to produce irregular sequences through the execution of address generation programs.
To optimize the power consumption, the Dynamic Voltage Scaling (DVS) technique has been included in the control of the memory architecture. The memories can be placed into low-power modes according to data access constraints to save energy. The low-power modes are managed by a global controller which ensures the global application constraints.
Our research aims at defining a platform model for the definition of dynamically reconfigurable architectures and associated methods. The main objective is to have a unified and formal specification of the platform that can be efficiently exploited in retargetable compilation flows, and in automated back-end generators for simulation and synthesis. The model is defined to cover different models of architectures, from fpgas to networks of processors, through coarse-grained reconfigurable data-path.
This method allows to easily develop a new dynamically reconfigurable architecture based on computing resources and generic interconnection schemes, to explore performances and to validate the architecture by simulations at different levels of abstraction. The definition of the architecture is done with the help of a high-level architecture description based on the MAML language developed at the University of Erlangen-Nuremberg. The first part of the work realized has permitted to interconnect different kinds of computing resources (configurable logic blocks, reconfigurable functional units or processors) and to produce the required reconfiguration resources for an homogeneous reconfiguration process. Different architecture paradigms (FPGA, reconfigurable datapaths such as DART or regular parallel processor architectures such as WPPA) can thus be quickly modeled. The second part of this work consisted in the generation of the configuration controller, after analyzing the MAML specifications of the architecture and of the reconfiguration resources produced. This work leads to the development of the Mozaic framework. The tool is able to generate a reconfigurable platform and to explore some important parameters (reconfiguration costs and time, flexibility and size of interconnect, number of resources). The proposed reconfiguration paradigm for computing and interconnect resources has been optimized for very fast reconfiguration process, which is essential to reach the timing constraint required by today's applications. Implementation of a wireless receiver has been tested on various architectures generated by our tool and has shown the efficiency of our methodology applied to reconfigurable systems .
We worked on the problems of the static optimization of area and reconfiguration time for communication networks of regular 2D reconfigurable processor array architectures. To solve these problems (a) jointly and (b) not for a single, but for a whole set of algorithms, a unique constraint programming approach has been applied. At the beginning we have introduced an abstract model for the minimization of the number of multiplexers , . This model is limited and covers only mono-casting data transfers. In the next step we have proposed a new optimized formulation that makes it possible to support multi-casting data transfers. Moreover, we have defined new cost functions that make it possible minimization of other communication network parameters, such as area as well as parallel and sequential reconfiguration time. The correctness of our approach was illustrated by applying our methodology to a concrete architecture, namely weakly programmable processor array(WPPA) developed at University of Erlangen-Nuremberg. This architecture belongs to a class of computer architectures that consists of an array of processing elements with reconfigurable interconnections and limited programming possibilities.
To ensure efficient execution of applications into SoC architectures, designers include heterogeneous execution resources in the same chip (e.g. processors, reconfigurable architectures, dedicated blocks). The management of the overall platform (including hardware support and tasks) is thus supported by an operating system (OS). With the introduction of flexible/reconfigurable resources in a SoC, some OS services have to be adapted. For example, we can cite two specific services which are strongly impacted by the presence of reconfiguration into the system. The first one is the task scheduling and allocation which has to take account of the availability of reconfigurable resources, and to allocate tasks on these resources. The classical temporal scheduling problem is then extended with a spatial dimension in order to manage the physical available area into the reconfigurable resource. The second impacted services is the task communication management. The on-line task placement makes the interconnection support difficult to predict. Then, a flexible and dynamically interconnect medium must be defined.
In order to evaluate the impact of reconfigurable architecture on OS services, we have first defined an UML model of the complete environment in the context of the OverSoC project. In this project, we have proposed the model of the reconfigurable part of the system . This work leads to a new collaboration with the Triskell team from IRISA. This collaborative work aims at defining a meta-model of reconfigurable hardware in order to take advantage of the raise of abstraction.
Concerning the scheduling service, we defined a first Artificial Neural Networks (ANN) to ensure spatial and temporal placement of tasks within a heterogeneous multi-processor SoC . This year, we have extended our first ANN proposal to take reconfigurability into account. We have thus defined a new structure, called Reconfigurable ANN (RANN), which allows to substantially reduce the number of neurons . This model can handle any number of tasks which can be instantiated on the resources. A mathematical formulation of this RANN was proposed, and a simulation tool was developed. A correct scheduling is obtained with a small number of iterations and a reduced set of neurons. To complete this study, we have prototyped the hardware implementation of the neural network. Our results show that implementation is very efficient and can be a very good candidate for hardware implementation of this service.
Concerning the interconnection, we are currently working on a specific interconnection architecture. We have proposed structures which are well-suited for state-of-the-art dynamically reconfigurable chips. We defined a first hierarchical interconnect infrastructure and built a VHDL implementation of this solution. Furthermore, to evaluate our architectural proposal, we have defined a demonstrator platform which allows us to illustrate the reconfiguration concept of this particular functionality.
Interconnects are now considered as the bottleneck in the design of system-on-chip (SoC) since they introduce delay and power consumption. To deal with this issue, data coding for interconnect power and timing optimization has been introduced. In today's SoCs these techniques are not efficient anymore due to their codec complexity or to their unrealistic experimentations. Based on some realistic observations on interconnect delay and power estimation , , the spatial switching technique is proposed and has been patented . It allows the reduction of delay and power consumption (including extra power consumption due to codecs) for on-chip buses. The concept of the technique is to detect all cross-transitions on adjacent wires and to decide if the adjacent wires are exchanged or not. Results show the spatial switching efficiency for different technologies and bus lengths. The power consumption reduction can reach up to 15% for a 5-mm bus and more if buses are longer and for future CMOS technologies.
This research work aims at developing ultra low-power SoC for wireless sensor nodes, as an alternative to existing approaches based low-power microcontrollers such as the Texas Instrument MSP430. The proposed approach tries to reduce the power consumption by using a combination of hardware specialization and power gating techniques. In particular, we use the fact that typical WSN application are generally modelled as a set of small to medium grain tasks that are implemented on low power microcontroller using light weight thread-like OS constructs.
Rather than implementing these tasks in software, we instead propose to map each of these tasks to their own specialized hardware structures that we call a hardware task. Such an hardware task consists of a minimalistic (and customized) datapath controlled by a finite state machine (FSM). By customizing each of these hardware implementations to their corresponding task, we expect to significantly reduce the dynamic power dissipated by the whole system. Besides, to circumvent the increase in static power caused by the possibly numerous hardware tasks implemented in the chip, we also propose to combine our approach with power gating, so as to supply power to a hardware task only when it needs to be executed. Encouraging preliminary results have been obtained, and the generation of these hardware tasks structures directly from C specification (using the Gecos framework) is now on the way.
CAIRN participates in the SoCLib ANR project (see Section
for more information) whose goal is to build an open platform for modeling and simulation of multiprocessors system-on-chip
(MP-SoC). This year, as part of our participation to this project, we have proposed and developed a simulation model of the Altera interconnect (Avalon bus). This model and its associated
wrappers now allow NIOS
We have also developed a model of the TMS320C62 DSP processor from Texas Instruments. The developed model is in fact an instruction-set simulator of the TMS320C62 processor. It has been validated in the framework of SoCLib simulation platforms at the CABA simulation level.
This year we have extended the previously developed (see section ) UPaK system designed for the automatic selection of application-dependent processor extensions and for the application scheduling on these new architectures. In the context of the project we have considered the architecture model of an ASIP processor with extended instruction sets. Extended instructions implement identified and selected computational patterns and can be executed sequentially or in parallel with the ASIP core processor instructions. This provides ways to trade-off execution time against hardware cost. The processor extensions are composed of heterogeneous cells and registers connected by an interconnection structure with the processor's data-path. The number of registers and the structure of interconnections are application-dependent. Each cell implements one or more patterns selected by the UPaKsystem. The registers store intermediate results that reduce data transfers between our architecture extension and a processor register file.
Our contribution is twofold. We have defined a complete design flow starting from the C specification and resulting in processor extension generation and we developed a new scheduling model based on the pattern matching principle using the constraint programming approach , , . We have also started work on the pattern generation problem. We have developed a prototype of the new pattern generator. Currently it supports additional constraints such as: the critical path lengths, number of inputs and outputs, etc.. The applied technique allows the selection of maximal sub-patterns of patterns satisfying all imposed architectural constraints. We have continued to work on an optimized synthesis of automatically identified computational patterns in order to synthesize corresponding run-time reconfigurable cells.
This year we have continued to work on the modeling problem of the run-time partially reconfigurable architecture in order to optimize the execution time and power consumption of the application. The architecture has been defined in the ROMA ANR project. The architecture is parametric, and is composed of memories, a restricted number of communication switches and run time reconfigurable cells at the functional level. In the context of this project a compilation flow has been defined.
The loop optimization is a well-known problem. It has been shown that the polyhedral model provides a convenient abstraction in order to perform some program transformations. Its intuitive geometric interpretation facilitates unimodular transformations such as skewing or projection. Moreover, the polyhedral model allows the execution of different operations such as intersection, difference, etc. on the sets of the geometric forms.
In the context of the ROMA project a new representation has been introduced in order to take the advantages of the polyhedral model and of the HCDG graph. In our new intermediate
representation, new guards were added. They are associated with polyhedron expressions that express the validity conditions of all guarded nodes. Thanks to that we know exactly which nodes
will be computed in a given polyhedral context (before and after transformation). The extended HCDG graph is constructed after the inter iteration data dependency analysis as described in
. We assume that the analyzed programs contain only
FORloops and that their bounds are defined by linear equations which contain the parameters and the loops indexes. This enhanced HCDG graph is currently generated from
Csource code using the
Gecosenvironment.
The traditional approach to design a fixed-point system is based on the worst-case principle. For example, for a digital communication receiver, the maximal performances and the maximal input dynamic are retained and the more constraint transmission channel is considered. Nevertheless, the noise and the signal levels evolve during time. Moreover, the data rate depends on the service (video, image, speech) used by the terminal and the required performances (bit error rate) are linked to the service. These various elements show that the fixed-point specification depends on external elements (noise level, input signal dynamic range, quality of service) and can be adapted during time to reduce the average power consumption.
An approach in which the fixed-point specification is adapted dynamically according to the input receiver SNR (Signal-to-Noise Ratio) has been proposed. This concept is called Dynamic Precision Scaling (DPS). To adapt the fixed-point specification during time, the architecture integrates flexible operators as presented in Section . Our approach interest has been shown on a WCDMA (Wide-band Code Division Multiple Access) receiver example. The WCDMA receiver is made-up of two main parts corresponding to rake-receiver and the searcher. For the rake receiver which decodes the transmitted symbols, the performances are evaluated through the bit error rate (BER). By applying the dynamic precision scaling, up to 40% of consumed energy can be saved compared to an implementation based to the worst case analysis. For the searcher, the performances are evaluated through the mis-detection and false-alarm probabilities. The results show that the DPS approach allows reducing up to 25 % the energy consumption .
A collaboration with Imec (Interuniversitair Micro-Electronika Centrum), Belgium, has started in 2008 for scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR). The aim is to apply our analytical approach to evaluate the quantization noise power on the SSFE (Selective Spanning for Fast Enumeration) algorithm. This algorithm is a near-Maximum Likelihood MIMO detector. Moreover, this algorithm includes decisions operators. So, another aim is to extend our analytical model to this type of operator to treat a complete signal processing algorithm.
To obtain this application analytical expression, the back-end of the accuracy evaluation module of the FloatToFix tool is used. The user gives the transfer function between all noise sources and the application output, and the output noise power expression is automatically computed. Thus, the noise power value is obtained applying this expression to the quantization noise statistics, given by the different fixed-point data formats.
A framework to optimize the implementation of linear time invariant filters or controllers in fixed-point architectures has been defined . The digital implementation leads to a numerical degradation of the controller performances due to the quantization of the involved coefficients (parametric errors) and the roundoff noises (numerical noises) in the numerical computations. The application is described with an algebraic form. Previous works have been extended to carry-out the operator finite word-length (FWL) optimization process. The cost function corresponding to surface or power consumption has been developed. Two implementation schemes corresponding to Roundoff After Multiplicationand Roundoff before Multiplicationhave been proposed. From the definition of the filter or controller (i.e. the transfer function), it is possible to choose multiple possible structure of realization (state-space, delta-operator, rho-operator, etc.), find the optimal one (according to one or some FWL measures) and generate the equivalent C, MATLAB or VHDL fixed-point code. The FWR Toolbox (for Matlab) was built to achieve this 'optimal' fixed-point implementation
Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antenna power allocation strategies can be performed thanks to the joined optimization of linear precoder (at the transmitter) and decoder (at the receiver). A new exact solution of the maximization of the minimum Euclidean distance between received symbols has been proposed for two 16-QAM modulated symbols. This precoder shows an important enhancement of this minimum distance compared to diagonal precoders which leads to a significant BER improvement. This new strategy selects the best precoding matrix among eight different expressions, depending on the value of the channel angle. In order to decrease the complexity, other sets of precoders have been proposed and the performances of the simplest one, composed of only two different precoders, remain very close to the optimal in terms of BER.
LDPC codes are a class of error-correcting code introduced by Gallager with an iterative probability-based decoding algorithm. Their performances combined with their relatively simple decoding algorithm make these codes very attractive for the next satellite and radio digital transmission system generations. LDPC codes were chosen in DVB-S2, 802.11n, 802.16e and 802.3an standards. The major problem is the huge design space composed of many interrelated parameters which enforces drastic design trade-offs. Another important issue is the need for flexibility of the hardware solutions which have to be able to support all the declinations of a given standard.
Previously we have defined a generic architecture template that is composed of several processing modules and a set of interconnection buses for inter-module communications. Each module includes two processing units (called bitnodeand checknodeprocessing units), and a set of memory banks. The number of modules, the number of interconnection buses, the size and the number of memory banks is standard dependent. LDPC decoding algorithm rests on an appropriate distribution of the block of input data in the different memory banks and on a scheduling of the computation obtained using constraints programming-based optimization tools. This year we have concentrated on the modeling of our parametric architecture at CABA level using the SoCLib platform and on its implementation using the FPGA platform. Different versions of the proposed LDPC decoder were realized , .
Since the wireless nodes are physically separated in cooperative MIMO systems, the imperfect time synchronization between cooperative nodes clocks leads to an unsynchronized MIMO transmission. The effect of this unsynchronization is that inter-symbol interference (ISI) appears and the space-time sequences from different nodes are no longer orthogonal. At the reception side, each cooperative node has to forward its received signal through a wireless channel to the destination node for space-time signal combination which leads to additional noise in the final received signal. Consequently, the cooperative transmission synchronization error and the cooperative reception additional noise lead to a performance degradation and affect the energy efficiency advantage of cooperative MIMO system over SISO system . For small range of transmission synchronization error, the performance degradation is negligible and the cooperative MIMO system performance is rather tolerant. However, for large range of error, the performance decreases quickly and the degradation is significant. A new efficient space time combination technique based on a low complexity algorithm has been proposed for cooperative MIMO system in the presence of transmission synchronization error. The new technique principle performs a multiple sampling process and a signal combination from different sampled sequences to reconstruct the orthogonality of the transmission space-time sequences .
The CAPTIV (Cooperative strAtegies for low Power wireless Transmissions between Infrastuctures and Vehicles) project aims at using new radio communications technologies in order to enhance drivers security. In a cooperative network composed of vehicles and road signs equipped with autonomous radio transmitters, the communications can be optimized at different levels. It was shown that spacetime codes allow to dramatically decrease the energy consumption of communications between crossroads. In order to both elaborate CAPTIV application program and evaluate the driver behaviour in front of this new kind of information, a specific driving simulator was designed, based on the ECA-FAROS platform. A real prototype has already been evaluated and proves the feasibility of CAPTIV application, and it will be soon optimized thanks to signal processing techniques. If the main goal remains driving assistance, many applications could be implemented on this platform and it will be able to deliver any kind of information (meteo, parking, tourist information, advertisement etc.) .
The dynamic feature of security systems is – through anti-intrusion mechanisms (filtering at different levels: packet, connection, and application levels) evolving according to modes and levels of protection–, to our knowledge, a challenge out of reach of classical technologies based on general purpose or network processors. The requirements of security in high-speed networks (from 10 to 40 Gigabit/s) impose the implementation of the filtering rules in appropriate hardware structures. It is a matter of being able to manage a large variety of complex treatments, and also to guarantee the quality of service. Only dedicated solutions could solve the bottleneck related to the implementation complexity today, at the price of an obvious lack of flexibility and a total absence of evolution.
The aim of our research is the design of specialized hardware systems for filtering of the network traffic at high-speed. We have proposed a new high performance hardware implementation of a string matching engine based on a multi-character variant of the well-known Aho-Corasick algorithm. The proposed architecture is well suited to modern FPGAs. It allows the efficient usage of FPGA's logic and memory resources . Our architecture is optimized to execute string matching in the case of tens of thousands of strings like the ones in intrusion prevention or intrusion detection systems. The proposed design has been validated through the implementation of a search engine on Altera Stratix II FPGA component in the case of a subset of rules in the Snort intrusion detection system. By applying the traffic parallelization and retiming techniques, it was shown that 40 Gbit/s traffic content scanning can be sustained . In comparison with other existing architectures a significant increase in performances has been obtained.
The objective of a random number generator (RNG) is to produce random binary numbers which are statistically independent, uniformly distributed and unpredictable. RNGs are necessary in many applications and the number of embedded hardware architectures requiring RNGs is continuously increasing. Generally, a hybrid RNG comprising a True Random Number Generator (TRNG) and a Pseudo Random Number Generator (PRNG) is used. PRNGs are based on deterministic algorithms. They are periodic, and must be initialized by a TRNG. TRNGs are based on a physical noise source (e.g. thermal noise or free running jitter oscillators) and depend strongly on their implementation quality. Most of the TRNGs implemented in FPGA or ASIC use phase jitter produced by a free running oscillator or a Phase-Locked Loop (PLL) . In practice, jitter can be influenced by noise external to the FPGA (power supply noise, temperature) and by chip activity. This dependence is a weakness exploitable by exposing the TRNG in hostile environment conditions .
In cryptography, security is usually based on the randomness quality of a key generated by an RNG. Some PRNGs are recognized to produce high quality random numbers . However, their quality depend on TRNG seed randomness. PRNG randomness evaluation is usually performed by using a battery of statistical tests. Several such batteries are reported in the literature including Diehard and NIST batteries. They are all implemented using high-level software programming. When an PRNG is evaluated, designers put a huge bit stream into memory and then submit it to software tests. If the bit stream successfully passes a certain number of statistical tests, the PRNG is said to be sufficiently random. TRNG validation is more complicated as their behavior depends on their construction, on external environments and essentially on a physical noise source which can differ in practice from an ideal noise. However, has described a methodology to evaluate physical generators. The procedure is based on TRNG construction and is the technical reference of the AIS 31 . TRNG weaknesses and external attacks must be prevented on real-time to inhibit TRNG output , and a solution is to monitor the TRNG at switch on and during operations by using statistical tests , .
During this year, the possibility to implement the AIS 31 statistical tests in hardware has been studied. Then, the tests have been implemented into ASIC and FPGA targets. Hardware cost
shows that the design can be used into low-cost embedded cryptography circuits. Moreover, the test data-rate allows to monitor TRNG in real-time. Finally, the TRNG monitoring interest has
been demonstrated on currently TRNGs. However, using only the AIS 31 statistical tests to control the TRNG quality is not only sufficiently. Consequently, we also worked on a methodology to
evaluate the randomness of TRNG based on free running oscillators. In these TRNGs, oscillator jitter is the physical noise source. We have studied the possibility of making an on-chip jitter
measurement. As a result, jitter is evaluated in real time in order to test if its quantity and quality is in adequation with the TRNG design hypothesis. The on-chip measurement circuit has
been implemented in ASIC and FPGA circuits. Finally, this year was concluded by the realization of an integrated circuit prototype (OCHRE) including our architecture proposal for RNG. The
chip is in 130
nmCMOS technology and is composed of a TRNG, a PRNG and some hardware statistical tests. The tests monitors the TRNG quality in real time to validate the PRNG seed randomness.
The GEODES (Global Energy Optimisation for Distributed Embedded Systems) project will provide design techniques, embedded software and accompanying tools needed to face the challenge of allowing long power-autonomy of features rich and connected embedded systems, which are becoming pervasive and whose usage is significantly rising. It approaches this challenge by considering all systems levels, and notably emphasises the distributed system view. GEODES is an ITEA2 project which involves partners from France, Austria, Italia and Netherland: Thales (FR, IT, NL), CyberFab (FR), CNRS (LEAT and IRISA) (FR), CETMEF/MARTEC (FR), Infineon (AU), Thomson (FR), TUV (AU), UAQ (IT), Phillips (NL), Organo (AU), TI-WMC (NL).
In various application domains, emerging requirements lead to the definition of new architectures for electronic embedded systems. In the automotive context, investigated solutions correspond to network of processing elements, distributed in the vehicle. In this context, the research activity considered in the CIFAER (Flexible Intra-Vehicule Communications and Embedded Reconfigurable Architectures) project is the definition of an innovative embedded architecture, based on general purpose processor with reconfigurable processing areas and on the use of adaptable interfaces (radio and powerline communications). Efficient software layers in the associated operating system will be investigated to enable new services as dynamic reconfiguration and task migration for error tolerance. CIFAER involves Irisa, IETR Rennes, Ireena Nantes, Atmel and Geensys.
The Fosfor (Flexible Operating System FOr Reconfigurable platform) project aims at reconsidering the structure of the RTOS which is generally implemented in software, centralized, and static, by proposing a distributed RTOS with homogeneous interface from the application point of view. We propose to exploit dynamic and partial reconfiguration of the reconfigurable SoC as well as the deployment of the tasks statically or dynamically on software processing units (general processors) or hardware units (reconfigurable areas). Flexibility of the OS will be achieved thanks to virtualization mechanisms of OS services, such that the tasks of the application are executed and communicate without prior knowledge of their assignment to software or hardware. FOSFOR involves Irisa, LEAT Nice, ETIS Cergy, Xilinx and Thales.
The Semim@ge project (http://semimage.enstb.org/) aims at helping prototyping customized audio and video content repurposing applications for professional broadcasters. The proposed approach consists in providing an automated and generic workflow targeted at content repurposing and multimedia data indexing (through the automatic generation of structural and semantic metadata). Within this workflow, the CAIRN and TexMex groups are involved in video stream macro-structuration, which consists in finding similarities (identical short scenes such as advertisements) within a video stream. This macro-restructuration relies on an exhaustive comparison of the descriptors associated to each single image of the video stream, which turns to be very demanding in terms of computing power (a few month on a standard PC workstation). The contribution of the CAIRN project lies in the design and evaluation of FPGA based hardware accelerators for the self-similarity finding stage. A retargetable architectural model has been designed and implemented on two high performance reconfigurable accelerator platforms, the ReMIX machine designed by the IRISA Symbiose group, and a COTS reconfigurable platform by SGI (RASC-100). The implementation is functionnal and speed-ups ranging from 16 to 40 have been observed on real-life data-set.
The aim of SocLib (An Open Modeling and Simulation Platform for System-on-Chip Design) is to build an open platform for modeling and simulation of multi-processors system-on-chip, that can
be used by both universities and industrial companies. The core of the platform is a library of simulation models for virtual components (IP cores), with a guaranteed path to silicon. The main
concern of the SocLib project is a true interoperability between the IP cores: all SocLib components are written in SystemC and respect the VCI (Virtual Component Interface standard)
communication protocol. CABA (cycle-accurate and bit-accurate) and TLMT (transaction level model with time) simulation models are proposed. See
http://
The aim of the Spring (Shelf Proof Random Integrated Number Generator) project is the design of high-performance and high-rate Quasi-True Random Number Generators. Randomness comes from the random jitter of clock generators in last FPGAs or SoCs. The main contribution is the capacity of the system to measure in real-time the jitter and to characterize the randomness quality using hardware accelerated statistic tests. Spring involves a close collaboration with SmartQuantum, a start-up company developing systems for quantum cryptography.
The TransMedi@ project addresses the issue of video transcoding, and more generally media processing, with very-high performance for network infrastructures and high quality for broadcast equipments. The aim of Transmedi@ is to propose flexible reconfigurable co-processing architectures for the acceleration of video algorithms. In the context of network infrastructure, the platform has to be able to transcode in real time several video streams from various video formats and norms, while in the context of broadcast the main constraint comes from the high-quality (HD) of the video. Cairn is involved in the definition of this platform and will propose innovating structures for reconfigurable coarse-grain processing and data transfer and storage, in this context of video processing. TransMedi@ involves a close collaboration with Alcatel, Envivio, Telecom Bretagne and IETR/Supelec.
The main goal of the ANR SVP, ( SurVeiller et Prévenir) project is to study, to experiment and to realize an ambient integrated architectural framework dedicated to the design and to the deployment of services into a dynamic sensor network. The proposed framework consists in designing a system architecture that meets the objective of ease of use or convenience while also taking into account and adapting all specific characteristics of wireless sensor nodes like drastic resource constraints. Since we are convinced that only technologies are not enough to spread and promote advanced researches we insist on the societal aspects of the project by also taking into account the final user. The second main objective of the SVP project is to deploy real applications in situ in order to adapt the technology available on the shelf to the reality. The application consists in deploying a sensor network that will record the physical activity of school children in order to study and prevent childhood obesity.
ROMA (
http://
The CAPTIV (Cooperative strAtegies for low Power wireless Transmissions between Infrastuctures and Vehicles) project (
http://
OveRSoC is an ANR project which objective is to develop a global exploration methodology to evaluate and validate the interactions between an embedded RTOS and a Reconfigurable SoC (RSoc) platform. The OveRSoC project aims also at furnishing SoC designers with a framework for choosing the right RTOS services architecture according to a particular RSoC platform.
The Cairnteam has currently some collaboration with the following laboratories: CEA List, SATIE ENS Cachan, LEAT Nice, Lab-Sticc (Lorient, Brest), ETIS Cergy, LIP6 Paris, IETR Rennes, Ireena Nantes; and with the following INRIA project-teams: Pops, Arenaire, Ares, Compsys, Espresso, Symbiose, TexMex.
The team participates to the activities of:
GdR SOC-SIP (
System On Chip - System In Package), working groups on reconfigurable architectures, embedded software for SoC, low power issues. See
http://
GdR-PRC ISIS (
Information Signal ImageS), working group on
Algorithms Architectures Adequation. See
http://
GdR ASR ( Architectures Systèmes et Réseaux, formerly GDR ARP)
Efficient and robust signal processing: optimization of digital filter synthesis in fixed-point and floating-point arithmetic.
Energy scavenging and power software management in the human environment
CoMap (Co-Design of Massively Parallel Embedded Processor Architectures) is a P2R collaboration with Germany, financed by the french Ministry of foreign affairs and its German counterpart.
The CoMap project deals with the systematic mapping, evaluation, and exploration of massively parallel processor architectures that are designed for special purpose applications in the world
of embedded computers. Comap involves University of Erlangen-Nuremberg, Dresden University of Technology, University of Bretagne Occidentale (Lester), University of Rennes (Irisa/
Cairn), Telecom Bretagne. The investigated class of computer architectures can be described by massively parallel networked processing elements that
are implemented on a single chip (MPSoC).
Cairncontributions is on a flexible and dynamically reconfigurable interconnection network. See
https://
The Cairnteam members are involved in close international cooperations with the following laboratories and universities:
Imec (Belgium) on scenario-based fixed-point data format refinement to enable energy-scalable of Software Defined Radios (SDR);
the University of Erlangen-Nuremberg and Dresden University of Technology (Germany) on massively parallel embedded reconfigurable architectures and on dynamic reconfiguration optimisation in the mesh fabric;
Lund University (Sweden) on constraints programming approach application in the reconfigurable data-paths synthesis flow;
the Computer Vision and Robotic Group of the Institute for Informatics and Applications at the University of Girona (Spain) on parallel architectures for vision algorithms applied to underwater robot;
University of Eindhoven (Netherlands) on reconfigurable data-path synthesis;
University of Leiden (Netherlands) on parallel architecture synthesis;
Cranfield University (UK) on optimal finite-word-length and finite precision controller implementations and low-complexity controllers.
The Cairnteam members are involved in close international cooperations with the following laboratories and universities:
the LRTS laboratory of Laval University in Québec (Canada) on the topic of architectures for MIMO systems, with funds from FFQR. An “Associated Team” between LRTS and Cairnhas been recognized by Inria in november 2006;
the LSSI laboratory of Québec University in Trois-Rivières (Canada), on the design of architectures for digital filters and mobile communications;
ENIT (Tunisia) on the topic of architectures for mobile communications;
the computer science department of the University of Colorado State in Fort-Collins (USA) on loop parallelization;
Los Alamos National Laboratory (USA) on optimised application specific reconfigurable architectures design;
University of Queensland (Australia) on reconfigurable architectures for scientific processing;
the University of California, Riverside (USA), on optimized image processing applications synthesis.
University of Douala, University of Yaoundé and University of Dschang in Cameroun on models and tools for parallelization. This cooperation takes place in the scope of the SARIMA GIS for the development of research laboratories in Mathematics and Computer Science in Africa.
Computer Science department of the University of Colorado State in Fort-Collins on the development of high-level synthesis tools.
Daniel Massicotte (Québec University, Canada) for one month in june and july.
Sébastien Roy (Laval University, Canada) for 2 weeks in july and december.
Michel Thériault (Laval University, Canada) from september 2007 for 8 months.
Stanislaw Pietsrak (Metz University, France), 2 weeks in may 2008.
David Novo, Imec, Belgium, 2 weeks in june 2008;
F. Charot, O. Sentieys are members of the steering committee of a school for graduate students on embedded systems architectures and associated design tools, organized under the auspices of the CNRS.
D. Chillet is a member of the organisation committee of the Workshop on Design and Architectures for Signal and Image Processing (DASIP) and a program committee member of Majecstic.
S. Pillement is a member of the Program Committee of IEEE FPL, SPL, DTIS and ERSA.
P. Quinton is member of the steering committee of the System Architecture MOdelling and Simulation (SAMOS) workshop. P. Quinton is member of the scientific committee of ASAP.
O. Sentieys is a member of the steering committee of the SOC-SIP Expert Group at the CNRS and of the GDR SOC-SIP. He is the chair of the IEEE Circuits and Systems (CAS) French Chapter.
O. Sentieys was a member of the French National University Council from 2000 to 2007 (Conseil National des Universités en section 61).
O. Sentieys was a member of technical program committee of the following conferences: IEEE DDECS, IEEE ISQED, IEEE VTC, IEEE DDECS, DCIS, DTIS, SBCCI, FTFC, GRETSI, Sympa. He is on the editorial board of Journal of Low Power Electronics, American Scientific Publishers.
C. Wolinski was a member of technical committee of the following conferences: IEEE/ACM DATE, IEEE FPL, Euromicro DSD, IEEE ISQED, SympA. He is a member of Board of Directors of Euromicro Society.
Georges Adouko, High-Rate Filtering for Network Security Based on Reconfigurable Components
Andrei Banciu, New Digital Design Methodology for Multi Giga bits/s Tranceivers
Antoine Courtay, High-Level Power Estimation and Architectural Optimization for On-Chip Interconnection Networks
Antoine Eiche, Real time scheduling for heterogeneous and reconfigurable architectures using neural network structures
Ludovic Devaux, Flexible interconnect infrastructure for dynamically reconfigurable architecture
Erwan Grace, Memory-Oriented Reconfigurable Embedded Architecture
Julien Lallet, Reconfigurable Processors: Towards the Definition of a Generic Platform Model
Shafqat Khan, Flexible Operators with Sub-Word Parallelism for Multimedia Applications
Kevin Martin, Extended Instruction-Set Generation for Processors Embedded in an FPGA
Quoc-Tuong Ngo, Optimization of precoding strategies for multi-user MIMO-OFDM systems
Michel Theriault, Transmit Beam-forming for Distributed Wireless Access with Centralized Signal Processing
Hai-Nam Nguyen, Dynamic Precision Scaling for Mobile Communications
Tuan Duc Nguyen, Cooperative and Relay Techniques for Energy Efficient WIreless Sensor Networks
Cécile Palud, Reconfigurable Architecture for High-Performance Video Transcoding
Karthick Parashar, System-level Approach for Implementation and Optimization of Signal Processing Applications into Fixed-Point Architectures
Adeel Pasha, A Reconfigurable SoC with Very Low Energy Consumption Adapted for the Domain of Wireless Communicating Objects
Manh Pham, Embedded Computing Architecture with Dynamic Hardware Reconfiguration for Intelligent Automotive Systems
Erwan Raffin, Run-time Reconfigurable Systems: Compilation and Synthesis Aspects
Renaud Santoro, High-Rate True Random Number Generators with Embedded Self-Test
O. Berder gave a talk on“Effects of desynchronization of radio transmitter on the performances of cooperative MIMO systems” at the workshop “Cooperative Approaches in Wireless Sensor Networks” of the GDR ISIS, Telecom Paris, in june 2008.
T. Hilaire gave a lecture at University of Hiroshima, Electronic Control Lab. on Optimal fixed-point implementation of Signal Processing Algorithms in june 2008.
F. Charot presented the Roma and SoCLib projects at “la semaine de l'innovation en Bretagne” in june 2008.
F. Charot presented the SoCLib project at ICT 2008 exhibition in november 2008.
The Roma project has been presented at the “Colloque STIC”, nov 5-7, 2007, Paris, parc de La Villette, France.
There is a strong teaching activity in the Cairnteam since most of the permanent members are Professors or Associate Professors.
P. Quinton is the deputy-director of Ecole Normale Supérieure de Cachan, responsible of the Brittany branch of this school.
P. Scalart is the Head of the Electronics Engineering department of Enssat.
O. Sentieys is responsible of the ”Embedded Systems” branch of the SISEA Master of Research (M2R).
C. Wolinski is responsible for Computer Organization and Architecture branch of ifsicand diic.
P. Quinton, L. Perraudeau, S. Pillement, D. Chillet and C. Wolinski serve in the hiring committee of University of Rennes 1.
S. Pillement serves in the hiring committee of University of Cergy.
O. Sentieys serves in the hiring committee of INSA Rennes and UBS Lorient.
O. Berder's main teaching activities at Enssatare signal processing, microprocessor architecture, and wireless communications. He also teaches signal processingat IUT Lannion and mobile communicationsat ENI Gabès, Tunisia.
D. Chillet teaches a course on advanced processors architecturesin M2R/ Enssatand on Low-power digital CMOS circuitsat Telecom Bretagne.
E. Casseau's main teaching activities are signal processingand hardware description language. He also teaches Soc design methodologiesat Telecom Bretagne Engineering school and Hardware design languagein Master Microelectronics System Design and Technology at ENSICAEN.
S. Derrien teaches at ifsic(Licence, Master, diic).
S. Pillement teaches at IUT Lannion. He also teaches a course on Network on Chip in the Master SIC at ENI Sousse, Tunsia.
R. Rocher teaches at IUT Lannion. P. Quinton teaches at ENS Cachan, ifsicand M2R.
P. Scalart teaches courses on signal processing at Enssat.
O. Sentieys teaches at Enssatand M2R where he gives courses on Methodologies for integrated system designand signal processing. He also teaches Digital IC: from synthesis to implementationin the Master Microelectronics System Design and Technology at ENSICAEN.
C.Wolinski is responsible for the following courses: CSE "Design of Embedded Systems" ( diic), SIA "Signal, Image, Architectures" ( diic), XAA "Advanced Architectures" (ENSC).
Enssatstands for ”Ecole Nationale Supérieure des Sciences Appliquées et de Technologie” and is an ”Ecole d'Ingénieurs” of the University of Rennes 1, located in Lannion.
Ifsic stands for ”Institut de Formation Supérieure en Informatique et Communication”.
DIIC stands for ” Diplôme d'Ingénieur en Informatique et Communication” and is an ”Ecole d'Ingénieurs” of the University of Rennes 1, located in Rennes.
M2R stands for Master of Research, second year.