Section: New Results
Algorithm Architecture Interaction
Flexible hardware accelerators for biocomputing applications
Participants : Steven Derrien, Naeem Abbas, Patrice Quinton.
It is widely acknowledged that FPGA-based hardware acceleration of compute intensive bioinformatics applications can be a viable alternative to cluster (or grid) based approach as they offer very interesting MIPS/watt figure of merits. One of the issues with this technology is that it remains somewhat difficult to use and to maintain (one is rather designing a circuit rather than programming a machine).
Even though there exists C-to-hardware compilation tools (Catapult-C, Impulse-C, etc.), a common belief is that they do not generally offer good enough performance to justify the use of such reconfigurable technology. As a matter of fact, successful hardware implementations of bio-computing algorithms are manually designed at RTL level and are usually targeted to a specific system, with little if any performance portability among reconfigurable platforms.
This research work, which is part of the ANR BioWic project, aims at providing a framework for helping semi-automatic generation of high-performance hardware accelerators. In particular we expect to widen the scope of common design constraints by focusing on system-level criterions that involve both the host machine and the accelerator (workload balancing, communications and data reuse optimisations, hardware utilization rate, etc.). This research work builds upon the Cairn research group expertise on automatic parallelization for application specific hardware accelerators and has been targeting mainstream bioinformatics applications (HMMER, ClustalW and BLAST).
Our work in 2011 extended the experiment results obtained in 2010 and led to the submission of a paper to IEEE Trans. in Parallel and Distributed Computing (the article being in revision). We also investigated another case study based on a more classical sequence comparison algorithm for which we investigated different style of architectural partitioning. This work led to a paper published in the proceedings of the ARC International Symposium [43] .
Range Estimation and Computation Accuracy Optimization
Participants : Daniel Menard, Karthick Parashar, Olivier Sentieys, Romuald Rocher, Pascal Scalart, Aymen Chakhari, Jean-Charles Naud, Emmanuel Casseau, Andrei Banciu.
Range Estimation
Efficient range estimation methods are required to optimize the integer part word-length. Our previous works based on the Karhunen-Loève Expansion (KLE) have been extended in [38] . The impulse response between the input and a variable is used to propagate the KLE parameters of the inputs. Range estimation has proven to be a difficult problem for non-linear operations especially when the input data is correlated. A stochastic approach can significantly improve the results compared to the classical methods like the interval and affine arithmetic. The aim is to obtain tight intervals by adapting the bounds to a desired probability of overflows. An approach for the analysis of range uncertainties based on the Polynomial Chaos Expansion (PCE) has been developed. The PCE representation is obtained for every input variable and an analytical description of the variability of the output is determined. Furthermore, the correlation of the inputs is captured using the Nataf transform. The range is computed using a probabilistic analysis from the probability density function (PDF).
Accuracy and performance evaluation
The automation of fixed-point conversion requires generic methods to study accuracy degradation. In [51] , [73] a new approach using analytical noise power propagation considering conditional structures. These structures are generated from programming language statements such as if-then-else or Switch. The proposed model takes into account two key points in fixed-point design: first, an alternative processing of noise depending on the condition; second, decision errors generated by quantization noise affecting the condition. This method is integrated in the fixed-point conversion process and uses path probabilities of execution alternatives obtained from profiling. This work extends existing analytical approaches for fixed-point conversion. Experimentations of our analytical method show that it has a fairly accurate noise power estimation compared to the real accuracy degradation. An analytical approach is studied to determine accuracy of systems including unsmooth operators. An unsmooth operator represents a function which is not derivable in all its definition interval (for example the sign operator). The classical model is no valid yet since these operators introduce errors that do not respect the Widrow assumption (their values are often higher than signal power). So an approach based on the distribution of the signal and the noise is proposed. It is applied to the sphere decoding algorithm. We also focus on recursive structure where an error influences future decision. So, the Decision Feedback Equalizer is also considered.
Reconfigurable Video Coding
Participants : Emmanuel Casseau, Olivier Sentieys, Arnaud Carer, Cécile Beaumin, Hervé Yviquel.
In the field of multimedia coding, standardization recommendations are always evolving. To reduce design time, Reconfigurable Video Coding (RVC) standard allows defining new codec algorithms based on a modular library of components. RVC dataflow-based specification formalism expressly targets multiprocessors platforms. However software processor cannot cope with high performance and low power requirements. Hence the mapping of RVC specifications on hardware accelerators is investigated in this work, as well as the scheduling of the functional units (FU) of the specification. Dataflow programming, such as RVC applications, express explicit parallelism within an application. Although multi-core processors are now available everywhere, few applications are able to truly exploit their multiprocessing capabilities. We describe in [69] a scheduling strategy for executing a dataflow program on multi-core architectures using distributed schedulers and lock-free communications. Actually, our goal is to design an RVC-dedicated reconfigurable architecture with various resources. Our previous results lead to the definition of a reconfigurable FIFO for optimizing cost and performance of RVC dataflow specifications by taking advantage of their dynamic behavior. We are currently working with Mickael Raulet from IETR INSA Rennes and Dr. Jani Boutellier from the university of Oulu (Finland), concerning the execution of an RVC decoder on a network of Transport Triggered Architecture (TTA) processors (proposed by the Tampere University of Technology). Thanks to its modular structure, TTA can be seen as a nice kind of CPU design to develop Application-Specific Processor. TTA processor network is connected by hardware channels so it has many similarities with RVC network. Hervé Yviquel, is expected to have a 4-month stay in 2012 in TUT to provide a functional automated flow to design TTA-based platform and compile RVC application for this platform.
Multi-Antenna Systems
Participants : Olivier Berder, Pascal Scalart, Quoc-Tuong Ngo.
Considering the possibility for the transmitter to get some Channel State Information (CSI) from the receiver, antenna power allocation strategies can be performed thanks to the joined optimization of linear precoder (at the transmitter) and decoder (at the receiver) according to various criteria.
A new exact solution of the maximization of the minimum Euclidean distance between received symbols has been proposed for two 16-QAM modulated symbols. This precoder shows an important enhancement of this minimum distance compared to diagonal precoders, which leads to a significant BER performance improvement. This new strategy selects the best precoding matrix among eight different expressions, depending on the value of the channel angle. Selecting only two of these expressions, this precoder was then generalized to any rectangular QAM modulation [26] .
Not only the minimum Euclidean distance but also the number of neighbors providing it has an important role in reducing the error probability when a Maximum Likelihood detection is considered at the receiver. Aiming at reducing this number of neighbors, a new precoder in which the rotation parameter has no influence is proposed for two independent data streams transmitted. The expression of the new precoding strategy is less complex and the space of solution is, therefore, smaller [53] , [74] . In the paper [52] , we proposed the general neighbor-dmin precoder for three independent data-streams and the simulation results also confirm a significant bit-error-rate improvement of the new precoder in comparison with other traditional precoding strategies.
Cooperative Strategies for Low-Energy Wireless Networks
Participants : Olivier Berder, Le Quang Vinh Tran, Olivier Sentieys.
During the last decade, many works were devoted to improving the performance of relaying techniques in ad hoc networks. One promising approach consists in allowing the relay nodes to cooperate, thus using spatial diversity to increase the capacity of the system. In wireless distributed networks where multiple antennas can not be installed in one wireless node, cooperative relay and cooperative Multi-Input Multi-Output (MIMO) techniques can indeed be used to exploit spatial and temporal diversity gain in order to reduce energy consumption.
Considering a system having a two-antenna source, two one-antenna relays and a one-antenna destination, MIMO simple cooperative relay model (MSCR) and MIMO full cooperative relay model (MFCR) are proposed in comparison with MIMO normal cooperative relay model (MNCR) where the relays forward signals consecutively to destination. The energy efficiency of these models is investigated by using a realistic power consumption model where the parameters are extracted from the characteristics of CC2420, a wireless sensor transceiver widely used and commercially available. For each transmission ranges, the optimal cooperative scheme in terms of energy efficiency is provided by simulation results [65] , [78] .
A fair analytical investigation on these cooperative protocols was also performed. A lower bound for the average symbol error probability (ASEP) of full DSTC cooperative relaying system in a Rayleigh fading environment is provided. In the case when the Signal to Noise Ratio (SNR) of the relay-relay link is much greater than that of the source-relay link, the upper bound on ASEP of this system is also derived. The effect of the distance between the relays shows that the performance does not degrade so much as the distance between relays is lower than a half of the source-destination distance. Moreover, we also show that, when the error synchronization range is lower than 0.5, the impact of the transmission synchronization error of the relay-destination link on the performance is not considerable [64] .
The energy efficiency of cooperative MIMO and relay techniques is also very useful for the Infrastructure to Vehicle (I2V) and Infrastructure to Infrastructure (I2I) communications in Intelligent Transport Systems (ITS) networks where the energy consumption of wireless nodes embedded on road infrastructure is constrained. Applications of cooperation between nodes to ITS networks are proposed and the performance and the energy consumption of cooperative relay and cooperative MIMO are investigated in comparison with the traditional multi-hop technique. The comparison between these cooperative techniques helps us to choose the optimal cooperative strategy in terms of energy consumption for energy constrained road infrastructure networks in ITS applications [27] .
Opportunistic Routing
Participants : Olivier Berder, Olivier Sentieys, Ruifeng Zhang, Jean-Marie Gorce [Insa Lyon, INRIA Swing] .
However, the aforementioned approaches introduce an overhead in terms of information exchange, increasing the complexity of the receivers. A simpler way of exploiting spatial diversity is referred to as opportunistic routing. In this scheme, a cluster of nodes still serves as relay candidates but only a single node in the cluster forwards the packet. This paper proposes a thorough analysis of opportunistic routing efficiency under different realistic radio channel conditions. The study aims at finding the best trade-off between two objectives: energy and latency minimizations, under a hard reliability constraint. We derive an optimal bound, namely, the Pareto front of the related optimization problem, which offers a good insight into the benefits of opportunistic routings compared with classical multi-hop routing schemes [31] . We then provided a closed-form expression of the lower bound of the energy-delay tradeoff and of energy efficiency for different channel models (additive white Gaussian noise, Rayleigh fast fading and Rayleigh block-fading) in a linear network. These analytical results are also verified in 2-dimensional Poisson networks using simulations. The closed-form expression provides a framework to evaluate the energy-delay performance and to optimize the parameters in physical layer, MAC layer and routing layer from the viewpoint of cross-layer design during the planning phase of a network.
Adaptive techniques for WSN power optimization
Participants : Olivier Berder, Daniel Menard, Olivier Sentieys, Mahtab Alam, Trong-Nhan Le.
Wireless sensor networks (WSNs) have obtained a great relevancy in civil as well as military applications such as environment sensing, real-time surveillance and habitat monitoring. It is difficult to design a node that is efficient for all of these different applications. The ideal sensor node would have to dynamically adapt its behavior to various parameters such as the data traffic, the channel conditions, the amount of harvested energy, its battery level, etc. Including the capability to scavenge energy from its environment, the design of an efficient power manager able to address both hardware and software processing seems very promising.
Energy modeling is an important issue for designing and dimensioning low power wireless sensor networks (WSN). In order to help the developers to optimize the energy spent by WSN nodes, a pragmatic and precise hybrid energy model is proposed. This model considers different scenarios that occur during the communication and evaluates their energy consumption based on software profiling as well as the hardware components power profiles. The proposed model is a combination of analytical derivations and real time measurements. These experiments are particularly useful to understand the medium access control (MAC) layer mechanisms, such as wake up or data collisions for the preamble sampling category, and the energy wasted by collisions can be evaluated [18] , [35] .
An adaptive wake-up-interval scheme for preamble sampling MAC protocols for variable traffic in WSN is then proposed. The wake-up-interval is updated based on the traffic status register (whose content depends on the presence of messages for a particular node). The results show that the sensor node adapts and converges its wake-up-interval to the best trade-off value for fixed and variable traffic patterns. Two optimization parameters (length of traffic status register and initial wake-up-interval value) are also tuned to achieve fast convergence speed for different traffic rates and variations.
A wireless body area sensor network (WBASN) demands ultra-low power and energy-efficient protocols. MAC layer plays a pivotal role for energy management in WBASN, moreover, idle listening is the dominant energy waste in most of the MAC protocols. WBASN exhibits wide range of traffic variations based on different physiological data emanating from the monitored patient. In this context, we proposed a novel energy efficient traffic-aware dynamic (TAD) MAC protocol for WBASN [36] . A comparison with other protocols for three different widely used radio chips, i.e. cc2420, cc1000 and amis52100, is presented. The results show that TAD-MAC outperforms all the other protocols under fixed and variable traffic rates.