NETQUEST is an Inria international project-team, located in the Sino-French IT Lab, LIAMA in Beijing, China, and attached to the Rocquencourt Unit.
Networks of independent entities, cooperating to handle global tasks, constitute a fascinating class of systems. They are widely found in nature, with cells exchanging information with their neighbors, or neurons through synapse connections, as well as in social organizations. Such organizations are making their ways in microelectronic systems and might in the near future become ubiquitous. They are made possible by the development of ever smaller and cheaper electronic devices with increased memory capacity and computational power, together with the standardization efforts for both wireless communication and data exchange format.
One of the main barriers today to the development of networks of cooperating devices is the lack of programming abstraction. Smart devices are usually dedicated systems based on ad hoc models, which are not generic enough to support the needs of future applications (flexibility, scalability, ease to maintain, etc.). The deployment of a sensor network for instance is a tedious task which requires an expertise in the underlying OS and hardware. Applications implemented today on top of TinyOS for instance impose to deal with low-level issues such as memory.
The objectives of the Netquest project are to develop solutions that allow to program networks in a declarative manner, by specifying the intended functionalities without having to deal with system aspects, much as in database systems. The separation of a logical level, accessible to users and applications, from the physical layersconstitutes the basic principle of Database Management Systems. It is at the origin of their technological and commercial success. This fundamental contribution of Codd in the design of the relational model of data, has lead to the development of universal high level query languages, that all vendors recognize, as well as to query processing techniques that optimize the declarative queries into (close to) optimal execution plans.
The abstraction we propose relies on a very simple idea: model the network as a database. The network is thus essentially hidden and perceived by each node as a database, with which it interacts through declarative query languages. The communication between devices thus consists of queries and data. Each node should be equipped with a distributed query engine, evaluating all queries whether posed by the node itself or received from other nodes. This approach blurs the traditional distinction between communication and application layers. Both are handled in a uniform fashion using queries evaluated by the query engine.
We consider two levels of abstraction: (i) the global level, where the network can be programmed as a whole, and (ii) the local level, where programs specify the nodes behavior. At the global level, a node can for instance fire a query asking for a route satisfying some properties, without any algorithmic specification. The network will rely on the distributed query engine on each node to evaluate that query, thus resulting in a distributed algorithm. At the local level of abstraction on the other hand, the behavior of the nodes can be specified. Routing protocols for instance are programmed at the local level by specifying the exchange of messages necessary to achieve a task.
"Modeling the network as a database" provides the programming abstraction which is lacking to multihop networks, but it also raises difficult theoretical and practical problems, to which our efforts are devoted. In fact we will concentrate on the following three research directions.
Establish theoretical foundationsfor network query languages: design, distributed complexity, and expressive power.
Implement distributed query enginesto execute queries of the network query languages, with distributed optimization.
Validate the declarative approach through real network problems, such as networking protocols, and distributed network applications.
The year 2008 was devoted to the implementation of the Netquest system, which is being tested with real networking protocols.
The scientific foundations rely firstly on the field of databases, from its theoretical foundations to system issues such as query processing. Distributed algorithms and models of distributed computations are also of fundamental importance. Finally, we also rely on networking protocols such as those used in ad hoc networks as well as in sensor networks.
Logical formalisms, such as first-order logic (FO), fixed-point logic (FP), monadic second-order logic (MSO) etc., allow to express problems in a declarative way. Instead of describing how to compute problems step by step, only the desired results of the computation are specified by logical expressions. The use of declarative query languages based on logical formalisms for data management was largely exploited by Codd in the 1970's in the relational model in which there is a separation between the logical and the physical levels. Since then, the investigation of the theoretical foundations of query languages has been a strong focus of the database theory community.
Two important measures characterize query languages: their expressive power and their complexity. Given a query language, deciding which problem can be expressed in this language characterizes its expressive power. How complex it is to compute the queries in a given query language, characterizes its computational complexity.
The expressive power and the complexity of classical logics have been intensively studied. The expressive power of FO, for instance, has been shown to be rather limited: it can only express local properties, and it lacks the power of counting and recursion . The complexity of FO has also been shown to be quite low: FO formulas can be evaluated in space logarithmic in the size of their input . The parallel complexity of FO has also been considered, it was shown that FO formulas can be evaluated in constant time, independently of the size of their inputs, on Boolean circuits with arbitrary fan-in gates, the well-known AC 0class.
Although classical query languages have been intensively studied in the context of centralized and parallel computation, their distributed computation has attracted only little attention. We first investigate the expressive power of classical query languages for describing distributed data structures, such as spanning trees, routing tables, and dominating sets, etc., and study their distributed complexity. We then propose to introduce new primitives into classical query languages to design proper logical formalisms for multihop networking (global level abstraction), while achieving a nice balance between expressive power and distributed computational complexity.
For the local level of abstraction, declarative rule languages (variants of Datalog) have been used to describe the communication protocols , . They revived the recursive languages developed in the 80's for deductive databases , , , well-suited to define routes in networks, and showed that query languages allow the expression of protocols, one or two orders of magnitude simpler than classical imperative programming languages. We continue this trend to demonstrate the potential of declarative rule languages for the local abstraction level, clarifying their semantics in asynchronous distributed computation, investigating further their expressive power and the complexity of their distributed evaluation.
We consider a distributed computation model based on the message passing model . It relies on a communication graph, in which each node only knows the local neighboring topology, and can only communicate with its one-hop neighbors. Distributed algorithms for graph problems, such as spanning tree, coloring, dominating sets, etc. have been widely investigated and constitute still an active area of research.
In multihop networks, the networking information is fully distributed over the entire network. Moreover, multihop networks are usually dynamic and nodes are usually constrained, with limited CPU, memory, energy, etc. The decentralized evaluation of logical formalisms over multihop networks thus requires high efficiency, scalability, and fault tolerance.
The locality of distributed algorithms is of special interest in this context. An algorithm is local , , if it solves a global problem in distributed time which is either a constant, that is independent of the size of the network, or at least smaller than the diameter of the network. We consider generalization of these definitions to communications bounded but not necessarily local, resulting in the class of frugal computations.
The centralized evaluation of classical logical formalisms for graphs has been intensively investigated . It is well-known that FO can be evaluated in linear-time over bounded degree graphs , and MSO can be evaluated in linear-time over bounded tree-width graphs . These results on the centralized evaluation lead to efficient distributed evaluation algorithms for the query languages used at the global level of abstraction.
The distributed evaluation of declarative rule languages, used at the local level of abstraction, has been considered by several researchers. Abiteboul et al. introduced distributed Datalog (dDatalog), by adding locations to the relations and rules (but not to the tuples), and devised distributed query-by-subquery technique (dQSQ) to evaluate dDatalog . On the other hand, Loo et al. adapted the bottom-up evaluation techniques of Datalog, e.g. semi-naive evaluation and magic set rewriting, to NDlog, another version of distributed Datalog with locations for tuples . Our purpose is to adapt both the bottom-up and top-down evaluation techniques of Datalog queries to the distributed setting, and combine them in a fully decentralized way.
Networking usually implies the construction and maintenance of distribute data structures, such as spanning tree, shortest path, dominating set, etc. In multihop networking, some additional constraints (e.g. mobility, energy efficiency, etc.) make the construction and maintenance of these distributed data structures challenging.
Typical multihop networking protocols include self-configuration and self-organization protocols, such as ASCNET , FISCO , and routing protocols, such as DSDV (Destination-Sequenced Distance Vector Routing) , OLSR (Optimized Link-State Routing) , AODV (Ad hoc On-Demand Distance Vector Routing) , and VRR (Virtual Ring Routing) . The declarative specification of these classical protocols will be compared with their implementation in imperative languages. With similar efficiency, the declarative expressions of the protocols offer considerable advantages for the facility to develop, maintain, and adapt their codes.
Applications of ubiquitous networks are emerging in many areas such as intelligent transportation, games, social networking, sensor networks, ambient intelligence, etc. We have considered widely spatial information systems in the past. Their interaction with networks is of great interest to support queries relating to the ambient space and positioning issues. Distributed in spatial environment of different scale (e.g. building, landscape) sensor networks constitute a promising application to validate the Netquest approach.
The Netquest approach provides a (global-level or local-level) programming abstraction that allows network protocol designers to program their protocols in a declarative way. The Netquest system is responsible for transforming these protocols into low-level code and executing them. The computation of these protocols in Netquest can be simulated and visualized using WSNET, which provides simulations of network environments. Compared to the implementation of protocols in imperative programming languages, the declarative specification can be two order of magnitude shorter . More generally, Netquest offers an environment which simplifies the design of protocols by relying on the DBMS for fundamental aspects such as the transactions.
In the absence of sufficient or sufficiently accurate knowledge, adaptive methods have been developed for query execution, that allow to alternate query processing with query execution phases. This trend, started long ago in the case of system R for cases where the statistics would be misleading or incomplete. This is now a topic of increasing interest with the development of applications running over data distributed over networks. Adaptation is the key challenge for ubiquitous networks. More generally, the capacity to self-assemble, grow, repair, organize, evolve over long period of time while maintaining essential functionalities is of fundamental importance for networks of cooperating objects. The declarative combination of networking and application layers, jointly processed by distributed query engines, offers a huge potential for pervasive adaptation, because the query engine can adapt the queries to the network (adaptive evaluation) and the network to the queries (Quality of service and content based routing). We will try to give evidence of this claim on real applications.
Heterogeneous networks, in which the node architecture, operating system, data format, etc. might vary significantly, pose additional challenges to network management and applications. Netquest offers a high level abstraction which allows to specify an application or a protocol independently of the underlying architecture. The Netquest system can run on any type of devices assuming nodes are not too constrained and are equipped with a (local) DBMS. The Netquest approach will be used to test network management and data-centric applications in heterogeneous networks.
We have implemented a system, also called Netquest, to be embedded on each node of a network, which evaluates in a distributed manner queries expressed in the Netloglanguage. The architecture of the system differs drastically from classical embedded systems such as TinyDB for instance. It relies on an embedded DBMS (MySQL), which handles all the data whether related to applications or to the network. The database also stores the protocols expressed in a declarative manner. The DBMS plays a fundamental role in the system which relies in particular on its transaction management. The main component is the Distributed Query Engine, which evaluates locally the queries and generate subqueries for other nodes. Currently, Netlog in push mode has been implemented for network queries. Other versions of the distributed query engine will be developed in the near future. Finally, the last component is a hybrid router, which handles the routing functions, but in a rather original way. In particular the hybrid router can handle implicit destinations, which are to be evaluated by the query engine. The use of an embedded linux together with a DBMS changes the way distributed systems can be programmed, and results in simplifications of the nodes development and maintenance, for its use of standard software components.
To facilitate the protocol design in the Netlog language, we have designed the Netquest protocol format, and developed a graphical interfaceto allow the protocol designers to write the declarative program for protocols conveniently. Then the declarative protocol is transformed into SQL queries, and stored into the embedded DMBS.
The Netquest system has been installed as a library on the network simulator WSNET. WSNET will be used to simulate wireless networks, test the Netquest system, as well as applications running on Netquest. To visualize the simulation result of Netquest on WSNET, we have developed a visualization toolto illustrate the message passing between network nodes, and the computation of the distributed query engine of the Netquest system in each node.
We have obtained results on the distributed complexity of first-order (FO) and monadic second order (MSO) logic on graphs of bounded degree . These results relate the locality of the logic, in the sense of Gaifman , to the complexity of the distributed computation in terms of the number of messages handled on each node, which can be shown to be constant, a property weaker but which resembles the locality of distributed computations . We have also obtained results on the distributed complexity of fixpoint logic (FP) on general graphs . We plan to pursue the investigation of the distributed computation of classical logics for graphs, and propose a new logic which includes a non-deterministic construct to be able to express classical distributed computing problems.
For the local level of abstraction, expressing the nodes behavior, we have designed a new language, Netlog , which extends SQL with recursion, non-determinism and communication primitives, based on rules à la Datalog . This language has been shown to be suitable to express a large collection of classical networking protocols. It thus allows to give a declarative specification of both networking protocols as well as network applications, much like the language proposed by the P2 group in Berkeley for similar purposes. This language admits two procedural semantics, in push mode corresponding to proactive protocols, and in pull mode corresponding to reactive protocols. Netlog is currently mainly used to express protocols which are stored in a declarative manner in the database of the nodes. We have adapted the semi-naive bottom-up algorithm for Datalog to evaluate Netlog and implemented it in the Netquest system.
We designed the Netquest protocol format(NPF) to express networking protocols in Netlog. Then we used NPF to express routing protocols for ad hoc networks, such as DSDV (Destination Sequence Distance Vector) , OLSR (Optimized Link-State Routing) , and VRR (Virtual Ring Routing) and test, and simulate them in the Netquest system, integrated on WSNET.
We consider motion planning on directed graphs, a problem related to data movement in a network with constrained node capacity and unidirectional links, which is an abstraction of the structure of wireless sensor networks. We proposed two algorithms for solving feasibility of motion planning on acyclic and strongly connected directed graphs respectively , thus extending results by Papadimitriou et al. on undirected graphs .
Close links exists with the France Telecom RD Beijing Lab, which started a new action on sensor networks. Our objective is to test our query engine on a testbed developed by FT. A CRC contract between INRIA and FT, associating the LIAMA with the Beijing Lab of France Telecom RD has been signed for the period July 2007 - June 2009.
Cooperation with Stéphane Ubeda and Fabrice Valois from the INRIA Ares project of CITI, INSA Lyon, on declarative networking in the framework of a Sino-French PRA project 2007-2008. Our aim is to investigate formally various networking protocols, such as flooding, self-configuration, self-organisation, routing, medium access, in the context of multihop networks, by using their declarative modeling.
Cooperation with Christine Collet and Christophe Bobineau from the Laboratory LSR/IMAG in Grenoble on the development of query optimization techniques in the context of networks.
Close links exists with the Institute of Software of the Chinese academy of Sciences, ISCAS. The professor Huimin LIN, academician, is the supervisor of the two students Fang WANG and Wenwu QU.
Stéphane Grumbach is a PC member of APWeb08, WAIM'08, SITIS'08, DEXA'09, DASFAA'09, and DS2ME@ICDE'09. He is co-PC-Chair of the 9th International Conference on Mobile Data Management (MDM'08) Beijing, China, April 27-30, 2008
Michel Bauderon is a PC member of ICGT'08, and a member of the steering committee of ICGT.