Section: Application Domains

Data Analytics for the Internet of Things

The Internet of Things (IoT) is rapidly transforming the physical world into a large scale information system. A wave of smart "things" smoothly disappear in our environment (aka Pervasive Computing), or be embodied in humans (aka Wearable Computing, and continuously produce valuable information regarding almost every living context and process. Making sense of the data streams "things" produce and share is crucial for disruptive IoT applications. From smart devices and homes, to smart roads and cities, IoT data analytics is expected to enable a resource-conscious automation of our everyday life in terms of operational efficiency, security, safety as well as of a lower energy footprint.

Multi-dimensional Usage Patterns. We have initially investigated how data analytics for Machine-to-Machine (M2M) data (connectivity, performance, usage) produced by connected devices in residential Intranet of Things, could support novel home automation services that enrich the living experience in smart homes. We have investigated new data mining techniques that go beyond binary association rule mining for traditional market basket analysis, considered by previous works. We design a multidimensional pattern mining framework, which collects raw data from operational home gateways, it discretizes and annotates the raw data, it produces traffic usage logs which are fed in a multidimensional association rule miner, and finally it extracts home residents habits. Using our analysis engine, we extract complex device co-usage patterns of 201 residential broadband users of an ISP, subscribed to a n-play service. Such fine-grained device usage patterns provide valuable insights for emerging use cases, such as adaptive usage of home devices (aka horizontal integration of things). Such use cases fall within the wider area of human-cognizant Machine-to-Machine communication aiming to predict user needs and complete tasks without users initiating the action or interfering with the service. While this is not a new concept, according to Gartner cognizant computing is a natural evolution of a world driven not by devices but collections of applications and services that span across multiple devices, in which human intervention becomes as little as possible, by analyzing past human habits. To realize this vision, we are interested in co-usage patterns featuring spatio-temporal information regarding the context under which devices have been actually used in homes. For example, a network extender which is currently turned off, could be turned on at a certain day period (e.g., evening) when it has been observed to be highly used along with other devices (e.g., a laptop or a tablet). Alternatively, the identification of frequent co-usage of particular devices at a home (say iPhone with media player), could be used by a things recommender to advertise the same set of devices at another home (say another iPhone user could be interested in a media player).

Time Series Motif. Furthermore, we are interested in extracting previously unknown recurring patterns (aka motifs) directly from traffic time series reported by residential gateways. Such motifs could help ISPs to reduce the cost for serving and diagnosing remotely home networks, or even help assist in defining home-specific bandwidth sharing and prioritization policies. More precisely, traffic motifs enriched with detailed home device information is a valuable input for root cause diagnosis and can be contrasted to the trouble description reported by users to the ISP. Moreover, in their majority, ISPs typically broadcast firmware and software updates to all gateways at nights (some operators even on a daily basis). This may cause service outages, given that some gateways may exhibit an active network usage during night time. A fine-grained temporal characterization of residential bandwidth consumption will enable ISPs to differentiate RGWs firmware update policies according to the least cumbersome time window per home, thus, improving the overall QoE of residential users. Finally, home network resources (bandwidth) are shared not only among residents using an increasing number of on-line applications (e.g., social networking, gaming, uploading/downloading, etc.) and real time services (TV on-demand, teleconferencing), but also with guests, neighbors, or even the occasional passes by. Existing methods for bandwidth sharing and traffic prioritization are static and coarse. ISPs usually allocate a fixed percentage of home bandwidth to non-residential users, while traffic prioritization in commodity gateways is at best based on the network port on which traffic is sent or received. We believe that behavioural patterns extracted by gateway traffic time series can be used to support dynamic policies for sharing home bandwidth that consider the online habits of residential users. For example, in-home traffic congestion can be avoided by ordering the traffic patterns of different devices observed especially during afternoon and weekends. These patterns reveal the bandwidth consumption behavior of different groups of residential users (adults and children employ different devices during the same time-slots) while the comparison of traffic domination help us to distinguish between residents and guests (pattern-specific vs global traffic dominant devices).