Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Survey Flood

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Journal of Information Processing Vol.24 No.2 195–202 (Mar.

2016)
[DOI: 10.2197/ipsjjip.24.195]

Invited Paper

Survey of Real-time Processing Technologies of IoT Data


Streams

Keiichi Yasumoto1,a) Hirozumi Yamaguchi2 Hiroshi Shigeno3


Received: September 2, 2015, Accepted: December 1, 2015

Abstract: Recently, Internet of Things (IoT) has been attracting attention due to its economical impact and high ex-
pectations for drastically changing our modern societies. Worldwide by 2022, over 50 billion IoT devices including
sensors and actuators are predicted to be installed in machines, humans, vehicles, buildings, and environments. De-
mand is also huge for the real-time utilization of IoT data streams instead of the current off-line analysis/utilization
of stored big data. The real-time utilization of massive IoT data streams suggests a paradigm shift to new horizontal
and distributed architecture because existing cloud-based centralized architecture will cause large delays for providing
service and waste many resources on the cloud and on networks. Content curation, which is the intelligent compilation
of valuable content from IoT data streams, is another key to fully utilize and penetrate IoT technologies. In this pa-
per, we survey the emerging technologies toward the real-time utilization of IoT data streams in terms of networking,
processing, and content curation and clarify the open issues. Then we propose a new framework for IoT data streams
called the Information Flow of Things (IFoT) that processes, analyzes, and curates massive IoT streams in real-time
based on distributed processing among IoT devices.

Keywords: IoT, data stream, real-time processing, distributed processing, on-line learning, content curation

merge data streams. Arkessa [10], Axeda [11], ThingSquare [12],


1. Introduction Thingworx [13], WoTkit [14] and Xively [15] are examples of
Recently, Internet of Things (IoT) technology, which connects such mashup services. Most of these IoT platforms employ archi-
various physical things to the Internet, has been attracting con- tectures based on a type of cloud computing called Platform as a
siderable attention. In a white paper [1], Cisco predicted that by Service (PaaS). Generated IoT data streams are placed in cloud
2022, 50 billion things will be connected to the Internet that will storage as big data, and off-line analysis is applied with generous
produce 14.4 trillion dollars in revenue. IDC predicted that 28 computation power and time to extract “intelligence” or patterns
billion IoT devices will be installed by 2020, and the annual mar- beneficial for business.
ket revenue will reach 700 billion dollars [2]. IoT was ranked as Since IoT data streams reflect current, real world situations,
the top expected technology in Gartner’s hype cycle 2014 [3] and their real-time utilization is anticipated. For example, if live street
continues to top the 2015 hype cycle. METI in Japan asserts that view video of every place in a city can be made from videos cap-
IoT will fuel a data-driven society where the digital data collected tured in real-time by multiple mobile cameras carried by people
by IoT will acquire added value and benefit society [4]. IoT tech- and vehicles, tourism, economics, and security will benefit. For
nology not only will hugely impact markets but also has large such a service, however, existing cloud-based approaches must
potential to drastically change society. introduce non-negligible delays until the service is provided, and
Many IoT research projects are working on their own respec- this may reduce service quality and/or rapidly increase service
tive purposes. For example, IoT-A [5] aims to establish IoT archi- costs due to wasting cloud computing resources and communi-
tecture, ClouT [6] is integrating cloud computing and IoT tech- cation bandwidth. For the actual penetration of IoT, content cu-
nologies, iCore [7] is establishing cognitive management frame- ration, that is, creating valuable content from IoT data streams,
works, IoT6.eu [8] is applying IPv6 to IoT, and IERC [9] is in- is important. Curating content from various IoT data streams re-
tegrating the results from different projects. Among the many quires a recipe that consists of the following three steps:
research challenges in IoT, (1) Heterogeneity, (2) Scalability, (3) • which data streams to use,
Interoperability, and (4) Security and Privacy have been identi- • how to process and analyze them and
fied as the most important challenges. • how to integrate multiple analysis results to form content.
To tackle these challenges, various IoT platforms have been de- Since the number of possible recipes will generally be enormous,
signed and implemented to interconnect IoT devices and process/ intelligence is needed that automatically finds good recipes.
Thus, we add the following two research challenges faced by IoT:
1
Nara Institute of Science and Technology, Ikoma, Nara 630–0192, Japan (5) timeliness (real-time utilization of IoT data streams), and (6)
2
Osaka University, Suita, Osaka 565–0871, Japan
3
Keio University, Yokohama, Kanagawa 223–8522, Japan intelligence (content curation from IoT data streams).
a) This paper is organized as follows. Section 2 provides IoT use
yasumoto@is.naist.jp


c 2016 Information Processing Society of Japan 195
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

cases that require timeliness and intelligence. Section 3 surveys casting services that target such popular sports as soccer, base-
existing technologies for IoT, and Section 4 clarifies unsolved ball, Olympic games, and World cup matches using UGC. This
problems. In Section 5, we propose a new IoT platform called service collects information in and around stadiums including
IFoT that deals with timeliness and intelligence. Finally, we con- cheers, weather conditions, atmosphere and tweets (SNS mes-
clude the paper in Section 6. sages) as well as video streams from the mobile/wearable devices
of spectators and audience members and creates video content for
2. Use Cases of Real-Time Utilization of IoT
broadcast by integrating the collected information. Such a service
Data Streams and Key Challenges can create new experiences for sport fans who cannot attend live
In this section, we describe several use case scenarios for the sporting events.
real-time utilization of IoT data streams and pose key challenges. A key challenge to realize this service is how to create content
that has value to prospective audiences by intelligently selecting,
2.1 Use Case Scenarios processing, and integrating data streams. This intelligent content
• (S1) Participatory live street view creation task is called content curation. For content curation, we
The first use case scenario is participatory live street view, need techniques for capturing and handling various data streams
shown in Fig. 1. This service allows a user to watch live street in a unified way as well as efficiently identifying relevant data
view video of any place in the city from any angle. To realize this streams.
service, based on user requests, relevant videos must be captured • (S3) City-wide real-time pedestrian flow tracking
by the mobile cameras of people/vehicles and/or fixed cameras Our third use case scenario is real-time pedestrian flow tracking
installed in the street, gathered, and processed to create/provide in crowded city areas. It is desirable to grasp pedestrian flow be-
requested videos in real-time. If such a service is realized, it tween points and/or areas for various purposes, for example, pro-
will benefit various purposes, such as tourist navigation (sight- viding transportation options for smooth transitions and smooth
seeing, shopping, dining), economic activities (taxi/bus alloca- evacuation guidance during emergencies.
tion, customer attraction), and security (surveillance, finding lost When a time series of the position data of tens of thousands
children). to millions of pedestrians is uploaded to a cloud and processed/
However, implementing such a service on top of current cloud- analyzed there, it will be difficult to track such pedestrian-flow
based systems will be difficult, because such a huge number of changes in real-time; many cellular bandwidth and cloud re-
uploads and downloads of video streams as well as their process- sources will also be wasted.
ing will exhaust both network bandwidth and cloud computation We believe that the required spatio-temporal granularity of
resources. Thus, the system does not scale or need huge cost for people flow depends on where the information is used. That is,
enhancing cloud systems. Unlike cloud-based architecture where the granularity of pedestrian-flow information near the requesting
all the streams converge to a single point, we need a new archi- place must be fine, but the information far from the place may be
tecture that allows multiple data streams to directly flow between coarser. The insight from this use case is that data streams must
producers and consumers in parallel. be processed and aggregated near their generation sources to re-
• (S2) Ultra-realistic live sports broadcasts based on UGC duce them and propagate them in far places in a scalable manner.
Recently, User Generated Contents (UGC), which are up- • (S4) Real-time anomaly detection for seniors living alone
loaded to video/photo sharing websites, SNSs, and BBSs, are be- In recent years, the solitary deaths of elderly people who are
coming more popular. Live video streaming applications such as living alone is becoming a big social problem [18]. We must real-
Meerkat [16] and Periscope [17] allow users to share live video ize elderly monitoring services that detect anomalies in real-time
streams captured by smartphones. and timely notify caretakers and families.
Our second use case scenario is ultra-realistic live sports broad- Many elderly monitoring services have been provided that
use electric pots, electricity/gas remote meters and sensors at-
tached to toilets and refrigerators. These systems, however, are
time-consuming for detecting anomalies and may not be effec-
tive for lifesaving purposes. Camera-based monitoring systems
can quickly detect anomalies like falls, but they violate privacy.
Moreover, most existing monitoring services are constructed as
cloud services. The cost of continuously using cloud commu-
nication and processing is non-negligible, and leakage risks are
induced by storing privacy data on the cloud.
Ueda et al. [19] proposed an in-home living activity recognition
method, where 11 different activities are recognized with more
than 90% accuracy using indoor position sensors and power me-
ters. However, the method is based on off-line learning; no real-
time activity recognition is provided. We must realize a low-cost
system as well as one that protects privacy and processes private
Fig. 1 Live street view. data streams near sources and sends only aggregated information


c 2016 Information Processing Society of Japan 196
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

to the cloud. are restricted due to limited energy sources or processing capa-
bilities, several protocols have been proposed, such as Message
2.2 Key Challenges Queue Telemetry Transport (MQTT) [31] and Constrained Ap-
The following four key challenges arise from the use case sce- plication Protocol (CoAP) [32]. MQTT is a lightweight publish/
narios. subscribe messaging transport protocol based on a client-server
C1: Creation technology for IoT data streams: capturing vari- model designed for M2M and IoT applications with constrained
ous real world events anywhere and anytime in a unified manner. networks. CoAP is a simplified web transfer protocol that is spe-
C2: Networking technology for IoT data streams: enabling cialized for use with constrained nodes and networks such as in
direct flow between producers and consumers in parallel. M2M applications. It provides asynchronous request/response in-
C3: Processing technology for IoT data streams: processing teractions between clients and servers over UDP, which can easily
and aggregating data streams near their sources. interface with HTTP.
C4: Content curation technology: intelligently selecting nec- Other protocols have also been proposed, including Web-
essary streams and processing and integrating them into valuable socket [33] and IPv6 over Low power Wireless Personal Area
content, based on the interests of prospective users. Networks (6LowPAN) [34]. WebSocket is a full-duplex proto-
col over a TCP connection that typically provides bidirectional
3. Enabling Technologies
communication between web browsers and web servers. 6LoW-
In this section, we survey the technologies that enable the chal- PAN defines an IPv6 header compression format for IPv6 packet
lenges discussed in Section 2. delivery in low-power wireless personal area networks, i.e., IEEE
802.15.4. These protocols are designed for peer-to-peer commu-
3.1 Sensor as a Data Stream Generator nication, client-server communication on the Internet, or com-
A variety of sensors can sense the mechanical, thermal, bio- munication in WPANs. The peer-to-peer style of communication
logical, chemical, optical, and magnetic properties of physical cannot provide real-time services. Instead, it is more feasible and
environments. Due to the progress of Micro-Electro-Mechanical reasonable to connect heterogeneous IoT devices, often directly
Systems (MEMS) technologies, sensor nodes, on which sensors through heterogeneous access networks or local cloud servers.
are equipped for communication and processing capabilities, are This concept leads to edge-heavy computing. EdgeComput-
becoming smaller. These sensor nodes are organized into Wire- ing [35] and Fog Computing [36] are based on such paradigms
less Sensor Networks (WSNs) [20], which are deployed on farms where data processing is executed on those components in or
and in factories to monitor them. Sensors are embedded not on the edge of networks to mitigate server load. The demerit
only in wearable devices such as eyeglasses and watches but also of these approaches is the need for investment to replace such
in cups [21], furniture and sporting equipment like tennis rack- network constituents like Information-Centric networks (ICNs).
ets [22] and basketballs [23], door locks [24], and even fish find- Edge-Centric Computing [37], which seeks a more practical solu-
ers [25]. They are connected by the Internet to cloud servers, and tion by extending EdgeComputing and Fog Computing, delegates
smartphones are common ways to access, control, and visualize the processing tasks of cloud servers to other distributed systems
sensing data. like P2P to realize service components such as proximity, intelli-
Participatory sensing and opportunistic sensing [26] also ex- gence, trust and control outside the cloud.
ploit sensor nodes, which mainly rely on mobile agents like As recent work on IoT, MINA [38] is an integrated network
humans and vehicles. Well-known or recent projects include system that provides seamless unification of different wireless
EarPhone [27], GreenGPS [28], and SakuraSensor [29]. In par- access technologies like cellular, WiFi, ZigBee, and Bluetooth
ticular, incentive mechanisms and gamification to promote user and multi-hop communication technologies like MANET. It
participation are recent challenging topics [30]. uses software defined networks (SDNs) and flexibly delivers data
As seen above, a variety of sensors may generate a number streams among devices. To cope with the issue of the heterogene-
of IoT data streams. Nevertheless, they are basically utilized by ity of available resources in such multinetwork environments,
dedicated software or platforms due to the proprietary aspects of MINA, which is composed of SDN controllers on four different
devices and a lack of standard platforms that enable many devel- layers, monitors the available resources and schedules the data
opers to easily obtain, analyze, and combine data streams in the streaming considering the QoS required by the services.
context of applications and services that are provided to users.
In other words, content-centric (or dependent) stream processing 3.3 IoT Data Stream Processing
is required to migrate service-level processing tasks from cloud Unlike conventional DBMS that assume all data are stored in
servers to distributed components in networks to mitigate the data DBs before analysis, more than a few IoT systems assume that
processing cost, which is high in terms of delay, server load, and data from IoT devices are streams, where data elements have tem-
throughput. poral relationships and require real-time processing. Most are
also redundant with respect to the data’s value.
3.2 Networking Technologies for IoT data Streaming Following its definition by Ref. [39], data stream processing
A number of protocols have been proposed for academic re- is represented, at a high level, as a graph of FIFO queues that
search, industrial use, and the standardization of IoT/M2M com- correspond to data streams and operators that may take multiple
munication. For lightweight communications where IoT sensors inputs/outputs from/to those FIFO queues. Operators are contin-


c 2016 Information Processing Society of Japan 197
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

uous stream transformers, which must contain activation, initial- imize the expected values of errors. Stochastic Gradient Descent
ization, and output data rate policies. For example, the function of (SGD), a well-known method for this purpose, was originally de-
averaging the input data streams is a simple stream transformer. signed for randomly picked data from a dataset; but by assuming
In the context of databases, such a system that deals with the incremental feeding of data to the procedure, it can be applied
data streams is often called a Data Stream Management System to data streams with limited memory space. Similarly, many algo-
(DSMS). In contrast to DBMS, which issues a single query to an rithms for batch (or off-line) machine learning algorithms can be
entire dataset, DSMS updates the result whenever new data ar- converted to on-line versions if we control references to training
rive, but it can be seen as an extension of DBMS since it is a data and model updates.
query-based system. Many of the IoT platforms introduced above support on-line
Complex Event Processing (CEP) [39], [40] is another well- learning schemes. For example, AWS and StreamInsight pro-
known technique to detect events that satisfy given conditions vide Amazon Machine Learning [48] and Azure Machine Learn-
over different streams. Historically, such single event process- ing [49] to enable data processing and analysis over data streams.
ing as attribute-based data and interest filtering has been em- However, both systems need cloud servers, which are often ex-
ployed in publisher/subscriber systems, but these systems with pensive for many types of applications and services in terms of
multiple sensors must deal with more complex conditions. CEP performance. Jubatus [50] supports distributed on-line machine
has been exploited in many fields like distributed information learning, but it does not focus on distributed stream fusion based
systems, business process automation, control systems, network on service-level context.
monitoring, and sensor networks. CEP supports a variety of lan-
guages (Java, Python, and R) to specify conditions, but SQL- 3.4 Service Composition from IoT Data Streams
based ones are popular in many systems. For example, Continu- Some programming models and tools have been developed and
ous Query Language (CQL) [41] was initially developed by Stan- provided for creating contents (composing services) by collect-
ford for their STREAM [42] system, and Oracle, uCosminexus ing and merging various IoT data streams. Web of Things [51]
Stream Data Platform (Hitachi Co. LTD.), and others have imple- is a programming model where services can be easily mashed-
mented CQL in their systems. Event Processing Language (EPL) up by associating objects with web components using web 2.0
is another common language. technology. WotKit [52] is a visual programming tool for ser-
Several platforms exist for stream processing, including IBM vice mashups. IBM also provides a visual programming tool
Infosphere Stream [43] (which uses IBM Streams Processing called Node-RED [53] where a new service can be created just by
Language (SPL)), SAP Sybase Event Stream Processor, Stream- drawing lines among IoT devices, APIs, and services. Another
Base [44], SQLstream Blaze, Amazon AWS IoT, Yahoo!S4, study [54] extends Node-RED to treat distributed data streams.
TIBCO Business Events, Microsoft StreamInsight, Apache Mobile Fog [55] is a programming model that constructs large-
Storm [45], and Apache Spark [46]. These platforms are tuned scale IoT services.
for high-speed real-time processing of a massive amount of tem- Although these programming models and tools facilitate users
poral data. For example, Spark has a function called Resilient to easily and intuitively compose services and/or create contents,
Distributed Dataset (RDD) that hides parallel and distributed op- they still need to manually design output layouts of the contents
erations over multiple streams and provides seamless access to and specify the processing sequence of the streams until the con-
service users. More academic issues have been discussed in sci- tent is derived. Therefore we need automated content creation
entific research. For example, CLARO [47] designed stochastic adaptive to the availability of data streams and their dynamism.
query processing when the input data are uncertain and have er- Fujisawa et al. proposed a video curation system [56] that tar-
rors. gets baseball games and automatically creates real-time video
Machine learning and statistics are considered part of IoT data content with high values from multiple video streams captured by
streaming frameworks. Feature selection and principal compo- spectator cameras in different places and at different angles and
nent analysis (PCA) help reduce the data dimensions to maintain zoom levels. In their study, assuming that video contents with
the capability of representing data characteristics. Machine learn- similar camera switching patterns (i.e., which camera’s video is
ing also involves operations to create such classifiers as Support used in the broadcasted content and when) to the TV broadcast
Vector Machine (SVM) or unsupervised learning (like clustering) have high values, machine learning algorithms are constructed
and to train various models, including linear function, k-nearest using the camera switching patterns of TV broadcasts as training
neighbor, and logistic model by regression analysis. These op- data.
erations can be applied to data streams slotted by a certain time
window or revised for incremental updates of models whenever 3.5 International Activities on IoT Framework Design and
new data arrive. This is often called on-line machine learning, Standardization
whose popularity is rising due to increased attention to IoT and Many IoT-related organizations, consortiums, and projects
big data. Supervised on-line machine learning represents the er- exist. For example, Open Interconnect Consortium (OIC),
rors between the true and estimated values by models and up- which was founded by Intel, Samsung, and others, released
dates the model parameters to minimize the errors whenever data IoTivity [57], an open source software framework that enables
arrive. Assuming that the error functions are represented as prob- seamless device-to-device connectivity to address the emerging
ability distribution functions, the parameters are updated to min- needs of the Internet of Things (IoTivity 1.0 was released in


c 2016 Information Processing Society of Japan 198
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

October 2015). AllSeen Alliance from the Linux Foundation summarize flows, and obtain knowledge or patterns. Existing
released Alljoyn [58], an open-source IoT framework by Qual- IoT technologies do not have a common representation format
comm. OneM2M [59], which developed technical specifications for handling different types/levels of flows in a unified manner.
for M2M services, releases standards to create a foundation plat- C2 (Networking technology for IoT data streams) and C3 (Pro-
form for IoT devices and applications. Industrial Internet Consor- cessing technology for IoT data streams): Since scenarios S1-S4
tium (IIC) [60] focuses more on industrial applications for IoT; suppose real-time stream distribution among devices and real-
Intel, GE and some others are leading this consortium. Though time processing and analysis of flows, they require an adaptive
their goals are different, they basically share a common vision for processing mechanism to meet real-time constraints by adaptively
IoT where everything is connected by the Internet to support next allocating computation resources and/or a granularity adjustment
generation applications and services that have deeply penetrated mechanism to satisfy the bandwidth constraint. On-line learning
our social lives, societies, industries, and infrastructures. The is another issue. Flows with high dynamism must be analyzed in
IPSO Alliance [61] established the Internet Protocol as the basis real-time, and knowledge or patterns such as contexts and objects
for the connection of Smart Objects. The European Commission must be detected on-line so that the learned tags are attached to
7th Framework program (EU-FP7) sponsors the IoT European flows for real-time utilization. Few existing IoT platforms con-
Research Cluster (IERC) [9] that addresses the large potential for sider both network/computation resources and data granularity
IoT-based capabilities in Europe involving international partners adjustments to achieve real-time distribution of flows.
from Europe, USA, Japan, China, and Korea. C4 (Content curation technology): Scenarios S1-S4 suppose
Many platforms and related projects have also been developed. real-time content curation from multiple flows. There are two
Some platforms for developing IoT applications are now avail- kinds of curators: human and machine. For human curators,
able on the market and primarily focus on processing large-scale support for the selection of relevant flows and for understanding
real-time streams. Research-based projects share this goal. For them is essential (C5). There are three functions for human cu-
example, the concept of the EU-Japan funded project ClouT [6] rators: intelligent flow search that considers the curator’s value,
is leveraging cloud computing as an enabler to bridge things, flow visualization for understanding the content, and flow pre-
people, and services by the Internet. Another EU-Japan project, diction for understanding temporal changes. Realizing machine
FESTIVAL [62], connects and unites European and Japanese IoT curators is a very interesting but challenging issue. Predicting
testbeds to provide IoT experimentation platforms for homoge- content values for prospective audiences is one part of the key
neous access APIs with an Experimentation as a Service (EaaS) challenges. Few existing IoT platforms have developed auto-
model for experimenters. Many other EU-funded projects exist: mated curators or functions to support human curators to manage
CASCADAS2, VITAL-IoT [63], and IOT-I & IOT-A [5]. real-time contents. Machine learning-based automatic video cu-
Finally, smart city projects are closely related with IoT tech- ration from spectator mobile cameras that target baseball games
nologies. In the EU, many cities are now interested in making was proposed [55], but TV broadcasts are used as training data
them smarter with respect to such infrastructure-related issues as
energy, mobility, government services, and health. The follow- Table 1 Challenges and technical issues for real-time utilization of IoT
ing are well-known smart city projects: Santander supported by flows.

Future Internet Research (FIRE), BCN Smart City (Barcelona),


Valencia Smart City, and Smart Beehive Project (Ireland). For
smaller-scale networks (in comparison with smart cities) that are
deployed in homes, buildings, and offices, several platforms are
available. For example, HomeKit and Brillo are provided by Ap-
ple and Google, IoTivity by OIC, and AllJoyn by AllSeen.

4. Open Issues
This section reflects on the use case scenarios in Section 2 and
key challenges C1-C4 and clarifies the open issues for the real-
time utilization of IoT data streams.
C1 (Creation technology for IoT data streams): Scenarios
S1-S4 utilize different sensors/data streams, such as those from
cameras, microphones, accelerometers, ambient sensors, posi-
tion sensors, power meters, vital sensors, and SNSs. Thus,
C1 requires a common representation format of various IoT
data streams because they are processed and combined to form
new streams. The common format must also be able to rep-
resent higher-level streams. Hereafter, we use the term flows
to refer to both raw data streams and higher-level streams after
processing. The common format should include metadata that
help devices/servers easily search for necessary flows, aggregate/


c 2016 Information Processing Society of Japan 199
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

by assuming that they have high values. Since preparing train- device and a flow source, captures and processes data in the real
ing data that only contain good curations is difficult, a method is world and sends them out as a flow(s). For this purpose, it in-
required for estimating the user values of a given curation. corporates functions for creating relevant information flows (C1
Another big challenge is related to security and privacy issues in Table 1), and processing/analyzing flows (C3), and handling
(C6). Scenarios S1-S4 utilize privacy-sensitive data flows such security and privacy issues (C6). IFoT-Neuron is expected to be
as location data, vital signs, and video/audio data. For the wide installed in every IFoT compatible device (called an IFoT node)
penetration and the utilization of IoT flows, functions are neces- as a software library or a hardware module.
sary that make people feel secure and safe when distributing their IFoT-Neuron has communication capabilities with nearby
flows and/or using the flows of others. Many security studies treat IFoT nodes, connections to the Internet (optional), and process-
individual data types like location data. Security architecture for ing flows such as attaching tags, basic stream processing, and
the cyber-physical-social world was proposed [64], but few treat anonymization.
both security and privacy issues in the IoT context such as data To tackle challenge C1, for different flows, we define a com-
heterogeneity and real-time distribution. mon metadata format that consists of data type, granularity, lo-
Table 1 summarizes the main challenges, their purposes, and cation information, and a set of tags for each time interval of the
the remaining technical issues. flows.
For challenge C2, tags (i.e., contexts, identified objects/events,
5. IFoT: Real-Time Information Flow Process-
etc.) are derived through a learning algorithm implemented in
ing Framework the IFoT-PO3-Engine and automatically attached to flows. The
As discussed above, most existing IoT platforms do not fully metadata associated with each flow facilitate efficient searches
support both distributed and on-site processing. Even for local and further processing of the flows by other IFoT nodes. More-
services, we need to set up a cloud server and collect/process over, for the real-time distribution of flows, it offers a dynamic
data streams in servers far from the data sources. Such archi- granularity adjustment function that reduces the data granularity,
tecture not only limits communication and computation capacity as requested by the IFoT-PO3-Engine.
but also requires additional efforts for handling privacy-sensitive For the wide penetration of IFoT compatible devices, IFoT-
data, creating barriers to the real-time utilization of IoT big data. Neuron should be implemented as a small, low-cost, and low-
In this section, we propose the Information Flow of Things power hardware component with sufficient computation power.
(IFoT), a new framework for processing, analyzing, and curating Toward zero-energy operation, MEMS and energy-harvesting
IoT data streams in real-time and in a scalable manner based on technologies should also be employed.
distributed processing among IoT devices. In IFoT, both raw data
streams and higher-level streams after processing/aggregating/ 5.2 IFoT-PO3-Engine
merging are called information flows (or flows) and treated iden- To distribute flows between sources and users without stag-
tically. nation, collecting low-level (raw) flows in clouds is not a good
IFoT aims to solve the following three technical issues: (1) idea because of the bandwidth waste in paths to the cloud and
handling various information flows in a unified manner, (2) pro- the imposition of large delays. Instead, it is desirable to process,
cessing and analyzing flows in their proximity and distributing analyze, and aggregate flows near their sources to reduce the re-
them directly between devices in real-time and in a scalable man- quired bandwidth between sources and destinations. We call this
ner, and (3) intelligently integrating different flows into content concept “Process On Our Own,” or PO3 in short.
(as a higher-level flow) and providing it in real-time. These issues For a concrete shape of this concept, we designed an IFoT-
are solved by three different layered components: IFoT-Neuron, PO3-Engine that offers functions for executing high-load tasks
IFoT-PO3-Engine, and IFoT-Curator (Fig. 2). including complex event processing and on-line learning among
IFoT nodes in a distributed and cooperative manner and ef-
5.1 IFoT-Neuron ficiently distribute resulting higher-level flows to remote IFoT
IFoT-Newron, which is an abstraction of an intelligent sensing nodes.
For distributed and cooperative processing, predicting process-
ing time for a heavy task is required. If the predicted time does
not satisfy the time constraint, the task is divided into sub-tasks,
which are sent to nearby IFoT nodes for execution. Division into
sub-tasks and allocating them are dynamically done by consider-
ing the available bandwidth in the network and the computation
power of the nearby IFoT nodes.
Real-time flow distribution among remote IFoT nodes is an-
other issue to be solved. The IFoT-PO3-Engine searches for mul-
tiple routes to a destination node (including routes through cellu-
lar networks), measures or estimates delays and available band-
width on each route, and establishes a multi-path route to deliver
Fig. 2 IFoT: challenges and approaches. a flow in real-time in cooperation with the dynamic granularity


c 2016 Information Processing Society of Japan 200
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

adjustment function of IFoT-Neuron. ney to Digital Business, available from http://www.gartner.com/


newsroom/id/2819918 (accessed 2015-11-08).
[4] Ministry of Education, Trade and Industry: Changes in response to the
5.3 IFoT-Curator arrival of a data-driven society using CPS, available from
IFoT aims to realize the real-time utilization of information http://www.meti.go.jp/committee/sankoushin/shojo/johokeizai/pdf/
report01 04 00.pdf
flows by providing users with content curated from multiple flows [5] IoT-A, Internet of Things – Architecture, available from http://www.
in real-time. To this end, we need to define a language to de- iot-a.eu/public (accessed 2015-11-08).
[6] Cloud of Things for empowering the citizen clout in smart cities, avail-
scribe a curation recipe (e.g., a task graph to create content with able from http://clout-project.eu/ (accessed 2015-11-08).
the data’s required spatio-temporal granularity) and its execution [7] iCore Project: available from http://www.iot-icore.eu/ (accessed
2015-11-10).
system. When a curation recipe is submitted to an IFoT node, it is [8] Researching IPV6 potential for the Internet of Things, available from
executed among nearby IFoT nodes in a distributed manner with http://iot6.eu/ (accessed 2015-11-08).
[9] IERC, European Research Cluster on the Internet of Things, available
an API provided by the IFoT-PO3-Engine and the IFoT-Neuron. from http://www.internet-of-things-research.eu/ (accessed 2015-11-
Therefore, the execution system is designed and implemented as 08).
middleware with such functions as code/data migration and dis- [10] Arkessa, available from http://www.arkessa.com/
[11] Axeda, available from http://www.axeda.com/
tributed/cooperative task processing. [12] ThingSquare, available from http://www.thingsquare.com/
IFoT-Curator aims not only for the execution of human-edited [13] Thingworx, available from http://www.thingworx.com/
[14] Blackstock, M. and Lea, R.: IoT mashups with the WoTKit, Proc.
recipes but also for the support for recipe-editing work (C5 in IEEE Internet of Things (IOT), pp.159–166 (2012).
Table 1) and the further automatic creation of recipes (C4 in Ta- [15] Xively, available from http://xively.com
ble 1). For the challenge C5, it is required for a human curator [16] Meerkat, https://meerkatapp.co/ (Accessed 2015-11-11).
[17] Periscope, available from https://www.periscope.tv/ (accessed 2015-
to be able to acquire only a special subset of massive flows that 11-11).
match his/her interests, visualize dynamism in flows and predict [18] Cabinet Office: Annual Report on the Aging Society: 2014 (Sum-
mary), available from http://www8.cao.go.jp/kourei/english/
their future change for their better understanding. annualreport/2014/2014pdf e.html (accessed 2015-11-30).
For automated curation, it is also needed to realize a function to [19] Ueda, K., Suwa, H., Arakawa, Y. and Yasumoto, K.: Exploring
Accuracy-Cost Tradeoff in In-Home Living Activity Recognition
measure the value of a content’s value for its prospective audience based on Power Consumptions and User Positions, Proc. 14th IEEE
and a function to predict new content’s expected value created Int’l. Conf. on Ubiquitous Computing and Communications (IUCC
2015), pp.1131–1137 (2015).
with each possible recipe. Even though the latter is challenging, [20] Yick, J., Mukherjee, B. and Ghosal, D.: Wireless sensor network sur-
this function must be achieved. vey, Computer Networks, Vol.52, pp.2292–2330 (2008).
[21] Intel shows off a light-up smart mug, because why not?, available from
6. Conclusion http://www.engadget.com/2014/01/07/intel-smart-mug-concept/
(accessed 2015-11-09).
IoT technologies offer the potential to drastically change our [22] Smart Tennis Sensor for Tennis Rackets, available from http://www.
sony.com/electronics/smart-devices/sse-tn1w (accessed 2015-11-09).
societies. The keys are the real-time utilization of IoT data [23] 94FiFty, available from http://www.94fifty.com/ (accessed 2015-11-
streams and intelligent content creation (content curation) from 09).
[24] GOji, available from http://gojiaccess.com/ (accessed 2015-11-09).
these data streams. However, existing network and cloud com- [25] Deeper, Smart Fishfinder, available from https://buydeeper.com (ac-
puting architectures may not be able to accommodate the mas- cessed 2015-11-09).
[26] Higuchi, T., Yamaguchi, H. and Higashino, T.: [Invited Paper] Mobile
sive data streams generated by as many as a trillion IoT devices Devices as an Infrastructure: A Survey of Opportunistic Sensing Tech-
in real-time. Thus, a paradigm shift is essential for new informa- nology, Journal of Information Processing, Vol.23, No.2, pp.94–104
(2014).
tion processing architecture that allows data streams to flow in the [27] Rana, R.K., Chou, C.T., Kanhere, S.S., Bulusu, N. and Hu, W.: Ear-
required form among places. phone: An end-to-end participatory urban noise mapping system,
Proc. 9th ACM/IEEE Intl. Conf. on Information Processing in Sensor
In this paper, we surveyed the existing and emerging technolo- Networks (IPSN 2010), pp.105–116 (2010).
gies toward real-time IoT data stream utilization and content cu- [28] Ganti, R.K., Pham, N., Ahmadi, H., Nangia, S. and Abdelzaher, T.F.:
ration, clarified open problems, and proposed a new framework GreenGPS: A participatory sensing fuel-efficient maps application,
Proc. 8th Intl. Conf. on Mobile Systems, Applications, and Services
called Information Flow of Things (IFoT) for processing IoT data (MobiSys 2010), pp.151–164 (2010).
streams in real-time. To realize IFoT, many challenging issues [29] Morishita, S., Maenaka, S., Nagata, D., Tamai, M., Yasumoto, K.,
Fukukura, T. and Sato, K.: SakuraSensor: Quasi-Realtime Cherry-
need to be solved. We hope this paper spurs prospective re- Lined Roads Detection through Participatory Video Sensing by Cars,
searchers in related fields to advance their research toward the Proc. 2015 ACM Intl. Joint Conf. on Pervasive and Ubiquitous Com-
puting (UbiComp 2015), pp.695–705 (2015).
realization of data-driven societies. [30] Arakawa, Y. and Matsuda, Y.: [Invited Paper] Gamification mecha-
Acknowledgments This study was supported in part by nism for enhancing a participatory urban sensing: survey and practi-
cal results, Journal of Information Processing, Vol.24, No.1, pp.31–38
JSPS Grant in Aid for Scientific Research 26220001, 25280031 (2016).
and 15H02690. [31] OASIS Standard, MQTT version 3.1.1, available from http://docs.
oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.doc (2014).
[32] Shelby, Z., Hartke, K. and Bormann, C.: Request for Comment
References 7252, The Constrained Application Protocol (CoAP), available from
http://tools.ietf.org/rfc/rfc7252.txt (2014).
[1] Bradley, J., Barbier, J. and Handler, D.: Embracing the Internet of Ev- [33] WebSocket, available from https://www.websocket.org/
erything To Capture Your Share of $14.4 Trillion (White Paper), avail- [34] Shelby, Z. and Borman, C.: 6LoWPAN: The Wireless Embedded In-
able from http://www.cisco.com/web/about/ac79/docs/innov/IoE ternet, John Wiley & Sons (2011).
Economy.pdf (accessed 2015-11-08).
[35] Davis, A., Parikh, J. and Weihl, W.E.: Edgecomputing: Extending
[2] IDC Market in a Minute: Internet of Things, available from enterprise applications to the edge of the internet, Proc. 13th Inter-
http://www.idc.com/downloads/idc market in a minute iot national World Wide Web Conference on Alternate Track Papers and
infographic.pdf (accessed 2015-11-08). Posters, pp.180–187 (2004).
[3] Gartner’s 2014 Hype Cycle for Emerging Technologies Maps the Jour-


c 2016 Information Processing Society of Japan 201
Journal of Information Processing Vol.24 No.2 195–202 (Mar. 2016)

[36] Bonomi, F., Milito, R., Zhu, J. and Addepalli, S.: Fog Computing Keiichi Yasumoto received his B.E.,
and Its Role in the Internet of Things, Proc. 1st Edition of the MCC M.E., and Ph.D. degrees in informa-
Workshop on Mobile Cloud Computing (MCC’12), pp.13–16 (2012).
[37] Lopez, P.G., Montresor, A., Epema, D., Datta, A., Higashino, T., tion and computer sciences from Osaka
Iamnitchi, A., Barcellos, M., Felber, P. and Riviere, E.: Edge-centric University, Osaka, Japan, in 1991, 1993
Computing: Vision and Challenges, ACM SIGCOMM Computer Com-
munication Review, Vol.45, No.5, pp.37–42 (2015). and 1996, respectively. He is currently
[38] Qin, Z., Denker, G., Giannelli, C., Bellavista, P. and a professor of the Graduate School of
Venkatasubramanian, N.: A Software Defined Networking Ar-
chitecture for the Internet-of-Things, Proc. IEEE Network Operations Information Science at Nara Institute of
and Management Symposium (NOMS), pp.1–9 (2014). Science and Technology. His research
[39] Hirzel, M., Soulé, R., Schneider, S., Gedik, B. and Grimm, R.: A cat-
alog of stream processing optimizations, ACM Comput. Surv., Vol.46, interests include distributed systems, mobile computing, and
No.4, Article 46, pp.1–34 (2014). ubiquitous computing. He is a member of ACM, IEEE, and
[40] Cugola, G. and Margara, A.: Processing flows of information: From
data stream to complex event processing, ACM Comput. Surv., Vol.44, IEICE.
No.3, Article 15, pp.1–62 (2012).
[41] Arasu, A., Babu, S. and Widom, J.: The CQL continuous query lan-
guage: Semantic foundations and query execution, The VLDB Jour- Hirozumi Yamaguchi received his B.E.,
nal, Vol.15, No.2, pp.121–142 (2006).
[42] Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Motwani, R., M.E., and Ph.D. degrees in information
Nishizawa, I., Srivastava, U., Thomas, D., Varma, R. and Widom, J.: and computer sciences from Osaka Uni-
STREAM: The stanford stream data manager, IEEE Data Eng. Bull.,
Vol.26, No.1, p.665 (2003). versity, Japan, in 1994, 1996, and 1998,
[43] Gedik, B. and Andrade, H.: A model-based framework for building respectively. He is currently an associate
extensible, high performance stream processing middleware and pro-
gramming language for IBM InfoSphere streams, Softw. Pract. Exp., professor at Osaka University. His current
Vol.42, No.11, pp.1363–1391 (2012). research interests include design, develop-
[44] StreamBase Systems, available from http://www.streambase.com
(2012). ment, modeling, and simulation of mobile
[45] Storm project, available from http://storm-project.net/ (2012). Re- and wireless networks and applications. He is a member of IEEE.
trieved May 2012.
[46] Apache Spark-Lighting-fast cluster computing, available from
http://spark.apache.org/
[47] Tran, T.T.L., Peng, L., Diao, Y., McGregor, A. and Liu, A.: CLARO: Hiroshi Shigeno received his B.S., M.E.
Modeling and processing uncertain data streams, The VLDB Journal, and Ph.D. degrees in instrumentation en-
Vol.21, pp.651–676 (2012).
[48] Amazon: Amazon Machine Learning, available from https://aws. gineering from Keio University, Japan in
amazon.com/machine-learning/ (accessed 2015-11-13). 1990, 1992 and 1997. Since 1998, he has
[49] Microsoft: Azure Machine Learning, available from https://azure.
microsoft.com/en-us/services/machine-learning/ (accessed 2015-11- been with the Department of Information
13). and Computer Science at Keio University,
[50] Jubatus: Distributed Online Machine Learning Framework, available
from http://jubat.us/en/ (accessed 2015-11-15). where he is currently a professor. His
[51] Guinard, D., Trifa, V. and Wilde, E.: A resource oriented architecture current research interests include mobile
for the web of things, Proc. Internet of Things (IOT), pp.1–8 (2010).
[52] Blackstock, M. and Lea, R.: IoT mashups with the WoTKit, Proc.
and ubiquitous computing, network architecture, and Intelligent
Internet of Things (IOT), pp.159–166 (2012). Transport Systems. He is a member of IEEE, ACM, and IEICE.
[53] IBM: Node-RED, available from http://nodered.org/ (accessed
2015-11-10).
[54] Blackstock, M. and Lea, R.: Toward a Distributed Data Flow Plat-
form for the Web of Things (Distributed Node-RED), Proc. 5th Intl.
Workshop on Web of Things, pp.34–39 (2014).
[55] Hong, K., Lillethun, D., Ramachandran, U., Ottenwälder, B. and
Koldehofe, B.: Mobile fog: A programming model for large-scale ap-
plications on the internet of things, Proc. 2nd ACM SIGCOMM Work-
shop on Mobile Cloud Computing, pp.15–20 (2013).
[56] Fujisawa, K., Hirabe, Y., Suwa, H., Arakawa, Y. and Yasumoto, K.:
Automatic Content Curation System for Multiple Live Sport Video
Streams, The 11th IEEE Int’l. Workshop on Multimedia Information
Processing and Retrieval (MIPR 2015) (2015).
[57] IoTivity, available from https://www.iotivity.org/ (accessed 2015-11-
13).
[58] AllJoyn Framework, available from https://allseenalliance.org/
framework (accessed 2015-11-13).
[59] oneM2M, available from http://www.onem2m.org/ (accessed 2015-
11-13).
[60] Industrial Internet Consortium, available from http://www.
iiconsortium.org/ (accessed 2015-11-13).
[61] IPSO Alliance, available from http://www.ipso-alliance.org/ (ac-
cessed 2015-11-13).
[62] FESTIVAL: FEderated interoperable SmarT ICT services deVelop-
ment And testing pLatforms, available from http://www.festival-
project.eu/en/ (accessed 2015-11-13).
[63] Vital, The future of Smart Cities, available from http://vital-iot.eu/.
[64] Ning, H. and Liu, H.: Cyber-physical-social based security architec-
ture for future internet of things, Advances in Internet of Things, Vol.2,
No.1, p.1 (2012).


c 2016 Information Processing Society of Japan 202

You might also like