A NOVEL CROWD SENSING FRAMEWORK FOR URBAN COMPUTING APPLICATIONS
Abstract
Driven by the proliferation of sensor-rich mobile devices, crowd sensing has emerged as a new paradigm of gathering information about the physical world. In crowd sensing applications, humans work as the sensor carriers or even the sensors, and report what they learn about the conditions of the surrounding environment, such as traffic conditions, broken public utilities, gas prices, weather conditions and air quality. Despite the large amount of data collected from the crowd sensing applications, there are several challenges which prevent us from obtaining useful knowledge. In this thesis, I present a novel framework to tackle some of the major challenges and demonstrate how crowd sensing can benefit urban computing applications.The first challenge of the crowd sensing lies in how to aggregate user-contributed information and derive the true information (i.e., the truth). In fact, users may provide conflicting and noisy information on the same entity, and how to discover the truth among these conflicting observations is a key question. To tackle this challenge, I propose a truth discovery method which explicitly incorporates entity correlation information. By this means, we can infer both user reliabilities and entity truths accurately in an unsupervised way. Besides the fact that multiple users may provide redundant observations on the same entity, another major challenge is the severe sparsity problem of the crowd sensing data, i.e., a large number of entities may never receive any observations from users. Thus, I develop a method which jointly tackles the redundancy and sparsity problems in crowd sensing applications, such that we can derive an accurate and fine-grained estimation of the surrounding environment.
In addition, although we can collect a large amount of data from crowd sensing applications, each data source can only provide partial information on our surrounding environment. To further unleash the power of the crowd sensed data, we
perform knowledge mining in urban computing applications by fusing data from heterogeneous sources.In the first application, we propose to infer the city-wide traffic volume with both static loop detector data and crowdsourced taxi trajectories. In fact , neither of the two data sources is sufficient to estimate the traffic volume on each road segment, because they either suffer from severe data sparsity or data accuracy problems. To solve these problems, we develop a learning model that fuses the knowledge from two different domains as well as urban context information to accurately estimate the traffic volume in a city scale. In another application, we propose to infer individuals trip purposes by combining the knowledge from heterogeneous data sources including trajectories, point of interests and social media data. The proposed dynamic Bayesian network model captures three important factors, the sequential properties of trip activities, the functionality and POI popularity of trip end areas. By this means, we can accurately infer the purposes of daily trips.