3.1 Data Acquisition
The data was collected using a cross-OS (Android and iOS) mobile sensing platform called Readisense. It was built on top of Sensus [
38]; an open-source crowdsensing tool developed at the University of Virginia since 2014. The application captures most common data streams from smartphone embedded sensors and applications, though there are some operating system-specific sensors like “Healthkit” data, which is available only in iOS. The application supports both Listening Probes (i.e., when a change in a sensor is noted, the application records those data) and Polling Probes (i.e., data is collected at regular intervals from the sensors). Even though a wide variety of sensors are available as part of the anonymous-app, we limited our study to the ones that captured human behavior change during the COVID-19 pandemic, including GPS, Step Count, App Usage, Battery Level, Wireless Access Point, and Screen Usage. The data collection was approved by the
Institution Review Board (IRB), where 2,700 participants across US were recruited to participate in a four-month study to examine the relationship between mobile sensing data and symptoms of infectious diseases. Only those with at least 14 days of GPS, Bluetooth, and Activity data were considered, leaving 598 participants. The participants’ average age is 40.71 (sd = 11.52). Females make up 65.36% of the participants, with Whites accounting for 66.2%, African Americans 19.3%, Asians 7.3%, people of other races 3.3%, Hispanics 3.5%, and others 0.4%. The selected 598 participants were the ones being active during the COVID-19 pandemic. Fifty-two percent of the participants are full-time employees, 20% are full-time students, and 12% are working part-time. The top three states based on the number of participants are North Carolina, Virginia, and Ohio.
3.5 Bayesian Change-point Analysis
We formulate the problem of detecting changes in the time series as a Bayesian change-point detection problem. Using a Bayesian methodology allows easy quantification of uncertainty and integration of priors (for instance, feeding in government directives as metadata instead of assuming a uniform distribution of change-points over locations). We use Barry and Hartigan’s [
7] Bayesian change-point model for our analysis. Although such a formulation supports identifying mean, variance, and intercept changes, it is practical to choose models that are simple and easy to interpret. Furthermore, increasing evidence showed that large-scale behaviors in communities such as Twitter and financial markets, while being driven by external events, are characterized by discrete shifts and bursts. Since we are working with time series of means of features on a community level, which are being driven by news events and government policies, it is appropriate to identify similar discrete changes. We thus primarily analyze changes in levels.
The adopted Bayesian change-point model assumes that there is an unknown partition \(\rho\) of the data into contiguous blocks, such that within each block, the mean remains the same. The model also assumes an independent normal distribution for each block.
Let us assume we have n data points \(\lbrace X: X_1,\ldots ,X_n\rbrace\). Let \(\rho = (U_1,\ldots ,U_n)\) indicate a partition of the time series into non-overlapping partitions. We use a Boolean array of change-points to denote the partitions. At each timestep, if \(U_i\) takes a value 1, then we have a new block; else, we remain in the same block.
We are interested in the posterior density
\(f(\rho |X)\). By Baye’s theorem, this can be written as
Prior cohesion density: Let
p denote the probability of getting a change point at each location. We assume this probability to be the same at each location. If we assume that there are
b partitions, then the prior cohesion density can be written as
The joint density of observations and parameters given
\(\rho\) is a product of densities of different blocks over the blocks in
\(\rho\). Let us consider a single block. If we assume that the data in this block is generated by a Gaussian with mean
\(\theta\) and variance
\(\sigma ^2\), then let the prior density of
\(\theta\) be a Gaussian with mean
\(\mu _0\) and variance
\(\sigma _0^2\)The above integral can be simplified to the expression below
where
and
\(\hat{X}_{ij}\) is the mean of the observations in the partition. However,
\(f(X_{ij})\) still depends on the parameters
\(\mu _0, \sigma ^2, \sigma _0^2\). Defining
\(w=\frac{\sigma ^2}{\sigma _0^2 + \sigma ^2}\) and choosing the following priors for the parameters:
After integrating our
\(\mu _0\) and
w, this can be simplified to the indefinite integral below. We urge the readers to read Reference [
7] for the full derivation.
where
Similarly, after integrating out the change probability p, the prior cohesion density thus can be written as
To calculate the posterior distribution over partitions, we use
Markov Chain Monte Carlo (MCMC) [
17]. We define a Markov chain with the following transition rule: With probability
\(p_i\), a new change point at the location
i is introduced. Here,
\(B_1, W_1\) and
\(B_0, W_0\) refer to the expressions in Equation (12) with and without the change point in location i.
We use the package
bcp [
13] in R to implement our change point analysis.