PATTERNS OF URBAN FOOT TRAFFIC DYNAMICS
GREGORY DOBLER†,1234, JORDAN VANI4, TRANG TRAN LINH DAM4
October 4, 2019
ABSTRACT
Using publicly available traffic camera data in New York City, we quantify timedependent patterns in aggregate pedestrian foot traffic. These patterns exhibit
repeatable diurnal behaviors that differ for weekdays and weekends but are broadly
consistent across neighborhoods in the borough of Manhattan. Weekday patterns
contain a characteristic 3-peak structure with increased foot traffic around 9:00am,
12:00-1:00pm, and 5:00pm aligned with the “9-to-5” work day in which pedestrians
are on the street during their morning commute, during lunch hour, and then during
their evening commute. Weekend days do not show a peaked structure, but rather
increase steadily until sunset. Our study period of June 28, 2017 to September 11, 2017
contains two holidays, the 4th of July and Labor Day, and their foot traffic patterns are
quantitatively similar to weekend days despite the fact that they fell on weekdays.
Projecting all days in our study period onto the weekday/weekend phase space (by
regressing against the average weekday and weekend day) we find that Friday foot
traffic can be represented as a mixture of both the 3-peak weekday structure and nonpeaked weekend structure. We also show that anomalies in the foot traffic patterns can
be used for detection of events and network-level disruptions. Finally, we show that
clustering of foot traffic time series generates associations between cameras that are
spatially aligned with Manhattan neighborhood boundaries indicating that foot traffic
dynamics encode information about neighborhood character.
1. INTRODUCTION
With the increasing urbanization of global
populations, the understanding of cities as
dynamical systems is becoming increasingly
important for future planning and sustainable
growth of urban environments. And while the
impact that built infrastructures have on
resource consumption and the local to global
environment is dependent on technological
advancement of engineered systems, it is the
dynamical behavioral patterns of the
underlying urban populations that serve as the
primary driver for city functioning. These
patterns of human dynamics in cities show
characteristic macro-/micro-scale behavior [1]
and are seen across many sectors including
energy consumption [2] and transportation [3].
The macro-scale, aggregate behavior in
particular has been shown to exhibit diurnal,
weekly, monthly, and seasonal patterns that
manifest in a variety of ways including lighting
variability [4], taxi pick-ups and drop-offs [5],
social media check-ins [6], and public WiFi
pings [7]. The latter represents an example of
tracers of pedestrian foot traffic and human
mobility [8-10], a critical aspect of
†
gdobler@udel.edu
Biden School of Public Policy and Administration, University of Delaware
2
Department of Physics and Astronomy, University of Delaware
3
Data Science Institute, University of Delaware
4
Center for Urban Science and Progress, New York University
1
1
understanding pedestrian use of – and flow
through – public spaces [11-17].
Human mobility in cities has been studied in a
variety of contexts that includes origindestination mapping [18,19], transportation
infrastructure use [20], and pedestrian
behavior [21]. This pedestrian behavior, in
turn, informs key operational and quality of
life indicators in cities including public safety
[22,23] and public health [24,25], and serves as
an input for models of urban planning relating
to the lived experience of cities [26-29].
Methods for assessing pedestrian foot traffic in
cities are numerous and include the
aforementioned WiFi pings [7,30,31] as well
as Bluetooth activity [30,31], thermal and laser
sensing [32], and video-based methods [3336]. While each of these has advantages and
disadvantages with respect to accuracy and
bias, video/imaging methods in which humans
are detected in single or sequences of images
have the most potential for reasonably accurate
counting and trajectory determination [37-39]
with a minimum number of deployed sensors.
The disadvantages of imaging-based methods
include significantly increased computational
complexity to detect a pedestrian; substantially
higher data rates; and privacy, ethical, and
legal considerations of deployed imaging
systems in public spaces [40,41] that can
potentially identify individuals via facial [42]
or gait [43] recognition5. Nevertheless, with
appropriately trained and constructed
pedestrian detectors as well as appropriately
scoped
privacy
protections
[47,48],
aggregation of pedestrian detection in streetlevel imaging has the potential to critically
inform a variety of urban disciplines including
planning, transportation, and emergency
management.
In this paper, we will demonstrate that using a
trained classifier on streams of images from a
5
Though WiFi and Bluetooth sensing that capture
mac addresses is now also collecting Personally
Identifiable Information [44] under recently
network of traffic cameras in New York City
(NYC) can be used to measure characteristic
dynamical patterns of foot traffic in dense
urban environments, providing city-scale
information about public space usage. In §2
we describe the data and methodology used to
implement the pedestrian detector, in §3 we
present the observed aggregate patterns of
activity and several anomalous cases, and in §4
we conclude with a discussion of the broader
implications and ancillary urban science that is
accessible from these data.
2. DATA AND METHODOLOGY
2.1 NYC Department of Transportation Traffic
Cameras
The Department of Transportation in the city
of New York (NYCDOT) maintains a network
of traffic cameras, that are spread throughout
the city. Although all five boroughs have at
least one traffic camera, the primary
concentration is on the island of Manhattan.
Figure 1 shows the spatial locations of the
camera network which consists of ~700
cameras for which data is publicly available
[49]. Data from the traffic cameras is streamed
at a rate of roughly 1 frame per second with a
pixel resolution of 240x352. While this frame
rate is largely insufficient for tracking or
trajectory analysis (e.g., [50-52], but see [53]),
it does allow for the instantaneous counting of
vehicles and pedestrians within the field of
view. Several examples of traffic cameras that
are suitable for pedestrian counting as well as
ones that are not are shown in Figure 1.
To generate the results presented in §3, we
created a pipeline for data scraping and
analysis that consisted of downloading a single
image from a camera, counting pedestrians in
that image (see §2.2), discarding the image,
and moving on to the next camera. This
enacted Privacy Laws in California, USA [45] and
the European Union [46].
2
Figure 1 – Top left: the geographic locations of the NYC Department of Transportation public traffic
cameras. The cameras have coverage throughout all 5 boroughs but are concentrated in Manhattan.
Top right: example images showing cameras both with and without views of pedestrian walkways.
The former are the focus of analysis for this paper. Bottom row: an example image and zoomed in
views of the pedestrians in that image demonstrating that the resolution of the traffic camera feeds is
extremely low and does not contain personally identifiable information.
pipeline was run continuously with a
characteristic return time to the first camera of
~80s, implying that our sampling rate per
camera is typically > 1 minute and that the
counts are offset in time from one camera to
the next. The data presented in this paper was
collected and analyzed between June 28, 2017
and September 11, 2017. Using publicly
available weather data, we exclude days that
have >0.05 inches of precipitation. In total, our
sample contains 31 weekdays (two of which
are the 4th of July and Labor Day holidays) and
9 weekend days.
Details of the accuracy of the pedestrian counts
is given below, however we stress that privacy
protections are built into several key aspects of
this work. First, the spatial resolution of the
publicly available imaging is extremely low
and does not allow for the extraction of
personally identifiable information (PII) [44]
including facial or gait recognition, nor is the
resolution sufficient for license plate
identification.
A zoomed-in example of
several images of pedestrians is shown in
Figure 1. Second, our pedestrian detection
model does not use full resolution feature maps
as described in [54] but rather has multiple
instances of down-sampling in the detection
pipeline. In addition, our detector does not rely
on features that contain PII or that can be used
to match one detection to another. Third, we
do not store all downloaded data while running
3
our pedestrian counting pipeline. Roughly
every 1 out of 1000 images is stored for
consistency and accuracy checks, the rest are
immediately deleted following application of
the pedestrian detector.
Lastly, we are
presenting results from data collected >2 years
in the past, eliminating the possibility that an
individual’s recent location is in the data set.
We note also that, all of the results presented
in this paper are aggregated in time.
2.2 Human Detection in Video
The field of pedestrian detection in both still
images and video sequences of frames has both
a long established history as well as breadth in
methodology. A complete overview of that
history is beyond the scope of this text (see
[36] and references therein), but efforts to
detect humans in video frames can be broadly
separated into either hand engineered feature
extraction plus a learned classifier or a deep
convolutional network (CNN) model for
automatic
feature
identification
and
classification. The most successful hand
engineered feature-based models use the
Histogram of Oriented Gradients (HOG) [55]
with a robust classifier [56], however most
modern pedestrian detection models are based
on features learned through CNNs.
A significant advancement in the field of
object
detection,
localization,
and
classification came through the development
of Region-CNN (R-CNN) [57] which hand
engineered features to identify regions of
interest, merged regions according to their
overlap, and finally implemented a CNN
classifier on the regions identified. This model
was subsequently improved in two iterations:
Fast R-CNN [58] and Faster R-CNN [59]. The
latter implements a full end-to-end neural
network for region identification and
classification, with training data that consists
of bounding-boxes and labels for objects
within images. This Faster R-CNN model was
used in the data analysis pipeline for this paper
as it was the state-of-the-art at the time that our
data processing pipeline was deployed. Since
the development of Faster R-CNN in 2015,
there have been many further developments of
models both similar [60] and dissimilar to
Faster R-CNN, including those which use a
combination of motion and still features
[39,61], parts-based detectors [62], high
dimensional features [63] and feature cascades
[64], and recent models that use Faster R-CNN
to extract high resolution features but replace
the CNN classifier with tree-based methods
[54].
2.3 Model Training and Performance
Our initial training/testing set was created from
3,918 daytime images that were pulled from 17
cameras on April 30th, 2016 and June 19th,
2016. The images were labelled by hand for
positive pedestrian examples and negative
examples using bounding boxes (BBs) with a
constant aspect ratio of 3:4. These labels were
not exhaustive (i.e., not all pedestrians were
labelled). Across the 3,918 images, 16,022
positive and 41,449 negative examples were
labelled yielding approximately a 2:5
(pos:neg) ratio. The enhanced number of
negative examples was necessary for training
in order to capture the complexity of the urban
background. A Faster R-CNN model was
trained on this data using the VGG16 network
structure, a learning rate of 0.0005, a Region
Proposal Network (RPN) batch size of 256, an
RPN positive overlap of 0.7, and a minimum
RPN size of 2x2 (smaller than typical given
our very low resolution images). The network
was trained for a total of 90,000 iterations on a
GeForce GTX 1080 Ti GPU.
Model performance was assessed using a test
set from the original labeled data by feeding
the test set through the network and comparing
centroids of labeled BBs with the BBs
produced by the model through a pairwise
elimination process of labels (points) to
detections (boxes). Negative label centroids
found outside all detection BBs were counted
as a true negatives, negative label centroids
4
Figure 2 – A comparison between the hand-counted number of pedestrians and the output of the
detector (left panels) for several different cameras (right panels) at different times. Boosting the
precision of our models to ensure few false positives results in a systematic undercounting. However,
the figure shows that the true number of pedestrians scales linearly with the detected number for a
given camera, thus allowing us to compare relative amplitudes of the number of pedestrians as a
function of time as shown in Figure 3.
within a detection BB were counted as false
positives, positive label centroids found within
a detection BB were counted as true positives,
and positive label centroids not found within
any detection BB were counted as false
negatives.
This process was performed
iteratively through labeled centroids and
detection BBs were removed when associated
with a given true positive. A separate
validation set was created using 286 randomly
selected daytime and nighttime images from 3
distinct cameras. This validation set was then
exhaustively labeled for pedestrians and a
similar process of pairwise elimination was
5
Figure 3 – The standardized number of pedestrians detected in a given camera (rows) in Manhattan
as a function of time of day from 3:00am to 9:00pm, averaged over all weekdays (left) and weekends
(right) in our study period of June 28, 2017 to September 11, 2017. Weekday foot traffic dynamics
display a 3-peak behavior that is strongly aligned with the “9-to-5” workday in which the peaks
correspond to commuting to work, exiting buildings during the lunch hour, then commuting from
work. The weekend dynamics do not show a peaked structure, but rather a steady increase of
pedestrian counts until night time.
used to count true and false positives and
negatives. The final precision and recall for
our model was found from this validation set
to be 92% ± 1.7% (precision) and 43 ± 1.9%
(recall) where the uncertainties represent a
95% confidence interval found via 14-fold
bootstrap resampling without replacement.
While the precision of the model is reasonably
high, the relatively low recall indicates that, in
a given image, we are undercounting the
number of pedestrians. However, as we will
show in §3, our primary interest is in
measuring trends and dynamics and so
absolute numbers are less important than
relative amplitudes over time. In that case, it
suffices to show that the true number of
pedestrians in the field of view of an image
from a given camera scales linearly with the
detected number of pedestrians in that
camera’s image. Figure 2 demonstrates that
this is indeed the case. For the three cameras
shown, we counted the number of pedestrians
in the field of view and find that this number
scales linearly with the detected number,
though the coefficient of correlation varies
from camera to camera due to both scene and
viewing angle variability.
3. RESULTS
Throughout the rest of this paper, we will
restrict our results to the borough of Manhattan
due to both the high density of NYCDOT
cameras in that borough and the relatively
large number of pedestrians at a given camera
location. The latter ensures that we have
sufficient statistics to identify the patterns
presented below. We also bin the number of
6
Figure 4 – The same as Figure 3, but averaged over cameras in Manhattan with the width of the line
corresponding to the error on the mean at a given time of day. Our study period included two holidays
that fell on a weekday, the 4th of July and Labor Day. Averaging over Manhattan cameras for those
days shows that holiday foot traffic dynamics display weekend as opposed to weekday behavior.
detections to 15 minute intervals for each
camera to reduce noise prior to standardization
of the time series or averaging across days or
cameras.
3.1 Foot Traffic Dynamics
Figure 3 shows the diurnal variability of
pedestrian counts for each camera in
Manhattan averaged over both weekdays and
weekends during our sample period. Each row
represents a given camera location and the total
counts have been standardized after averaging
across days. Two patterns in the foot traffic
dynamics are clearly apparent: weekdays
exhibit a characteristic 3-peak structure while
weekends show a steady increase throughout
the day. Our interpretation of the 3-peak
weekday behavior is that the cameras are
detecting morning rush as pedestrians walk
through the streets to work, a “lunch hour”
bump during which pedestrians temporarily
exit buildings out into the streets, and an
afternoon rush hour as pedestrians leave work
for home. That this 3-peak structure is so
characteristic to weekdays compared to
weekends would indicate that it is tightly
related to work schedules.
To test this hypothesis, we show the average
weekday and weekend across all cameras in
Manhattan in Figure 4 as well as the average
holiday. Over our sample period, there were
two such holidays, July 4th and Labor Day,
which fell on a Tuesday and Monday
respectively in 2017. These holidays (in which
many businesses are closed) do not exhibit the
3-peak structure and align much more closely
with the no-peak weekend structure, providing
evidence that the 3-peak structure is closely
tied to workforce behavior [3,6,7].
However, not all weekdays are identical. As
we show in Figure 5, averaging over cameras
in Manhattan for each day of the week shows
that the transition from 3-peak to no-peak
patterns is gradual as the week progresses.
Monday through Wednesday shows clear 3peak dynamical behavior, and the relative
height of the middle bump becomes less
prominent in the transition towards the
weekend as the structure begins to break down
by Friday. It is important to note that the
7
Figure 5 – The standardized pedestrian foot traffic averaged over cameras in Manhattan during our
sample period as a function of day of the week. Red vertical lines are shown at 9:00am and 5:00pm
and the gray band covers the 12:00pm-1:00pm lunch hour. The 3-peak weekday pattern is strongest
early in the week, but the peaks decrease in amplitude towards the end of the week as the pattern
transitions to weekend behavior. Friday represents a mixture of weekday and weekend foot traffic
patterns.
majority of our sample period is during
summer days and so it may be that many
workers have curtailed Friday schedules, or
that there is increased tourist activity on
Fridays, which may lead to Fridays exhibiting
characteristically “mixed” weekday/weekend
behavior. This is underscored in Figure 6
where we have regressed each full day of our
sample against the average weekday (Friday
inclusive) and weekend day:
Di = ki K + ei E
(1)
where Di is the time series for a given day i,
and K = ∑i Ki / NK and E = ∑j Ej / NE are the
average weekday and weekend days
respectively. That is, each time series has an
associated
projection
onto
the
weekday/weekend phase space (ki,ei). Figure
6 shows that, as the week progresses, the time
series of urban foot traffic moves through this
phase space with Fridays representing a
mixture of weekday and weekend behavior.
As described above, the two holidays (4th of
July and Labor Day) have high “weekend”
coefficients and low “weekday” coefficients
despite both falling on a weekday.
Interestingly, July 3rd, which fell on a
Monday, shows a mix of weekday/weekend
behavior indicating that many urban
inhabitants did not go to work on the Monday
before the 4th of July and/or that there was
increased tourist activity on that day.
Lastly, Figure 6 also demonstrates the potential
for Di to encode neighborhood character.
Through straightforward K-means clustering
of the weekday foot traffic time series (with 4
clusters that show different relative heights and
positioning of the 3 peaks), we find that
groupings of Manhattan foot traffic dynamics
characteristically trace zonal boundaries in the
borough: the Financial District (cluster 2),
Greenwich Village and the Upper East/West
Side (cluster 3), Midtown (clusters 0 and 1),
and Harlem (cluster 0). This is perhaps not too
surprising given the likelihood that the
structures shown in Figures 3-5 relate to
workforce and tourist behavior which are
strong classifiers of these Manhattan
neighborhoods.
3.2 Anomaly and Event Identification
The dynamical patterns of foot traffic
described above have sufficient structure to
identify anomalous outliers and events specific
to a given camera location. In Figure 7 we
show each full weekday (weekend day) in our
sample period with the average weekday
(weekend day) removed. All time series are
averaged over Manhattan cameras. This
8
Figure 6 – Top: a projection of each day in our study period onto weekday/weekend phase space in
which each day is regressed against the average weekday and weekend foot traffic pattern. The size
of each point is inversely proportional to the amount of rainfall on that day. The trajectory of
weekdays through phase space demonstrates that Fridays represent a mixture of weekday and weekend
behavior while holidays (the 4th of July and Labor Day) are well represented by weekend behavior.
Bottom: Unsupervised clustering of the weekday time series for each camera into 4 clusters results in
associations that spatially align with neighborhood boundaries indicating that foot traffic dynamics
encode neighborhood character.
residual shows strong anomalies on the
holidays of the 4th of July and Labor Day, as
well as a network-wide camera outage on July
27th between 2:15pm and 2:45pm. In addition,
the figure demonstrates that our detection
accuracy is affected by illumination with clear
patterns of under-detection that correlate with
sunset (gray line in the figure), though it is
important to note that the earliest time that
sunset may be affecting our determinations is
9
Figure 7 – The residual of each day in our sample period (averaged over cameras in Manhattan) minus
the average weekday (top) or weekend (bottom). The sum of squared residuals for each day is shown
in the right panel. The gray lines represent sunrise/sunset time and these residuals show that our
pedestrian detector is affected by lighting after sunset, however this is well after the third peak in the
weekdays shown in Figures 3-5. These residuals do show the anomalous behavior of holidays as well
as a system-wide camera outage for ~30 min on July 26th, 2017.
7:00pm (towards the end of our study period),
significantly later in the day than the observed
3-peak structure, ensuring that the third peak in
the weekday behavior is robust to variation in
sunset time.
Beyond network-wide anomalies, for densely
populated areas, we have sufficient signal-tonoise to detect anomalous (aggregate)
behavioral dynamics at individual locations.
We show two examples in Figure 8. In the
first, the relative number of detected
pedestrians between the hours of 3:00pm and
9:00pm is larger on the 4th of July (a holiday)
than on weekend days for a camera located at
3rd Avenue and 42nd Street. We attribute this
difference to the close vicinity of a viewpoint
for the 4th of July fireworks near the East River
Water Front, suggesting that this methodology
can be used for onset of crowd detection. The
second example is from a location near the
Staten Island station in lower Manhattan on a
weekday.
The time series shows the
characteristic
3-peak
behavior,
but
superimposed on that are sharp spikes in the
pedestrian counts at quasi-periodic intervals.
The figure shows that these sharp features are
tightly aligned with the ferry schedule
indicating the utility of this method as a means
of indirectly estimating transportation
ridership.
10
Figure 8 – Examples of event detection in the time series of foot traffic for individual traffic cameras.
Top: The weekdays (gray), weekend days (purple), and 4th of July (red) for the camera located at 3rd
Avenue and 42 Street in Manhattan. As shown in Figure 7, the 4th of July is characteristically more
similar to weekend than weekday behavior, but for this camera we find that the amount of pedestrian
foot traffic increases significantly between the hours of 4:00pm and 8:00pm. We attribute this to the
fact that this camera is near a viewpoint for the NYC 4th of July fireworks display and so this increase
represents a gathering crowd of spectators. Bottom: For a camera near the exit of the Staten Island
Ferry station in lower Manhattan, shows not only the 3-peak weekday pattern, but also sharp spikes
in the number of pedestrians on the street that are tightly aligned with the ferry schedule (vertical gray
lines), suggesting that this methodology can be used to indirectly estimate public transportation
ridership.
4. CONCLUSIONS
We have presented a direct measurement of
patterns in the dynamics of foot traffic in dense
urban areas via the application of object
detection methods to a network of low
resolution traffic cameras in New York City.
For a given camera, we find that the counts
returned by our detector scale linearly with the
actual number of pedestrians in the field of
view allowing us to determine scaled trends in
foot traffic activity for a given location. We
have identified distinctly different foot traffic
behavior on weekends vs weekdays when
averaging across all cameras in the borough of
Manhattan.
Weekdays characteristically
exhibiting a 3-peak structure, the peaks of
which align with the onset of the “9-to-5”
workday, the lunch hour, and the end of the
workday. Weekend behavior typically does
not show any such peaks but rather, foot traffic
gradually increases throughout the day. Foot
traffic behavior on holidays that fall on
weekdays (the 4th of July and Labor Day in our
June 28, 2017 to September 11, 2017 study
period) traces weekend as opposed to weekday
behavior. When aggregating across Manhattan
11
and across the study period, we find that the 3peak structure correlated with the work day is
strongest early in the week, but Fridays display
a mixture of weekday and weekend behavior.
This is further demonstrated by tracing the
trajectory of all study days through
weekday/weekend phase space in which each
day is projected onto dimensions representing
the characteristic weekday and weekend
temporal behavior. Finally, we have shown
that there is evidence that unsupervised
clustering of the foot traffic time series results
in clusters that are spatially distinct and trace
the boundaries of neighborhoods in
Manhattan, indicating that foot traffic
dynamics encode elements of neighborhood
character.
Deviations from the established patterns of
activity can be used to detect both system-wide
anomalies and individual events. Examples of
the former include weekday holidays that are
clear outliers in weekend/weekday phase space
as well as system-wide camera outages that
present as simultaneous non-detection of
pedestrians across all cameras. We have
shown examples of event detection that
include transitions to crowd behavior for
fireworks displays on the 4th of July and
recurrent bursts of detections near public
transportation (PT) points that temporally
align with PT schedules, corresponding to
riders entering/exiting PT access points.
It is important to note that we have taken
numerous steps to ensure the privacy of
pedestrians in public within the field of view
of the traffic cameras. First, the imaging data
that was analyzed for this work was of
sufficiently low resolution that it contained no
personally identifiable information (PII) such
as license plates or faces. Second, the analysis
was done in near-real time with data discarded
immediately after pedestrian counts were
performed so that continuous data streams
have not been stored. Third, the pedestrian
detections are completely anonymous and
there is no element of the analysis that attempts
to either match a detection across multiple
cameras or across the same camera in time
(e.g., tracking was not performed). Finally, we
are presenting the results of this study with a
two year delay.
The measurements of city-scale foot traffic
dynamics presented here have potential
impacts for a wide range of urban operations
including emergency management, situational
awareness, transportation efficiency, planning
of built environment interventions, and
connectedness of urban communities. The
scaling and deviation of these patterns across
the city and correlations of those with
characteristics of land use and neighborhood
characteristics will be the subject of future
work.
Acknowledgements
We thank Denis Khryashchev, Priya Kohker,
Alec McLean, Richard Nam, and Bilguun
Turboli for assistance generating the training
set. GD, JV, and TTLD were supported by a
Complex Systems Scholar Award from the
James S. McDonnell Foundation.
References
1. Batty, M., 2008. Fifty years of urban modeling: Macro-statics to micro-dynamics. In The dynamics of
complex urban systems (pp. 1-20). Physica-Verlag HD.
2. Huebner, G., Shipworth, D., Hamilton, I., Chalabi, Z. and Oreszczyn, T., 2016. Understanding
electricity consumption: A comparative contribution of building factors, socio-demographics,
appliances, behaviours and attitudes. Applied energy, 177, pp.692-702.
12
3. Faghih-Imani, A., Hampshire, R., Marla, L. and Eluru, N., 2017. An empirical analysis of bike sharing
usage and rebalancing: Evidence from Barcelona and Seville. Transportation Research Part A: Policy
and Practice, 97, pp.177-191.
4. Dobler, G., Ghandehari, M., Koonin, S.E., Nazari, R., Patrinos, A., Sharma, M.S., Tafvizi, A., Vo, H.T.
and Wurtele, J.S., 2015. Dynamics of the urban lightscape. Information Systems, 54, pp.115-126.
5. Ferreira, N., Poco, J., Vo, H.T., Freire, J. and Silva, C.T., 2013. Visual exploration of big spatiotemporal urban data: A study of new york city taxi trips. IEEE Transactions on Visualization and
Computer Graphics, 19(12), pp.2149-2158.
6. Liu, Y., Liu, X., Gao, S., Gong, L., Kang, C., Zhi, Y., Chi, G. and Shi, L., 2015. Social sensing: A new
approach to understanding our socioeconomic environments. Annals of the Association of American
Geographers, 105(3), pp.512-530.
7. Traunmueller, M.W., Johnson, N., Malik, A. and Kontokosta, C.E., 2018. Digital footprints: Using
WiFi probe and locational data to analyze human mobility trajectories in cities. Computers,
Environment and Urban Systems, 72, pp.4-12.
8. Gonzalez, M.C., Hidalgo, C.A. and Barabasi, A.L., 2008. Understanding individual human mobility
patterns. nature, 453(7196), p.779.
9. Song, C., Qu, Z., Blumm, N. and Barabási, A.L., 2010. Limits of predictability in human mobility.
Science, 327(5968), pp.1018-1021.
10. Kang, C., Ma, X., Tong, D. and Liu, Y., 2012. Intra-urban human mobility patterns: An urban
morphology perspective. Physica A: Statistical Mechanics and its Applications, 391(4), pp.1702-1717.
11. Moudon, A.V., Hess, P.M., Snyder, M.C. and Stanilov, K., 1997. Effects of site design on pedestrian
travel in mixed-use, medium-density environments. Transportation Research Record, 1578(1), pp.4855.
12. Khan, F.M., Jawaid, M., Chotani, H. and Luby, S., 1999. Pedestrian environment and behavior in
Karachi, Pakistan. Accident Analysis & Prevention, 31(4), pp.335-339.
13. Ewing, R., Hajrasouliha, A., Neckerman, K.M., Purciel-Hill, M. and Greene, W., 2016. Streetscape
features related to pedestrian activity. Journal of Planning Education and Research, 36(1), pp.5-15.
14. Ridel, D., Rehder, E., Lauer, M., Stiller, C. and Wolf, D., 2018, November. A literature review on the
prediction of pedestrian behavior in urban scenarios. In 2018 21st International Conference on
Intelligent Transportation Systems (ITSC) (pp. 3105-3112). IEEE.
15. Lai, Y. and Kontokosta, C.E., 2018. Quantifying place: Analyzing the drivers of pedestrian activity in
dense urban environments. Landscape and Urban Planning, 180, pp.166-178.
16. Leinberger, C.B. and Rodriguez, M., 2016. Foot Traffic Ahead: Ranking Walkable Urbanism in
America's Largest Metros 2016.
17. Schadschneider, A. and Seyfried, A., 2011. Empirical results for pedestrian dynamics and their
implications for modeling. Networks & heterogeneous media, 6(3), pp.545-560.
18. Xia, F., Wang, J., Kong, X., Wang, Z., Li, J. and Liu, C., 2018. Exploring human mobility patterns in
urban scenarios: A trajectory data perspective. IEEE Communications Magazine, 56(3), pp.142-149.
19. Alexander, L., Jiang, S., Murga, M. and González, M.C., 2015. Origin–destination trips by purpose and
time of day inferred from mobile phone data. Transportation research part c: emerging technologies,
58, pp.240-250.
13
20. Hasan, S., Schneider, C.M., Ukkusuri, S.V. and González, M.C., 2013. Spatiotemporal patterns of
urban human mobility. Journal of Statistical Physics, 151(1-2), pp.304-318.
21. Phithakkitnukoon, S., Horanont, T., Di Lorenzo, G., Shibasaki, R. and Ratti, C., 2010, August. Activityaware map: Identifying human daily activity pattern using mobile phone data. In International
Workshop on Human Behavior Understanding (pp. 14-25). Springer, Berlin, Heidelberg.
22. Nikolopoulou, M., Martin, K. and Dalton, B., 2016. Shaping pedestrian movement through playful
interventions in security planning: what do field surveys suggest?. Journal of Urban Design, 21(1),
pp.84-104.
23. Schuck, A.M., 2017. Prevalence and predictors of surveillance cameras in law enforcement: the
importance of stakeholders and community factors. Criminal justice policy review, 28(1), pp.41-60.
24. Handy, S.L., Boarnet, M.G., Ewing, R. and Killingsworth, R.E., 2002. How the built environment
affects physical activity: views from urban planning. American journal of preventive medicine, 23(2),
pp.64-73.
25. Heath, G.W., Brownson, R.C., Kruger, J., Miles, R., Powell, K.E. and Ramsey, L.T., 2006. The
effectiveness of urban design and land use and transport policies and practices to increase physical
activity: a systematic review. Journal of physical activity and health, 3(s1), pp.S55-S76.
26. Mehta, V., 2007. Lively streets: Determining environmental characteristics to support social behavior.
Journal of planning education and research, 27(2), pp.165-187.
27. Mondada, L., 2009. Emergent focused interactions in public places: A systematic analysis of the
multimodal achievement of a common interactional space. Journal of Pragmatics, 41(10), pp.19771997.
28. Mehta, V. and Bosson, J.K., 2018. Revisiting lively streets: Social interactions in public space. Journal
of Planning Education and Research, p.0739456X18781453.
29. Roberts, H., McEachan, R., Margary, T., Conner, M. and Kellar, I., 2018. Identifying effective behavior
change techniques in built environment interventions to increase use of green space: a systematic
review. Environment and Behavior, 50(1), pp.28-55.
30. Abedi, N., Bhaskar, A. and Chung, E., 2013. Bluetooth and Wi-Fi MAC address based crowd data
collection and monitoring: benefits, challenges and enhancement.
31. Kurkcu, A. and Ozbay, K., 2017. Estimating pedestrian densities, wait times, and flows with wi-fi and
bluetooth sensors. Transportation Research Record, 2644(1), pp.72-82.
32. Wang, X., Liono, J., Mcintosh, W. and Salim, F.D., 2017, November. Predicting the city foot traffic
with pedestrian sensor data. In Proceedings of the 14th EAI International Conference on Mobile and
Ubiquitous Systems: Computing, Networking and Services (pp. 1-10). ACM.
33. Dollar, P., Wojek, C., Schiele, B. and Perona, P., 2011. Pedestrian detection: An evaluation of the state
of the art. IEEE transactions on pattern analysis and machine intelligence, 34(4), pp.743-761.
34. Torabi, A., Massé, G. and Bilodeau, G.A., 2012. An iterative integrated framework for thermal–visible
image registration, sensor fusion, and people tracking for video surveillance applications. Computer
Vision and Image Understanding, 116(2), pp.210-221.
35. Kristoffersen, M., Dueholm, J., Gade, R. and Moeslund, T., 2016. Pedestrian counting with occlusion
handling using stereo thermal cameras. Sensors, 16(1), p.62.
14
36. Zhao, Z.Q., Zheng, P., Xu, S.T. and Wu, X., 2019. Object detection with deep learning: A review. IEEE
transactions on neural networks and learning systems.
37. Biliotti, D., Antonini, G. and Thiran, J.P., 2005, January. Multi-layer hierarchical clustering of
pedestrian trajectories for automatic counting of people in video sequences. In 2005 Seventh IEEE
Workshops on Applications of Computer Vision (WACV/MOTION'05)-Volume 1 (Vol. 2, pp. 50-57).
IEEE.
38. Antonini, G. and Thiran, J.P., 2006. Counting pedestrians in video sequences using trajectory
clustering. IEEE Transactions on Circuits and Systems for Video Technology, 16(8), pp.1008-1020.
39. Xu, Y., Piao, Z. and Gao, S., 2018. Encoding crowd interaction with deep neural network for pedestrian
trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 5275-5284).
40. Adams, A.A. and Ferryman, J.M., 2015. The future of video analytics for surveillance and its ethical
implications. Security Journal, 28(3), pp.272-289.
41. Wright, D., Friedewald, M., Gutwirth, S., Langheinrich, M., Mordini, E., Bellanova, R., De Hert, P.,
Wadhwa, K. and Bigo, D., 2010. Sorting out smart surveillance. Computer Law & Security Review,
26(4), pp.343-354.
42. Denyer, S., 2018. Beijing bets on facial recognition in a big drive for total surveillance. The Washington
Post January, 7.
43. Lee, L. and Grimson, W.E.L., 2002, May. Gait analysis for recognition and classification. In
Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition (pp. 155162). IEEE.
44. McCallister, E., 2010. Guide to protecting the confidentiality of personally identifiable information.
Diane Publishing.
45. California Consumer Privacy Act of 2018, 2018. Assembly Bill No. 375, Chapter 55.
46. Tikkinen-Piri, C., Rohunen, A. and Markkula, J., 2018. EU General Data Protection Regulation:
Changes and implications for personal data collecting companies. Computer Law & Security Review,
34(1), pp.134-153.
47. Chinomi, K., Nitta, N., Ito, Y. and Babaguchi, N., 2008, January. PriSurv: privacy protected video
surveillance system using adaptive visual abstraction. In International Conference on Multimedia
Modeling (pp. 144-154). Springer, Berlin, Heidelberg.
48. Chan, A.B., Liang, Z.S.J. and Vasconcelos, N., 2008, June. Privacy preserving crowd monitoring:
Counting people without people models or tracking. In 2008 IEEE Conference on Computer Vision
and Pattern Recognition (pp. 1-7). IEEE.
49. https://webcams.nyctmc.org, last accessed 2019-10-05.
50. Li, C., Chiang, A., Dobler, G., Wang, Y., Xie, K., Ozbay, K., Ghandehari, M., Zhou, J. and Wang, D.,
2016, August. Robust vehicle tracking for urban traffic videos at intersections. In 2016 13th IEEE
International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 207-213).
IEEE.
51. Li, C., Dobler, G., Feng, X. and Wang, Y., 2019. TrackNet: Simultaneous Object Detection and
Tracking and Its Application in Traffic Video Analysis. arXiv preprint arXiv:1902.01466.
15
52. Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R. and Herrera, F., 2019. Deep
Learning in Video Multi-Object Tracking: A Survey. arXiv preprint arXiv:1907.12740.
53. Wei, P., Shi, H., Yang, J., Qian, J., Ji, Y. and Jiang, X., 2019, September. City-scale vehicle tracking
and traffic flow estimation using low frame-rate traffic cameras. In Proceedings of the 2019 ACM
International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019
ACM International Symposium on Wearable Computers (pp. 602-610). ACM.
54. Zhang, L., Lin, L., Liang, X. and He, K., 2016, October. Is faster r-cnn doing well for pedestrian
detection?. In European conference on computer vision (pp. 443-457). Springer, Cham.
55. Dalal, N. and Triggs, B., 2005, June. Histograms of oriented gradients for human detection.
56. Ohn-Bar, E. and Trivedi, M.M., 2016, December. To boost or not to boost? on the limits of boosted
trees for object detection. In 2016 23rd international conference on pattern recognition (ICPR) (pp.
3350-3355). IEEE.
57. Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014. Rich feature hierarchies for accurate object
detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 580-587).
58. Girshick, R., 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision
(pp. 1440-1448).
59. Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster r-cnn: Towards real-time object detection with
region proposal networks. In Advances in neural information processing systems (pp. 91-99).
60. Zhao, Z.Q., Bian, H., Hu, D., Cheng, W. and Glotin, H., 2017, August. Pedestrian detection based on
fast R-CNN and batch normalization. In International Conference on Intelligent Computing (pp. 735746). Springer, Cham.
61. Shao, J., Kang, K., Change Loy, C. and Wang, X., 2015. Deeply learned attributes for crowded scene
understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp.
4657-4666).
62. Tian, Y., Luo, P., Wang, X. and Tang, X., 2015. Deep learning strong parts for pedestrian detection. In
Proceedings of the IEEE international conference on computer vision (pp. 1904-1912).
63. Zhang, J., Tan, B., Sha, F. and He, L., 2011. Predicting pedestrian counts in crowded scenes with rich
and high-dimensional features. IEEE Transactions on Intelligent Transportation Systems, 12(4),
pp.1037-1046.
64. Cai, Z., Saberian, M. and Vasconcelos, N., 2015. Learning complexity-aware cascades for deep
pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (pp.
3361-3369).
16