Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

The AirSensor Open-Source R-Package and DataViewer Web Application For Interpreting Community Data Collected by Low-Cost Sensor Networks

Uploaded by

jose valeriano
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

The AirSensor Open-Source R-Package and DataViewer Web Application For Interpreting Community Data Collected by Low-Cost Sensor Networks

Uploaded by

jose valeriano
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Environmental Modelling and Software 134 (2020) 104832

Contents lists available at ScienceDirect

Environmental Modelling and Software


journal homepage: http://www.elsevier.com/locate/envsoft

The AirSensor open-source R-package and DataViewer web application for


interpreting community data collected by low-cost sensor networks
Brandon Feenstra a, b, c, *, Ashley Collier-Oxandale a, Vasileios Papapostolou a, David Cocker b, c,
Andrea Polidori a
a
South Coast Air Quality Management District, Air Quality Sensor Performance Evaluation Center (AQ-SPEC), Diamond Bar, CA, 91765, USA
b
University of California - Riverside, Department of Chemical & Environmental Engineering, Riverside, CA, 92521, USA
c
University of California - Riverside, Bourns College of Engineering, Center for Environmental Research and Technology (CE-CERT), Riverside, CA, 92521, USA

A R T I C L E I N F O A B S T R A C T

Keywords: While large-scale low-cost sensor networks are now recording air pollutant concentrations at finer spatial and
Community air monitoring temporal scales than previously measured, the large environmental data sets generated by these sensor networks
Citizen scientist can become overwhelming when considering the scientific skills required to analyze the data and generate
Low-cost air quality sensor
interpretable results. This paper summarizes the development of an open-source R package (AirSensor) and
Open-source R package
interactive web application (DataViewer) designed to address the environmental data science challenges of
Particulate matter PM2.5
Data interpretation visualizing and understanding local air quality conditions with community networks of low-cost air quality
sensors. AirSensor allows users to access historical data, add spatial metadata, and create maps and plots for
viewing community monitoring data. The DataViewer application was developed to incorporate the functionality
and plotting functions of the R package into a user-friendly web experience that would serve as the primary
source for data communication for community-based organizations and citizen scientists.

Software availability 1. Introduction

The AirSensor R-package version 0.5 was developed by Mazama A paradigm shift in air quality monitoring is occurring with citizen
Science and South Coast AQMD. AirSensor is Free and Open Source scientists able to develop hyper-local community monitoring networks
Software available through the GitHub repository [https://github. to supplement the established regulatory monitoring networks that are
com/MazamaScience/AirSensor/tree/version-0.5]. Mazama Science designed for regional monitoring (Snyder et al., 2013). These environ­
maintains the package as part of its ongoing relationships with federal, mental monitoring networks are increasing in complexity, size, and
state and local air quality agencies. AirSensor version 0.5 was first resolution (both spatial and temporal) due to technological advances
released in 2019 under General Public License v3.0 (GPL-3.0) and runs and cost reductions for environmental monitoring hardware, connected
on Windows, Unix, and Macintosh operating systems. AirSensor was Internet of Things (IoT) devices, and cloud computing. Citizen scientists
written in R and program files are less than 5 Mbytes. AirSensor is can take an active role in monitoring air quality at the neighborhood
designed to be used with R (≥ 3.3) and RStudio. level by installing low-cost air quality sensors (LCS) that collect and
The DataViewer Shiny application was developed by Mazama Science report air pollutant data. Particulate matter (PM) is an air pollutant that
and South Coast AQMD. DataViewer is Free and Open Source Software is categorized based on size with fine particulate matter (PM2.5) defined
available through the GitHub repository [https://github.com/Mazam as particles with aerodynamic diameter less than 2.5 μm. The ability to
aScience/AirSensorShiny]. The DataViewer was first released in 2019 record and visualize hyper-local data in an intuitive and informative
under General Public License v3.0 (GPL-3.0) and runs on Windows, interface will likely spawn an increase in interest and interaction with
Unix, and Macintosh operating systems. DataViewer was written in R environmental data sets due to the locally relevant nature of the infor­
and program files are less than 7 Mbytes. The DataViewer requires Git, mation. On the other hand, non-intuitive or limited user interfaces and
Apache, Docker, R, and R Shiny Server. confusing user experiences may discourage citizen scientists from

* Corresponding author. 21865 Copley Dr. Diamond Bar, CA, 91765, USA.
E-mail address: bfeenstra@aqmd.gov (B. Feenstra).

https://doi.org/10.1016/j.envsoft.2020.104832
Accepted 6 August 2020
Available online 25 August 2020
1364-8152/© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

interacting with the collected data. The increasing complexity, size, and currently limited to viewing only the last seven days of data in a time
resolution of today’s environmental monitoring networks have created series plot and current data on the map (accessed January 2020). To gain
big data challenges leading to the emergence of a new field of study: an understanding of the historical local AM data, the user is required to
Environmental Data Science (Gibert et al., 2018). Data science combines download, process, and visualize the data from these networks on their
computer programming skills, math and statistical knowledge, and own, which may be a limiting factor to those without the environmental
subject matter expertise (Conway, 2013). Free Open Source Software data science skills needed to perform such analysis. These sensor-specific
(FOSS) platforms play a vital role in the progress of research towards online resources for viewing sensor data often do not include the regu­
developing new methods for addressing environmental data science latory AM data that may be publicly available through the AirNow or
challenges. The R-environment and Python are two FOSS programing OpenAQ API and often do not indicate what, if any, quality control (QC)
languages that are often used in environmental data science applications measures are taking place on the collected data before displaying
(Kadiyala and Kumar 2017a, 2017b). Open access to environmental data publicly.
sets and related tools is foundational for environmental data science to For community members to understand local air pollution trends, a
thrive and develop. Environmental monitoring data can be considered more in-depth analysis of historical data is required. While map-centric
open access when the data is available through a stable and consistent GUIs work well for viewing real-time data, communities that monitor air
Application Programming Interface (API) that allows software and quality in long-term deployments need additional plotting and viewing
application developers to build applications to display and report that capabilities to access and understand their local historical AM data. A
data in transparent and meaningful ways. data dashboard for viewing and analyzing historical data would provide
Environmental data scientists can access regulatory data via open citizen scientist with a better understanding of local air pollution levels,
API’s (e.g., AirNow API, OpenAQ API) to create custom web applications particularly spatial and temporal air pollution trends. For those with
for displaying air monitoring (AM) data (AirNow, 2020; OpenAQ, varying levels of technical data science programming skills, several
2020). These AM data viewing websites are useful and provide infor­ software resources are available that support individual data analysis of
mation to the public at varying granularity spatially and temporally. air quality data. If data can be organized and loaded into a software
Two examples of data viewing websites include the OpenAQ map and system, then a more in-depth analysis can occur, and custom visuali­
the World’s Air Pollution: Real-time AQI (WAQI) map which both zations can be produced. FOSS software packages have been developed
display international air quality monitoring data (OpenAQ, 2020; World in the R and Python environments specifically for accessing and visu­
Air Quality Index Project, 2020). OpenAQ uses a color scale (Fig. S1 in alizing freely available AM data. These include the R packages openair,
the Supplemental Information (SI)) that deviates from the common Air PWFSLSmoke, ropenaq, and raqdm. OpenAir provides a useful package for
Quality Index (AQI) color scale to display air pollution concentrations. A developing visualizations from collected AM data with functions to
special feature in the WAQI website is their use of calendar plots to create calendar plots, scatter plots, and time variation plots along with
display AM information. Data viewing websites that display modeled or wind roses, pollution roses, and bivariate polar plots if wind speed and
interpolated air pollutant or AQI values are also available (BreezoMeter, direction data is available (Carslaw and Ropkins, 2012; Carslaw and
2020; IQAir, 2020; Plume Labs, 2020). When displaying data from both Beevers, 2013).
regulatory-grade instruments and LCS, the source and type of data dis­ If we use advanced analytical tools and access AM data directly, then
played should be readily apparent. A lack of differentiating and identi­ we can facilitate more organized, robust, systematic, and repeatable
fying data sources may cause confusion for the end-user, especially if the data processing, analysis, and visualization of LCS data. Furthermore,
LCS do not agree with nearby regulatory-grade instrumentation. With using FOSS tools allows for increased iteration and development. An
interpolated or modeled maps, often the user is not readily aware of the example of this workflow would be the PWFSLSmoke R package and the
input parameters used to model air quality data. When viewing modeled associated PM2.5 AM web application developed as part of the AirFire
air pollution information, the viewer should be cautious especially when tools by the U.S. Forest Service (USFS) Wildland Fire Air Quality
data sources are not readily apparent and input parameters, whether Response Program (WFAQRP) (Callahan et al., 2019; Air Fire Tools,
defendable or questionable, for the data model are unknown to the 2020). These tools were developed to access regulatory grade AM data
end-user/viewer (Hagler et al., 2018). Broadly, the available sensor data via the AirNow API and display that data graphically to assist the USFS
viewing platforms are map-centric with point values or interpolated Air Resource Advisors to gather air quality data and create air quality
modeled data displayed with options for viewing recent time series data. reports during wildfire smoke events. The PWFSLSmoke R package
Resources for accessing and displaying data collected from networks provides functions to download, parse, and plot AM data and provides
of LCS are available, though they vary in terms of software (FOSS or the back-end software necessary to generate plots for displaying on the
proprietary), what they provide, and whether they are provided by the front-end web application. A similar model in which an R-package is
manufacturer, a project team, or through a citizen science model. While used for accessing and processing LCS data would save users time and
many sensor manufacturers have software and platforms in place for would allow the development of custom functions for different ap­
ingesting, storing, and analyzing data that is generated from their proaches to QC and more complex historical data analysis, which are
respective sensors, these are often proprietary and offered as a Software gaps we see in the current offerings. Additionally, the R-package could
as a Service (SaaS) or Platform as a Service (PaaS) requiring accounts provide the back-end software to support a front-end web application to
with monthly or annual subscriptions costs. In contrast to the SaaS and display historical AM data to provide communities with more useful
PaaS business model, several sensor resources are available for open- analysis and visualizations of historical data. This web application
access viewing of data collected from LCS networks. These platforms would allow community members to answer questions about their local
include but are not limited to the HabitatMap AirCasting map, Air environment, which are not readily answered with the current offerings
Quality Egg Portal, Luft Daten project map, PurpleAir Map, Smart Citi­ of real-time maps with limited historical data analysis.
zen Kit Map, and the uRADMonitor Network map (Air Quality Egg, The objectives of the software development associated with this
2020; HabitatMap, 2020; Luftdaten, 2020; PurpleAir, 2020; Smart Cit­ project were to build an FOSS R package and data viewer web appli­
izen Kit, 2020; uRADMonitor, 2020). PurpleAir provides open access to cation that would address the challenges identified with the data man­
the data collected by the PurpleAir network of sensors through an API agement and visualization of LCS networks deployed within the U.S.
and provides open viewing and downloading of sensor data through the EPA Science To Achieve Results (STAR) grant project. This paper sum­
PurpleAir map. The Luft Daten project is a citizen science project with marizes the development of an R package and web application designed
LCS reporting to a map and invites programmers to collaborate in this to address the environmental data science challenges created by
FOSS development through GitHub (OK Lab Stuttgart, 2020). When deploying 400+ LCS in 14 different communities. We wanted an open
selecting a sensor in either the PurpleAir or Luftdaten GUI, the user is source R package that would allow users to download sensor data, add

2
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

spatial metadata, perform data fusion with other relevant data sets, and developing packages of functions (The R environment 2019). RStudio, a
create maps and plots for viewing data collected by AM sensors. We also public benefit corporation, provides a FOSS version of an Integrated
wanted the package designed with functions so that minimal coding Development Environment (IDE) for R which supports code execution,
would be required to complete tasks. Understanding that many would debugging, and workspace management (RStudio, 2019; Allaire, 2020).
prefer to interact with an online web application, we wanted to build an Instructions for installing R and RStudio can be found on the web and in
application that would provide an interactive data experience allowing the literature (Kadiyala and Kumar 2017b). The fundamental unit of
users to make selections and explore the community AM data sets by shareable code in R is a package. Packages bundle together R code, data,
generating pre-defined data visuals based on their user input selections. documentation, and tests. Packages are sharable on the Comprehensive
The South Coast Air Quality Management District (South Coast AQMD) R Archive Network (CRAN), which is the public clearing house for R
collaborated with Mazama Science to develop the R package AirSensor packages. CRAN hosts a wide variety of FOSS packages that allow re­
and web application AirSensor DataViewer (DataViewer) to meet these searchers to collaborate and build upon already developed R code. The
software development aims. development of AirSensor built upon R packages available on CRAN;
most notably MazamaSpatialUtils, openair, PWFSLSmoke, and worldmet.
2. Methods (Software design and characteristics) AirSensor is designed to be used with R version ≥3.3. This paper de­
scribes version 0.5 of the AirSensor package which is available on
2.1. Community engagement GitHub. The latest or master branch of AirSensor is also available on
GitHub. The AirSensor package can be installed using the devtools
In 2016, South Coast AQMD was awarded a U.S. EPA STAR grant, package within R using the following code:
titled “Engage, Educate and Empower California Communities on the
Use and Applications of ‘Low-cost’ Air Monitoring Sensors” under
Assistance Agreement No. R836184. South Coast AQMD has engaged 14
California communities through a series of workshops to introduce the
Shiny is a FOSS R package that provides a framework for building
project, provide technical guidance on sensor technology and deploy­
interactive web applications. Shiny allows the user to turn R derived
ment (siting, installation, configuration, and registration) of air quality
analysis and plots into interactive web applications without requiring
sensors, review deployment progress and examine community data sets,
HTML, CSS, or JavaScript programming. Shiny allows for the develop­
and provide software tools and resources for citizen scientists to engage
ment of a web application for viewing and sharing data analytics. Since
with collected data sets and create informative data visualizations.
not all users would be comfortable using the R environment which does
Roughly 400 PurpleAir PA-II sensors (PurpleAir LLC, USA) were
require coding, R Shiny was used to develop the DataViewer web
distributed to community members. The on-going engagement with the
application to provide an interactive data experience for community
STAR Grant Sensor Communities (SGSC) has provided the motivation to
members that would prefer to interact with the sensor data in a web
develop software tools to enhance the community members’ ability to
application rather than in the R programming environment.
interact with historical data and extract meaningful information about
their local environment. Participants were not engaging with the data
that often (as is supported by the survey data, which is most respondents 2.3. AirSensor - R package
only check their air quality data “sometimes” - 36% as opposed to
“often” - 17% and “everyday” - 5%). In person discussions provided Rather than describing each individual function in AirSensor, the
useful context to help us understand this by (1) reporting that data was following examples will showcase the three primary data objects
difficult to access and download (especially. historic data), and (2) available through the package, how to apply quality control measures on
sharing what they wished to do with the data. For example, after dis­ the imported data, and how to generate plots for each of the data objects.
playing a static time of day bar chart showing the diurnal PM2.5 trends A complete guide to AirSensor functions and operations can be found
during a community workshop, one community group leader asked, within the R-environment after the package has been loaded. Helpful R
“How do I generate that plot on a regular basis and share with my vignettes are also available within the package to provide the user with
community members?” In one SGSC, a sensor host wanted to know the code examples for using the AirSensor functions and working with the
best time of day to walk their dog to reduce their exposure to particulate sensor data.
pollution. Additionally, multiple participants from different commu­
nities shared their difficulty downloading and analyzing the publicly 2.3.1. Data access, extraction, and data objects overview
accessible PA-II data especially with regards to the time/date refor­ AirSensor currently accesses data generated by PurpleAir sensors by
matting required for plotting in Microsoft Excel. The survey responses collecting real-time data from www.purpleair.com/json and historical
along with the discussions with community members on the data science data from a ThingSpeak Representational State Transfer (REST) API.
challenges provided the motivation to build additional software tools to Extracted data is enhanced with spatial metadata and transformed into
address the difficulty and challenges posed by analyzing these large efficient data objects for downstream analytics. The three primary data
community AM data sets. Increasing the number of data-sharing events objects are the Purple Air Synoptic (PAS), Purple Air Timeseries (PAT),
with effective data visualizations should provide participants with a and AirSensor (sensor) data objects. Functions exist for creating or
better understanding of the principles of air quality, their local air loading data objects as well as manipulating and visualizing them. An
pollution, and the proper use and application of LCS (Sandhaus et al., overview of the AirSensor R package data access, data objects, and
2019). Table S1 in the SI provides a summary of the environmental data functions is provided in Fig. 1. After installing or loading the package, a
science challenges that are addressed in this project. data archive repository can be set to access archived data. Data archives
can be created that for specific sensor networks (e.g. SGSC) or for a
2.2. Software tools (R environment, RStudio, R packages, and Shiny) specific geographic area (e.g., Southern California) so that the R user can
access and load historical data more efficiently from an archive rather
The R environment is an integrated suite of software facilities that is than the ThingSpeak REST API. The data archives developed for the
designed on a simple yet effective computer programming language, R. SGSC are kept current with cron jobs (cron jobs are time-based jobs that
The R environment provides tools and functions for data processing, can run commands at specific time intervals) that are scheduled to run
storage, calculation, and graphical display. Since R is designed essen­ every hour to pull and add the most recent data to the archive. The data
tially on a computer programming language, users are able to add archive for the SGSC is accessible at http://smoke.mazamascience.com/
further functionality to existing packages by defining new functions and data/PurpleAir and includes historical data starting from October 01,

3
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 1. Flow Chart for data flow and functionality of the AirSensor R package.

2017. A base archive can be set in AirSensor by the following code: 2.3.3. Data fusion enhancements
Data fusion with other relevant data sources provides benefits for
custom analytics, for performing data quality checks, and for providing
information on local weather conditions. Data fusion provides the
2.3.2. Purple Air Synoptic - Data object ability to tell a more complete story about local air pollution by fusing
The Purple Air Synoptic (PAS) data object provides an instantaneous collected sensor data with other publicly available data sets. AirSensor
view of the measured values from a network of sensors. A PAS can be has been integrated with the PWFSLSmoke R package for access to reg­
created from the JSON data available at www.purpleair.com/json or can ulatory AM data via the AirNow API and integrated to the worldmet R
be loaded by accessing a data archive (Fig. 1). At the time of this writing, package for access to the U.S. National Oceanic and Atmospheric
the time resolution of the PA-II sensors is 120 s and therefore a new PAS Administration (NOAA) Integrated Surface Database for meteorological
data object would be available roughly every 120 s. The available data (Callahan et al., 2019; Carslaw, 2019). These data fusion en­
functions for manipulating PAS data object include pas_filter(), pas_fil­ hancements provide the ability to generate comparison plots between a
terArea(), and pas_filterNear(). The PAS data can be plotted on a map to LCS and the nearest regulatory-grade instrument and allow for sensor
display the instantaneous data collected by the sensor network with the data to be joined with nearest meteorological data so that wind roses,
pas_leaflet() and pas_staticMap() functions. Fig. 2 shows a PAS data pollution roses, and bivariate polar plots can be generated to provide
object displayed on an interactive map using the pas_leaflet() function insights into local air pollution trends. Data fusion enhancements are
which maps sensor locations and colors the locations according to AQI. performed on both the PAT and sensor data objects.
The map is interactive in that the user can select an individual sensor
and view the values recorded at that location for the time the PAS object 2.3.4. Purple Air Timeseries (PAT) - Data object and quality control
was created. If a user is interested in loading specific states or air dis­ functions
tricts, the user can apply filters when generating the PAS data object. The PAT timeseries data object provides timeseries data on a per-
The leaflet map can be modified with options for map tiles, parameter sensor basis. Data manipulation functions for the PAT data object
displayed, and what type of sensors to display (i.e. inside or outside include filtering, sampling, and joining. A PAT can be loaded from a data
sensors). Fig. 2 was produced by the following two lines of code: archive using the pat_load() function or can be created from the Pur­
pleAir ThingSpeak API with the pat_createNew() function. The code

4
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 2. Interactive leaflet map created from a PAS data object.

example below loads a PAT data object for a sensor in Seal Beach, CA pat_aggregate() function also returns the t-test statistic (based on an
that was deployed as part of the SGSC deployments. The PAT data object unpaired, two-sample student’s t-test), p-value, and degrees of freedom.
includes data from January 01 to December 31, 2018. Subsequent Several built-in QC algorithms are available in AirSensor and are labeled
example code and plots displaying the AirSensor functions will be per­ as pat_qc, hourly_AB_01, and hourly_AB_02. The pat_qc function allows
formed on this PAT data object or a filtered PAT data object created from the user to perform a first-level QC check for values that are considered
the SCSB_20 sensor. The PAT data object can be loaded into the R “out-of-spec” with regards to the manufacturer defined specifications for
environment and filtered by date with the following code: the acceptable ranges for PM2.5, temperature, and humidity. The Pur­
pleAirQC_hourly_AB_00() function allows the user to perform an hourly
average of the A and B sensor channels when sufficient sub-hourly data
exists for both channels within an hour. The default min-count for sub-
hourly data is set to 20 data points; requiring a data recovery for A and B
channels >66% for the current time-resolution at 120-s. No further QC is
applied with this function. Note that the PA-II’s time resolution has
changed with firmware updates over time. As firmware updates have not
been performed across the board simultaneously for all sensors in the
PurpleAir network, the following dates are estimates for firmware re­
leases and data resolution. Time resolution for data prior to February
2017 is 20 s, from February 2017 to March 2017 is 40 s, from March
2017 to May 2017 is 70 s, from May 2017 to May 2019 is 80 s, and data
PAT data objects can be processed for time averaging, QC algorithms, recorded after May 2019 is 120 s. The function Pur­
and outlier detection for removal or replacement. The user can create pleAirQC_hourly_AB_01 allows the user to perform an hourly average of
their own framework for applying QC functions depending on their the A and B sensor when sufficient sub-hourly data exists and when data
project requirements. The pat_aggregate() function returns a data frame is considered statistically similar. Data is invalidated when (1) minimum
with aggregate statistics which are helpful for building out QC algo­ count < 20 values, (2) when both the means of channels A and B are not
rithms. The aggregate statistics include the mean, median, standard statistically the same (two-sample t-test p-value < 1e− 4) and the mean
deviation, minimum, maximum, and count for the aggregate time period difference between channels A and B is greater than 10 μg/m3, and (3)
chosen. Note that the PA-II sensor node is manufactured with two when the mean difference between A and B is greater than 20 μg/m3 for
identical OEM (original equipment manufacturer) PM sensors (model PM2.5 values less than 100 μg/m3. These conditions assume that the air
PMS 5003, Plantower, China) that report the same types and amounts of entering the channel A and B sensors is the same and therefore the means
data and for reference purposes are labeled as channel A and channel B, of the two channel measurements should be statistically similar. When
respectively. For the paired channel A and B PM2.5 data columns, the measurements from these sensor channels agree, the user can have

5
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

higher confidence in the LCS air quality measurements and the subse­ identified as red asterisks. In this example, a date filter was applied to
quent data averaging of the two -OEM sensors into one value. The two- the pat_example previously generated to only include the June 27 to
sample t-test is a statistical technique to determine whether the differ­ July 8, 2018 time period that would be impacted by a special event: 4th
ence between two means is significant. The default settings of these QC of July fireworks. The outlier detection function appears to identify
checks can be modified to adjust the QC check to individual project many of the one-off high values as outliers but does not consider the
requirements. Additionally, new QC functions can be created and Air­ elevated PM2.5 concentrations due to the fireworks to be outliers. This
Sensor users are encouraged to create their own custom QC functions function allows AirSensor data users to quickly implement an outlier
and submit these functions to be added to the AirSensor package through detection technique and visualize the results of their outlier detection
GitHub. The PurpleAirQC_validationPlot() function creates a series of function. Fig. 4 was produced by the following R-code:
timeseries plots for channel A and B, the difference between channel A
and B, t-test p-value, min count, and the hourly averaged final output
(Fig. 3).
Data visualization functions for the PAT include plotting raw data
The AirSensor pat_outlier() function provides an outlier detection
time series, interactive time series, multiplot time series (A, B, Temp,
function that allows the user to apply a rolling Hampel filter to identify
RH), comparison plot for channel A vs. B, and a comparison plot with
points that may be outliers, and if desired, replace those identified
regard to the nearest regulatory PM2.5 monitor. The channel A and B
outliers with a rolling median value. The Hampel Filter is an outlier
PM2.5 timeseries data can be compared using the pat_interalFit() func­
detection technique that uses the Median Absolute Deviation (MAD). For
tion as shown in Fig. 5. For SCSB_20, the A and B sensors agree with each
each data point, a median and standard deviation are calculated using
other with an R2 > 0.98, slope of 1.05, and an intercept of − 0.8. Since
neighborhood values within a sample window size. If the MAD of a
the two sensors perform similarly for 2018-time frame, the blue times­
single data point is a specified number of standard deviations (threshold
eries points representing the B sensor are plotted over top of the red
minimum) from the median value for the sample window, then the data
points representing the A sensor. The code to generate the plot is:
point is flagged as an outlier. The default values for the pat_outlier()
The pat_scatterplot function provides a multi-panel scatterplot for
function set the sample window = 23 and the threshold minimum = 8.
variables in the PAT data object with an example of the plot shown in
Adjusting the default parameters on the function for identifying outliers
Fig. 6. This plot allows the researcher to determine if there is a lack of
would adjust the number of points detected as outliers. Fig. 4 provides
correlation between the A and B sensor channels or if there are higher
an example of the pat_outlier function with the potential outliers
than expected correlations between PM2.5 concentrations and weather

Fig. 3. Plot generated to visualize the QC_01 algorithm for SCSB_20 located in Seal Beach, CA. A/B separate provides a timeseries plot for channel A and B; A/B
difference provides a timeseries of the mean difference between the A and B channels; t-test p-value provides a timeseries of the p-value statistic between the two
channels; A/B minimum count provides the minimum count of data points in 1-hr for the A and B channels; and hourly_AB_01 provides the 1-hr quality controlled
average data processed with the hourly_AB_01 function.

6
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 4. Plot generated with the rolling Hampel filter identifying potential outliers in red asterisks for SCSB_20 from June 27 to July 08, 2018. (For interpretation of
the references to color in this figure legend, the reader is referred to the Web version of this article.)

Fig. 5. Scatter plot and timeseries rendered using the pat_internalFit function to compare channel A and B within a single PA-II sensor, SCSB_20, for 2018.

conditions (temperature and humidity). This plot also provides the downtime with a noticeable downtime seen in August and September of
timeseries and distribution of data points for PM, temperature and hu­ 2018.
midity. In Fig. 6, the distribution plots for the A and B sensor channels A sensor can also be compared to the nearest regulatory air moni­
indicate PM2.5 concentrations for this sensor are typically less than 25 toring station (AMS) with the pat_externalFit() function (Fig. 7). In this
μg/m3. The datetime column provides an indication of periods of example, the sensor is 3.1 km away from the regulatory AMS equipped

7
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 6. Plot generated using the pat_scatterplot function to graphically view the variables in the PAT timeseries data object.

with a Met One Beta Attenuation Monitor (BAM), which is a U.S. EPA nearby regulatory-grade instrument for PM2.5 with R2 > 0.73, the sensor
designated Class III FEM (EQPM-0308-170) for PM2.5. The time resolu­ tends to estimate higher concentrations than the regulatory-grade in­
tion of the regulatory PM2.5 data is hourly. To match LCS data with the strument. This slope/intercept offset could be due to a local emission
regulatory data, this function uses the QC procedures previously source impacting this particular sensor location or could be due to
described to hourly aggregate the sensor data. The user can specify sensor measurement bias error that has been identified in prior publi­
which QC algorithm to apply or create custom QC functions. Fig. 7 in­ cations (Feenstra et al., 2019; Magi et al., 2019). For the time-series in
dicates that while the sensor follows the typical daily PM2.5 trends of the Fig. 7, the purple colored points represent the 1-hr PurpleAir sensor data

Fig. 7. Scatter plot and timeseries plot rendered using the pat_externalFit() function which compares the PA-II sensor, SCSB_20, in Seal Beach, CA to a nearby
regulatory-grade PM2.5 instrument in Long Beach, CA.

8
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

and the black colored points represent the regulatory-grade instrument these plots is retrieved from the NOAA worldmet R package. The bivar­
data. If the two agree closely within an hour, the black point would be iate polar plot and pollution rose, which are shown in Fig. 9 and Fig. 10
plotted on top of the purple point for that hour. The plot in Fig. 7 is respectively, provide the user with the ability to couple wind direction
created with the following: and wind speed with PM2.5 pollutant data to determine whether pollu­
tion events can be attributed to specific meteorological conditions and
potentially identify pollution sources. A more in-depth analysis of these
plots and their application in analyzing AM datasets is accessible within
The pat_dygraph function returns an interactive time-series plot for
the published literature on the ‘open-air’ R package development (Car­
both channel A and B allowing the user to zoom in/out and investigate
slaw and Ropkins, 2012) and use of bivariate polar plots (Carslaw and
date/times when PM2.5 concentrations may be higher than normal
Beevers, 2013; Grange et al., 2016). The pollution rose and polar plot are
(Fig. 8). Using the interactive time-slider located below the plot allows
generated by the following code:
the user to quickly zoom in to further investigate dates and times with
particle pollution events. With a small amount of code, the dygraph
2.3.6. Timestamp and time averaging for AirSensor data objects and
provides a versatile, interactive plot, where the user can explore a large
functions
amount of data at customizable levels with the time slider and zoom in/
AirSensor and AirSensor functions have been designed to appropri­
out features. Fig. 8 is created with the following:
ately handle timestamps and various time zones of potential users. Users
should understand how time stamps are stored and visualized within
AirSensor and take appropriate steps when creating and visualizing
AirSensor data objects; especially if using plotting functions outside of
2.3.5. Hourly QC data object (sensor) the AirSensor package to visualize data. The PurpleAir API provides
The sensor data object is generated on a per sensor basis from a PAT access to data stored in Coordinated Universal Time (UTC). The Air­
data object with the pat_createAirSensor() function. The user will need Sensor data objects (PAS, PAT, and sensor) all store data with a UTC
to specify a PAT data object, time averaging period, parameter, channel, timestamp. When creating or loading either a PAT or a sensor data ob­
QC algorithm, and minimum count. The QC algorithms applied in jects, the user can specify the local time zone of the sensor selected. If a
creating the sensor data object are described earlier in Section 2.2.4 with time zone is not specified when creating a data object for a single day,
regards to the QC functions that can be applied to a PAT timeseries data the date/time parameters will be passed as UTC, which for a sensor
object. The functions for sensor data objects begin with “sensor_”. An located in the Pacific Time Zone (+8h UTC) would return a data object
example creating a sensor data object is shown in the code below: with data from 08:00 AM to 08:00 AM local time of the following day. In
AirSensor, time stamps are labeled and time averages are coded as “time
beginning”. For example, a 1-hr time average with a timestamp of 14:00
would be an average of the data collected between 14:00 and 14:59. This
holds true even with the 2-min time-matched channel A and B sensor
data available in AirSensor. The PurpleAir PA-II channel A and B sensors
report at different times within a 120-sec time interval. In AirSensor, the
seconds are dropped and data from the A and B sensor are assigned to a
2-min time beginning time stamp for matching purposes between the
Plots available for the sensor data object within the AirSensor package two OEM sensors within a PA-II sensor. Since data is stored as UTC, the
include a bivariate polar plot and pollution rose, which wrap functions plotting functions within AirSensor are coded to appropriately apply
from the openair R package. The meteorological data used to generate time shifts based on the sensor’s location (time zone) so that data will be
plotted and displayed in the local time of that sensor’s location.

Fig. 8. Dygraph plot with interactive time-slider generated by the pat_dygraph function.

9
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 9. PM2.5 Pollution Rose generated by the sensorpollutionRose() function for SCSB_20 located in Seal Beach, CA for June 27 to July 08, 2018.

2.4. AirSensor DataViewer web application 2.4.2. Cloud computing resources


Cloud computing provides computing services over the internet
2.4.1. AirSensor DataViewer overview using a pay-as-you-go pricing model. Computing services typically
The DataViewer application was developed to provide an online include computing power, storage, networking, and analytics. Cloud
interactive data experience for the SGSC networks. These communities computing can provide benefits by allowing programmers to focus on
and sensor names are listed in SI Table S2. This interactive web appli­ building new and innovative applications rather than acquiring and
cation provides access to the functionality of the AirSensor R package. maintaining the infrastructure required for their computational needs.
Citizen scientists that would not be able to download R and run code or The cloud can provide benefits with cost reductions for IT infrastructure
scripts to access, process, and visualize community data are now able to and can increase the scalability, elasticity, reliability, and security of
visualize their community data through the DataViewer. While the computational services in comparison to computation services provi­
infrastructure to generate the types of plots that had resonated with sioned locally or on-premise. Azure, which is Microsoft’s public cloud
community group members during the workshops was developed in the computing platform, was used to support the computational re­
AirSensor package, the ability for community group members to use that quirements of the DataViewer application. The application could also be
infrastructure and generate visualizations in an interactive web appli­ run on another public cloud platform or on premise if desired. The
cation without writing a single piece of code is provided in the Data­ computation services required include running scheduled tasks (cron
Viewer application. Plots that generated the most interest with jobs) for creating data objects, storing data in structured data di­
community group members, including calendar plots, concentration rectories, and hosting the DataViewer application. The data archive
maps, community time-lapse videos, and sensor performance plots be­ consists of a set of flat files defined by a simple directory and naming
tween the A and B internal sensors and between the sensor and nearest protocol with the data ingest scripts written in the R programming
reference PM2.5 monitor, were prioritized for incorporation in the language. A virtual machine (VM) was configured on Azure with the
DataViewer. The following sections will provide an overview of the back- structured directories for the data directories along with required soft­
end infrastructure required for the DataViewer application and the ware (i.e., Git, Apache, Docker, and R). A second VM was configured to
methodology for the DataViewer color scale and timelapse videos. The host the DataViewer application. Fig. 11 provides a simplified system
front-end of the DataViewer, which is the online web application and the architecture for the DataViewer application.
primary point of interaction for community members, is highlighted in
the results section.

10
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 10. PM2.5 Bivariate Polar Plot generated by the sensorpolarPlot() function for SCSB_20 located in Seal Beach, CA for June 27 to July 08, 2018.

2.4.3. DataViewer color scale users with a clearer differentiation among the higher pollutant levels
Determining an appropriate color scale for pollutant concentrations sometimes indicated by the sensors. Hence, a new color scheme was
generated by LCS is challenging. Historically, air quality has been developed for the DataViewer that includes 5 concentration categories
colored according to the AQI with values ranging from 0 to 500 with six represented by two colors (blue and purple) with variations in the hue
distinct color categories; good (green), moderate (yellow), unhealthy for and luminance as shown in Table 1.
sensitive groups (orange), unhealthy (red), very unhealthy (purple), and
hazardous (maroon). Historically, AQI has been calculated at 24-h av­ 2.4.4. FF MPEG and digital stills and video stills creation
erages due to the scientific information about air pollution exposure and One of the desires of the community groups was to view historical
public health. In 2013, the U.S. EPA released a new AQI calculation time-lapse concentration maps to view past air quality events in their
method (NowCast Reff method) for PM2.5 that calculates AQI hourly communities. To accomplish this task, cron jobs run hourly to create
based on the previous 12 h with the most recent hourly pollutant con­ video still images for each of the 14 SGSC. These images are stored in the
centrations given larger weighting when air quality is changing rapidly structured data directory in sequence and converted into mp4 video files
(Mintz et al., 2013). The U.S. EPA in the Air Sensor Toolbox suggested a using FFmpeg, which is a FOSS (Dawes, 2019; FFmpeg, 2019).
new pilot version color/concentration scale that could be used for 1-min
high time resolution data from LCS (U.S. Environmental Protection 3. Results
Agency, 2019). This scale uses four shades of blue for low, medium,
high, and very high PM2.5 concentrations and is shown in SI Fig. S2. The 3.1. AirSensor package
scale from the AirSensor Toolbox was created for 1-min sensor data in
contrast to this work in which LCS data is processed with QC algorithms The AirSensor R package meets the community needs for those
and time-averaged to 1-hr concentrations prior to being displayed in the desiring to work with PurpleAir LCS data programmatically in the R
DataViewer application. Furthermore, the authors wanted to provide environment. Through the AirSensor package, real-time and historical

11
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 11. System architecture for the DataViewer application.

conditions. This solution provides an example of how these types of tools


Table 1
and solutions can enhance public engagement with data from LCS
DataViewer color/concentration scale for 1-Hr PM2.5
networks.
concentrations.

3.2. DataViewer application

3.2.1. User interface: Tabular structure and plotting features


The DataViewer application, version 0.9.7, has a hierarchical page
and tab structure with 4 top-level pages: Explore, View Data, Latest
Data, and About. The View Data page is for viewing tabled data and
provides the ability to download data in 3 to 30-day intervals on a per
sensor basis. For SGSC, historical data can be accessed back to the start
of the SGSC deployments: October 01, 2017. The View Data page in­
cludes high resolution (2-min) time-matched PA-II PM2.5 data from the
A and B sensor channels, temperature (◦ F), and relative humidity (%).
This data output provides the user with a clean time-matched data set for
the A and B sensor. Creating a similar data set outside of the AirSensor R
package or DataViewer application would likely be time consuming and
difficult; especially if the user were not proficient with Microsoft Excel
or data science environments. The Latest Data page provides visual ac­
cess to the latest non-QC data on a per sensor basis with timeseries plots
data from the SGSC (listed in SI Table S2) can be accessed, loaded into R, provided for sensor channel A, channel B, humidity, and temperature.
and visualized used pre-built plotting functions. These plotting functions The “About” page provides an overview of the DataViewer, its intended
allow the user to create useful and interactive plots that can be shared purpose, QC procedures, and a disclaimer message.
within a community group and deliver actionable information for the
community members to answer questions like “When is particle pollu­ 3.2.2. Explore page: Tabs and functionality
tion highest in my community?” and “What time of day or day of week The Explore page has the most functionality for exploring and
would be best to plan an outdoor activity (i.e. walk dog or golf game) to analyzing community AM data and includes six tabs: Overview, Calen­
potentially reduce my particle pollution exposure?” AirSensor creates a dar, Raw Data, Daily Patterns, Compare, and Timelapse. In the Overview
data flow for the end-user to create data objects for synoptic data, time- tab, the user can select a community, a single sensor (sensor name), a
series data, and QC hourly PM2.5 data. With the functions of this R date (end date), and view past data with options for viewing the prior 3,
package highlighted in the methods section, the user can easily create 7, 15, or 30 days to the selected end date. The Overview tab (Fig. 12)
informative plots for community members to understand their local provides a map that displays the average PM2.5 for all sensors within the
historical air quality trends with minimal coding required. The AirSensor selected community for the time period selected (3, 7, 15, or 30-day
R package and associated functions provide the necessary back-end average) and a bar chart that displays hourly PM2.5 concentrations for
software analysis and plotting functions to create the front-end Data­ the selected sensor. This overview tab provides the user with access to
Viewer web application. The DataViewer is usable and useful to a much historical pollutant concentrations for the user-selected timeframe for
broader segment of the public and is the primary point of interaction for their community and individual sensor. By changing the date, the user
community members to gain insights into their local air quality can quickly identify spatial differences between locations since the map
indicates an average PM2.5 concentration for the entire timeframe

12
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Fig. 12. Overview tab in the DataViewer application showing PM2.5 concentrations and sensor locations for Seal Beach, CA.

chosen (3–30 days). Additionally, the user can quickly scan the bar chart selected sensor. The user selects a community, sensor, and date with a
for when higher than typical PM2.5 concentrations were recorded for a calendar plot being generated for the entire calendar year of the date
particular sensor. selected (Fig. 13). The calendar plot is interactive and when the user
In the Calendar tab, a 1-year calendar plot is rendered for a single hovers over a date, the 24-Hr averaged PM2.5 concentration is displayed.

Fig. 13. Calendar Plot generated using the AirSensor DataViewer Application. The darker shades indicate higher levels of pollution as set forth in the color scale
provided in Table 1. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

13
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

The calendar plot is easily understood by community members and the selected time period. If a user is concerned with the performance of
provides an intuitive view of a complete year of PM2.5 data for a single an individual sensor, this tab can be used to determine if both the raw
sensor. The calendar tab is great place to start when exploring a com­ sensors are responding to changes in particle concentrations similarly.
munity data set to find dates with atypical 24-hr PM2.5 concentrations. Low correlation and/or a large slope/intercept offset are indicative of a
The user can then further examine these atypical pollution events at sensor performance issue and that one or both sensors may be experi­
higher time resolution with other tabs available within the Explore page. encing a malfunction.
The calendar plot especially resonated with the community members The Daily Patterns tab (Fig. 14) provides a bar chart illustrating the
and sensor hosts; and therefore, was a priority for inclusion in the diurnal trend for PM2.5, a pollution rose, and a summary table for the
DataViewer application. Calculating and rendering the calendar plot is NOAA weather data for the date range selected. The daily patterns bar
computationally expensive and may take a few moments to display chart provides the average concentration by hour of day. With this tab,
when interacting with the DataViewer application, but the result is well the DataViewer user can determine on average what hour of the day has
worth the wait for this informative plot. Throughout the workshops, we the highest and lowest particle pollution. This plot helps to inform users
received the most feedback and discussion from community members as to historical trends within their community and provides information
when showing the calendar plot. The calendar plot triggered the audi­ that the community member can infer what time of day may be best for
ence and facilitated effective discussions during community workshops. scheduling physical activity to reduce particle pollution exposure based
Community members who would be more silent or could not recall as to on historical air pollution trend data. The pollution rose allows the user
what might have caused poor air quality in their community during the to determine if pollution can be attributed to specific meteorological
past several months, were able to identify days with poor air quality and conditions.
what might have caused them when they viewed the calendar plot with The Compare tab provides a comparison between the sensor data and
the color-coded concentrations. the nearest AMS equipped with a continuous regulatory PM2.5 instru­
The Raw Data tab provides the raw time-series data for channels A ment. The Compare tab provides a map indicating the location of the
and B, humidity, and temperature. Below the time series plots, the Raw sensor and nearest AMS along with a timeseries and scatter plot com­
Data tab provides a comparison between the channel A and B sensors parison for the two data sources, allowing the user to determine if the
with both a time-series and a scatterplot that indicates the regression selected sensor follows the typical trends for PM2.5 recorded at the
statistics between the channel A and B. This functionality uses the nearby regulatory AMS for the date range selected. The DataViewer
pat_internalfit() function from the AirSensor R package which was pre­ application is using the AirSensor pat_externalfit() plotting function
viously shown in Fig. 5. These comparison plots provide the user with which was shown prior in Fig. 7. While the distance between the regu­
the ability to check on the performance of an individual sensor by latory monitor and the LCS is provided on the Sensor-Monitor Com­
viewing how well the two internal raw sensors within the PA-II agree for parison timeseries plot, the map provided in the DataViewer on this tab

Fig. 14. Snapshot of the Daily Patterns tab in the DataViewer application.

14
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

allows the user the opportunity to visualize the distance between and the community members can take appropriate actions to reduce their
spatial context of the two monitoring locations. Understanding the siting exposure to air pollutants. These actions could include planning trans­
of the LCS and the regulatory AMS is crucial to understanding the in­ portation (e.g., walk, bike, motor vehicle) routes to reduce air pollution
formation provided by the comparison plot. If either the sensor or reg­ exposure and scheduling physical activity events (e.g., golf game,
ulatory monitor is installed in a near-source environment (i.e. near- sporting practice, sporting event) during hours of the day or day of the
road), the user should not expect the two measurements to agree. week that have been identified to have lower PM2.5 pollution based on
The final tab in the Explore page is the Timelapse tab. This tab historical data analysis. Our experience with sharing the DataViewer
provides the user with the ability to generate a 6-day timelapse PM2.5 with the community leaders and members participating in the project
concentration video on a per community basis (Fig. 15). Right-clicking has been positive with users enjoying the interactive data experience
on the video allows the user to save a MP4 video to their computer provided within the DataViewer. These community members have
and share if desired. This timelapse concentration map allows the user to shared how this DataViewer provides them with the analysis capabilities
view pollution events that may have taken place within a community to better understand their local air quality conditions. Plots that previ­
during a selected time frame and visualize the flow of pollutants through ously seemed out of reach due to required technical data analysis skills
a community. An informative approach to using this timelapse video is and coding experience are now readily available and generated with
first to use the calendar plot feature to identify dates with elevated PM2.5 only a few selections and mouse clicks within the DataViewer
mass concentrations (μg/m3). After identifying those dates, the user can application.
then choose an inclusive date range to view the community timelapse to FOSS software developments provide efficiency by building a com­
better understand the pollution event. munity of proactive data users around shared tools and allowing for
multiple parties (i.e. agencies, entities, individuals) to contribute to
4. Discussion software development and enhancing software functionalities. This
benefit has already been realized as with the USFS AirFire group funding
While online systems exist to view real-time and recently recorded further developments to AirSensor for functions to calculate state-of-
measurements, FOSS tools for accessing, processing, and analyzing health metrics designed to categorize whether sensors are functioning
historical AM data collected by LCS are less available to the public. properly. This information will be used in the context of wildfire air
Developing FOSS tools for archiving, interpreting, and communicating quality response. FOSS allows for researchers to collaborate and build
data from sensors has been identified as a concrete next step towards upon the foundation established in this development. FOSS de­
building a system for filling the air quality data gap (Pinder et al., 2019). velopments can also provide a high level of transparency in terms of data
This work provides a FOSS R package and a web application designed to analysis and integrity as the end-user is able to select which post-
fill that gap by providing the software tools to view both real-time and processing steps are appropriate for their data analysis. With FOSS
historical hyper-local air quality information generated by LCS net­ tools and publicly available data sets, researchers can reproduce data
works. Access to hyper-local air quality information is expected to analysis techniques and develop additional functions with the interop­
spawn an increased desire to interact with air quality information and erability associated with FOSS development.
allow community members to take appropriate actions based on results
generated from their community monitoring networks. The AirSensor R 5. Conclusions
package and DataViewer application provide a framework and data flow
for communities to transform their community monitoring data sets into This novel work brings these software systems to the end-users or
insightful information through interactive data experiences and data community members in a FOSS format with all the advantages of open
explorations. When meaningful results and observations are formulated, software developments. Not only is the end-user able to access, process,

Fig. 15. Community timelapse video tab.

15
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

and analyze historical sensor data, but the user also has access to the SPEC) at South Coast AQMD. The authors would also like to thank the
source code and functions with the option to create their own custom community groups’ leaders, trainers, coordinators, and members/sensor
functions for QC, filters, and advanced analytics. Allowing the com­ hosts that participated in the U.S. EPA STAR grant and provided valu­
munity to build upon this existing work provides benefits to the sensing able feedback that allowed us to create and improve this work. The
community as a whole. Developing this software in the R-environment authors thank Ms. Emma Ranheim who assisted in user testing the Air­
also provides for data fusion enrichment by coupling the collected AM Sensor R package.
sensor data with meteorological data and regulatory AM data through
other open-source packages in the R environment. The AirSensor pack­ Appendix A. Supplementary data
age has established a foundation upon which further enhancements and
refinements can be developed. Both AirSensor and DataViewer source Supplementary data to this article can be found online at https://doi.
codes are available on Github and the authors invite collaboration and org/10.1016/j.envsoft.2020.104832.
input to help shape the AirSensor open source project to best meet the
needs of the air sensing community. References
The AirSensor R package is sensor specific, working with any publicly
registered Purple Air PA-II sensors. The DataViewer solution is both Air Fire Tools, 2020. WFAQRP-AirFire tools information. Viewed 02/05/2020, from.
sensor- and project-specific and therefore limited to the PA-II sensors https://tools.airfire.org.
Air Quality Egg, 2020. Air quality Egg portal. Viewed 02/05/2020, from. https://airqua
deployed by South Coast AQMD in SGSC. The authors believe that the lityegg.com/portal/.
data flow works well for AM sensor data with the data objects going from AirNow, 2020. "AirNow developer tools." AirNow API. Viewed 02/04/2020, from.
synoptic data to time-series data and then to hourly QC sensor data. The https://docs.airnowapi.org/.
Allaire, J., 2020. RStudio, PBC. https://blog.rstudio.com/2020/01/29/rstudio-pbc/
blueprint developed to make the DataViewer operational could be 2020.
applied to other projects and communities to visualize data collected by BreezoMeter, 2020. Air quality map. Viewed 02/04/2020, from. https://breezometer.
their PurpleAir LCS networks. The work discussed in this paper focused com/air-quality-map.
Callahan, J., Aras, R., Dingels, Z., Hagg, J., Kim, J., Martin, H., Miller, H., Pease, S.,
on the initial data handling and analysis capabilities required for a
Thompson, R., Yang, A., 2019. PWFSLSmoke: Utilities for Working with Air Quality
community AM network of PM2.5 sensors. Planned future work will Monitoring Data. R package version 1.2.103, from. https://github.com/Mazam
focus on several improvements to the AirSensor R package, the data aScience/PWFSLSmoke.
Carslaw, D., 2019. Worldmet: Import Surface Meteorological Data from NOAA Integrated
archive database design, and the DataViewer application. The AirSensor
Surface Database (ISD). R package version 0.8.7, from. http://github.com/davidca
R package and archive will be improved by adding functionality to rslaw/worldmet.
handle unique timeseries identifiers and incorporating PM1 and PM10 Carslaw, D.C., Beevers, S.D., 2013. Characterising and understanding emission sources
data. Additional plotting functionality will include enhancements to using bivariate polar plots and k-means clustering. Environ. Model. Software 40,
325–329.
create multi-sensor comparison plots and visualize sensor state-of-health Carslaw, D.C., Ropkins, K., 2012. Openair - an R package for air quality data analysis.
metrics for both individual sensors and sensor networks. Additional Environ. Model. Software 27–28, 52–61.
enhancements to the R package may include developing models to Conway, D., 2013. The data science venn diagram. Viewed 05/31/19, from. http://
drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.
provide hyper local air quality forecast for the community. The Data­ Dawes, B., 2019. Using FFmpeg to convert image sequences to video. Analogue + Digital
Viewer will be enhanced by improving the appearance, usability, data Viewed 12/27/19, from. http://brendandawes.com/blog/ffmpeg-images-to-video.
handling, and performance of the application. Feenstra, B., Papapostolou, V., Hasheminassab, S., Zhang, H., Boghossian, B.D.,
Cocker, D., Polidori, A., 2019. Performance evaluation of twelve low-cost PM2.5
sensors at an ambient air monitoring site. Atmos. Environ. 216, 116946.
Funding FFmpeg, 2019. FFmpeg. Viewed 12/27/19, from. https://www.ffmpeg.org/about.html.
Gibert, K., Horsburgh, J.S., Athanasiadis, I.N., Holmes, G., 2018. Environmental data
science. Environ. Model. Software 106, 4–12.
This research has been supported by a grant from the U.S. Environ­
Grange, S.K., Lewis, A.C., Carslaw, D.C., 2016. Source apportionment advances using
mental Protection Agency’s Science to Achieve Results (STAR) program polar plots of bivariate correlation and regression statistics. Atmos. Environ. 145,
to the South Coast Air Quality Management District. 128–134.
HabitatMap, 2020. AirCasting map. Viewed 02/05/2020, from. http://aircasting.ha
bitatmap.org/mobile_map.
Disclaimer statement Hagler, G.S.W., Williams, R., Papapostolou, V., Polidori, A., 2018. Air quality sensors and
data adjustment algorithms: when is it No longer a measurement? Environ. Sci.
This publication was developed under Assistance Agreement No. Technol. 52 (10), 5530–5531.
IQAir, 2020. AirVisual map. from. https://www.airvisual.com/air-quality-map.
R836184 awarded by the U.S. Environmental Protection Agency to Kadiyala, A., Kumar, A., 2017a. Applications of Python to evaluate environmental data
South Coast AQMD. It has not been formally reviewed by EPA. The views science problems. Environ. Prog. Sustain. Energy 36 (6), 1580–1586.
expressed in this document are solely those of the authors and do not Kadiyala, A., Kumar, A., 2017b. Applications of R to evaluate environmental data science
problems. Environ. Prog. Sustain. Energy 36 (5), 1358–1364.
necessarily reflect those of the U.S. EPA. The South Coast AQMD and U. Luftdaten, 2020. Measuring air data with citizen science. Viewed 02/05/2020, from. htt
S. EPA do not endorse any products or commercial services mentioned in ps://luftdaten.info/en/home-en/.
this publication. Magi, B.I., Cupini, C., Francis, J., Green, M., Hauser, C., 2019. Evaluation of PM2.5
measured in an urban setting using a low-cost optical particle counter and a Federal
Equivalent Method Beta Attenuation Monitor. Aerosol. Sci. Technol. 13.
Declaration of competing interest Mintz, D., Stone, S., Dickerson, P., Davis, A., 2013. Transitioning to a New NowCast
Method. Viewed 11/21/19, from. https://www3.epa.gov/airnow/ani/pm25_aqi_re
porting_nowcast_overview.pdf.
The authors declare that they have no known competing financial Ok Lab Stuttgart, 2020. Open data Stuttgart. Viewed 02/05/2020, from. www.github.
interests or personal relationships that could have appeared to influence com/opendata-stuttgart.
the work reported in this paper. OpenAQ, 2020. Open data: countries. Viewed 01/24/2020, from. https://openaq.org/
#/countries.
Pinder, R.W., Klopp, J.M., Kleiman, G., Hagler, G.S.W., Awe, Y., Terry, S., 2019.
Acknowledgements Opportunities and challenges for filling the air quality data gap in low- and middle-
income countries. Atmos. Environ. 215, 116794.
Plume Labs, 2020. Air quality map. Viewed 02/04/2020, from. https://air.plumelabs.
The authors would like to thank Dr. Jonathan Callahan and Hans
com/air-quality-map.
Martin at Mazama Science, Inc. (Seattle, WA) for their collaboration and PurpleAir, 2020. PurpleAir map. Viewed 02/05/2020, from. https://www.purpleair.
contributions in the development of the AirSensor R-package and the com/map.
DataViewer application tools along with their valuable feedback on this RStudio, 2019. R studio. Viewed 11/08/2019, from. https://rstudio.com/.
Sandhaus, S., Kaufmann, D., Ramirez-Andreotta, M., 2019. Public participation, trust and
manuscript. The sensor data used and presented in this paper was data sharing: gardens as hubs for citizen science and environmental health literacy
collected by the Air Quality Sensor Performance Evaluation Center (AQ- efforts. Int. J. Sci. Educ. Part B-Communication and Public Engagement 9 (1), 54–71.

16
B. Feenstra et al. Environmental Modelling and Software 134 (2020) 104832

Smart Citizen Kit, 2020. Smart Citizen Kit Map, 02/05/2020, from https://smartcitizen. U.S. Environmental Protection Agency, 2019. Air sensor Toolbox: what do my sensor
me/kits/. readings mean? Sensor scale pilot project. Viewed 11/21/2019, from. https://www.
Snyder, E.G., Watkins, T.H., Solomon, P.A., Thoma, E.D., Williams, R.W., Hagler, G.S.W., epa.gov/air-sensor-toolbox/what-do-my-sensor-readings-mean-sensor-scale-pilot-
Shelow, D., Hindin, D.A., Kilaru, V.J., Preuss, P.W., 2013. The changing paradigm of project.
air pollution monitoring. Environ. Sci. Technol. 47 (20), 11369–11377. uRADMonitor, 2020. Global Environmental Monitoring Network. Viewed 02/05/2020,
The R environment, 2019. The R environment. Viewed 11/08/2019, 2019, from. https:// from. https://www.uradmonitor.com/.
www.r-project.org/about.html. World Air Quality Index Project, 2020. World’s air pollution: real-time air quality Index.
Viewed 01/24/2020, 2020, from. https://waqi.info/.

17

You might also like