Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Yann-A¨el Le Borgne, Gianluca Bontempi
Machine Learning Group
Department of Computer Science
Universit´e Libre de Bruxelles
May 15th, 2009

2

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Wireless sensors: latest trend of Moore’s law (1965)
“The number of transistors that can be
placed inexpensively on an integrated circuit
doubles every two years.”
1950 1990 2000 2010
Computing devices get
Smaller
Cheaper
→ Enable new kinds of interactions with our world

3

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Wireless sensors
Sensor nodes can collect, process and communicate data
[Warneke et al., 2001; Akyildiz et al., 2002]
TMote Sky
Deputy dust
Sensors: Light, temperature, humidity,
pressure, acceleration, sound, . . .
Radio: ∼ 10s kbps, ∼ 10s meters
Microprocessor: A few MHz
Memory: ∼ 10s KB

4

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Environmental monitoring and periodic data collection
Base
station
Wireless node
Internet
Sensor network
Base station
separate calibration procedures to test the system prior to
placing it in the field, and then spent a day in the forest
with ropes, harnesses, and a notebook.
We decided on the following envelope for our deployment:
Time: One month during the early summer, sampling all
sensors once every 5 minutes. The early summer con-
tains the most dynamic microclimatic variation. We
decided that sampling every 5 minutes would be suffi-
cient to capture that variation.
Vertical Distance: 15m from ground level to 70m from
ground level, with roughly a 2-meter spacing between
nodes. This spatial density ensured that we could cap-
ture gradients in enough detail to interpolate accu-
rately. The envelope began at 15m because most of
the foliage was in the upper region of the tree.
Angular Location: The west side of the tree. The west
side had a thicker canopy and provided the most buffer-
ing against direct environmental effects.
Radial Distance: 0.1-1.0m from the trunk. The nodes
were placed very close to the trunk to ensure that we
were capturing the microclimatic trends that affected
the tree directly, and not the broader climate.
Figure 1 shows the final placement of each mote in the
tree. We also placed several nodes outside of our angular
and radial envelope in order to monitor the microclimate in
the immediate vicinity of other biological sensing equipment
that had previously been installed.
Figure 1: The placement of nodes within the tree
4.1 Hardware and Network Architecture
The sensor node platform was a Mica2Dot, a repackaged
Mica2 mote produced by Crossbow, with a 1 inch diameter
form factor. The mote used an Atmel ATmega128 micro-
controller running at 4 MHz, a 433 MHz radio from Chip-
con operating at 40Kbps, and 512KB of flash memory. The
mote was connected to digital sensors using I2C and SPI
serial protocols and to analog sensors using the on-board
ADC.
The choice of measured parameters was driven by the bio-
logical requirements. We measured traditional climate vari-
ables – temperature, humidity, and light levels. Tempera-
ture and relative humidity feed directly into transpiration
models for redwood forests. Photosynthetically active radi-
ation (PAR, wavelength from 350 to 700 nm) provides infor-
mation about energy available for photosynthesis and tells
us about drivers for the carbon balance in the forest. We
measure both incident (direct) and reflected (ambient) levels
of PAR. Incident measurements provide insight into the en-
ergy available for photosynthesis, while the ratio of reflected
to incident PAR allows for eventual validation of satellite
remote sensing measurements of land surface reflectance.
The Sensirion SHT11 digital sensor provided temperature
(± 0.5◦
C) and humidity (± 3.5%) measurements. The in-
cident and reflected PAR measurements were collected by
two Hamamatsu S1087 photodiodes interfaced to the 10-bit
ADC on Mica2Dot.
The platform also included a TAOS TSL2550 sensor to
measure total solar radiation (300nm - 1000nm), and an
Intersema MS5534A to measure barometric pressure, but
we chose not to use them in our deployment. During cali-
bration, we found that the TSR sensor was overly sensitive,
and would not produce useful information in direct sunlight.
Because TSR and PAR would have told roughly the same
story, and because PAR was more useful from the biology
viewpoint, we decided not to gather data on total solar ra-
diation. As for the pressure sensor, barometric pressure is
simply too diffuse a phenomenon to show appreciable differ-
ences over the height of a single redwood tree. A standard
pressure gradient would exist as a direct function of height,
but any pressure changes due to weather would affect the
entire tree equally. Barometric pressure sensing should be
useful in future large-scale climate studies.
The package for such a deployment needs to protect the
electronics from the weather while safely exposing the sen-
sors. Our chosen sensing modalities place specific require-
ments on the package. Standardized temperature and hu-
midity sensing should be performed in a shaded area with
adequate airflow, implying that the enclosure must provide
such a space while absorbing little radiated heat. The out-
put of the sensors that measure direct radiation is dependent
on the sensor orientation, so the enclosure must expose these
sensors and level their sensing plane. The sensors measuring
ambient levels of PAR must be shaded but need a relatively
wide field of view.
The package designed for this deployment is shown in Fig-
ure 2. The mote, the battery, and two sensor boards fit in-
side the sealed cylindrical enclosure. The enclosure is milled
from white HDPE, and reflects most of the radiated heat.
The endcaps of the cylinder form two sensing surfaces – one
captures direct radiation, the other captures all other mea-
surements. The white “skirt” provides extra shade, protec-
53
One-to-one applications Many-to-one applications
Data is sent by nodes periodically to a base station.
Applications: Medical, interactive arts, ecology, industry,
disaster prevention.

5

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos node
Standby 5.1 µA
MCU Active 1.8 mA
MCU + Radio RX 21.8 mA
MCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.
95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.

6

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos node
Standby 5.1 µA
MCU Active 1.8 mA
MCU + Radio RX 21.8 mA
MCU + Radio TX (0dBm) 19.5 mA
The radio is the most energy consuming module.
95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.

7

Preliminaries Adaptive Model Selection Adaptive Model Selection
Supervised learning
Goal:
Using examples, find relationships in data by means of
prediction models hθ (parametric functions).
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80 100 120
020406080100120
x
y
Time
Measurement
t
si[t]
Model
si[t] = θt
Training
examples

8

Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learning
Modeling data with parametric functions
Let S = {1, 2, . . . , S} be the set of S sensor nodes.
Let t ∈ N denote time instants, or epochs.
Let si [t] be the measurement of sensor i ∈ S at epoch t.
A model is a parametric function
hθ : Rn
→ R
x → ˆsi [t] = hθ(x)
hθ : Model with parameter θ ∈ Rp
x ∈ Rn: Input.
ˆsi [t] ∈ R: approximation to si [t]
Temporal models, e.g., ˆsi [t] = θ1si [t − 1] + θ2si [t − 2]
x = (si [t − 1], si [t − 2]): past measurements of sensor i are
used to model si [t].
Spatial models, e.g., ˆsi [t] = θ1sj [t] + θ2sk [t]
x = (sj [t], sk [t]), j, k ∈ S: measurements of sensors j and k are
used to model measurements of sensor i.

9

Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learning
Modeling data with parametric functions
Let S = {1, 2, . . . , S} be the set of S sensor nodes.
Let t ∈ N denote time instants, or epochs.
Let si [t] be the measurement of sensor i ∈ S at epoch t.
A model is a parametric function
hθ : Rn
→ R
x → ˆsi [t] = hθ(x)
hθ : Model with parameter θ ∈ Rp
x ∈ Rn: Input.
ˆsi [t] ∈ R: approximation to si [t]
Temporal models, e.g., ˆsi [t] = θ1si [t − 1] + θ2si [t − 2]
x = (si [t − 1], si [t − 2]): past measurements of sensor i are
used to model si [t].
Spatial models, e.g., ˆsi [t] = θ1sj [t] + θ2sk [t]
x = (sj [t], sk [t]), j, k ∈ S: measurements of sensors j and k are
used to model measurements of sensor i.

10

Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learning
Modeling data with parametric functions
Let S = {1, 2, . . . , S} be the set of S sensor nodes.
Let t ∈ N denote time instants, or epochs.
Let si [t] be the measurement of sensor i ∈ S at epoch t.
A model is a parametric function
hθ : Rn
→ R
x → ˆsi [t] = hθ(x)
hθ : Model with parameter θ ∈ Rp
x ∈ Rn: Input.
ˆsi [t] ∈ R: approximation to si [t]
Temporal models, e.g., ˆsi [t] = θ1si [t − 1] + θ2si [t − 2]
x = (si [t − 1], si [t − 2]): past measurements of sensor i are
used to model si [t].
Spatial models, e.g., ˆsi [t] = θ1sj [t] + θ2sk [t]
x = (sj [t], sk [t]), j, k ∈ S: measurements of sensors j and k are
used to model measurements of sensor i.

11

Preliminaries Adaptive Model Selection Adaptive Model Selection
Learning
Learning procedure
The model parameters are obtained by a learning procedure,
based on a training set of N examples. A loss function L(y, ˆy)
is used to minimize the model error.
Target function
Prediction model
Training set
output ˆy
output yinput x
error L(y, ˆy)
Learning procedure
Loss function: Quadratic, Zero/One, . . .
Learning procedure: Linear/nonlinear regression, K nearest
neighbors, SVM, . . .

12

Preliminaries Adaptive Model Selection Adaptive Model Selection
Learning
Learning procedure
The model parameters are obtained by a learning procedure,
based on a training set of N examples. A loss function L(y, ˆy)
is used to minimize the model error.
Target function
Prediction model
Training set
output ˆy
output yinput x
error L(y, ˆy)
Learning procedure
Loss function: Quadratic, Zero/One, . . .
Learning procedure: Linear/nonlinear regression, K nearest
neighbors, SVM, . . .

13

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection (AMS)

14

Preliminaries Adaptive Model Selection Adaptive Model Selection
State of the art: Temporal replicated models
Overview
Base
station
Wireless node
Model Copy of modelhθ hθ
Models are used to predict sensors’ measurements over time.
A user defined threshold determines when a sensor node
updates the model.
Constant model [Olston et al., 2001]
ˆsi [t] = si [t − 1]
Most simple: no parameter to compute

15

Preliminaries Adaptive Model Selection Adaptive Model Selection
Temporal replicated models
Overview
Temperature measurements, Solbosch greenhouse.
160 170 180 190 200 210 220
30323436384042
Accuracy: 2°C
Constant model
Time instants
Temperature(°C)
q q q q q
Sensor node
Base station
5 updates instead of 58
→ more than 90% of communication savings.

16

Preliminaries Adaptive Model Selection Adaptive Model Selection
Temporal replicated models
From simple to complex model
Constant model
ˆsi [t] = si [t − 1]
Most simple
No parameter to compute
Not complex
Autoregressive model AR(p)
ˆsi [t] = θ1si [t−1]+. . .+θpsi [t−p]
Regression
θ = (XT X)−1XT Y
using N past observations
Least Mean Square
Provides a way to compute
θ recursively with µ as the
step size
0 5 10 15 20
202530354045
Accuracy: 2°C
Constant model
Time (Hour)
Temperature(°C)
q qq qqq qq q q qq q q q q q q q q q
0 5 10 15 20
202530354045
Accuracy: 2°C
AR(2)
Time (Hour)
Temperature(°C)
q qq q q qqq q qq q q q q q q

17

Preliminaries Adaptive Model Selection Adaptive Model Selection
Temporal replicated models
From simple to complex model
Constant model
ˆsi [t] = si [t − 1]
Most simple
No parameter to compute
Not complex
Autoregressive model AR(p)
ˆsi [t] = θ1si [t−1]+. . .+θpsi [t−p]
Regression
θ = (XT X)−1XT Y
using N past observations
Least Mean Square
Provides a way to compute
θ recursively with µ as the
step size
0 5 10 15 20
202530354045
Accuracy: 2°C
Constant model
Time (Hour)
Temperature(°C)
q qq qqq qq q q qq q q q q q q q q q
0 5 10 15 20
202530354045
Accuracy: 2°C
AR(2)
Time (Hour)
Temperature(°C)
q qq q q qqq q qq q q q q q q

18

Preliminaries Adaptive Model Selection Adaptive Model Selection
State of the art: Temporal replicated models
Pros and cons
Pros:
Guarantee the observer with accuracy.
Simple or complex models can be used (from constant model
[Olston, 2001] to autoregressive models [Santini et al., Tulone
et al., 2006]).
Cons:
In most cases, no a priori information is available on the
measurements. Which model to choose a priori?
The metric (update rate) does not consider the number of
parameters of the models.

19

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Motivation
Tradeoff: More complex models better predict measurements,
but have a higher number of parameters.
Model complexity
Metric
Communication costs
Model error
AR(p) : ˆsi[t] =
p
j=1
θjsi[t − j]

20

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Collection of models
With Adaptive Model Selection (AMS), a collection of K
models {hk}, 1 ≤ k ≤ K, of increasing complexity are run by
the node. Wk: metric estimating the communication savings.
h1 h2 h3 h4
Model complexity
(W1) (W2) (W3) (W4)

21

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Metric to assess communication savings
Metric suggested: the weighted update rate
Wk[t] = CkUk[t]
Update rate Uk [t]: percentage of updates for model k at
epoch t ([Olston, 2001, Jain et al., 2004, Santini et al., Tulone
et al., 2006]).
Model cost Ck : takes into account the number of parameters
of the k-th model.
Ck = P
P−D+1
P: Size of the packet.
D: Size of the data load.
→ P − D is the packet overhead
SYNC Packet Address Message Group Data . . . Data CRC SYNC
BYTE Type Type ID Length BYTE
1 2 3 5 6 7 . . . Size D P-2 P

22

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Model selection
When an update is required, the model hk with
k = argminkWk[t] is sent to the base station.
Assuming stationarity in the data, the confidence in each
estimated Wk[t] increases with t.
Running poorly performing models is detrimental to energy
consumption.
Racing [Maron, 1997]: Model selection technique based on
the Hoeffding bound, which allows to discard poorly
performing models.

23

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Model selection
When an update is required, the model hk with
k = argminkWk[t] is sent to the base station.
Assuming stationarity in the data, the confidence in each
estimated Wk[t] increases with t.
Running poorly performing models is detrimental to energy
consumption.
Racing [Maron, 1997]: Model selection technique based on
the Hoeffding bound, which allows to discard poorly
performing models.

24

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Hoeffding bound
Let x be a random variable with range R. Let µ be its mean.
Let µ[t] be an estimate of µ using t samples of x.
Given a confidence 1 − δ, the Hoeffding bound states that
P(|µ − µ[t]| < ∆) > 1 − δ
with ∆ = R ln 1/δ
2t [Hoeffding, 1963].
In AMS, the random variable considered is the model
performance Wk. Wk[t] is the estimate of Wk after t epochs.
We have Wk[t] = CkU[t]. The range of Wk[t] is R = 100Ck
as U[t] ≤ 100.

25

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Hoeffding bound
Let x be a random variable with range R. Let µ be its mean.
Let µ[t] be an estimate of µ using t samples of x.
Given a confidence 1 − δ, the Hoeffding bound states that
P(|µ − µ[t]| < ∆) > 1 − δ
with ∆ = R ln 1/δ
2t [Hoeffding, 1963].
In AMS, the random variable considered is the model
performance Wk. Wk[t] is the estimate of Wk after t epochs.
We have Wk[t] = CkU[t]. The range of Wk[t] is R = 100Ck
as U[t] ≤ 100.

26

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Racing
Best model: Wk[t] for which k = arg minkWk[t]. Upper
bound is Wk[t] + 100Ck
ln 1/δ
2t .
If a model hk has
Wk [t] − 100Ck
ln 1/δ
2t
> Wk[t] + 100Ck
ln 1/δ
2t
then it can be discarded.
Model type
Weightedupdaterate
h1 h2 h3 h4 h5 h6
Upper bound
for error
Estimated error
for
h3
h4
The test used is Wk [t] − Wk[t] > 100(Ck + Ck ) ln 1/δ
2t
Using racing, models h1 and h6 are discarded.

27

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Racing
Model type
Weightedupdaterate
h1 h2 h3 h4 h5 h6
Upper bound
for error
Estimated error
for
h3
h4
The test used is Wk [t] − Wk[t] > 100(Ck + Ck ) ln 1/δ
2t
Using racing, models h1 and h6 are discarded.

28

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Experimental evaluation
14 time series, various types of measured physical quantities.
Data set Sensed quantity Sampling period Duration Number of samples
S Heater temperature 3 seconds 6h15 3000
I Light light 5 minutes 8 days 1584
M Hum humidity 10 minutes 30 days 4320
M Temp temperature 10 minutes 30 days 4320
NDBC WD wind direction 1 hour 1 year 7564
NDBC WSPD wind speed 1 hour 1 year 7564
NDBC DPD dominant wave period 1 hour 1 year 7562
NDBC AVP average wave period 1 hour 1 year 8639
NDBC BAR air pressure 1 hour 1 year 8639
NDBC ATMP air temperature 1 hour 1 year 8639
NDBC WTMP water temperature 1 hour 1 year 8734
NDBC DEWP dewpoint temperature 1 hour 1 year 8734
NDBC GST gust speed 1 hour 1 year 8710
NDBC WVHT wave height 1 hour 1 year 8723
Error threshold is set to 0.01r where r is the range of the
measurements.
AMS is run with k = 6 models: the constant model (CM) and
autoregressive models AR(p) with p ranging from 1 to 5.

29

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Experimental evaluation
Assuming a packet overhead of 24 bytes, the model cost is set
to CCM = 1 for the CM, and to CAR(p) = 24+2p
24+1 for an AR(p).
CM AR1 AR2 AR3 AR4 AR5 AMS
S Heater 74 78 68 70 76 81 AR2
I Light 38 42 44 48 51 53 CM
M Hum 53 55 55 60 62 66 CM
M Temp 48 50 50 54 56 60 CM
NDBC DPD 65 89 89 95 102 109 CM
NDBC AWP 72 75 81 88 93 99 CM
NDBC BAR 51 52 44 47 49 50 AR2
NDBC ATMP 39 41 40 43 46 49 CM
NDBC WTMP 27 28 23 25 27 28 AR2
NDBC DEWP 57 54 58 62 67 71 AR1
NDBC WSPD 74 87 92 99 106 113 CM
NDBC WD 85 84 91 98 104 111 AR1
NDBC GST 80 84 90 96 103 110 CM
NDBC WVHT 58 58 63 67 71 76 CM
Bold numbers report significantly better update rates
(Hoeffding bound, δ = 0.05).
For all time series, the AMS selects the best model.

30

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Experimental evaluation
Number of models remaining over time:
0 200 400 600 800 1000
02468101214
timeChange[1, ]
rep(1,7)
6 4 3 2 1
6 1
6 5 4 3 1
6 5 3 1
6 2 1
6 3 2 1
6 5 4 3 2 1
6 5 4 1
6 5 1
6 4 3 2 1
6 1
6 3 2 1
6 3 2 1
6 4 3 2 1
Time instants
S Heater
I Light
M Hum
M Temp
NDBC DPD
NDBC AWP
NDBC BAR
NDBC ATMP
NDBC WTMP
NDBC DEWP
NDBC WSPD
NDBC WD
NDBC GST
NDBC WVHT
The speed of convergence of the racing algorithm depends on
the time series, and is in most cases reasonably fast.

31

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Conclusions
In summary, Adaptive Model Selection
Takes into account the cost of sending model parameters,
Allows sensor nodes to determine autonomously the model
which best fits their measurements,
Provides a statistically sound selection mechanism to discard
poorly performing models,
Gave in experimental results about 45% of communication
savings on average,
Was implemented in TinyOS, the reference operating system.

32

Preliminaries Adaptive Model Selection Adaptive Model Selection
Additional contributions
Several deployments at the ULB.
Data sets (and code) available at www.ulb.ac.be/di/labo.
Microclimate monitoring - Solbosch greenhouses.
18 sensors in three greenhouses. Several experiments.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
Experimental setups monitoring - Unit of Social Ecology.
18 sensors in three experimental labs, running for 5 days.
Data collected: Temperature, humidity and light.
Sampling interval: 5 minutes.
PIMAN project (R´egion Bruxelles Capitale - 2007/2008):
Goal: localize an operator in an industrial environment.
Techniques: Triangulation, multidimensional scaling, Kalman filters.
Several deployments (up to 48 sensors).

33

Preliminaries Adaptive Model Selection Adaptive Model Selection
Thank you for your attention
Questions?

34

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Data centric networking
A sensor network can be seen as a distributed database.
SQL can be used as the language to interact with the network:
SELECT temperature FROM sensors
WHERE location=[0,0, 15, 35]
DURATION=00:00:00,10:00:00
EPOCH DURATION 30s
1
5
7
6
3
2 4
Base
station
1
5
7
6
3
2 4
Base
station
1
5
7
6
3
2 4
Base
station
The query is broadcasted from the base station to the
network. Sensors involved in the query establish a routing
structure towards the base station.

35

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
ess Sensor Networks
allenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos node
Standby 5.1 µA
MCU Active 1.8 mA
MCU + Radio RX 21.8 mA
MCU + Radio TX (0dBm) 19.5 mA
he radio is the most energy consuming module.
% of energy consumption in typical data collection tasks [Madden, 2003].
un continuoulsy with the radio, the lifetime is about 5 days.
x10
The radio is the most energy consuming module.
95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.

36

Preliminaries Adaptive Model Selection Adaptive Model Selection
Wireless Sensor Networks
Challenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
ess Sensor Networks
allenges in environmental monitoring:
Long-running applications (months or years),
Limited energy on sensor nodes.
Operation mode Telos node
Standby 5.1 µA
MCU Active 1.8 mA
MCU + Radio RX 21.8 mA
MCU + Radio TX (0dBm) 19.5 mA
he radio is the most energy consuming module.
% of energy consumption in typical data collection tasks [Madden, 2003].
un continuoulsy with the radio, the lifetime is about 5 days.
x10
The radio is the most energy consuming module.
95% of energy consumption in typical data collection tasks [Madden, 2003].
If run continuoulsy with the radio, the lifetime is about 5 days.

37

Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learning
Overview
Goal: Uncover structure and relationships in a set of
observations, by means of models (mathematical functions).
!
!
!
!
! !
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
0 1 2 3 4 5 6 7
01234567
x[1:22]
y2[1:22]
!
!
!
!
! !
!
!
!
!
!
!
! !
!
! !
!
!
!
!
!
0 1 2 3 4 5 6 7
01234567
x[1:22]
y2[1:22]
Variable 1 ( )x Variable 1 ( )x
Variable2()y
Variable2()y
Learning
Procedure
Observations
y = h(x)Model
A learning procedure is used to find the model.

38

Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learning
Learning methodology
Unknown relationship
Observations
Learning procedure
Model
Input Output
Output
x y
ˆy
L(y, ˆy)Error
Input x can have several dimensions (Image classification)
Different models exist (linear models, neural networks,
decision trees, ...) with specific learning procedures.

39

Preliminaries Adaptive Model Selection Adaptive Model Selection
Machine learning
Modeling sensor data
Temporal model:
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 20 40 60 80 100 120
020406080100120
x
y
Time
Measurement
t
si[t]
Model
si[t] = θt
Training
examples
Input: Time.
Output: The measurement si [t] of a sensor i at time t.
Model: si [t] = θt.
The model approximates the set of measurements with just one
parameter θ.

40

Preliminaries Adaptive Model Selection Adaptive Model Selection
Learning with wireless sensor data
Thesis statement
Machine learning techniques can be used to reduce
communication by approximating sensor data with models.
→ Instead of sending all the measurements, only the parameters of
the models are transmitted.
Effective approach as sensor data are
temporally and spatially related (correlations)
Noisy: exact measurements rarely needed.

41

Preliminaries Adaptive Model Selection Adaptive Model Selection
Contribution I:
Adaptive Model Selection (AMS)
Temporal modeling

42

Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated models
Overview
Recall: In environmental monitoring, a sensor sends its
measurements periodically.
Measurements s[t] are sent at every time t.
Base
station
Wireless node
s[t]
s[t]
t
s[t]
t
Replicated models:
Models h are sent instead of the measurements.
Base
station
Wireless node
s[t]
t t
h
ˆs[t]

43

Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated models
Overview
Models computed by the sensor node
→ The node can compare the model prediction with the true
measurements:
A new model is sent if |s[t] − ˆs[t]| >
is user-defined, and application dependent.
Simple learning procedure must be used. Most simple model:
Constant model [Olston et al., 2001]
ˆsi [t] = si [t − 1]
Simply: The next measurement is the same as the previous
one
no parameter to compute

44

Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated models
Constant model
Temperature measurements, Solbosch greenhouse. = 2◦C.
160 170 180 190 200 210 220
30323436384042
Accuracy: 2°C
Constant model
Time instants
Temperature(°C)
q q q q q
Sensor node
Base station
5 updates instead of 58
→ more than 90% of communication savings.

45

Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated models
Autoregressive models
More complex models can be used: autoregressive models AR(p)
[Santini et al., Tulone et al., 2006].
ˆs[t] = θ1s[t − 1] + . . . + θps[t − p]
0 5 10 15 20
202530354045
Accuracy: 2°C
AR(2)
Time (Hour)
Temperature(°C)
● ●● ● ● ●●● ● ●● ● ● ● ● ● ●
Time (hours)
Temperature(°C)
An AR(2) reduces the number
of updates by 6 percents
in comparison to
the constant model.

46

Preliminaries Adaptive Model Selection Adaptive Model Selection
Replicated models
Pros and cons
Pros:
Guarantee the observer with accuracy.
Simple or complex models can be used.
Cons:
In most cases, no a priori information is available on the
measurements. Which model to choose a priori?
The metric (update rate) does not consider the number of
parameters of the models.

47

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Motivation
Tradeoff: More complex models better predict measurements,
but have a higher number of parameters.
Model complexity
Metric
Communication costs
Model error
AR(p) : ˆsi[t] =
p
j=1
θjsi[t − j]

48

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Collection of models
A collection of K models {hk}, 1 ≤ k ≤ K, of increasing
complexity are run by the node.
Base
station
Wireless node
s[t]
t t
{h1, h2,
. . . , hK}
ˆs[t]
h2
Wk: new metric estimating the communication costs.
When an update is needed, the model with the lowest Wk is
sent.

49

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Metric to assess communication savings
Wk: the weighted update rate
Wk[t] = CkUk[t]
Update rate Uk [t]: percentage of updates for model k at
epoch t ([Olston, 2001, Jain et al., 2004, Santini et al., Tulone
et al., 2006]).
Model cost Ck : takes into account the number of parameters
of the k-th model.
Ck = P
P−D+1
P: Size of the packet.
D: Size of the data load.
→ P − D is the packet overhead
SYNC Packet Address Message Group Data . . . Data CRC SYNC
BYTE Type Type ID Length BYTE
1 2 3 5 6 7 . . . Size D P-2 P

50

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Model selection
Model performances Wk[t] are estimated over time.
When data collection starts, no idea which model is best.
Running poorly performing models is detrimental to energy
consumption.
Racing [Maron, 1997]: Model selection technique based on
the Hoeffding bound, which allows to select the best
performing model.

51

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Model selection
Model performances Wk[t] are estimated over time.
When data collection starts, no idea which model is best.
Running poorly performing models is detrimental to energy
consumption.
Racing [Maron, 1997]: Model selection technique based on
the Hoeffding bound, which allows to select the best
performing model.

52

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Racing
Model type
Weightedupdaterate
h1 h2 h3 h4 h5 h6
W1[t]
Upper bound
forWk[t]
Lower bound
for W1[t]
W1[t]
W2[t]
W3[t]
W4[t]
W5[t]
W6[t]
At first, all models are in competition.

53

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Racing
Model type
Weightedupdaterate
h1 h2 h3 h4 h5 h6
W1[t]
Upper bound
for
Wk[t]
W1[t]
W2[t] W3[t]
W4[t]
W5[t]
W6[t]
As time passes, model h1 statistically outperforms h6.

54

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Racing
Model type
Weightedupdaterate
h1 h2 h3 h4 h5 h6
Upper bound
for
Wk[t]
W1[t]
W2[t]
W3[t]
W4[t]
W5[t]
W3[t]
h3 then statistically outperforms h5.

55

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Racing
Model type
Weightedupdaterate
h1 h2 h3 h4 h5 h6
Upper bound
for
Wk[t]
W1[t]
W2[t]
W3[t]
W4[t]
W3[t]
h3 finally is selected as the best one.

56

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Experimental evaluation
14 time series, various types of measured physical quantities.
Data set Sensed quantity Sampling period Duration Number of samples
S Heater temperature 3 seconds 6h15 3000
I Light light 5 minutes 8 days 1584
M Hum humidity 10 minutes 30 days 4320
M Temp temperature 10 minutes 30 days 4320
NDBC WD wind direction 1 hour 1 year 7564
NDBC WSPD wind speed 1 hour 1 year 7564
NDBC DPD dominant wave period 1 hour 1 year 7562
NDBC AVP average wave period 1 hour 1 year 8639
NDBC BAR air pressure 1 hour 1 year 8639
NDBC ATMP air temperature 1 hour 1 year 8639
NDBC WTMP water temperature 1 hour 1 year 8734
NDBC DEWP dewpoint temperature 1 hour 1 year 8734
NDBC GST gust speed 1 hour 1 year 8710
NDBC WVHT wave height 1 hour 1 year 8723
Error threshold is set to 0.01r where r is the range of the
measurements.
AMS is run with k = 6 models: the constant model (CM) and
autoregressive models AR(p) with p ranging from 1 to 5.

57

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Experimental evaluation
CM AR1 AR2 AR3 AR4 AR5 AMS
S Heater 74 78 68 70 76 81 AR2
I Light 38 42 44 48 51 53 CM
M Hum 53 55 55 60 62 66 CM
M Temp 48 50 50 54 56 60 CM
NDBC DPD 65 89 89 95 102 109 CM
NDBC AWP 72 75 81 88 93 99 CM
NDBC BAR 51 52 44 47 49 50 AR2
NDBC ATMP 39 41 40 43 46 49 CM
NDBC WTMP 27 28 23 25 27 28 AR2
NDBC DEWP 57 54 58 62 67 71 AR1
NDBC WSPD 74 87 92 99 106 113 CM
NDBC WD 85 84 91 98 104 111 AR1
NDBC GST 80 84 90 96 103 110 CM
NDBC WVHT 58 58 63 67 71 76 CM
Bold numbers report significantly better update rates
(Hoeffding bound, δ = 0.05).
For all time series, the AMS selects the best model.

58

Preliminaries Adaptive Model Selection Adaptive Model Selection
Adaptive Model Selection
Conclusions
In summary, Adaptive Model Selection
Takes into account the cost of sending model parameters,
Allows sensor nodes to determine autonomously the model
which best fits their measurements,
Provides a statistically sound selection mechanism to discard
poorly performing models,
Gave in experimental results about 45% of communication
savings on average,
Energy for computation is not a problem (negligible),
Was implemented in TinyOS, the reference operating system.

More Related Content

Adaptive model selection in Wireless Sensor Networks

  • 1. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Yann-A¨el Le Borgne, Gianluca Bontempi Machine Learning Group Department of Computer Science Universit´e Libre de Bruxelles May 15th, 2009
  • 2. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Wireless sensors: latest trend of Moore’s law (1965) “The number of transistors that can be placed inexpensively on an integrated circuit doubles every two years.” 1950 1990 2000 2010 Computing devices get Smaller Cheaper → Enable new kinds of interactions with our world
  • 3. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Wireless sensors Sensor nodes can collect, process and communicate data [Warneke et al., 2001; Akyildiz et al., 2002] TMote Sky Deputy dust Sensors: Light, temperature, humidity, pressure, acceleration, sound, . . . Radio: ∼ 10s kbps, ∼ 10s meters Microprocessor: A few MHz Memory: ∼ 10s KB
  • 4. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Environmental monitoring and periodic data collection Base station Wireless node Internet Sensor network Base station separate calibration procedures to test the system prior to placing it in the field, and then spent a day in the forest with ropes, harnesses, and a notebook. We decided on the following envelope for our deployment: Time: One month during the early summer, sampling all sensors once every 5 minutes. The early summer con- tains the most dynamic microclimatic variation. We decided that sampling every 5 minutes would be suffi- cient to capture that variation. Vertical Distance: 15m from ground level to 70m from ground level, with roughly a 2-meter spacing between nodes. This spatial density ensured that we could cap- ture gradients in enough detail to interpolate accu- rately. The envelope began at 15m because most of the foliage was in the upper region of the tree. Angular Location: The west side of the tree. The west side had a thicker canopy and provided the most buffer- ing against direct environmental effects. Radial Distance: 0.1-1.0m from the trunk. The nodes were placed very close to the trunk to ensure that we were capturing the microclimatic trends that affected the tree directly, and not the broader climate. Figure 1 shows the final placement of each mote in the tree. We also placed several nodes outside of our angular and radial envelope in order to monitor the microclimate in the immediate vicinity of other biological sensing equipment that had previously been installed. Figure 1: The placement of nodes within the tree 4.1 Hardware and Network Architecture The sensor node platform was a Mica2Dot, a repackaged Mica2 mote produced by Crossbow, with a 1 inch diameter form factor. The mote used an Atmel ATmega128 micro- controller running at 4 MHz, a 433 MHz radio from Chip- con operating at 40Kbps, and 512KB of flash memory. The mote was connected to digital sensors using I2C and SPI serial protocols and to analog sensors using the on-board ADC. The choice of measured parameters was driven by the bio- logical requirements. We measured traditional climate vari- ables – temperature, humidity, and light levels. Tempera- ture and relative humidity feed directly into transpiration models for redwood forests. Photosynthetically active radi- ation (PAR, wavelength from 350 to 700 nm) provides infor- mation about energy available for photosynthesis and tells us about drivers for the carbon balance in the forest. We measure both incident (direct) and reflected (ambient) levels of PAR. Incident measurements provide insight into the en- ergy available for photosynthesis, while the ratio of reflected to incident PAR allows for eventual validation of satellite remote sensing measurements of land surface reflectance. The Sensirion SHT11 digital sensor provided temperature (± 0.5◦ C) and humidity (± 3.5%) measurements. The in- cident and reflected PAR measurements were collected by two Hamamatsu S1087 photodiodes interfaced to the 10-bit ADC on Mica2Dot. The platform also included a TAOS TSL2550 sensor to measure total solar radiation (300nm - 1000nm), and an Intersema MS5534A to measure barometric pressure, but we chose not to use them in our deployment. During cali- bration, we found that the TSR sensor was overly sensitive, and would not produce useful information in direct sunlight. Because TSR and PAR would have told roughly the same story, and because PAR was more useful from the biology viewpoint, we decided not to gather data on total solar ra- diation. As for the pressure sensor, barometric pressure is simply too diffuse a phenomenon to show appreciable differ- ences over the height of a single redwood tree. A standard pressure gradient would exist as a direct function of height, but any pressure changes due to weather would affect the entire tree equally. Barometric pressure sensing should be useful in future large-scale climate studies. The package for such a deployment needs to protect the electronics from the weather while safely exposing the sen- sors. Our chosen sensing modalities place specific require- ments on the package. Standardized temperature and hu- midity sensing should be performed in a shaded area with adequate airflow, implying that the enclosure must provide such a space while absorbing little radiated heat. The out- put of the sensors that measure direct radiation is dependent on the sensor orientation, so the enclosure must expose these sensors and level their sensing plane. The sensors measuring ambient levels of PAR must be shaded but need a relatively wide field of view. The package designed for this deployment is shown in Fig- ure 2. The mote, the battery, and two sensor boards fit in- side the sealed cylindrical enclosure. The enclosure is milled from white HDPE, and reflects most of the radiated heat. The endcaps of the cylinder form two sensing surfaces – one captures direct radiation, the other captures all other mea- surements. The white “skirt” provides extra shade, protec- 53 One-to-one applications Many-to-one applications Data is sent by nodes periodically to a base station. Applications: Medical, interactive arts, ecology, industry, disaster prevention.
  • 5. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Challenges in environmental monitoring: Long-running applications (months or years), Limited energy on sensor nodes. Operation mode Telos node Standby 5.1 µA MCU Active 1.8 mA MCU + Radio RX 21.8 mA MCU + Radio TX (0dBm) 19.5 mA The radio is the most energy consuming module. 95% of energy consumption in typical data collection tasks [Madden, 2003]. If run continuoulsy with the radio, the lifetime is about 5 days.
  • 6. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Challenges in environmental monitoring: Long-running applications (months or years), Limited energy on sensor nodes. Operation mode Telos node Standby 5.1 µA MCU Active 1.8 mA MCU + Radio RX 21.8 mA MCU + Radio TX (0dBm) 19.5 mA The radio is the most energy consuming module. 95% of energy consumption in typical data collection tasks [Madden, 2003]. If run continuoulsy with the radio, the lifetime is about 5 days.
  • 7. Preliminaries Adaptive Model Selection Adaptive Model Selection Supervised learning Goal: Using examples, find relationships in data by means of prediction models hθ (parametric functions). ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 100 120 020406080100120 x y Time Measurement t si[t] Model si[t] = θt Training examples
  • 8. Preliminaries Adaptive Model Selection Adaptive Model Selection Machine learning Modeling data with parametric functions Let S = {1, 2, . . . , S} be the set of S sensor nodes. Let t ∈ N denote time instants, or epochs. Let si [t] be the measurement of sensor i ∈ S at epoch t. A model is a parametric function hθ : Rn → R x → ˆsi [t] = hθ(x) hθ : Model with parameter θ ∈ Rp x ∈ Rn: Input. ˆsi [t] ∈ R: approximation to si [t] Temporal models, e.g., ˆsi [t] = θ1si [t − 1] + θ2si [t − 2] x = (si [t − 1], si [t − 2]): past measurements of sensor i are used to model si [t]. Spatial models, e.g., ˆsi [t] = θ1sj [t] + θ2sk [t] x = (sj [t], sk [t]), j, k ∈ S: measurements of sensors j and k are used to model measurements of sensor i.
  • 9. Preliminaries Adaptive Model Selection Adaptive Model Selection Machine learning Modeling data with parametric functions Let S = {1, 2, . . . , S} be the set of S sensor nodes. Let t ∈ N denote time instants, or epochs. Let si [t] be the measurement of sensor i ∈ S at epoch t. A model is a parametric function hθ : Rn → R x → ˆsi [t] = hθ(x) hθ : Model with parameter θ ∈ Rp x ∈ Rn: Input. ˆsi [t] ∈ R: approximation to si [t] Temporal models, e.g., ˆsi [t] = θ1si [t − 1] + θ2si [t − 2] x = (si [t − 1], si [t − 2]): past measurements of sensor i are used to model si [t]. Spatial models, e.g., ˆsi [t] = θ1sj [t] + θ2sk [t] x = (sj [t], sk [t]), j, k ∈ S: measurements of sensors j and k are used to model measurements of sensor i.
  • 10. Preliminaries Adaptive Model Selection Adaptive Model Selection Machine learning Modeling data with parametric functions Let S = {1, 2, . . . , S} be the set of S sensor nodes. Let t ∈ N denote time instants, or epochs. Let si [t] be the measurement of sensor i ∈ S at epoch t. A model is a parametric function hθ : Rn → R x → ˆsi [t] = hθ(x) hθ : Model with parameter θ ∈ Rp x ∈ Rn: Input. ˆsi [t] ∈ R: approximation to si [t] Temporal models, e.g., ˆsi [t] = θ1si [t − 1] + θ2si [t − 2] x = (si [t − 1], si [t − 2]): past measurements of sensor i are used to model si [t]. Spatial models, e.g., ˆsi [t] = θ1sj [t] + θ2sk [t] x = (sj [t], sk [t]), j, k ∈ S: measurements of sensors j and k are used to model measurements of sensor i.
  • 11. Preliminaries Adaptive Model Selection Adaptive Model Selection Learning Learning procedure The model parameters are obtained by a learning procedure, based on a training set of N examples. A loss function L(y, ˆy) is used to minimize the model error. Target function Prediction model Training set output ˆy output yinput x error L(y, ˆy) Learning procedure Loss function: Quadratic, Zero/One, . . . Learning procedure: Linear/nonlinear regression, K nearest neighbors, SVM, . . .
  • 12. Preliminaries Adaptive Model Selection Adaptive Model Selection Learning Learning procedure The model parameters are obtained by a learning procedure, based on a training set of N examples. A loss function L(y, ˆy) is used to minimize the model error. Target function Prediction model Training set output ˆy output yinput x error L(y, ˆy) Learning procedure Loss function: Quadratic, Zero/One, . . . Learning procedure: Linear/nonlinear regression, K nearest neighbors, SVM, . . .
  • 13. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection (AMS)
  • 14. Preliminaries Adaptive Model Selection Adaptive Model Selection State of the art: Temporal replicated models Overview Base station Wireless node Model Copy of modelhθ hθ Models are used to predict sensors’ measurements over time. A user defined threshold determines when a sensor node updates the model. Constant model [Olston et al., 2001] ˆsi [t] = si [t − 1] Most simple: no parameter to compute
  • 15. Preliminaries Adaptive Model Selection Adaptive Model Selection Temporal replicated models Overview Temperature measurements, Solbosch greenhouse. 160 170 180 190 200 210 220 30323436384042 Accuracy: 2°C Constant model Time instants Temperature(°C) q q q q q Sensor node Base station 5 updates instead of 58 → more than 90% of communication savings.
  • 16. Preliminaries Adaptive Model Selection Adaptive Model Selection Temporal replicated models From simple to complex model Constant model ˆsi [t] = si [t − 1] Most simple No parameter to compute Not complex Autoregressive model AR(p) ˆsi [t] = θ1si [t−1]+. . .+θpsi [t−p] Regression θ = (XT X)−1XT Y using N past observations Least Mean Square Provides a way to compute θ recursively with µ as the step size 0 5 10 15 20 202530354045 Accuracy: 2°C Constant model Time (Hour) Temperature(°C) q qq qqq qq q q qq q q q q q q q q q 0 5 10 15 20 202530354045 Accuracy: 2°C AR(2) Time (Hour) Temperature(°C) q qq q q qqq q qq q q q q q q
  • 17. Preliminaries Adaptive Model Selection Adaptive Model Selection Temporal replicated models From simple to complex model Constant model ˆsi [t] = si [t − 1] Most simple No parameter to compute Not complex Autoregressive model AR(p) ˆsi [t] = θ1si [t−1]+. . .+θpsi [t−p] Regression θ = (XT X)−1XT Y using N past observations Least Mean Square Provides a way to compute θ recursively with µ as the step size 0 5 10 15 20 202530354045 Accuracy: 2°C Constant model Time (Hour) Temperature(°C) q qq qqq qq q q qq q q q q q q q q q 0 5 10 15 20 202530354045 Accuracy: 2°C AR(2) Time (Hour) Temperature(°C) q qq q q qqq q qq q q q q q q
  • 18. Preliminaries Adaptive Model Selection Adaptive Model Selection State of the art: Temporal replicated models Pros and cons Pros: Guarantee the observer with accuracy. Simple or complex models can be used (from constant model [Olston, 2001] to autoregressive models [Santini et al., Tulone et al., 2006]). Cons: In most cases, no a priori information is available on the measurements. Which model to choose a priori? The metric (update rate) does not consider the number of parameters of the models.
  • 19. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Motivation Tradeoff: More complex models better predict measurements, but have a higher number of parameters. Model complexity Metric Communication costs Model error AR(p) : ˆsi[t] = p j=1 θjsi[t − j]
  • 20. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Collection of models With Adaptive Model Selection (AMS), a collection of K models {hk}, 1 ≤ k ≤ K, of increasing complexity are run by the node. Wk: metric estimating the communication savings. h1 h2 h3 h4 Model complexity (W1) (W2) (W3) (W4)
  • 21. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Metric to assess communication savings Metric suggested: the weighted update rate Wk[t] = CkUk[t] Update rate Uk [t]: percentage of updates for model k at epoch t ([Olston, 2001, Jain et al., 2004, Santini et al., Tulone et al., 2006]). Model cost Ck : takes into account the number of parameters of the k-th model. Ck = P P−D+1 P: Size of the packet. D: Size of the data load. → P − D is the packet overhead SYNC Packet Address Message Group Data . . . Data CRC SYNC BYTE Type Type ID Length BYTE 1 2 3 5 6 7 . . . Size D P-2 P
  • 22. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Model selection When an update is required, the model hk with k = argminkWk[t] is sent to the base station. Assuming stationarity in the data, the confidence in each estimated Wk[t] increases with t. Running poorly performing models is detrimental to energy consumption. Racing [Maron, 1997]: Model selection technique based on the Hoeffding bound, which allows to discard poorly performing models.
  • 23. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Model selection When an update is required, the model hk with k = argminkWk[t] is sent to the base station. Assuming stationarity in the data, the confidence in each estimated Wk[t] increases with t. Running poorly performing models is detrimental to energy consumption. Racing [Maron, 1997]: Model selection technique based on the Hoeffding bound, which allows to discard poorly performing models.
  • 24. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Hoeffding bound Let x be a random variable with range R. Let µ be its mean. Let µ[t] be an estimate of µ using t samples of x. Given a confidence 1 − δ, the Hoeffding bound states that P(|µ − µ[t]| < ∆) > 1 − δ with ∆ = R ln 1/δ 2t [Hoeffding, 1963]. In AMS, the random variable considered is the model performance Wk. Wk[t] is the estimate of Wk after t epochs. We have Wk[t] = CkU[t]. The range of Wk[t] is R = 100Ck as U[t] ≤ 100.
  • 25. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Hoeffding bound Let x be a random variable with range R. Let µ be its mean. Let µ[t] be an estimate of µ using t samples of x. Given a confidence 1 − δ, the Hoeffding bound states that P(|µ − µ[t]| < ∆) > 1 − δ with ∆ = R ln 1/δ 2t [Hoeffding, 1963]. In AMS, the random variable considered is the model performance Wk. Wk[t] is the estimate of Wk after t epochs. We have Wk[t] = CkU[t]. The range of Wk[t] is R = 100Ck as U[t] ≤ 100.
  • 26. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Racing Best model: Wk[t] for which k = arg minkWk[t]. Upper bound is Wk[t] + 100Ck ln 1/δ 2t . If a model hk has Wk [t] − 100Ck ln 1/δ 2t > Wk[t] + 100Ck ln 1/δ 2t then it can be discarded. Model type Weightedupdaterate h1 h2 h3 h4 h5 h6 Upper bound for error Estimated error for h3 h4 The test used is Wk [t] − Wk[t] > 100(Ck + Ck ) ln 1/δ 2t Using racing, models h1 and h6 are discarded.
  • 27. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Racing Model type Weightedupdaterate h1 h2 h3 h4 h5 h6 Upper bound for error Estimated error for h3 h4 The test used is Wk [t] − Wk[t] > 100(Ck + Ck ) ln 1/δ 2t Using racing, models h1 and h6 are discarded.
  • 28. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Experimental evaluation 14 time series, various types of measured physical quantities. Data set Sensed quantity Sampling period Duration Number of samples S Heater temperature 3 seconds 6h15 3000 I Light light 5 minutes 8 days 1584 M Hum humidity 10 minutes 30 days 4320 M Temp temperature 10 minutes 30 days 4320 NDBC WD wind direction 1 hour 1 year 7564 NDBC WSPD wind speed 1 hour 1 year 7564 NDBC DPD dominant wave period 1 hour 1 year 7562 NDBC AVP average wave period 1 hour 1 year 8639 NDBC BAR air pressure 1 hour 1 year 8639 NDBC ATMP air temperature 1 hour 1 year 8639 NDBC WTMP water temperature 1 hour 1 year 8734 NDBC DEWP dewpoint temperature 1 hour 1 year 8734 NDBC GST gust speed 1 hour 1 year 8710 NDBC WVHT wave height 1 hour 1 year 8723 Error threshold is set to 0.01r where r is the range of the measurements. AMS is run with k = 6 models: the constant model (CM) and autoregressive models AR(p) with p ranging from 1 to 5.
  • 29. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Experimental evaluation Assuming a packet overhead of 24 bytes, the model cost is set to CCM = 1 for the CM, and to CAR(p) = 24+2p 24+1 for an AR(p). CM AR1 AR2 AR3 AR4 AR5 AMS S Heater 74 78 68 70 76 81 AR2 I Light 38 42 44 48 51 53 CM M Hum 53 55 55 60 62 66 CM M Temp 48 50 50 54 56 60 CM NDBC DPD 65 89 89 95 102 109 CM NDBC AWP 72 75 81 88 93 99 CM NDBC BAR 51 52 44 47 49 50 AR2 NDBC ATMP 39 41 40 43 46 49 CM NDBC WTMP 27 28 23 25 27 28 AR2 NDBC DEWP 57 54 58 62 67 71 AR1 NDBC WSPD 74 87 92 99 106 113 CM NDBC WD 85 84 91 98 104 111 AR1 NDBC GST 80 84 90 96 103 110 CM NDBC WVHT 58 58 63 67 71 76 CM Bold numbers report significantly better update rates (Hoeffding bound, δ = 0.05). For all time series, the AMS selects the best model.
  • 30. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Experimental evaluation Number of models remaining over time: 0 200 400 600 800 1000 02468101214 timeChange[1, ] rep(1,7) 6 4 3 2 1 6 1 6 5 4 3 1 6 5 3 1 6 2 1 6 3 2 1 6 5 4 3 2 1 6 5 4 1 6 5 1 6 4 3 2 1 6 1 6 3 2 1 6 3 2 1 6 4 3 2 1 Time instants S Heater I Light M Hum M Temp NDBC DPD NDBC AWP NDBC BAR NDBC ATMP NDBC WTMP NDBC DEWP NDBC WSPD NDBC WD NDBC GST NDBC WVHT The speed of convergence of the racing algorithm depends on the time series, and is in most cases reasonably fast.
  • 31. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Conclusions In summary, Adaptive Model Selection Takes into account the cost of sending model parameters, Allows sensor nodes to determine autonomously the model which best fits their measurements, Provides a statistically sound selection mechanism to discard poorly performing models, Gave in experimental results about 45% of communication savings on average, Was implemented in TinyOS, the reference operating system.
  • 32. Preliminaries Adaptive Model Selection Adaptive Model Selection Additional contributions Several deployments at the ULB. Data sets (and code) available at www.ulb.ac.be/di/labo. Microclimate monitoring - Solbosch greenhouses. 18 sensors in three greenhouses. Several experiments. Data collected: Temperature, humidity and light. Sampling interval: 5 minutes. Experimental setups monitoring - Unit of Social Ecology. 18 sensors in three experimental labs, running for 5 days. Data collected: Temperature, humidity and light. Sampling interval: 5 minutes. PIMAN project (R´egion Bruxelles Capitale - 2007/2008): Goal: localize an operator in an industrial environment. Techniques: Triangulation, multidimensional scaling, Kalman filters. Several deployments (up to 48 sensors).
  • 33. Preliminaries Adaptive Model Selection Adaptive Model Selection Thank you for your attention Questions?
  • 34. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Data centric networking A sensor network can be seen as a distributed database. SQL can be used as the language to interact with the network: SELECT temperature FROM sensors WHERE location=[0,0, 15, 35] DURATION=00:00:00,10:00:00 EPOCH DURATION 30s 1 5 7 6 3 2 4 Base station 1 5 7 6 3 2 4 Base station 1 5 7 6 3 2 4 Base station The query is broadcasted from the base station to the network. Sensors involved in the query establish a routing structure towards the base station.
  • 35. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Challenges in environmental monitoring: Long-running applications (months or years), Limited energy on sensor nodes. ess Sensor Networks allenges in environmental monitoring: Long-running applications (months or years), Limited energy on sensor nodes. Operation mode Telos node Standby 5.1 µA MCU Active 1.8 mA MCU + Radio RX 21.8 mA MCU + Radio TX (0dBm) 19.5 mA he radio is the most energy consuming module. % of energy consumption in typical data collection tasks [Madden, 2003]. un continuoulsy with the radio, the lifetime is about 5 days. x10 The radio is the most energy consuming module. 95% of energy consumption in typical data collection tasks [Madden, 2003]. If run continuoulsy with the radio, the lifetime is about 5 days.
  • 36. Preliminaries Adaptive Model Selection Adaptive Model Selection Wireless Sensor Networks Challenges in environmental monitoring: Long-running applications (months or years), Limited energy on sensor nodes. ess Sensor Networks allenges in environmental monitoring: Long-running applications (months or years), Limited energy on sensor nodes. Operation mode Telos node Standby 5.1 µA MCU Active 1.8 mA MCU + Radio RX 21.8 mA MCU + Radio TX (0dBm) 19.5 mA he radio is the most energy consuming module. % of energy consumption in typical data collection tasks [Madden, 2003]. un continuoulsy with the radio, the lifetime is about 5 days. x10 The radio is the most energy consuming module. 95% of energy consumption in typical data collection tasks [Madden, 2003]. If run continuoulsy with the radio, the lifetime is about 5 days.
  • 37. Preliminaries Adaptive Model Selection Adaptive Model Selection Machine learning Overview Goal: Uncover structure and relationships in a set of observations, by means of models (mathematical functions). ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 0 1 2 3 4 5 6 7 01234567 x[1:22] y2[1:22] ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 0 1 2 3 4 5 6 7 01234567 x[1:22] y2[1:22] Variable 1 ( )x Variable 1 ( )x Variable2()y Variable2()y Learning Procedure Observations y = h(x)Model A learning procedure is used to find the model.
  • 38. Preliminaries Adaptive Model Selection Adaptive Model Selection Machine learning Learning methodology Unknown relationship Observations Learning procedure Model Input Output Output x y ˆy L(y, ˆy)Error Input x can have several dimensions (Image classification) Different models exist (linear models, neural networks, decision trees, ...) with specific learning procedures.
  • 39. Preliminaries Adaptive Model Selection Adaptive Model Selection Machine learning Modeling sensor data Temporal model: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 20 40 60 80 100 120 020406080100120 x y Time Measurement t si[t] Model si[t] = θt Training examples Input: Time. Output: The measurement si [t] of a sensor i at time t. Model: si [t] = θt. The model approximates the set of measurements with just one parameter θ.
  • 40. Preliminaries Adaptive Model Selection Adaptive Model Selection Learning with wireless sensor data Thesis statement Machine learning techniques can be used to reduce communication by approximating sensor data with models. → Instead of sending all the measurements, only the parameters of the models are transmitted. Effective approach as sensor data are temporally and spatially related (correlations) Noisy: exact measurements rarely needed.
  • 41. Preliminaries Adaptive Model Selection Adaptive Model Selection Contribution I: Adaptive Model Selection (AMS) Temporal modeling
  • 42. Preliminaries Adaptive Model Selection Adaptive Model Selection Replicated models Overview Recall: In environmental monitoring, a sensor sends its measurements periodically. Measurements s[t] are sent at every time t. Base station Wireless node s[t] s[t] t s[t] t Replicated models: Models h are sent instead of the measurements. Base station Wireless node s[t] t t h ˆs[t]
  • 43. Preliminaries Adaptive Model Selection Adaptive Model Selection Replicated models Overview Models computed by the sensor node → The node can compare the model prediction with the true measurements: A new model is sent if |s[t] − ˆs[t]| > is user-defined, and application dependent. Simple learning procedure must be used. Most simple model: Constant model [Olston et al., 2001] ˆsi [t] = si [t − 1] Simply: The next measurement is the same as the previous one no parameter to compute
  • 44. Preliminaries Adaptive Model Selection Adaptive Model Selection Replicated models Constant model Temperature measurements, Solbosch greenhouse. = 2◦C. 160 170 180 190 200 210 220 30323436384042 Accuracy: 2°C Constant model Time instants Temperature(°C) q q q q q Sensor node Base station 5 updates instead of 58 → more than 90% of communication savings.
  • 45. Preliminaries Adaptive Model Selection Adaptive Model Selection Replicated models Autoregressive models More complex models can be used: autoregressive models AR(p) [Santini et al., Tulone et al., 2006]. ˆs[t] = θ1s[t − 1] + . . . + θps[t − p] 0 5 10 15 20 202530354045 Accuracy: 2°C AR(2) Time (Hour) Temperature(°C) ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● Time (hours) Temperature(°C) An AR(2) reduces the number of updates by 6 percents in comparison to the constant model.
  • 46. Preliminaries Adaptive Model Selection Adaptive Model Selection Replicated models Pros and cons Pros: Guarantee the observer with accuracy. Simple or complex models can be used. Cons: In most cases, no a priori information is available on the measurements. Which model to choose a priori? The metric (update rate) does not consider the number of parameters of the models.
  • 47. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Motivation Tradeoff: More complex models better predict measurements, but have a higher number of parameters. Model complexity Metric Communication costs Model error AR(p) : ˆsi[t] = p j=1 θjsi[t − j]
  • 48. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Collection of models A collection of K models {hk}, 1 ≤ k ≤ K, of increasing complexity are run by the node. Base station Wireless node s[t] t t {h1, h2, . . . , hK} ˆs[t] h2 Wk: new metric estimating the communication costs. When an update is needed, the model with the lowest Wk is sent.
  • 49. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Metric to assess communication savings Wk: the weighted update rate Wk[t] = CkUk[t] Update rate Uk [t]: percentage of updates for model k at epoch t ([Olston, 2001, Jain et al., 2004, Santini et al., Tulone et al., 2006]). Model cost Ck : takes into account the number of parameters of the k-th model. Ck = P P−D+1 P: Size of the packet. D: Size of the data load. → P − D is the packet overhead SYNC Packet Address Message Group Data . . . Data CRC SYNC BYTE Type Type ID Length BYTE 1 2 3 5 6 7 . . . Size D P-2 P
  • 50. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Model selection Model performances Wk[t] are estimated over time. When data collection starts, no idea which model is best. Running poorly performing models is detrimental to energy consumption. Racing [Maron, 1997]: Model selection technique based on the Hoeffding bound, which allows to select the best performing model.
  • 51. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Model selection Model performances Wk[t] are estimated over time. When data collection starts, no idea which model is best. Running poorly performing models is detrimental to energy consumption. Racing [Maron, 1997]: Model selection technique based on the Hoeffding bound, which allows to select the best performing model.
  • 52. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Racing Model type Weightedupdaterate h1 h2 h3 h4 h5 h6 W1[t] Upper bound forWk[t] Lower bound for W1[t] W1[t] W2[t] W3[t] W4[t] W5[t] W6[t] At first, all models are in competition.
  • 53. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Racing Model type Weightedupdaterate h1 h2 h3 h4 h5 h6 W1[t] Upper bound for Wk[t] W1[t] W2[t] W3[t] W4[t] W5[t] W6[t] As time passes, model h1 statistically outperforms h6.
  • 54. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Racing Model type Weightedupdaterate h1 h2 h3 h4 h5 h6 Upper bound for Wk[t] W1[t] W2[t] W3[t] W4[t] W5[t] W3[t] h3 then statistically outperforms h5.
  • 55. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Racing Model type Weightedupdaterate h1 h2 h3 h4 h5 h6 Upper bound for Wk[t] W1[t] W2[t] W3[t] W4[t] W3[t] h3 finally is selected as the best one.
  • 56. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Experimental evaluation 14 time series, various types of measured physical quantities. Data set Sensed quantity Sampling period Duration Number of samples S Heater temperature 3 seconds 6h15 3000 I Light light 5 minutes 8 days 1584 M Hum humidity 10 minutes 30 days 4320 M Temp temperature 10 minutes 30 days 4320 NDBC WD wind direction 1 hour 1 year 7564 NDBC WSPD wind speed 1 hour 1 year 7564 NDBC DPD dominant wave period 1 hour 1 year 7562 NDBC AVP average wave period 1 hour 1 year 8639 NDBC BAR air pressure 1 hour 1 year 8639 NDBC ATMP air temperature 1 hour 1 year 8639 NDBC WTMP water temperature 1 hour 1 year 8734 NDBC DEWP dewpoint temperature 1 hour 1 year 8734 NDBC GST gust speed 1 hour 1 year 8710 NDBC WVHT wave height 1 hour 1 year 8723 Error threshold is set to 0.01r where r is the range of the measurements. AMS is run with k = 6 models: the constant model (CM) and autoregressive models AR(p) with p ranging from 1 to 5.
  • 57. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Experimental evaluation CM AR1 AR2 AR3 AR4 AR5 AMS S Heater 74 78 68 70 76 81 AR2 I Light 38 42 44 48 51 53 CM M Hum 53 55 55 60 62 66 CM M Temp 48 50 50 54 56 60 CM NDBC DPD 65 89 89 95 102 109 CM NDBC AWP 72 75 81 88 93 99 CM NDBC BAR 51 52 44 47 49 50 AR2 NDBC ATMP 39 41 40 43 46 49 CM NDBC WTMP 27 28 23 25 27 28 AR2 NDBC DEWP 57 54 58 62 67 71 AR1 NDBC WSPD 74 87 92 99 106 113 CM NDBC WD 85 84 91 98 104 111 AR1 NDBC GST 80 84 90 96 103 110 CM NDBC WVHT 58 58 63 67 71 76 CM Bold numbers report significantly better update rates (Hoeffding bound, δ = 0.05). For all time series, the AMS selects the best model.
  • 58. Preliminaries Adaptive Model Selection Adaptive Model Selection Adaptive Model Selection Conclusions In summary, Adaptive Model Selection Takes into account the cost of sending model parameters, Allows sensor nodes to determine autonomously the model which best fits their measurements, Provides a statistically sound selection mechanism to discard poorly performing models, Gave in experimental results about 45% of communication savings on average, Energy for computation is not a problem (negligible), Was implemented in TinyOS, the reference operating system.