Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Payless: A Low Cost Network Monitoring Framework For Software Defined Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

PayLess: A Low Cost Network Monitoring

Framework for Software Defined Networks

Shihabur Rahman Chowdhury, Md. Faizul Bari, Reaz Ahmed, and Raouf Boutaba
David R. Cheriton School of Computer Science, University of Waterloo
{sr2chowdhury | mfbari | r5ahmed | rboutaba}@uwaterloo.ca

Abstract—Software Defined Networking promises to simplify The monitoring framework should accumulate, process and
network management tasks by separating the control plane (a deliver the monitored data at requested aggregation level and
central controller) from the data plane (switches). OpenFlow has frequency, without introducing too much monitoring overhead
emerged as the de facto standard for communication between into the system.
the controller and switches. Apart from providing flow control
and communication interfaces, OpenFlow provides a flow level Although accurate and timely monitoring is essential for
statistics collection mechanism from the data plane. It exposes seamless network management, contemporary solutions for
a high level interface for per flow and aggregate statistics monitoring IP networks are ad-hoc in nature and hard to im-
collection. Network applications can use this high level interface plement. Monitoring methods in IP networks can be classified
to monitor network status without being concerned about the as direct and sampling based [1], [2], [3]. Direct measurement
low level details. In order to keep the switch design simple,
methods incur significant network overhead, while sampling
this statistics collection mechanism is implemented as a pull-
based service, i.e. network applications and in turn the controller based methods overcome this problem by sacrificing accuracy.
has to periodically query the switches about flow statistics. The Moreover, different network equipment vendors have propri-
frequency of polling the switches determines monitoring accuracy etary technologies to collect statistics about the traffic [1],
and network overhead. In this paper, we focus on this trade-off [3], [4]. The lack of openness and interoperability between
between monitoring accuracy, timeliness and network overhead. these methods and technologies have made the traffic statistics
We propose PayLess – a monitoring framework for SDN. PayLess collection a complex task in traditional IP networks.
provides a flexible RESTful API for flow statistics collection at
different aggregation levels. It uses an adaptive statistics collection More recently, Software Defined Networking (SDN) has
algorithm that delivers highly accurate information in real-time emerged with the promise to facilitate network programmabil-
without incurring significant network overhead. We utilize the ity and ease the management tasks. SDN proposes to decouple
Floodlight controller’s API to implement the proposed monitoring control plane from data plane. Data plane functionality of
framework. The effectiveness of our solution is demonstrated packet forwarding is built into switching fabric, whereas the
through emulations in Mininet. control plane functionality of controlling network devices is
placed in a logically centralized software component called
I. I NTRODUCTION controller. The control plane provides a programmatic interface
for developing management programs, as opposed to providing
Monitoring is crucial to network management. Manage- a configuration interface for tuning network properties. From
ment applications require accurate and timely statistics on a management point of view, this added programmability
network resources at different aggregation levels. Yet, the opens the opportunity to reduce the complexity of distributed
network overhead for statistics collection should be minimal. configuration and ease the network management tasks [5].
Accurate and timely statistics is essential for many network
management tasks, like load balancing, traffic engineering, The OpenFlow [6] protocol has been accepted as the de
enforcing Service Level Agreement (SLA), accounting and facto interface between the control and data planes. OpenFlow
intrusion detection. Management applications may need to provides per flow1 statistics collection primitives at the con-
monitor network resources at different aggregation levels. troller. The controller can poll a switch to collect statistics
For example, an ISP’s billing system would require monthly on the active flows. Alternatively, it can request a switch to
upstream and downstream usage data for each user, an SLA push flow statistics (upon flow timeout) at a specific frequency.
enforcement application may require per queue packet drop The controller has a global view of the network. Sophisticated
rate at ingress and egress switches to ensure bounds on packet and effective monitoring solutions can be developed using
drops, a load balancing application may require a switch’s per these capabilities of an OpenFlow Controller. However, in
port traffic per unit time. the current scenario, a network management application for
SDN, would be a part of the control plane, rather than being
A well designed network monitoring framework should independent of it. This is due to the heterogeneity in the
provide the management applications with a wide selection of controller technologies, and the absence of a uniform abstract
network metrics to monitor at different levels of aggregation, view of the network resources.
accuracy and timeliness. Ideally, it is the responsibility of the
monitoring framework to select and poll the network resources In this paper, we propose PayLess, a network monitoring
unless otherwise specified by the management applications. framework for SDN. PayLess offers a number of advantages
978-1-4799-0913-1/14/$31.00
c 2014 IEEE 1A flow is identified by a ordered set of Layer 2-4 header fields
towards developing network management applications on top flow level statistics. Instead of continuously polling a switch,
of the SDN controller platform. First, PayLess provides an PayLess offers an adaptive scheduling algorithm for polling
abstract view of the network and an uniform way to request that achieves the same level of accuracy as continuous polling
statistics about the resources. Second, PayLess itself is de- with much less communication overhead. In [10] the authors
veloped as a collection of pluggable components. Interaction have motivated the importance of identifying large traffic
between these components are abstracted by well-defined aggregates in a network and proposed a monitoring framework
interfaces. Hence, one can develop custom components and utilizing secondary controllers to identify and monitor such
plug into the PayLess framework. Highly variable tasks, like aggregates using a small set of rules that changes dynamically
data aggregation level and sampling method, can be easily with traffic load. This work differs significantly from PayLess.
customized in PayLess. We also study the resource-accuracy Whereas PayLess’s target is to monitor all flows in a network,
trade-off issue in network monitoring and propose a variable this work monitors only large aggregate flows. FlowSense [7]
frequency adaptive statistics collection scheduling algorithm. proposes a passive push based monitoring method where
FlowRemoved messages are used to estimate per flow link
The rest of this paper is organized as follows. We begin utilization. While communication overhead for FlowSense is
with a discussion of some existing IP network monitoring quite low, its estimation is quite far from the actual value and it
tools, OpenFlow inspired monitoring tools, and variable rate works well only when there is a large number of small duration
adaptive data collection methods used in sensor and IP net- flows. FlowSense cannot capture traffic bursts if they do not
works (Section II). Then we present the architecture of PayLess coincide with another flow’s expiry. More recently authors
(Section III) followed by a presentation of our proposed flow in [11] have proposed an adaptive SDN based monitoring
statistics collection scheduling algorithm (Section IV). The method, focusing on anomaly detection. They manipulate the
next section describes the implementation of a link utilization aggregation level of the packet forwarding rules to switch
monitoring application using the proposed algorithm (Sec- between different granularity levels of traffic measurement. In
tion V). We evaluate and compare the performance of our link contrast, we propose a framework for developing a wide range
utilization monitoring application with that of FlowSense [7] of network monitoring application and a variable rate adaptive
through simulations using Mininet (Section VI). Finally, we sampling algorithm, which makes an adaption in the time scale.
conclude this paper and point out some future directions of
our work (Section VII). There has been an everlasting trade-off between statistics
collection accuracy and resource usage for monitoring in IP
II. R ELATED W ORKS networks. Monitoring in SDN also needs to make a trade-off
between resource overhead and measurement accuracy as dis-
There exists a number of flow based network monitoring cussed by the authors in [12]. Variable rate adaptive sampling
tools for traditional IP networks. NetFlow [1] from Cisco is techniques have been proposed in different contexts to improve
the most prevalent one. NetFlow probes are attached to a the resource consumption while providing satisfactory levels of
switch as special modules. These probes collect either com- accuracy of collected data. Variable rate sampling techniques to
plete or sampled traffic statistics, and send them to a central save resource while achieving a higher accuracy rate have been
collector [4]. NetFlow version 9 has been adopted to be a extensively discussed in the literature in the context sensor
common and universal standard by IP Flow Information Export networks [13], [14], [15], [16], [17], [18]. The main focus of
(IPFIX) IETF working group, so that non-Cisco devices can these sampling techniques has been to effectively collect data
send data to NetFlow collectors. NetFlow provides information using the sensor while trying to minimize the sensor’s energy
such as source and destination IP address, port number, byte consumption, which is often a scarce resource for the sensors.
count, etc. It supports different technologies like multi-cast, Adaptive sampling techniques have also been studied in the
IPSec, and MPLS. Another flow sampling method is sFlow [2], context of traditional IP networks [19], [20]. However, to the
which was introduced and maintained by InMon as an open best of our knowledge adaptive sampling for monitoring SDN
standard. It uses time-based sampling for capturing traffic have not been explored yet.
information. Another proprietary flow sampling method is
JFlow [3], developed by the Juniper Networks. JFlow is quite III. S YSTEM D ESCRIPTION
similar to NetFlow. JFlow provides detailed information about
each flow by applying statistical sampling just like NetFlow A. PayLess Architecture
and sFlow. Except for sFlow, NetFlow and JFlow are both
proprietary solutions and incur a large up-front licensing and Fig. 1 shows the software stack for a typical SDN setup
setup cost to be deployed in a network. sFlow is less expensive along with our monitoring framework. OpenFlow controllers
to deploy, but it is not widely adopted by the vendors. (e.g., NOX [21], POX [22], Floodlight [23], etc.) provide
a platform to write custom network applications that are
Recently a good number of network monitoring tools based oblivious to the complexity and heterogeneity of the underlying
on OpenFlow have been proposed. Yu et al. proposes OpenS- network. An OpenFlow controller provides a programming
ketch [8], a three stage packet processing pipeline design for interface, usually refereed to as the Northbound API, to
SDN. Opensketch allows the development of more expressive the network applications. Network applications can obtain
traffic measurement application by proposing a clean slate an abstract view of the network through this API. It also
design of the packet processing pipeline. OpenTM [9] focuses provides interfaces for controlling traffic flows and collecting
more on efficiently measuring traffic matrix using existing statistics at different aggregation levels (e.g., flow, packet, port,
technology. It proposes several heuristics to choose an optimal etc.). The required statistics collection granularity varies from
set of switches to be monitored for each flow. After a switch application to application. Some applications require per flow
has been selected it is continuously polled for collecting statistics, while for others, aggregate statistics is required.
For example, an ISP’s user billing application would expect section, we describe a statistics collection scheduling
to get usage data for all traffic passing though the user’s algorithm for our framework. However, the scheduler
home router. Unfortunately, neither the OpenFlow API nor is well isolated from the other components in our
the available controller implementations (e.g., NOX, POX framework. One can develop customized scheduling
and Floodlight) support these aggregation levels. Moreover, algorithm for statistics collection and seamlessly inte-
augmenting a controller’s implementation with monitoring grate withing the PayLess framework.
functionality will greatly increase design complexity. Hence,
a separate layer for abstracting monitoring complexity from • Switch Selector: We have to identify and select one
the network applications and the controller implementation is (or more) switches for statistics collection, when a
required. statistics collection event is scheduled. This com-
ponent determines the set of switches to poll for
To this end, we propose PayLess: a low-cost efficient obtaining the required statistics at the schedules time
network statistics collection framework. PayLess is built on stamps. For example, to collect statistics about a flow,
top of an OpenFlow controller’s northbound API and provides it is sufficient to query the ingress switch only, and it is
a high-level RESTful API. The monitoring framework takes possible to determine the statistics for the intermediate
care of the translation of high level monitoring requirements switches by simple calculations. Authors in [9] have
expressed by the applications. It also hides the details of discussed a number of heuristics for switch selection
statistics collection and storage management. The network in the context of traffic matrix calculation in SDN.
monitoring applications, built on top of this framework, will
use the RESTful API provided by PayLess and will remain • Aggregator & Data Store: This module is responsible
shielded from the underlying low-level details. for collecting raw data from the selected switches and
storing these raw data in the data store. This module
aggregates the collected raw-data to compute monitor-
App Development Framework
ing information at requested aggregation levels. The
L2/L3/L4 Monitoring data store is an abstraction of a persistent storage
Forwarding
Firewall ... Apps system. It can range from regular files to relational
databases to key-value stores.
PayLess API

Monitoring Framework (PayLess) Network monitoring applications


(written in any programming language)
Northbound API

Intrusion Differentiated
Control Plane (Floodlight / NOX / POX etc.) Link Usage
Detection User Billing Qos
System
Monitor .... Management

OpenFlow Protocol

PayLess
OpenFlow Enabled RESTful API
Switch Network
Request Switch
Scheduler Aggregator
Interpreter Selector
Fig. 1. SDN Software Stack
Data Store
We elaborate the monitoring framework (PayLess) portion Monitoring Framework
from Fig. 1 and show its components in Fig. 2. These compo-
nent are explained in detail below: Northbound API

• Request Interpreter: This component is responsible


for translating the high level primitives expressed by Fig. 2. PayLess Network Monitoring Framework
the applications to flow level primitives. For example,
a user billing application may request usage of a user
by specifying the user’s identity (e.g., email address B. Monitoring API
or registration number). This component is responsible
for interacting with other modules to translate this high PayLess provides an RESTful API for rapid develop-
level identifier to network level primitives. ment of network monitoring applications. Any programming
language can used to access this API. A network applica-
• Scheduler: The scheduler component schedules tion can express high level primitives in its own context to
polling of switches in the network for gathering statis- be monitored and get the collected data from the PayLess
tics. OpenFlow enabled switches can provide per flow data store at different aggregation levels. Here, we provide
statistics, per queue statistics as well as per port aggre- a few examples to illustrate how network applications can
gate statistics. The scheduler determines which type of access this API. Every network application needs to create a
statistics to poll, based on the nature of request it re- MonitoringRequest (Fig. 3) object and register it with
ceived from an application. The time-stamps of polling PayLess. The MonitoringRequest object contains the
is determined by a scheduling algorithm. In the next following information about a monitoring task:
{"MonitoringRequest": {
"Type": "["performance" | "security" | "failure" | ... ]",
"Metrics": [
{"performance": ["latency", "jitter", "throughput", "packet-drop", ...]},
{"security": ["IDS-alerts", "ACL-violations", "Firewall-alerts", ...]},
{"failure": ["MTBF", "MTTR"]}
],
"Entity": ["<uri_to_script>"],
"AggregationLevel": ["flow" | "table" | "port" | "switch" | "user" | "custom": "uri_to_script"],
"Priority": ["real-time", "medium", "low", custom: "monitoring-frequency"],
"Monitor" : ["direct", "adaptive", "random-sampling", "optimized", "custom": "uri_to_script"],
"Logging": ["default", "custom": "<uri_to_log_format>"]
}}

Fig. 3. MonitoringRequest object

• Type: the network application needs to specify what ods, an application may provide a link to a customized
type of metrics it wants to be monitored e.g., perfor- monitoring method.
mance, security, fault-tolerance, etc.
• Logging: A network application can optionally pro-
• Metrics: for each selected monitoring type, the net- vide a LogFormat object to the framework for
work application needs to provide the metrics that customizing the output format. If no such object is
should be monitored and logged. Performance metrics provided then PayLess writes the logs in its default
may include delay, latency, jitter, throughput, etc. format.
For security monitoring, metrics may include IDS-
alerts, firewall-alerts, ACL-violations etc. for a spe- The MonitoringRequest object is specified using
cific switch, port, or user. Failure metrics can be JSON. Attributes of this object along with some possible
mean-time-before-failure or mean-time-to-repair for a values are shown in Fig. 3. A network application registers
switch, link, or flow table. a MonitoringRequest object through PayLess’s RESTful
API. After the registration is successful, PayLess provisions
• Entity: this is an optional parameter and depends on monitoring resources for capturing the requested statistics and
the type of metric to be monitored. This parameter places them in the data store. In response to a monitoring
specifies the network entities that need to be mon- request PayLess returns a data access-id to the network
itored. In PayLess, network users2 , switches, switch application. The network application uses this access-id to
ports, flow-tables, traffic flows, etc. can be uniquely retrieve collected data from the data store.
identified and monitored. Network monitoring appli-
For example, an ISP’s network application for user billing
cation can specify which entities it wants to monitor
may specify the MonitoringRequest object as shown
or the field can be left empty as well, depending on
in Fig. 4. Here, the application wants to monitoring perfor-
the type of metric to be monitored.
mance metrics: throughput, and packet-drops for particular
• Aggregation Level: network applications must spec- users with a low priority using direct monitoring technique
ify the aggregation level (e.g., flow, port, user, switch and log the collected data in PayLess’s default format.
etc.) for statistics collection. PayLess provides a set
{"MonitoringRequest": {
of predefined aggregation levels (Fig. 3), as well "Type": "["performance"]",
as the option to provide a script to specify custom "Metrics": [
aggregation levels. {"performance": [
"throughput",
• Priority: PayLess provides the option to set priority "packet-drop",
levels for each metric to be monitored. We have three ]},
pre-defined priority levels: real-time, medium, and ],
low. Alternatively, an application can specify a custom "Entity": ["user": "<user_id>"],
polling frequency. PayLess framework is responsible "AggregationLevel": ["user"],
for selecting the appropriate polling frequencies for "Priority": ["medium"],
the pre-defined priorities. "Monitor" : ["direct"],
"Logging": ["default"]
• Monitor: This parameter specifies the monitor- }}
ing method, for example, direct, adaptive, random-
sampling, or optimized. The default monitoring Fig. 4. MonitoringRequest for user billing application
method is optimized, in which case the PayLess frame-
work selects the appropriate monitoring method for Another example will be a real-time media streaming
balancing between accuracy, timeliness, and network service that needs to provide differentiated QoS to the user.
overhead. Apart from the predefined sampling meth- This application needs flow-level real-time monitoring data
to make optimal routing decisions. A possible sample for the
2 network users can be identified in the way as described in [24] MonitoringRequest object is shown in Fig. 5.
{"MonitoringRequest": {
"Type": "["performance"]",
highly accurate statistics. However, this will induce significant
"Metrics": [ monitoring overhead in the network. To strike a balance
{"performance": [ between statistics collection accuracy and incurred network
"throughput", overhead, we propose a variable frequency flow statistics
"latency", collection algorithm.
"jitter",
"packet-drop", We propose that when the controller receives a PacketIn
]}, message, it will add a new flow entry to an active
], flow table along with an initial statistics collection time-
"Entity": ["flow": "<flow_specification>"], out, τ milliseconds. If the flow expires within τ mil-
"AggregationLevel": ["flow"], liseconds, the controller will receive its statistics in a
"Priority": ["real-time"], FlowRemoved message. Otherwise, in response to the time-
"Monitor" : ["adaptive"], out event (i.e., after τ milliseconds), the controller will send a
"Logging": ["default"] FlowStatisticsRequest message to the corresponding
}} switch to collect statistics about that flow. If the collected data
for that flow does not significantly change within this time
Fig. 5. MonitoringRequest for differentiated QoS period, i.e., the difference between the previous and current
byte count against that flow is not above a threshold, say ∆1 ,
the timeout for that flow is multiplied by a small constant,
PayLess also provides API functions for listing, updating, say α. For a flow with low packet rate, this process may be
and deleting MonitoringRequest objects. Table I provides repeated until a maximum timeout value of Tmax is reached.
a few example API URIs and their parameters for illustration On the other hand, if the difference in the old and new data
purpose. The first URIs provides the basic CRUD functionality becomes larger than another threshold ∆2 , the scheduling time-
for the MonitorRequest object. The fifth URI is used for out of that flow is divided by another constant β. For a heavy
accessing collected data from the data store. flow, this process may be repeated until a minimum timeout
value of Tmin is reached. The rationale behind this timeout
IV. A N A DAPTIVE M ONITORING M ETHOD adjustment is that we maintain a higher polling frequency
In this section, we present an adaptive monitoring algo- for flows that significantly contribute to link utilization, and
rithm that can be used to monitor network resources. Our goal we maintain a lower polling frequency for flows that do not
is to achieve accurate and timely statistics, while incurring significantly contribute towards link utilization at that moment.
little network overhead. We assume that the underlying switch If their contribution increases, the scheduling timeout will
to controller communication is performed using the OpenFlow adjust according to the proposed algorithm to adapt the polling
protocol. Therefore, before diving into the details of the algo- frequency with the increase in traffic.
rithm, we present a brief overview of the OpenFlow messages We optimize this algorithm further by batching
that are used in our framework. FlowStatisticsRequest messages together for flows
OpenFlow identifies a flow using the fields obtained from with same timeout. This will reduce the spread of monitoring
layer 2, layer 3 and layer 4 headers of a packet. When a traffic in the network without affecting the effectiveness of
switch receives a flow that does not match with any rules in its polling with a variable frequency. The pseudocode of this
forwarding table, it sends a PacketIn message to the con- algorithm is shown in Algorithm 1.
troller. The controller installs the necessary forwarding rules in
the switches by sending a FlowMod message. The controller V. I MPLEMENTATION : L INK U TILIZATION M ONITORING
can specify an idle timeout for a forwarding rule. This refers As a concrete use case of our proposed framework and
to the inactivity period, after which a forwarding rule (and the monitoring algorithm, we have implemented a prototype
eventually the associated flow) is evicted from the switch. link utilization monitoring application on Floodlight controller
When a flow is evicted the switch sends a FlowRemoved platform. We have chosen Floodlight as the controller platform
message to the controller. This message contains the duration for its highly modular design and the rich set of APIs to
of the flow as well as the number of bytes matching this perform operations on the underlying OpenFlow network.
flow entry in the switch. Flowsense [7] proposes to monitor
link utilization in zero cost by tracking the PacketIn and It is worth mentioning that our prototype implementation is
FlowRemoved messages only. However, this method has intended to perform experiments and to show the effectiveness
large average delay between consecutive statistics retrieval. of our algorithm. Hence, we have made the following simplify-
It also does not perform well in monitoring traffic spikes. ing assumption about flow identification and matching without
In addition to these messages, the controller can send a any loss of generality. Since we are monitoring link utilization,
FlowStatisticsRequest message to the switch to query it is sufficient for us to identify the flows by their source and
about a specific flow. The switch sends the duration and byte destination IP addresses. We performed the experiments using
count for that flow in a FlowStatisticsReply message iperf [25] in UDP mode. The underlying network also had
to the controller. some DHCP traffic, which also uses UDP. We filtered out
the DHCP traffic while adding the flows to active flow table
An obvious approach to collect flow statistics is to poll by looking at the destination UDP port numbers3 . It is worth
the switches periodically each constant interval of time by
sending the FlowStatisticsRequest message. A high 3 DHCP uses destination port 67 and 68 for DHCP requests and replies,
frequency (i.e., low polling interval) of polling will generate respectively
RESTful API URI Parameter(s)
/payless/object/monitor_request/register data=<JSON data as shown in Fig. 3>
/payless/object/monitor_request/update id=<request id>&data=<JSON data as shown in Fig. 3>
/payless/object/monitor_request/list id=<application id>
/payless/object/monitor_request/delete id=<request id>
/payless/log/retrieve access-id=<access id>
TABLE I. PAY L ESS REST FUL API

Algorithm 1 FlowStatisticsCollectionScheduling(Event e) The active flow entries are moved around the hashtable buckets
globals: active f lows //Currently Active Flows with lower or higher timeout values depending on the change in
schedule table //Associative table of active flows byte count from previous measurement checkpoint. Currently,
// indexed by poll frequency we have a basic REST API, which provides an interface to
U // Utilization Statistics. Output of this algorithm get the link statistics (in JSON format) of all the links in the
if e is Initialization event then network. However, our future objective is to provide a REST
active f lows ← φ, schedule table ← φ, U ← φ API for allowing external applications to register a particular
end if flow for monitoring and obtaining the statistics.
if e is a PacketIn event then
Although the current implementation makes some assump-
f ← he.switch, e.port, Tmin , 0i
tion about flow identification and matching, this does not
schedule table[Tmin ] ← schedule table[Tmin ] ∪ f
reduce the generality of our proposed algorithm. Our long term
else if e is timeout τ in schedule table then
goal is to have a full functional implementation of the PayLess
for all flows f ∈ schedule table[τ ] do
framework for efficient flow statistics collection. Developing
send a FlowStatisticsRequest to f.switch
network monitoring applications will be greatly simplified by
end for
the statistics exposed by our framework. It is worth mentioning
else if e is a FlowStatisticsReply event for flow f
that the proposed scheduling algorithm lies at the core of the
then
scheduler component of this framework, and no assumption
dif f byte count ← e.byte count − f.byte count
about the algorithm’s implementation were made in this pro-
dif f duration ← e.duration − f.duration
totype. The only assumptions made here corresponds to the
checkpoint ← current time stamp
implementation of link utilization monitoring application that
U [f.port][f.switch][checkpoint] ← hdif f byte count,
uses our framework.
dif f durationi
if dif f byte count < ∆1 then
f.τ ← min(f.τ α, Tmax ) VI. E VALUATION
Move f to schedule table[f.τ ] In this section, we present the performance of a demo
else if dif f byte count > ∆2 then application for monitoring link utilization. This application is
f.τ ← max(f.τ /β, Tmin ) developed using the PayLess framework. We have also im-
Move f to schedule table[f.τ ] plemented Flowsense and compared it to PayLess, since both
end if target the same use case. We have also implemented a baseline
end if scenario, where the controller periodically polls the switches
at a constant interval to gather link utilization information. We
have used Mininet to simulate a network consisting of hosts
noting that all the components of our proposed monitoring and OpenFlow switches. Details on the experimental setup is
framework are not in place yet. Therefore, we resorted to provided in Section VI-A. Section VI-B explains the evaluation
implementing the link utilization monitoring application as a metrics. Finally, the results are presented in Section VI-C.
floodlight module.
A. Experiment Setup
We intercepted the PacketIn and FlowRemoved mes-
sages to keep track of flow installations and removals from We have used a 3-level tree topology as shown in Fig. 7
the switches, respectively. We also maintained a hash table for this evaluation. UDP flows for a total duration of 100s
indexed by the schedule timeout value. Each bucket with between hosts were generated using iperf. Fig. 6 is the timing
timeout τ , contains a list of active flows that need to be polled diagram showing the start time, throughput and the end time
every τ milliseconds. Each of the bucket in the hashtable is for each flow. We have set the idle timeout of the active
also assigned a worker thread that wakes up every τ mil- flows in a switche to 5s. We have also deliberately introduced
liseconds and sends a FlowStatisticsRequest message pauses of different durations between the flows in the traffic
to the switches corresponding to the flows in its bucket. to experiment with different scenarios. Pauses less than the
The FlowStatisticsReply messages are received asyn- soft timeout were placed between 28th and 30th second, and
chronously by the monitoring module. The latter creates a mea- also between 33 and 35 seconds to observe how the proposed
surement checkpoint for each reply message. The contribution scheduling algorithm and the Flowsense react to sudden traffic
of a flow is calculated by dividing its differential byte count spikes. The minimum and maximum polling interval for our
from the previous checkpoint by the differential time duration scheduling algorithm was set to 500ms and 5s, respectively.
from the previous checkpoint. The monitoring module exam- For the constant polling case, a polling interval of 1s was used.
ines the measurement checkpoints of the corresponding link The parameters ∆1 and ∆2 described in Section IV were set to
and updates the utilization at previous checkpoints if necessary. 100MB. Finally, we have set α and β described in Section IV
T = 0s 4 10 12 14 17 23 25 28 30 33 35 38 48 53 T = 60s

(h1,h8,
(h1,h8, 10Mbps) 10Mbps)

(h2,h7, 20Mbps) (h2,h7, 20Mbps) (h2,h7, 50Mbps) (h2,h7, 50Mbps) (h2,h7, 50Mbps)

(h3,h6, 20Mbps)

Fig. 6. Timing Diagram of Experiment Traffic

to 2 and 6, respectively. β was set to a higher value to quickly 80


Flowsense
react and adapt to any change in traffic. 70
Payless
Periodic Polling
60

Link Utilization (Mbps)


Sw-0

50

40
Sw-1 Sw-2
30

20

10
Sw-3 Sw-4 Sw-5 Sw-6
0
0 10 20 30 40 50 60
Time (second)
h1 h2 h3 h4 h5 h6 h7 h8
Fig. 8. Utilization Measurement

Fig. 7. Topology for Experiment


value cause Flowsense to report less than the actual utilization.
In contrast, our proposed algorithm very closely follows the
B. Evaluation Metrics utilization pattern obtained from periodic polling. Although it
did not succeed to fully capture the first spike in the traffic, it
Utilization: Link utilization is measured as the instanta- quickly adjusted itself to successfully capture the next traffic
neous throughput obtained from that link and is measured in spike.
units of Mbps. We report the utilization of the link between
switches Sw-0 and Sw-1 (Fig. 7). According to the traffic
Monitoring Overhead (OpenFlow Messages)

30
mix, this link is part of all the flows and is most heavily used. It Payless
Periodic Polling
also exhibits a good amount of variation in utilization. We also 25
experiment with different values of minimum polling interval
(Tmin ) and show its effect on the trade-off between accuracy 20
and monitoring overhead.
Overhead: We compute overhead in terms of the number 15

of FlowStatisticsRequest messages sent from the con-


10
troller. We compute the overhead at timeout expiration events
when a number of flows with the same timeout are queried for
5
statistics.
0
C. Results 0 10 20 30 40 50 60
Time (second)
1) Utilization: Fig. 8 shows the utilization of Sw0-Sw1
link over simulation time, measured using three different tech- Fig. 9. Messaging Overhead
niques. The baseline scenario, i.e., periodic polling, which has
the most resemblance with the traffic in Fig. 6. Flowsense fails 2) Overhead: Fig. 9 shows the messaging overhead of the
to capture the traffic spikes because of the large granularity of baseline scenario and our proposed algorithm. Since Flowsense
its measurement. The traffic pauses less than the soft timeout does not send FlowStatisticsRequest messages, there-
fore it has zero messaging overhead, hence not shown in the 250 100
figure. The fixed polling method polls all the active flows Message Overhead
Measurement Error

Number of messages/Minute
after the fixed timeout expires. This causes a large number

Measurement error (RMS)


200 80
of messages to be injected in the network at the query time.
On the other hand, our proposed algorithm reduces the spike
of these messages by assigning different timeouts to flows and 150 60
spreading the messages over time. It is also evident in Fig. 9
that our algorithm has more query points across the timeline, 100 40
but at each time line it sends out less messages in the network
to get statistics about flows. In some cases, our algorithm sends
out 50% less messages than that of periodic polling method. 50 20

Although Flowsense has zero measurement overhead, it is


0 0
much less accuracy compared to our adaptive scheduling algo- 250 500 1000 2000
rithm. In addition, the monitoring traffic incurred by PayLess is Tmin (ms)
very low, only 6.6 messages per second on average, compared
to 13.5 messages per second on average for periodic polling. In
summary, the proposed algorithm for scheduling flow statistics Fig. 11. Overhead and measurement error
can achieve an accuracy close to constant periodic polling
method, while having a reduced messaging overhead.
of our knowledge, PayLess is the only monitoring framework
for SDN. Almost every aspect of monitoring can be specified
300 Actual Utilization using PayLess’s generic RESTful API. Moreover, the core
250
200 components in PayLess framework can be replaced by custom
150 implementations without affecting the other components. To
100
50 demonstrate the effectiveness of PayLess framework, we have
0
300 T-min = 250ms presented an adaptive scheduling algorithm for flow statistics
250 collection. We implemented a concrete use case of monitoring
200
150 link utilization using the proposed algorithm. We have eval-
100
50
uated and compared its performance with that of Flowsense
0 and a periodic polling method. We found that the proposed
300 T-min = 500ms algorithm can achieve higher accuracy of statistics collection
250
200 than FlowSence. Yet, the incurred messaging overhead is upto
150
100 50% of the overhead in an equivalent periodic poling strategy.
50 Our long term goal along this work is to provide an open-
Link Utilization (Mbps)

0
300 T-min = 1000ms source, community driven monitoring framework for SDN.
250
200
This should provide a full-fledged abstraction layer on top of
150 the SDN control platform for seamless network monitoring
100
50 application development. We also plan to demonstrate the
0 effectiveness of our platform by implement an autonomic QoS
0 20 40 60 80 100
policy enforcement application [26] on top of PayLess and
Time (second)
perform large scale experiments in our OpenFlow testbed [27].
It is also one of our future goals to make PayLess compatible
Fig. 10. Effect of Tmin on Measured Utilization
with distributed controller platforms [28].
3) Effect of Minimum Polling Frequency, Tmin : As ex-
plained in Algorithm 1, our scheduling algorithm adopts to the ACKNOWLEDGEMENT
traffic pattern. For, a rapidly changing traffic spike, the poling
frequency sharply decreases and reaches Tmin . In Fig. 10, we This work was supported by the Natural Science and
present the impact of Tmin on monitoring accuracy. Evidently, Engineering Council of Canada (NSERC) in part under its
the monitoring data is very accurate for Tmin = 250ms and Discovery program and in part under the Smart Applications
it gradually degrades with higher values of Tmin . However, on Virtual Infrastructure (SAVI) Research Network.
monitoring accuracy comes at the cost of network overhead as
presented in Fig. 11. This figure presents the root-mean-square
(RMS) error in monitoring accuracy along side the messaging R EFERENCES
overhead for different values of Tmin . This parameter can [1] “Cisco NetFlow site reference,” http://www.cisco.com/en/US/products/ps6601/
be adjusted to trade-off accuracy with messaging overhead, products white paper0900aecd80406232.shtml.
depending on the application requirements. [2] “Traffic Monitoring using sFlow,” http://www.sflow.org/.
[3] A. C. Myers, “JFlow: Practical mostly-static information flow control,”
VII. C ONCLUSION AND F UTURE W ORK in Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on
Principles of programming languages. ACM, 1999, pp. 228–241.
In this paper, we have introduced PayLess – a flexible [4] C. Systems, “Cisco CNS NetFlow Collection Engine,”
and extendable monitoring framework for SDN. To the best http://www.cisco.com/en/US/products/sw/netmgtsw/ps1964/index.html.
[5] H. Kim and N. Feamster, “Improving network management with soft- IEEE/IFIP Network Operations and Management Symposium (NOMS
ware defined networking,” Communications Magazine, IEEE, vol. 51, 2014).
no. 2, pp. 114–119, 2013. [28] M. F. Bari, A. R. Roy, S. R. Chowdhury, Q. Zhang, M. F. Zhani,
[6] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, R. Ahmed, and R. Boutaba, “Dynamic Controller Provisioning in Soft-
J. Rexford, S. Shenker, and J. Turner, “Openflow: enabling innovation in ware Defined Networks,” in 9th International Conference on Network
campus networks,” SIGCOMM Comput. Commun. Rev., vol. 38, no. 2, and Service Management 2013 (CNSM 2013), Oct 2013, pp. 18–25.
pp. 69–74, 2008.
[7] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. V.
Madhyastha, “FlowSense: Monitoring Network Utilization with Zero
Measurement Cost,” in Passive and Active Measurement. Springer,
2013, pp. 31–41.
[8] M. Yu, L. Jose, and R. Miao, “Software defined traffic measurement
with opensketch,” in Proceedings 10th USENIX Symposium on Net-
worked Systems Design and Implementation, NSDI, vol. 13, 2013.
[9] A. Tootoonchian, M. Ghobadi, and Y. Ganjali, “OpenTM: traffic matrix
estimator for OpenFlow networks,” in Passive and Active Measurement.
Springer, 2010, pp. 201–210.
[10] L. Jose, M. Yu, and J. Rexford, “Online measurement of large traffic
aggregates on commodity switches,” in Proc. of the USENIX HotICE
workshop, 2011.
[11] Y. Zhang, “An adaptive flow counting method for anomaly detection
in sdn,” in Proceedings of the ninth ACM conference on Emerging
networking experiments and technologies. ACM, 2013, pp. 25–30.
[12] M. Moshref, M. Yu, and R. Govindan, “Resource/Accuracy Tradeoffs
in Software-Defined Measurement,” in Proceedings of HotSDN 2013,
August 2013, to appear.
[13] A. Jain and E. Y. Chang, “Adaptive sampling for sensor networks,” in
Proceeedings of the 1st international workshop on Data management
for sensor networks: in conjunction with VLDB 2004. ACM, 2004,
pp. 10–16.
[14] B. Gedik, L. Liu, and P. Yu, “Asap: An adaptive sampling approach to
data collection in sensor networks,” Parallel and Distributed Systems,
IEEE Transactions on, vol. 18, no. 12, pp. 1766–1783, 2007.
[15] A. D. Marbini and L. E. Sacks, “Adaptive sampling mechanisms in
sensor networks,” in London Communications Symposium, 2003.
[16] J. Kho, A. Rogers, and N. R. Jennings, “Decentralized control of
adaptive sampling in wireless sensor networks,” ACM Transactions on
Sensor Networks (TOSN), vol. 5, no. 3, p. 19, 2009.
[17] C. Alippi, G. Anastasi, M. Di Francesco, and M. Roveri, “An adaptive
sampling algorithm for effective energy management in wireless sensor
networks with energy-hungry sensors,” Instrumentation and Measure-
ment, IEEE Transactions on, vol. 59, no. 2, pp. 335–344, 2010.
[18] R. Willett, A. Martin, and R. Nowak, “Backcasting: adaptive sampling
for sensor networks,” in Information Processing in Sensor Networks,
2004. IPSN 2004. Third International Symposium on, 2004, pp. 124–
133.
[19] E. Hernandez, M. Chidester, and A. George, “Adaptive sampling for
network management,” Journal of Network and Systems Management,
vol. 9, no. 4, pp. 409–434, 2001.
[20] G. Androulidakis, V. Chatzigiannakis, and S. Papavassiliou, “Network
anomaly detection and classification via opportunistic sampling,” Net-
work, IEEE, vol. 23, no. 1, pp. 6–12, 2009.
[21] N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown,
and S. Shenker, “NOX: Towards an operating system for networks,”
SIGCOMM Comput. Commun. Rev., vol. 38, no. 3, pp. 105–110.
[22] “POX OpenFlow Controller,” https://github.com/noxrepo/pox.
[23] “Floodlight openflow controller,” http://www.projectfloodlight.org/floodlight/.
[24] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and
S. Shenker, “Ethane: Taking control of the enterprise,” in ACM SIG-
COMM Computer Communication Review, vol. 37, no. 4. ACM, 2007,
pp. 1–12.
[25] “Iperf: TCP/UDP Bandwidth Measurement Tool,” http://iperf.fr/.
[26] M. F. Bari, S. R. Chowdhury, R. Ahmed, and R. Boutaba, “PolicyCop:
An Autonomic QoS Policy Enforcement Framework for Software
Defined Networks,” in Future Networks and Services (SDN4FNS), 2013
IEEE SDN for. IEEE, 2013, pp. 1–7.
[27] A. R. Roy, M. F. Bari, M. F. Zhani, R. Ahmed, and R. Boutaba, “Design
and Management of DOT: A Distributed OpenFlow Testbed,” in 14th

You might also like