Arduino
Arduino
• Arduino UNO
• Arduino Nano
• Arduino Mega
• Arduino Due
• Arduino Bluetooth
Types of Arduino Boards
1) Arduino UNO
• Arduino UNO is based on an
ATmega328P microcontroller.
• The Arduino UNO includes 6 analog pin
inputs, 14 digital pins, a USB connector, a
power jack, and an ICSP (In-Circuit Serial
Programming) header.
• It is the most used and of standard form
from the list of all available Arduino
Boards.
• It is also recommended for beginners as it
is easy to use.
Types of Arduino Boards
2) Arduino Nano
• The Arduino Nano is a small Arduino board based on ATmega328P or
ATmega628 Microcontroller.
• The connectivity is the same as the Arduino UNO board.
• The Nano board is defined as a sustainable, small, consistent, and flexible
microcontroller board.
• It is small in size compared to the UNO board.
• The devices required to start our projects using
the Arduino Nano board are Arduino IDE and mini USB.
• The Arduino Nano includes an I/O pin set of 14 digital pins and 8 analog
pins.
• It also includes 6 Power pins and 2 Reset pins.
Types of Arduino Boards
3) Arduino Mega
• The Arduino Mega is based on ATmega2560
Microcontroller which is an 8-bit microcontroller.
• It has the advantage of working with more memory space.
• The Arduino Mega includes 54 I/O digital pins and 16 Analog Input/Output
(I/O), ICSP header, a reset button, 4 UART (Universal Asynchronous
Reciever/Transmitter) ports, USB connection, and a power jack.
Types of Arduino Boards
4) Arduino Due
• The Arduino Due is based on the 32- bit ARM core.
• It is the first Arduino board that has developed
based on the ARM Microcontroller.
• It consists of 54 Digital Input/Output pins and 12 Analog pins.
• The Microcontroller present on the board is the Atmel SAM3X8E ARM
Cortex-M3 CPU.
• It has two ports, namely, native USB port and Programming port.
Types of Arduino Boards
5) Arduino Bluetooth
• The Arduino Bluetooth board is based on ATmega168 Microcontroller.
• It is also named as Arduino BT board.
• The components present on the board are 16 digital pins, 6 analog pins,
reset button, 16MHz crystal oscillator, ICSP header, and screw terminals.
• The screw terminals are used for power.
• The Arduino Bluetooth Microcontroller board can be programmed over the
Bluetooth as a wireless connection.
Arduino IDE
• The Arduino IDE is an open-source software, which is used to write and
upload code to the Arduino boards.
• The IDE application is suitable for different operating systems such
as Windows, Mac OS X, and Linux.
• It supports the programming languages C and C++. IDE stands
for Integrated Development Environment.
• The program or code written in the Arduino IDE is often called as sketching.
• We need to connect the Genuino and Arduino board with the IDE to upload
the sketch written in the Arduino IDE software.
• The sketch is saved with the extension '.ino.'
The Arduino IDE will appear as:
Arduino IDE
Each section of the Arduino IDE:
• Toolbar Button
• The icons displayed on the toolbar are New, Open, Save, Upload, & Verify.
• It is shown below:
Menu Bar
Introduction to Django
• Django is a Python framework that makes it easier to create web
sites using Python.
• Django emphasizes reusability of components, also referred to as
DRY (Don't Repeat Yourself), and comes with ready-to-use
features like login system, database connection and CRUD
operations (Create, Read, Update, Delete).
• Django is especially helpful for database driven websites.
• A database-driven website is one that uses a database for
collecting and storing information.
• Django officially supports the following databases:
PostgreSQL. MariaDB. MySQL.
How does Django Work?
• Django follows the MVT design pattern (Model, View, Template).
• Model - The data you want to present, usually data from a
database.
• View - A request handler that returns the relevant template and
content - based on the request from the user.
• Template - A text file (like an HTML file) containing the layout of
the web page, with logic on how to display the data.
Model
• The model provides data from the database.
• In Django, the data is delivered as an Object Relational Mapping
(ORM), which is a technique designed to make it easier to work
with databases.
• The most common way to extract data from a database is SQL.
• One problem with SQL is that you have to have a pretty good
understanding of the database structure to be able to work with it.
• Django, with ORM, makes it easier to communicate with the
database, without having to write complex SQL statements.
• The models are usually located in a file called models.py.
View
• A view is a function or method that takes http requests as
arguments, imports the relevant model(s), and finds out what
data to send to the template, and returns the final result.
\\\\
• Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also
supports SQL queries, Streaming data, Machine learning (ML), and Graph
algorithms.
Components of Spark
The Spark framework includes:
• Spark Core as the foundation for the platform
• Spark SQL for interactive queries
• Spark Streaming for real-time analytics
• Spark MLlib for machine learning
• Spark GraphX for graph processing
• Spark Core: It is responsible for memory management, fault recovery,
scheduling, distributing & monitoring jobs, and interacting with storage systems.
Spark Core is exposed through an application programming interface (APIs) built
for Java, Scala, Python and R.
• MLlib- Machine Learning library
• Spark includes MLlib, a library of algorithms to do machine learning on data at
scale.
• Machine Learning models can be trained by data scientists with R or Python on
any Hadoop data source, saved using MLlib, and imported into a Java or Scala-
based pipeline.
• Spark was designed for fast, interactive computation that runs in memory,
enabling machine learning to run quickly.
• The algorithms include the ability to do classification, regression, clustering,
collaborative filtering, and pattern mining.
• Spark Streaming
• Real-time
• Spark Streaming leverages Spark Core's fast scheduling capability to
perform streaming analytics.
• Spark Streaming supports data from Twitter, Kafka, Flume, HDFS, and
ZeroMQ, and many others found from the Spark Packages ecosystem.
• Spark SQL-Interactive Queries
• Spark SQL is a distributed query engine that provides low-latency, interactive
queries up to 100x faster than MapReduce.
• It includes a cost-based optimizer, columnar storage, and code generation for fast
queries, while scaling to thousands of nodes.
• Business analysts can use standard SQL or the Hive Query Language for querying
data.
• Developers can use APIs, available in Scala, Java, Python, and R.
• GraphX
• Graph Processing
• Spark GraphX is a distributed graph processing framework built on top of
Spark.
• GraphX provides ETL, exploratory analysis, and iterative graph computation to
enable users to interactively build, and transform a graph data structure at scale.
Data Analytics for IoT
• The Volume, velocity and variety of data generated by data-intensive
IoT systems is so large that it is difficult to store, manage, process and
analyse the data using traditional databases and data processing tools.
• Analysis of data can be done with aggregation functions (sum, min,
max, count, average) OR
• Using ML methods such as clustering and classification.
• Clustering- grouping similar data items together such that, data items
which are more similar to each other than other data items are put in one
cluster.
• Classification is used for categorizing objects into predefined
categories.
REST Services, Analytics Component
(IoT Intelligence)
Deployment design of a forest fire detection system
• Deployment design of a forest fire detection system with multiple end
nodes which are deployed in forest.
• The end nodes are equipped with sensors for measuring temperature
,humidity, light and carbon monoxide (CO) at various locations in the
forest.
• Each end node sends data independently to the cloud using REST-
based communication.
• The data collected in the cloud is analysed to predict whether fire has
broken out in the forest.
Timestamp, Temperature(c),Humidity(%),Light(Lux),CO(parts per million)
• A measurement of 1 lux is equal to the illumination/brightness of a
one metre square surface (that is one metre away from a single
candle).
• ppm is used to measure the concentration of a contaminant in soils
and sediments.
• Parts per million (ppm) is the number of units of mass of a
contaminant per million units of total mass.
• Xively Cloud was designed to enable
developers to connect, manage, and
analyze data from IoT devices. It
provided features such as device
connectivity, data management, real-
time analytics, and visualization
tools.
1.Apache Hadoop
• Hadoop is an open source framework from Apache and is used to
store process and analyze data which are very huge in volume.
• Hadoop is written in Java .
• It is used for batch/offline processing.
• It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and
many more.
• Moreover it can be scaled up just by adding nodes in the cluster.
1.1 MapReduce Programming Model
• MapReduce is a programming framework that allows us to perform parallel
processing on large data sets in a distributed environment.
• A MapReduce program is composed of a map procedure, which performs
filtering and sorting, and a reduce method, which performs a summary
operation.
• The main components of a MapReduce program are the Mapper and Reducer.
• Languages used Java, Python or others.
• Mapper: The Mapper class splits the input data into key-value pairs.
(eg:gender: M/F,Color:green,price:100)
• Reducer: The Reducer class takes the key-value pairs output by the Mapper
and reduces them to a result.
Data flow in MapReduce and Example
1.2 Hadoop MapReduce Job Execution
• About MapReduce job execution workflow and the steps involved in job
submission, job initialization, task selection and task execution.
JobTracker:
• distributes MapReduce tasks to specific nodes in the cluster.
• Client applications submit jobs to the Job tracker.
• The JobTracker submits the work to the chosen TaskTracker nodes.
TaskTracker:
• A TaskTracker is a node in the cluster that accepts tasks - Map, Reduce
and Shuffle operations - from a JobTracker.
• Every TaskTracker is configured/designed with a set of slots, these
indicate the number of tasks that it can accept.
DataNode:
• DataNodes are the slave nodes in HDFS.
• The actual data is stored on DataNodes.
• A functional filesystem has more than one DataNode, with data replicated
across them.
• On startup, a DataNode connects to the NameNode; spinning until that
service comes up.
Purpose of DataNode:
• The DataNode stores HDFS data in files in its local file system.
• The DataNode has no knowledge about HDFS files.
• It stores each block of HDFS data in a separate file in its local file system.
• The DataNode does not create all files in the same directory.
File Block In HDFS
• Data in HDFS is always stored in terms of blocks. So the single block of data is
divided into multiple blocks of size 128MB which is default and you can also
change it manually. Nowadays file blocks of 128MB to 256MB are considered in
Hadoop.
MapReduce Job Execution Workflow for Hadoop
• The job execution starts when the client applications submit jobs to the job tracker.
• The JobTracker returns a JobID to the client application.
• The JobTracker talks to the NameNode to determine the location of the data.
• The JobTracker locates TaskTracker nodes with available slots at/or near the data.
• The TaskTrackers send out the heartbeat messages to the JobTracker, usually every
few minutes, to reassure the JobTracker that they are still alive.
• A 'heartbeat' is a signal sent between a DataNode and NameNode. This signal is taken
as a sign of vitality(strength/energy). If there is no response to the signal, then it is
understood that there are certain technical problems with the nodes.
• These messages also inform the JobTracker of the number of available slots, so the
JobTracker can stay up to date with where in the cluster, new work can be delegated.
• The JobTracker submits the work to the TaskTracker nodes when they poll for tasks.
• To choose a task for a TaskTracker, the JobTracker uses various scheduling
algorithms.
• The default scheduling algorithm in Hadoop in FIFO(First-in First-out).
• In FIFO,a work queue is maintained and JobTracker pulls the oldest job first for
scheduling.
• There is no notion of the Job priority or size of the job in FIFO scheduling.
• The TaskTracker nodes are monitored using the heartbeat signals that are sent by the
TaskTrackers to JobTracker.
• The TaskTracker produces a separate JVM process for each task ,so that any task can
execute.
• The TaskTracker monitors these processes while capturing the output and exit codes.
• When the process finishes ,successfully or not,the TaskTracker notifies the
JobTracker.
• When the task fails the TaskTracker notifies the JobTracker.
• JobTracker decides whether to resubmit the job to some other TaskTracker or mark
that specific record as something to avoid.
• The JobTracker can blacklist a TaskTracker as unreliable if there are repeated task
failures.
• When the job is completed,the JobTracker updates its status.
• Client applications can poll the JobTracker for status of the jobs.
1.4.Hadoop Cluster setup
Steps involved in setting up a Hadoop cluster are described as follows:
Install Java: Hadoop requires Java 6 or later version.
The Daemons are the processes that
run in the background of the system.
The components of Hadoop known as
daemons include NameNode,
Secondary NameNode, DataNode,
JobTracker, and TaskTracker.
Stateless Stateful
Storm topology runs until shutdown by the MapReduce jobs are executed in a sequential
user or an unexpected unrecoverable failure. order and completed eventually.
Many industries can use Storm for real-time big data processing such as:
• Credit card companies can use it for fraud detection on swipe.
• Investment banks can use it for trade pattern analysis in real time.
• Retail stores can use it for dynamic pricing.
• Transportation providers can use it for route suggestions based on traffic
data.
• Healthcare providers can use it for the monitoring of ICU sensors.
• Telecom organizations can use it for processing switch data.
STORM Data Model: Storm data model consists of tuples and streams.
Tuple
• A tuple is an ordered list of named values similar to a database row. Each field
in the tuple has a data type that can be dynamic. The field can be of any data
type such as a string, integer, float, double, boolean or byte array. User-defined
data types are also allowed in tuples.
• For example, for the stock market data, if the schema is in the ticker, year,
value, and status format, then some tuples can be ABC, 2011, 20, GOOD
ABC, 2012, 30, GOOD ABC, 2012, 32, BAD XYZ, 2011, 25, GOOD.
Stream
• A stream of Storm is an unbounded sequence of tuples.
• For example, if the above tuples are stored in a file stocks.txt format, then the
command cat stocks.txt produces a stream. If the process is continuously
putting data into stocks.txt format, then it becomes an unbounded stream.
Storm Architecture
• Storm has a master-slave architecture.
• There is a master server called Nimbus running on a single node called
master node.
• There are slave services called supervisors that are running on each
worker node.
• Supervisors consists of one or more worker processes called workers that
run in parallel to process the input.
• The diagram shows the Storm architecture with one master
node and five worker nodes.
• The Nimbus process is running on the master node.
• There is one supervisor process running on each worker node.
• There are multiple worker processes running on each worker
node.
• The workers get the input from the file system or database and
store the output also to a file system or database.
Storm Processes
• A Zookeeper cluster is used for coordinating the master,
supervisor and worker processes.
• Assigns and distributes the tasks to the worker nodes
• Monitors the tasks
• Reassigns tasks on node failure.
supervisor process
• Runs on each worker node of the cluster
• Runs each task as a separate process called worker process
• Communicates with Nimbus using zookeeper
• Number of worker processes for each task can be configured
worker process
• Runs on any worker node of the cluster
• Started and monitored by the supervisory process
• Runs either spout or bolt tasks
• Number of worker processes for each task can be configured
Sample Program
• A Log processing program takes each line from the log file and
filters the messages based on the log type and outputs the log
type.
• Input: A log file containing error, warning, and informational
messages. This is a growing file getting continuous lines of log
messages.
• Output: Output type of message (ERROR or WARNING or INFO)
Let us continue with the sample program.
• This program given below contains a single spout and a single bolt.
The spout does the following:
• Opens the file, reads each line and outputs the entire line as a tuple.
The bolt does the following:
• Reads each tuple from the spout and checks if the tuple contains the string ERROR or
WARNING or INFO.
• Outputs only ERROR or WARNING or INFO.
LineSpout {
foreach line = readLine(logfile) {
emit(line)
}
LogTypeBolt(tuple) {
if(tuple contains “ERROR”) emit(“ERROR”);
if(tuple contains(“WARNING”) emit (“WARNING”);
if(tuple contains “INFO”) emit(“INFO”);
• The spout is named LineSpout.
• It has a loop to read each line of input and outputs the entire line.
• The emit function is used to output the line as a stream of tuples.
• The bolt is named LogTypeBolt. It takes the tuple as input.
• If the line contains the string ERROR, then it outputs the string ERROR.
• If the line contains the string WARNING, then it outputs the string
WARNING.
• Similarly, if the line contains the string INFO, then it outputs the string
INFO.
Storm Components
• Storm provides two types of components that process the input stream, spouts,
and bolts. Spouts process external data to produce streams of tuples. Spouts
produce tuples and send them to bolts. Bolts process the tuples from input
streams and produce some output tuples. Input streams to bolt may come from
spouts or from another bolt.
• The diagram shows a Storm cluster consisting of one spout and two bolts. The
spout gets the data from an external data source and produces a stream of
tuples. The first bolt takes the output tuples from the spout and processes them
to produce another set of tuples. The second bolt takes the output tuples from
bolt 1 and stores them into an output stream.
• Storm Example
• Let us illustrate storm with an example.
• Problem: The stock market data which is continuously sent by an external
system should be processed, so that data with GOOD status is inserted into a
database whereas data that are with BAD status is written to an error log.
• STORM Solution: This will have one spout and two bolts in the topology.
Spout will get the data from the external system and convert into a stream of
tuples. These tuples will be processed by two bolts. Those with Status GOOD
will be processed by bolt1. Those with status BAD will be processed by bolt2.
Bolt1 will save the tuples to Cassandra database. Bolt2 will save the tuples to
an error log file.
• The diagram shows the Storm topology for the above solution. There is one
spout that gets the input from an external data source. There is bolt 1 that
processes the tuples from the spout and stores the tuples with GOOD status to
Cassandra. There is bolt 2 that processes the tuples from the spout and stores
the tuples with BAD status to an error log.