Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
PRECISION AGRICULTURE SUPPORT
USING SCALA/SPARK
Project Report
SRIRAM RV
SPRING SEMESTER
ADVISOR: PROFESSOR BRAD RUBIN
2
Table of Contents
1.0 PURPOSE OF PROJECT.......................................................................................................................4
2.0 PROJECT DESCRIPTION.....................................................................................................................4
2.0 Why Agriculture Data.........................................................................................................................5
3.0 DATASET ..............................................................................................................................................5
3.1 Data Source:........................................................................................................................................5
3.2 Details about Dataset: .........................................................................................................................5
3.3 Sample Data........................................................................................................................................5
Weather data .........................................................................................................................................5
Moisture Data........................................................................................................................................5
Image Data............................................................................................................................................6
3.4 Schema................................................................................................................................................6
Weather data .........................................................................................................................................6
Moisture Data........................................................................................................................................6
3.5 Data Description: ................................................................................................................................7
Weather Data: .......................................................................................................................................7
Moisture Data........................................................................................................................................7
4.0 PROJECT IMPLEMENTATION...........................................................................................................8
4.1 Data Ingestion using Kafka.................................................................................................................8
4.2 Kafka producer....................................................................................................................................8
4.4 Kafka Broker.......................................................................................................................................9
4.5 Kafka Consumer ...............................................................................................................................10
5.0 ADDITIONAL TOOLS........................................................................................................................10
5.1 Maven ...............................................................................................................................................10
5.2 Scala Build tool.................................................................................................................................11
5.3 Git .....................................................................................................................................................11
6.0 OUTPUT INTERPRETATION............................................................................................................12
7.0 IMPROVING THE KAFKA ARCHITECTURE.................................................................................12
7.1. Making kafka architecture more robust ...........................................................................................12
7.2. Having dedicated Kafka Broker to improve performance ...............................................................13
3
8.0 FUTURE RESEARCH.........................................................................................................................13
9.0. CONCLUSION....................................................................................................................................13
BIBLIOGRAPHY.......................................................................................................................................14
4
1.0 PURPOSE OF PROJECT
Big data tools over last few years has been focused on both structured and unstructured data.
However, image processing is one area where it needs more of attention and it has been my
area of interest too. With the help of this project, I will get an opportunity to experiment with
streaming images and weather data captured in the UST greenhouse, and also get a feel for
image processing with Scala/Spark on Hadoop more generally.
I will gain experience in technologies such as Scala, Spark, Spark streaming, and image
processing in the domain of food technology that will give me skills that I cannot otherwise
obtain in the GPS curriculum.
2.0 PROJECT DESCRIPTION
The purpose of the project is to stream real-time weather data captured by both direct sensors
and RGB images captured by the drones to perform image processing and weather data
analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image
processing and streaming with Spark are knew technologies to GPS, part of the project will
focus on experimenting with different tools and find out more reliable way of storing images
and streamed data in HDFS.
The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST
School of Engineering. The greenhouse has a local weather station that will be broadcasting
weather data such as temperature, humidity, light intensity, barometric pressure, position
(latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at
10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field
use which is useful for both analysis of plant health and creating a model for each of the six
plant species that will be grown. In addition, high resolution images will be taken of the plants
in the visible and near IR regions of the light spectrum. The periodicity of these images will be
every couple of days.
5
2.0 Why Agriculture Data
With the help of agricultural data, I will get an opportunity to experiment withstreaming images
and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed
and gives me experience on working with data from food technology.
3.0 DATASET
3.1 Data Source:
The data source used for this project is the live streaming of weather and moisture data captured using
sensors through Arduino chip and Streamed using Kafka producer.
3.2 Details about Dataset:
 The sensor data were captured for every second.
 Total number of days of weather data stored in HDFS is 90 days.
 Total number of days of moisture data stored in HDFS is 85days.
 Total number of days of image data stored is 90 days.
3.3 Sample Data
Weather data
Fig 1: Sensor Weather data from Arduino
Moisture Data
6
Fig 2: Sensor data from Arduino
Image Data
Image data was captured every alternative day over a period of 90 days .
Fig 3: Images from the greenhouse
3.4 Schema
Weather data
Date Time Wind
direction
Wind
Speed
Humidity Temperature Rain Pressure Battery Light
Level
Table 1 : Weather Data Schema
Moisture Data
Date Time Moist
2
Moist
6
Moist
8
Moist
11
Moist
10
Moist
1
Moist
9
Moist
7
Moist
5
Temp Par
Table 2 : Weather Data Schema
7
3.5 Data Description:
Weather Data:
Date & time : Timestamp of the recording
Wind Direction: Direction of wind
Wind Speed: Speed of wind
Wind Gust: Gust of wind
Humidity: Percentage of water in air
Temperature: Temperature
Rain: Rain percentage
Pressure: Air pressure
Battery: Battery of Arduino
Light: Light exposure
Moisture Data
Moist 2: Moisture of plot 2
Moist 6: Moisture of plot 6
Moist 8: Moisture of plot 8
Moist 11: Moisture of plot 5
Moist 10: Moisture of plot 10
Moist 1: Moisture of plot 1
Moist 9: Moisture of plot 9
Moist 7: Moisture of plot 7
Moist 5: Moisture of plot 5
Temp: Soil temperature
PAR: Moisture metrics
8
4.0 PROJECT IMPLEMENTATION
4.1 Data Ingestion using Kafka
Kafka is the distributed messaging system which allows to transmit moisture and weather data from
Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and
consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker.
The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams
the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the
addresses of the broker thought zookeeper.
Fig 4: Kafka Architectural Diagram
4.2 Kafka producer
Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to
their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka
brokers which is attached to the header of packet while sending the data.
The weather data, moisture data and image data differentiated using different topics such as “weather-data”,
”moisture-data” and “image-data”.
Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap
server is the broker ID list of the Kafka broker.
9
Fig 5 : Configuring the kafka producer
Below is the snippet that is used to create message object which contains topic and messages to be sent to
the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages,
sends it to the broker.
Fig 6: Sending the message to kafka broker
4.4 Kafka Broker
Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling
hundreds and hundreds of read and write operation per second. It can elastically expand without downtime.
Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than
capability of single machine. The Kafka broker can be monitored using Zookeeper using port number
2181.By default Kafka broker comes with retention period of 168 hours.
10
Fig 7 : Monitoring the messages using Zookeeper
4.5 Kafka Consumer
Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise
from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing.
Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that
we are interested to fetch from the broker.
Fig 8 : Configuring Kafka Consumer
5.0 ADDITIONAL TOOLS
5.1 Maven
Maven was used as the dependency management to bring in all the jar from the server to the local repository.
This dependency injection help to develop the code from the windows environment .Maven helped to
specify the version of spark and kafka that was used and all the jar files related that version of spark was
stored in the local repository.
11
Fig 9 : Dependency Injection
5.2 Scala Build tool
Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm
using winscp.
Fig 10 : SBT build
5.3 Git
Git is online code repository for storing all the code related to project. It offers all of the distributed
revision control and source code management (SCM). Git was used for precision agriculture project
repository to store the code online and share with team.
Below is the git link for the precision agriculture.
https://github.com/sri303030/Data-Ingestion-using-Kafka
12
6.0 OUTPUT INTERPRETATION
The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to
distinguish between weather data and moisture data.
Below is the output from the weather data folder
Fig 11: weather data folder
Below is output from the moisture data folder
Fig 12: Moisture Data Folder
7.0 IMPROVING THE KAFKA ARCHITECTURE
Kafka Architecture can be improved in two ways:
1. Making kafka architecture more robust.
2. Having dedicated Kafka Broker to improve performance
7.1. Making kafka architecture more robust
In precision agriculture project, both broker and consumer were running on the same system as the
requirement of data ingestion was to store data in HDFS. In order make the architecture more robust,
consumer system must be a remote system or cluster which have the access to kafka broker this way the
architecture will be more robust and in case of failure in kafka broker the data can be retrived from
consumer.
13
7.2. Having dedicated Kafka Broker to improve performance
Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the
cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that
kafka broker has got over hadoop environment and speeds up all the processes.
8.0 FUTURE RESEARCH
1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive.
2. Implement real time machine learning using Spark Mllib
3. Connect the live data to the reporting tool and analyze live data and create useful reports.
9.0. CONCLUSION
Kafka is rapidly growing distributed messaging system having various application in the field of
engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was
captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure
to handle different big data problems in real time situation and helped me understand kafka architecture.
14
BIBLIOGRAPHY
http://kafka.apache.org/
Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark
Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter
Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of
Korea. http://www.aclweb.org/anthology/P12-3020

More Related Content

What's hot

Archimate - an introduction
Archimate - an introductionArchimate - an introduction
Archimate - an introduction
Stefan Luyten
 
Exercícios - Tutorial ETL com Pentaho Data Integration
Exercícios - Tutorial ETL com Pentaho Data IntegrationExercícios - Tutorial ETL com Pentaho Data Integration
Exercícios - Tutorial ETL com Pentaho Data Integration
Jarley Nóbrega
 
Multi-tenancy: A Core commercetools Differentiator
Multi-tenancy: A Core commercetools DifferentiatorMulti-tenancy: A Core commercetools Differentiator
Multi-tenancy: A Core commercetools Differentiator
Kelly Goetsch
 
Algumas das principais características do NoSQL
Algumas das principais características do NoSQLAlgumas das principais características do NoSQL
Algumas das principais características do NoSQL
Eric Silva
 
Solution Architecture Concept Workshop
Solution Architecture Concept WorkshopSolution Architecture Concept Workshop
Solution Architecture Concept Workshop
Alan McSweeney
 
PDTI - Plano Diretor de Tecnologia da Informação (modelo)
PDTI - Plano Diretor de Tecnologia da Informação (modelo)PDTI - Plano Diretor de Tecnologia da Informação (modelo)
PDTI - Plano Diretor de Tecnologia da Informação (modelo)
Fernando Palma
 
Microsoft Planner Deep Dive
Microsoft Planner Deep DiveMicrosoft Planner Deep Dive
Microsoft Planner Deep Dive
André Vala
 
Prinicipais desafios no uso do TOGAF®
Prinicipais desafios no uso do TOGAF® Prinicipais desafios no uso do TOGAF®
Prinicipais desafios no uso do TOGAF®
Blue Hawk - B&IT Management
 
Basic concepts of soa
Basic concepts of soaBasic concepts of soa
Basic concepts of soa
Venu Borra LION*
 
FedRAMP High & AWS GovCloud (US): FISMA High Requirements
FedRAMP High & AWS GovCloud (US): FISMA High RequirementsFedRAMP High & AWS GovCloud (US): FISMA High Requirements
FedRAMP High & AWS GovCloud (US): FISMA High Requirements
Amazon Web Services
 
Apresentacao Veeam Backup & Replication
Apresentacao Veeam Backup & ReplicationApresentacao Veeam Backup & Replication
Apresentacao Veeam Backup & Replication
Blue Solutions
 
Telosys project booster Paris Open Source Summit 2019
Telosys project booster Paris Open Source Summit 2019Telosys project booster Paris Open Source Summit 2019
Telosys project booster Paris Open Source Summit 2019
Laurent Guérin
 
User story canvas
User story canvasUser story canvas
User story canvas
LuxoftAgilePractice
 
Scopes in mule
Scopes in muleScopes in mule
Scopes in mule
Ramakrishna kapa
 
Introduction To Microsoft SharePoint 2013
Introduction To Microsoft SharePoint 2013Introduction To Microsoft SharePoint 2013
Introduction To Microsoft SharePoint 2013
Vishal Pawar
 
Introducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfIntroducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdf
Enterprise Knowledge
 
Migrating Existing Applications to AWS Cloud
Migrating Existing Applications to AWS CloudMigrating Existing Applications to AWS Cloud
Migrating Existing Applications to AWS Cloud
jineshvaria
 
121022 - The art of getting business analysis right - slide deck.pptx
121022 - The art of getting business analysis right - slide deck.pptx121022 - The art of getting business analysis right - slide deck.pptx
121022 - The art of getting business analysis right - slide deck.pptx
AustraliaChapterIIBA
 
IT4IT / DevOps Tooling Landscape 2022
IT4IT / DevOps Tooling Landscape 2022 IT4IT / DevOps Tooling Landscape 2022
IT4IT / DevOps Tooling Landscape 2022
Rob Akershoek
 
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptxTeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
Jasper Oosterveld
 

What's hot (20)

Archimate - an introduction
Archimate - an introductionArchimate - an introduction
Archimate - an introduction
 
Exercícios - Tutorial ETL com Pentaho Data Integration
Exercícios - Tutorial ETL com Pentaho Data IntegrationExercícios - Tutorial ETL com Pentaho Data Integration
Exercícios - Tutorial ETL com Pentaho Data Integration
 
Multi-tenancy: A Core commercetools Differentiator
Multi-tenancy: A Core commercetools DifferentiatorMulti-tenancy: A Core commercetools Differentiator
Multi-tenancy: A Core commercetools Differentiator
 
Algumas das principais características do NoSQL
Algumas das principais características do NoSQLAlgumas das principais características do NoSQL
Algumas das principais características do NoSQL
 
Solution Architecture Concept Workshop
Solution Architecture Concept WorkshopSolution Architecture Concept Workshop
Solution Architecture Concept Workshop
 
PDTI - Plano Diretor de Tecnologia da Informação (modelo)
PDTI - Plano Diretor de Tecnologia da Informação (modelo)PDTI - Plano Diretor de Tecnologia da Informação (modelo)
PDTI - Plano Diretor de Tecnologia da Informação (modelo)
 
Microsoft Planner Deep Dive
Microsoft Planner Deep DiveMicrosoft Planner Deep Dive
Microsoft Planner Deep Dive
 
Prinicipais desafios no uso do TOGAF®
Prinicipais desafios no uso do TOGAF® Prinicipais desafios no uso do TOGAF®
Prinicipais desafios no uso do TOGAF®
 
Basic concepts of soa
Basic concepts of soaBasic concepts of soa
Basic concepts of soa
 
FedRAMP High & AWS GovCloud (US): FISMA High Requirements
FedRAMP High & AWS GovCloud (US): FISMA High RequirementsFedRAMP High & AWS GovCloud (US): FISMA High Requirements
FedRAMP High & AWS GovCloud (US): FISMA High Requirements
 
Apresentacao Veeam Backup & Replication
Apresentacao Veeam Backup & ReplicationApresentacao Veeam Backup & Replication
Apresentacao Veeam Backup & Replication
 
Telosys project booster Paris Open Source Summit 2019
Telosys project booster Paris Open Source Summit 2019Telosys project booster Paris Open Source Summit 2019
Telosys project booster Paris Open Source Summit 2019
 
User story canvas
User story canvasUser story canvas
User story canvas
 
Scopes in mule
Scopes in muleScopes in mule
Scopes in mule
 
Introduction To Microsoft SharePoint 2013
Introduction To Microsoft SharePoint 2013Introduction To Microsoft SharePoint 2013
Introduction To Microsoft SharePoint 2013
 
Introducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfIntroducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdf
 
Migrating Existing Applications to AWS Cloud
Migrating Existing Applications to AWS CloudMigrating Existing Applications to AWS Cloud
Migrating Existing Applications to AWS Cloud
 
121022 - The art of getting business analysis right - slide deck.pptx
121022 - The art of getting business analysis right - slide deck.pptx121022 - The art of getting business analysis right - slide deck.pptx
121022 - The art of getting business analysis right - slide deck.pptx
 
IT4IT / DevOps Tooling Landscape 2022
IT4IT / DevOps Tooling Landscape 2022 IT4IT / DevOps Tooling Landscape 2022
IT4IT / DevOps Tooling Landscape 2022
 
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptxTeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptx
 

Viewers also liked

Embedded training
Embedded trainingEmbedded training
Embedded training
sowmiya437
 
Trabajo de didactica
Trabajo de didacticaTrabajo de didactica
Trabajo de didactica
Suhaila Selam Mofaddal
 
CV HARIS
CV HARISCV HARIS
CV HARIS
Haris Sultanto
 
Separación siamesas
Separación siamesasSeparación siamesas
Separación siamesas
Dr. Cristóbal Longton
 
Final presentation
Final presentationFinal presentation
Final presentation
Dao Tran
 
Motores de búsqueda
Motores de búsquedaMotores de búsqueda
Motores de búsqueda
Mario Hernan
 
Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3
mas_mughni
 
Anticoagulación y cirugía
Anticoagulación y cirugíaAnticoagulación y cirugía
Anticoagulación y cirugía
Ivan Vojvodic Hernández
 
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
City Unrulyversity
 
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
Andrii Chlechko
 
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad TécnicaDocumentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Proyecto Red Eureka
 
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad SocialPresentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Proyecto Red Eureka
 
Desforramiento de extremidad inferior
Desforramiento de extremidad inferior Desforramiento de extremidad inferior
Desforramiento de extremidad inferior
Dr. Cristóbal Longton
 
Resume - Mechanical Engineer
Resume - Mechanical EngineerResume - Mechanical Engineer
Resume - Mechanical Engineer
Adeel Khan
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 
Plasma Technology
Plasma TechnologyPlasma Technology
Plasma Technology
anshul parmar
 
Digital plan for Men's Biore
Digital plan for Men's BioreDigital plan for Men's Biore
Digital plan for Men's Biore
Quang Hưng (Pipopy)
 
Quemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos IndisaQuemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos Indisa
Sebastian Villegas
 
Slideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_pacoSlideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_paco
VicentMenaAsix
 

Viewers also liked (20)

UC-83FNA0DB
UC-83FNA0DBUC-83FNA0DB
UC-83FNA0DB
 
Embedded training
Embedded trainingEmbedded training
Embedded training
 
Trabajo de didactica
Trabajo de didacticaTrabajo de didactica
Trabajo de didactica
 
CV HARIS
CV HARISCV HARIS
CV HARIS
 
Separación siamesas
Separación siamesasSeparación siamesas
Separación siamesas
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Motores de búsqueda
Motores de búsquedaMotores de búsqueda
Motores de búsqueda
 
Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3Fiqih kelas 7 sm 2 pelajaran 3
Fiqih kelas 7 sm 2 pelajaran 3
 
Anticoagulación y cirugía
Anticoagulación y cirugíaAnticoagulación y cirugía
Anticoagulación y cirugía
 
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
4th Nov 15 - Creating Great Minimum Viable Products - Brian Crofts
 
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...
 
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad TécnicaDocumentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
Documentación Proyecto # 73 Premios Eureka 2011 Mención Innovatividad Técnica
 
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad SocialPresentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
Presentación Proyecto # 51 Eureka 2011 Mención Innovatividad Social
 
Desforramiento de extremidad inferior
Desforramiento de extremidad inferior Desforramiento de extremidad inferior
Desforramiento de extremidad inferior
 
Resume - Mechanical Engineer
Resume - Mechanical EngineerResume - Mechanical Engineer
Resume - Mechanical Engineer
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Plasma Technology
Plasma TechnologyPlasma Technology
Plasma Technology
 
Digital plan for Men's Biore
Digital plan for Men's BioreDigital plan for Men's Biore
Digital plan for Men's Biore
 
Quemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos IndisaQuemados Graves. Resultados comparativos Indisa
Quemados Graves. Resultados comparativos Indisa
 
Slideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_pacoSlideshare 1os auxilios paula_vicent_paco
Slideshare 1os auxilios paula_vicent_paco
 

Similar to Precision Agriculture Data Ingestion Using Kafka

Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
AIMS (Agricultural Information Management Standards)
 
Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...
AIMS (Agricultural Information Management Standards)
 
Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...
journalBEEI
 
4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre
Bhushan Deore
 
OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012
Dennsy
 
Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications
IoTForum | TiE Bangalore
 
finalDraftPoster
finalDraftPosterfinalDraftPoster
finalDraftPoster
Matthew Kennedy
 
FinalReport
FinalReportFinalReport
FinalReport
John Pham
 
23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)
IAESIJEECS
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
Edureka!
 
Ashwin_Thesis
Ashwin_ThesisAshwin_Thesis
Ashwin_Thesis
Ashwin Ramesh
 
Process Model
Process ModelProcess Model
Process Model
Karel Charvat
 
Intelligent Weather Service
Intelligent Weather Service Intelligent Weather Service
Intelligent Weather Service
Uday Sharma
 
Ar Quality M System project presentation
Ar Quality M System project presentationAr Quality M System project presentation
Ar Quality M System project presentation
bikramjitchoudhury5
 
Hh3413401342
Hh3413401342Hh3413401342
Hh3413401342
IJERA Editor
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
Manuel Sehlinger
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
Comsysto Reply GmbH
 
Realtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultreRealtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultre
Bhushan Deore
 
UDP Report
UDP ReportUDP Report
UDP Report
James Dianics
 
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine LearningIRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET Journal
 

Similar to Precision Agriculture Data Ingestion Using Kafka (20)

Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
Dynamic Integrations of Crop Data and Corresponding Meteorological Data based...
 
Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...Dynamic integrations of crop data and corresponding meteorological data based...
Dynamic integrations of crop data and corresponding meteorological data based...
 
Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...Real-time monitoring system for weather and air pollutant measurement with HT...
Real-time monitoring system for weather and air pollutant measurement with HT...
 
4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre4 realtime wether station for monitoring and control of agricultre
4 realtime wether station for monitoring and control of agricultre
 
OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012OpenWeatherMap on the Open GIS Conference 2012
OpenWeatherMap on the Open GIS Conference 2012
 
Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications Wireless Sensor Network for AgriTech Applications
Wireless Sensor Network for AgriTech Applications
 
finalDraftPoster
finalDraftPosterfinalDraftPoster
finalDraftPoster
 
FinalReport
FinalReportFinalReport
FinalReport
 
23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)23 2 may17 28apr 16137 (6575 new)(edit)
23 2 may17 28apr 16137 (6575 new)(edit)
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Ashwin_Thesis
Ashwin_ThesisAshwin_Thesis
Ashwin_Thesis
 
Process Model
Process ModelProcess Model
Process Model
 
Intelligent Weather Service
Intelligent Weather Service Intelligent Weather Service
Intelligent Weather Service
 
Ar Quality M System project presentation
Ar Quality M System project presentationAr Quality M System project presentation
Ar Quality M System project presentation
 
Hh3413401342
Hh3413401342Hh3413401342
Hh3413401342
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Realtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultreRealtime wether station for monitoring and control of agricultre
Realtime wether station for monitoring and control of agricultre
 
UDP Report
UDP ReportUDP Report
UDP Report
 
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine LearningIRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
IRJET- Smart Management of Crop Cultivation using IoT and Machine Learning
 

Precision Agriculture Data Ingestion Using Kafka

  • 1. PRECISION AGRICULTURE SUPPORT USING SCALA/SPARK Project Report SRIRAM RV SPRING SEMESTER ADVISOR: PROFESSOR BRAD RUBIN
  • 2. 2 Table of Contents 1.0 PURPOSE OF PROJECT.......................................................................................................................4 2.0 PROJECT DESCRIPTION.....................................................................................................................4 2.0 Why Agriculture Data.........................................................................................................................5 3.0 DATASET ..............................................................................................................................................5 3.1 Data Source:........................................................................................................................................5 3.2 Details about Dataset: .........................................................................................................................5 3.3 Sample Data........................................................................................................................................5 Weather data .........................................................................................................................................5 Moisture Data........................................................................................................................................5 Image Data............................................................................................................................................6 3.4 Schema................................................................................................................................................6 Weather data .........................................................................................................................................6 Moisture Data........................................................................................................................................6 3.5 Data Description: ................................................................................................................................7 Weather Data: .......................................................................................................................................7 Moisture Data........................................................................................................................................7 4.0 PROJECT IMPLEMENTATION...........................................................................................................8 4.1 Data Ingestion using Kafka.................................................................................................................8 4.2 Kafka producer....................................................................................................................................8 4.4 Kafka Broker.......................................................................................................................................9 4.5 Kafka Consumer ...............................................................................................................................10 5.0 ADDITIONAL TOOLS........................................................................................................................10 5.1 Maven ...............................................................................................................................................10 5.2 Scala Build tool.................................................................................................................................11 5.3 Git .....................................................................................................................................................11 6.0 OUTPUT INTERPRETATION............................................................................................................12 7.0 IMPROVING THE KAFKA ARCHITECTURE.................................................................................12 7.1. Making kafka architecture more robust ...........................................................................................12 7.2. Having dedicated Kafka Broker to improve performance ...............................................................13
  • 3. 3 8.0 FUTURE RESEARCH.........................................................................................................................13 9.0. CONCLUSION....................................................................................................................................13 BIBLIOGRAPHY.......................................................................................................................................14
  • 4. 4 1.0 PURPOSE OF PROJECT Big data tools over last few years has been focused on both structured and unstructured data. However, image processing is one area where it needs more of attention and it has been my area of interest too. With the help of this project, I will get an opportunity to experiment with streaming images and weather data captured in the UST greenhouse, and also get a feel for image processing with Scala/Spark on Hadoop more generally. I will gain experience in technologies such as Scala, Spark, Spark streaming, and image processing in the domain of food technology that will give me skills that I cannot otherwise obtain in the GPS curriculum. 2.0 PROJECT DESCRIPTION The purpose of the project is to stream real-time weather data captured by both direct sensors and RGB images captured by the drones to perform image processing and weather data analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image processing and streaming with Spark are knew technologies to GPS, part of the project will focus on experimenting with different tools and find out more reliable way of storing images and streamed data in HDFS. The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST School of Engineering. The greenhouse has a local weather station that will be broadcasting weather data such as temperature, humidity, light intensity, barometric pressure, position (latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at 10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field use which is useful for both analysis of plant health and creating a model for each of the six plant species that will be grown. In addition, high resolution images will be taken of the plants in the visible and near IR regions of the light spectrum. The periodicity of these images will be every couple of days.
  • 5. 5 2.0 Why Agriculture Data With the help of agricultural data, I will get an opportunity to experiment withstreaming images and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed and gives me experience on working with data from food technology. 3.0 DATASET 3.1 Data Source: The data source used for this project is the live streaming of weather and moisture data captured using sensors through Arduino chip and Streamed using Kafka producer. 3.2 Details about Dataset:  The sensor data were captured for every second.  Total number of days of weather data stored in HDFS is 90 days.  Total number of days of moisture data stored in HDFS is 85days.  Total number of days of image data stored is 90 days. 3.3 Sample Data Weather data Fig 1: Sensor Weather data from Arduino Moisture Data
  • 6. 6 Fig 2: Sensor data from Arduino Image Data Image data was captured every alternative day over a period of 90 days . Fig 3: Images from the greenhouse 3.4 Schema Weather data Date Time Wind direction Wind Speed Humidity Temperature Rain Pressure Battery Light Level Table 1 : Weather Data Schema Moisture Data Date Time Moist 2 Moist 6 Moist 8 Moist 11 Moist 10 Moist 1 Moist 9 Moist 7 Moist 5 Temp Par Table 2 : Weather Data Schema
  • 7. 7 3.5 Data Description: Weather Data: Date & time : Timestamp of the recording Wind Direction: Direction of wind Wind Speed: Speed of wind Wind Gust: Gust of wind Humidity: Percentage of water in air Temperature: Temperature Rain: Rain percentage Pressure: Air pressure Battery: Battery of Arduino Light: Light exposure Moisture Data Moist 2: Moisture of plot 2 Moist 6: Moisture of plot 6 Moist 8: Moisture of plot 8 Moist 11: Moisture of plot 5 Moist 10: Moisture of plot 10 Moist 1: Moisture of plot 1 Moist 9: Moisture of plot 9 Moist 7: Moisture of plot 7 Moist 5: Moisture of plot 5 Temp: Soil temperature PAR: Moisture metrics
  • 8. 8 4.0 PROJECT IMPLEMENTATION 4.1 Data Ingestion using Kafka Kafka is the distributed messaging system which allows to transmit moisture and weather data from Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker. The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the addresses of the broker thought zookeeper. Fig 4: Kafka Architectural Diagram 4.2 Kafka producer Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka brokers which is attached to the header of packet while sending the data. The weather data, moisture data and image data differentiated using different topics such as “weather-data”, ”moisture-data” and “image-data”. Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap server is the broker ID list of the Kafka broker.
  • 9. 9 Fig 5 : Configuring the kafka producer Below is the snippet that is used to create message object which contains topic and messages to be sent to the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages, sends it to the broker. Fig 6: Sending the message to kafka broker 4.4 Kafka Broker Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling hundreds and hundreds of read and write operation per second. It can elastically expand without downtime. Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than capability of single machine. The Kafka broker can be monitored using Zookeeper using port number 2181.By default Kafka broker comes with retention period of 168 hours.
  • 10. 10 Fig 7 : Monitoring the messages using Zookeeper 4.5 Kafka Consumer Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing. Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that we are interested to fetch from the broker. Fig 8 : Configuring Kafka Consumer 5.0 ADDITIONAL TOOLS 5.1 Maven Maven was used as the dependency management to bring in all the jar from the server to the local repository. This dependency injection help to develop the code from the windows environment .Maven helped to specify the version of spark and kafka that was used and all the jar files related that version of spark was stored in the local repository.
  • 11. 11 Fig 9 : Dependency Injection 5.2 Scala Build tool Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm using winscp. Fig 10 : SBT build 5.3 Git Git is online code repository for storing all the code related to project. It offers all of the distributed revision control and source code management (SCM). Git was used for precision agriculture project repository to store the code online and share with team. Below is the git link for the precision agriculture. https://github.com/sri303030/Data-Ingestion-using-Kafka
  • 12. 12 6.0 OUTPUT INTERPRETATION The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to distinguish between weather data and moisture data. Below is the output from the weather data folder Fig 11: weather data folder Below is output from the moisture data folder Fig 12: Moisture Data Folder 7.0 IMPROVING THE KAFKA ARCHITECTURE Kafka Architecture can be improved in two ways: 1. Making kafka architecture more robust. 2. Having dedicated Kafka Broker to improve performance 7.1. Making kafka architecture more robust In precision agriculture project, both broker and consumer were running on the same system as the requirement of data ingestion was to store data in HDFS. In order make the architecture more robust, consumer system must be a remote system or cluster which have the access to kafka broker this way the architecture will be more robust and in case of failure in kafka broker the data can be retrived from consumer.
  • 13. 13 7.2. Having dedicated Kafka Broker to improve performance Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that kafka broker has got over hadoop environment and speeds up all the processes. 8.0 FUTURE RESEARCH 1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive. 2. Implement real time machine learning using Spark Mllib 3. Connect the live data to the reporting tool and analyze live data and create useful reports. 9.0. CONCLUSION Kafka is rapidly growing distributed messaging system having various application in the field of engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure to handle different big data problems in real time situation and helped me understand kafka architecture.
  • 14. 14 BIBLIOGRAPHY http://kafka.apache.org/ Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of Korea. http://www.aclweb.org/anthology/P12-3020