This document describes a project to implement precision agriculture support using Scala and Spark on a Hadoop cluster. The project ingests real-time weather and moisture data from sensors in a greenhouse via Kafka producers and consumers. Weather data including temperature, humidity, and images of plants are captured every 10 seconds and 2 days respectively and stored in HDFS. The Kafka architecture is improved by making it more robust with remote consumers and using dedicated brokers to avoid cluster noise and improve performance for real-time analytics and machine learning on the streaming agricultural data.
This document provides an introduction to Archimate, an open standard for enterprise architecture modeling. It describes Archimate as a common language for describing business processes, organizational structures, information flows, IT systems, and technical infrastructure. The document outlines key Archimate concepts including layers, aspects, cross-layer dependencies, and views/viewpoints. It provides examples of how to model different layers and relationships between layers in an Archimate diagram.
Exercícios - Tutorial ETL com Pentaho Data IntegrationJarley Nóbrega
1. O documento descreve como criar uma transformação no Pentaho Data Integration (PDI) para gerar a mensagem "Hello World" utilizando dois steps: um para gerar linhas e outro vazio.
2. Também mostra como expandir essa transformação para ler dados de um arquivo texto, adicionar campos constantes, gerar sequências e gravar o resultado em um novo arquivo texto.
3. Por fim, explica como criar uma conexão com um banco de dados Apache Derby para armazenar dados no futuro.
Multi-tenancy: A Core commercetools DifferentiatorKelly Goetsch
Multi-tenancy allows a single instance of an application to serve multiple customers. It provides significant cost savings over single-tenancy architectures by sharing infrastructure and allowing automatic scaling. Commercetools is a multi-tenant commerce platform built from the ground up for the cloud, using microservices and supporting infinite elasticity and self-service capabilities like public clouds.
Algumas das principais características do NoSQLEric Silva
Este trabalho tem como objetivo apresentar
algumas das principais características do NoSQL,
um banco de dados que possui como diferencial um
grande poder de escalabilidade, proporcionando
uma maior capacidade de armazenamento e
velocidade.
The document summarizes the objectives, process, roles, and activities involved in a Solution Architecture Concept workshop. The workshop is intended to define the scope, components, and architectural overview of a proposed IT solution by bringing together stakeholders to develop a shared understanding of business needs and technical requirements. Key parts of the workshop include preparation activities, a two-day session to discuss business/functional and technology/implementation views, and documentation of findings.
PDTI - Plano Diretor de Tecnologia da Informação (modelo)Fernando Palma
O documento apresenta um modelo de Plano Diretor de Tecnologia da Informação (PDTI) com a missão, visão, situação atual, necessidades, portfólio de projetos e planos de gerenciamento. O PDTI tem como objetivo direcionar as decisões e ações de TI para o período, contemplando projetos estratégicos, indicadores e metas.
This document provides an overview and deep dive into Microsoft Planner. It begins with an introduction and overview of Planner concepts such as plans, tasks, buckets, and the Planner mobile apps. It then covers managing plans with PowerShell and the Microsoft Graph API. The document concludes with the Planner roadmap, covering features currently available and in development.
O documento discute os principais desafios no uso do framework TOGAF, incluindo: (1) a arquitetura corporativa requer um grande esforço organizacional, (2) a conexão entre a arquitetura e a estratégia é fundamental, (3) é necessária uma avaliação da prontidão da organização para transformação.
FedRAMP High & AWS GovCloud (US): FISMA High RequirementsAmazon Web Services
The document discusses AWS GovCloud (US), a region intended for customers with strict regulatory requirements. It received a FedRAMP High authorization in June 2016, allowing federal agencies to run highly sensitive workloads in the cloud. FedRAMP establishes a standardized approach for security assessment and authorization of cloud services, reducing redundant work. The authorization allows agencies to leverage existing authorizations rather than each conducting their own security reviews.
Ferramenta de backup para o ambiente virtual. Nessa apresentação demonstramos as principais funcionalidades, inclusive SureBackup, Replicação, Instant Recovery, Sandbox e Tape Backup.
Faça Download da versão grátis
http://www.veeambackup.com.br/#contato
Telosys project booster Paris Open Source Summit 2019Laurent Guérin
Telosys is a code generation tool that allows developers to generate repetitive code automatically from models and templates, saving significant time. It uses lightweight models defined in text files or from a database schema to represent project entities. Templates written in the Velocity Template Language are used to generate specific code files for different targets from the model. The generated code is customizable and independent of Telosys. It aims to accelerate development without requiring major strategic decisions by being easy to use and remove if desired.
User Story Canvas by Maksim Gaponov
Next webinars:
In Search for Team's Efficiency https://attendee.gotowebinar.com/register/471689760712542978?source=slideshare
Best Tools to Develop Soft Skills in Scrum and Agile Development https://attendee.gotowebinar.com/register/3974870146644735746?source=slideshare
Building shared understanding in a scaled and distributed Agile environment is a challenge. The big picture view of a product is often lost, team members in one location are not aware of activities in other locations, and everyone is using their own standards – all while common misunderstandings of the User Story concept create an even bigger mess.
At Luxoft Agile Practice, we have developed a special tool for discussing and documenting User Stories. We call it User Story Canvas.
Scopes in mule allow you to group message processors together. Certain scopes like Poll, Message Enricher, and Until Successful allow only one embedded message processor. Scopes can execute synchronously, halting the parent flow until the child scope completes, or asynchronously where the parent flow continues independently. The Cache Scope stores and reuses frequently called data, and the Composite Source handles messages from multiple sources. The Foreach splits messages into parts for processing while the Poll periodically retrieves messages. The Transactional scope executes a series of steps atomically, while the Until Successful retries an embedded processor until it succeeds or hits retry limits.
Introduction To Microsoft SharePoint 2013Vishal Pawar
Microsoft SharePoint is a Web application platform developed by Microsoft. First launched in 2001, SharePoint has historically been associated with intranet content management and document management, but recent versions have significantly broader capabilities
For KM practitioners, Agile frameworks have long been important for optimizing stakeholder value and satisfaction in KM initiatives. Over 20 years ago, a group of software developers revolutionized their field by introducing the Agile Manifesto to guide their industry in adopting Agile values, frameworks, and practices. However, until now, KM practitioners have lacked a formal framework demonstrating how to apply Agility to KM. In short, it is time to codify these Agile principles in a manner suited for the KM profession. Leveraging the original Agile Manifesto for inspiration, Andrew Politi and Megan Salerno introduced “The Agile KM Manifesto” at KM World 2022. The presentation is designed to initiate a conversation amongst KM practitioners across the industry about this initial version of the Agile KM Manifesto (the 'AKM'), and solicit feedback on future iterations.
Next, the presenters walked through three EK case studies demonstrating how the application of its principles could have saved significant time in those initiatives.
First, we described how a global non-profit approached EK to address duplicate and outdated content, and the lack of content creation standards.
Applicable AKM principle: "Content should only be available to users if it is new, essential, reliable, dynamic, and reusable. If these criteria are not met, the content must be cleaned-up or archived accordingly.”"
Next was a discussion of how national nuclear research laboratory struggled to share and discover knowledge from retiring employees and compartmentalized silos.
Applicable AKM principle: “Tacit knowledge and expertise should be proactively and formally captured and stored in the same manner as explicit knowledge.”
Finally, the presenters described how one of the largest multinational athletic apparel companies struggled to help geographically separated teams collectively and collaboratively reuse knowledge and create content across the globe, even functionally similar focus roles.
Applicable AKM principle: “All KM efforts must leverage a common language. Develop, socialize, and employ a common KM language so stakeholders don't speak past each other and can maintain consensus throughout your KM effort.”
Ultimately, this presentation served to introduce The AKM to the broader community, demonstrate its value, and solicit input from across the industry.
121022 - The art of getting business analysis right - slide deck.pptxAustraliaChapterIIBA
IIBA® Melbourne brings you a virtual event where you learn from Maria's wealth of knowledge and experience in business analysis. Maria will take you through her business analysis journey and the complexity of the business world, in particular she will cover the 7 key ingredients in getting business analysis right.
Overview of the IT4IT tooling market in 2022.
Key trends in the IT4IT / DevOps tooling market are:
- Strategic portfolio management / portfolio backlog management (scaling agile on the enterprise level integrating with Enterprise architecture and Application / Product Portfolio Management)
- On-line collaboration & communication tools supporting team of team planning, problem solving, etc.
- Value stream management (an emerging tooling category) providing visibility across the end-to-end IT value streams
- Multi-cloud discovery & visibility on usage, costs and compliance
- Integrating DevOps tool chain (e.g. CICD pipeline) with the ITSM platform and CMDB
- Integrating security, risk and compliance management into the DevOps tool chain
- AIOps and observability management, consoliding metrics, logs, events mapped to a real-time service model
- Security operations, integrating security monitoring, vulnerability scanning, etc. into end-to-end detect to correct value streams
- Enterprise Service Management (ITSM vendors providing omni-channel services across IT, HR, Facilities, Finance, etc.)
- Leveraging AI/ML in various capabilities such test management, security operations, incident management, etc.
- Sustainability management integrated in IRM/GRC platforms
And last but not least:
- Service / Product portfolio management (managing the portfolio of service/applications, supporting product centric operating models, linked to business capabilities, product owners and teams)
TeamsNation 2022 - Governance for Microsoft Teams - A to Z.pptxJasper Oosterveld
Thinking about Governance doesn't get many people excited. That said, defining and implementing Governance is the key to a successful rollout and adoption of Microsoft Teams. Jasper Oosterveld, Microsoft MVP & Modern Workplace Consultant, is taking a hands-on approach around Governance for Microsoft Teams. You can expect real world advise around a variety of topics: naming convention, external access, expiration policy, information protection and the creation process. After this session you are ready to go back and build your own successful Governance for Microsoft Teams.
Este documento presenta una unidad didáctica de 10 sesiones sobre la Prehistoria para estudiantes de segundo ciclo de educación infantil. La unidad busca que los estudiantes conozcan aspectos clave de la vida prehistórica como la alimentación, viviendas y pinturas rupestres. Las sesiones incluyen videos, lectura de cuentos, una excursión al museo, y la creación de un mural y disfraces para que los estudiantes presenten lo que aprendieron a sus padres en la última sesión.
Haris Sutanto is an Indonesian telecommunications engineer born in 1990 in Kebumen, Indonesia. He graduated from elementary, junior high, and high school in Kebumen. His skills include the installation, quality control, and commissioning of BTS and TRS systems for Nokia and Huawei networks. He has over 10 years of experience as an engineer and team leader working on 2G, 3G, and LTE networks for various telecom companies throughout Indonesia. His areas of expertise include troubleshooting, installation, and maintenance of BTS, TRS, and FMR systems.
Cristóbal Longton Brunet
Cirujano Plástico chileno egresado de la Escuela de Medicina de la Pontificia Universidad Católica de Chile, miembro asociado de la Sociedad Chilena de Cirugía Plástica Reconstructiva y Estética, como también de IPRAS y de FILACP.
This document outlines the responsibilities and activities for managing the finances of a bakery and drinks department. It includes collecting financial information, estimating expenses and budgets, managing data, and reporting on income and expenses. A table shows the revenue from various food and drink items. It also lists lessons learned, such as the importance of time management, flexibility, and cooperation between departments. The experience will help with planning, controlling inventory, and effectively communicating financial processes for future projects.
Los motores de búsqueda son herramientas fundamentales en la era de la información, permitiendo encontrar rápidamente una gran variedad de información en Internet para estudiantes, trabajadores y aficionados. Los motores de búsqueda indexan la gran cantidad de datos disponibles en línea y ofrecen métodos como palabras clave y operadores lógicos para realizar búsquedas más específicas. Aunque la información en Internet es útil, los usuarios deben tener cuidado con el contenido malicioso y hacer un uso responsable de los datos.
Salat sunah muakad merupakan salat sunah yang sangat dianjurkan, termasuk salat rawatib (mengiringi salat fardu), salat sunah malam seperti tarawih dan tahajud, serta salat Idain pada hari raya Idul Fitri dan Idul Adha. Salat-salat tersebut memiliki ketentuan dan cara pelaksanaan tersendiri.
4th Nov 15 - Creating Great Minimum Viable Products - Brian CroftsCity Unrulyversity
The document discusses Intuit's mission to profoundly improve customers' financial lives through their products and services. It emphasizes the importance of deep customer empathy and evoking positive emotions by exceeding expectations. Several lean startup techniques are proposed for testing assumptions, including creating video MVPs to demonstrate concepts and measure interest through beta sign-ups, as Dropbox did. Metrics on cohort conversion rates are presented. Personas like the "Life Event Customer" are discussed. The document concludes by considering what customer delight might look like.
The Analysis of the Impact of Capital Mobility on Bubbly Episodes Creation in...Andrii Chlechko
The author has developed a stylized experimental model, which is used to analyze the impact of the introduction of capital mobility on the assets’ prices behaviour in the controlled laboratory environment. The introduction of capital mobility is a subject to financial friction in a form of borrowing costs and collateral borrowing. The model is based on SSW-type double-auction market with finite horizon. Current paper analysis two types of markets: one asset market and two assets market. Such a division is crucial for the analysis of resources allocation and subjects decision making. The division of the analyzed population into productive and unproductive investors creates the environment, in which the structure of capital mobility tends to impact the overall market efficiency. The overall combination of presented factors allows the author to analyze the market efficiency based on the deviation of the market traded price over the expected average value of the assets.
http://www.wz.uw.edu.pl/portaleFiles/6133-wydawnictwo-/Rynek_kapitałowy_szanse_2018.pdf
El documento describe los desafíos que enfrenta la industria del vidrio en Venezuela, incluyendo la obsolescencia tecnológica y factores externos como restricciones económicas. La empresa Owens Illinois busca mejorar la eficiencia mediante la automatización del proceso de recolección y transporte de vidrio de desperdicio para reducir costos y aumentar la productividad. El objetivo es diseñar un sistema que permita reciclar el vidrio de forma segura y con menos contaminantes para mejorar la calidad del proceso productivo.
Este documento describe un proyecto de servicio comunitario en entomología forense en la Universidad de Carabobo en Venezuela. El proyecto busca capacitar al personal de investigaciones criminales y estudiantes en el uso de insectos como evidencia en casos criminales a través de talleres y la creación de una guía de dípteros de importancia forense. Hasta la fecha se han dictado tres talleres a 159 personas y se ha establecido un blog y boletín informativo sobre el tema.
Cristóbal Longton Brunet
Cirujano Plástico chileno egresado de la Escuela de Medicina de la Pontificia Universidad Católica de Chile, miembro asociado de la Sociedad Chilena de Cirugía Plástica Reconstructiva y Estética, como también de IPRAS y de FILACP.
The document provides an overview of different machine learning algorithms used to predict house sale prices in King County, Washington using a dataset of over 21,000 house sales. Linear regression, neural networks, random forest, support vector machines, and Gaussian mixture models were applied. Neural networks with 100 hidden neurons performed best with an R-squared of 0.9142 and RMSE of 0.0015. Random forest had an R-squared of 0.825. Support vector machines achieved 73% accuracy. Gaussian mixture modeling clustered homes into three groups and achieved 49% accuracy.
Plasma is the fourth state of matter and consists of ionized gas. Plasma technology has many applications including lighting, semiconductor manufacturing, and surface treatments. Plasma displays use small cells of ionized gas to generate visible light and produce images. Plasma converters use intense heat from plasma torches to break down waste through pyrolysis, producing synthetic gas, slag, and heat.
Este documento presenta los resultados de un estudio retrospectivo que compara los resultados obtenidos al aplicar el protocolo de tratamiento de quemados graves del Hospital del Trabajador en una clínica privada, Clínica Indisa. Se analizaron 26 fichas clínicas de pacientes hospitalizados entre julio de 2007 y julio de 2008, encontrando perfiles similares de pacientes y tasas de mortalidad comparables. El estudio concluye que es posible obtener resultados equivalentes aplicando el mismo protocolo terapéutico y contando con un equipo tratante homogéneo
El documento proporciona información sobre los primeros auxilios, incluyendo la reanimación cardiopulmonar, la hemorragia y las quemaduras. Explica que los primeros auxilios consisten en técnicas para aplicar la primera atención a un accidentado hasta que llegue la asistencia médica. Describe brevemente los procedimientos de reanimación cardiopulmonar, cómo tratar diferentes tipos de hemorragias y quemaduras, y las acciones inmediatas para cada una.
The document summarizes a system for integrating crop data and meteorological data using a standardized data exchange framework. The system uses a metadata database and broker service called MetBroker to provide consistent access to heterogeneous weather databases. Crop data from different sources can be uploaded and integrated into a central database. The system then allows users to query the integrated crop and weather data and analyze relationships to support applications like crop modeling.
Real-time monitoring system for weather and air pollutant measurement with HT...journalBEEI
This system summarizes the development of a real-time monitoring system for weather and air pollutant measurement with an HTML-based user interface application. The system includes sensors to measure weather parameters like wind, rain, temperature, and humidity, as well as air pollutants. A microcontroller collects and transmits the sensor data to a cloud database in real-time. A web application then displays the data for users. Field testing showed the sensors accurately measured various weather and pollution levels over a day. The HTML interface provided an informative display of the real-time environmental information.
4 realtime wether station for monitoring and control of agricultreBhushan Deore
This document describes a project to create a real-time weather station for monitoring and controlling agriculture. The weather station will use sensors to measure temperature, humidity, CO2 levels, and light intensity in agricultural fields. It will send the sensor data in real-time to a webpage using Ethernet and TCP/IP protocols so farmers can remotely monitor environmental conditions. The system also allows farmers to control devices like pumps and sheds through the webpage to automatically regulate the field's temperature, if needed.
OpenWeatherMap on the Open GIS Conference 2012Dennsy
OpenWeatherMap is an open source project that collects weather data from over 40,000 weather stations worldwide and makes it available for free through interactive maps and APIs. It believes weather data should be open and accessible to all. The site provides current and forecasted weather for over 120,000 cities, as well as weather maps, analytics, and an API. Data is collected from both professional and private weather stations and combined with forecasts from multiple global and local models to provide accurate global and local weather coverage.
This presentation was given by Prof. K N Subramanya, Principal, RV College of Engineering & CoE IoT during IoTForum's AgriTech Day 2019 on February 9, 2019 at NIANP-ICAR, Bangaluru
This document discusses research integrating atmospheric sensors onto unmanned aerial vehicles (UAVs) at the University of Wisconsin-Eau Claire. The goal is to monitor temperature, humidity, wind, and ozone in real-time using sensors on the UAV. Challenges include power, weight restrictions, data acquisition, and storage. Researchers are working to implement a microcontroller for onboard data collection and storage, and to output data in real-time through the autopilot for adjusted flight based on conditions. Integration of sensors and real-time data transmission pose engineering hurdles to overcome.
This project developed a new method for visualizing and interacting with 3D atmospheric data from NASA satellites. Python scripts convert cloud data from NASA's Atmospheric Infrared Sounder (AIRS) into 3D models in Blender. The models represent cloud properties like temperature and density using colored cylinders. Animations and comparisons with other datasets demonstrate the benefits of 3D over 2D views. The goal is to create photorealistic cloud models and integrate the visualizations into virtual and augmented reality applications to enhance scientific understanding.
Weather monitoring and forecasting are very important in agricultural sectors. There are several data need to be collected in real-time to support weather monitoring and forecasting systems, such as temperature, humidity, air pressure, wind speed, wind direction, and rainfall. The purpose of this research to develop a real-time weather monitoring system using a parallel computation approach and analyze the computational performance (i.e., speed up and efficiency) using the ARIMA model. The developed system wireless has been implemented on sensor networks (WSN) platform using Arduino and Raspberry Pi devices and web-based platform for weather visualization and monitoring. The experimental data used in our research work is a set of weather data acquired and collected from January until March 2017 in Bogor area. The result of this research is that the speed up of the using eight processors computation three times faster than using a single processor, with the efficiency of 50%.
Forrester predicts, CIOs who are late to the Hadoop game will finally make the platform a priority in 2015. Hadoop has evolved as a must-to-know technology and has been a reason for better career, salary and job opportunities for many professionals.
The document describes a project report submitted by R Ashwin for the award of a Bachelor of Technology degree. It discusses the distributed implementation of the graph database system DGraph. The key steps in the distributed version include sharding the data, assigning unique IDs, loading the data into different servers, and enabling communication between servers through network calls. Performance evaluation on the Freebase film dataset showed that the distributed version had higher throughput and lower latency than the centralized version, especially as the load and computational power increased.
Precision Farming (PF) is introduced and history in short is reviewed. Essential activities of GPS locating, soil mapping, GIS dataprocessing and presentation and VRT application are described. Basic principles of PF are shown to be:
• Precision Farming is the management process of within-field variability.
• This management must bring profit or at least reduce the risk of loss
• This management must reduce the impact of farming on environment.
Techniques used in Precision Farming are described. Economics of Precision Farming is discussed. A general cost/benefit analysis and profitability of PF are reviewed. The price of PF adoption facing a farmer is discussed. Methods of process analysis and activity based costing are shown as useful instruments for PF process analysis and model building. PF process is analysed and process graph is developed.
The document describes an intelligent weather service that provides weather reports in a natural language format with emoticons. The service allows users to view weather reports on mobile devices or laptops based on their location. The reported weather is described intelligently - if the temperature is between 10 and 30 degrees Celsius, a happy smiley emoticon is displayed along with text like "today's weather is quite cool and sunny". The document also outlines the use cases and UML diagrams for the weather service.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Many people promise fast data as the next step after big data. The idea of creating a complete end-to-end data pipeline that combines Spark, Akka, Cassandra, Kafka, and Apache Mesos came up two years ago, sometimes called the SMACK stack. The SMACK stack is an ideal environment for handling all sorts of data-processing needs which can be nightly batch-processing tasks, real-time ingestion of sensor data or business intelligence questions. The SMACK stack includes a lot of components which have to be deployed somewhere. Let’s see how we can create a distributed environment in the cloud with Terraform and how we can provision a Mesos-Cluster with Mesosphere Datacenter Operating System (DC/OS) to create a powerful fast data platform.
Many people promise fast data as the next step after big data. The idea of creating a complete end-to-end data pipeline that combines Spark, Akka, Cassandra, Kafka, and Apache Mesos came up two years ago, sometimes called the SMACK stack. The SMACK stack is an ideal environment for handling all sorts of data-processing needs which can be nightly batch-processing tasks, real-time ingestion of sensor data or business intelligence questions. The SMACK stack includes a lot of components which have to be deployed somewhere. Let’s see how we can create a distributed environment in the cloud with Terraform and how we can provision a Mesos-Cluster with Mesosphere Datacenter Operating System (DC/OS) to create a powerful fast data platform.
Realtime wether station for monitoring and control of agricultreBhushan Deore
This document provides an overview of a project to develop a real time weather station for monitoring and controlling agriculture. The project aims to provide accurate and timely weather data through sensors measuring temperature, humidity, CO2 levels and light intensity. This data will be displayed on a webpage in real time. The system uses a DSPIC processor and Ethernet controller chip to gather sensor data and connect to the webpage. This will allow farmers and others to remotely monitor weather conditions. It discusses the scope, objectives and organization of the project report, as well as introducing real time weather stations and the basic concepts involved like the use of sensors, networking protocols and webpage display of data.
This document summarizes the development of a distributed simulation toolbox for MATLAB/Simulink. The toolbox allows for real-time communication between systems using UDP. It was developed in two phases: first, test applications in C++, then S-functions for MATLAB. The C++ applications demonstrated singlecast, multicast, and broadcast transmissions of data arrays. The S-functions translate this functionality into Simulink blocks for UDP send and receive with parameters for port, IP, and data type.
IRJET- Smart Management of Crop Cultivation using IoT and Machine LearningIRJET Journal
This document proposes a smart agriculture system using IoT and machine learning to assist farmers in crop management. Sensors would collect real-time data on temperature, humidity, soil moisture, light intensity, and gas presence, and send it to the cloud for analysis. A KNN machine learning algorithm would predict the most suitable crop to grow based on the sensor data and a training dataset. The system would include a web interface for farmers to access sensor readings, data visualizations, and crop predictions. It aims to help farmers increase yields and produce food more sustainably with low-cost smart devices and automated recommendations.
Similar to Precision Agriculture Data Ingestion Using Kafka (20)
4. 4
1.0 PURPOSE OF PROJECT
Big data tools over last few years has been focused on both structured and unstructured data.
However, image processing is one area where it needs more of attention and it has been my
area of interest too. With the help of this project, I will get an opportunity to experiment with
streaming images and weather data captured in the UST greenhouse, and also get a feel for
image processing with Scala/Spark on Hadoop more generally.
I will gain experience in technologies such as Scala, Spark, Spark streaming, and image
processing in the domain of food technology that will give me skills that I cannot otherwise
obtain in the GPS curriculum.
2.0 PROJECT DESCRIPTION
The purpose of the project is to stream real-time weather data captured by both direct sensors
and RGB images captured by the drones to perform image processing and weather data
analytics leveraging the Scala/Spark ecosystem on a Hadoop computing cluster. Since image
processing and streaming with Spark are knew technologies to GPS, part of the project will
focus on experimenting with different tools and find out more reliable way of storing images
and streamed data in HDFS.
The UST greenhouse will be growing plants for the Precision Agriculture project run by the UST
School of Engineering. The greenhouse has a local weather station that will be broadcasting
weather data such as temperature, humidity, light intensity, barometric pressure, position
(latitude/longitude), wind speed and direction and rainfall. The broadcast will be continuous at
10 second intervals (in CSV format) .The equipment in the greenhouse is a prototype for field
use which is useful for both analysis of plant health and creating a model for each of the six
plant species that will be grown. In addition, high resolution images will be taken of the plants
in the visible and near IR regions of the light spectrum. The periodicity of these images will be
every couple of days.
5. 5
2.0 Why Agriculture Data
With the help of agricultural data, I will get an opportunity to experiment withstreaming images
and weather data captured in the UST greenhouse. Data captured in greenhouse is so much detailed
and gives me experience on working with data from food technology.
3.0 DATASET
3.1 Data Source:
The data source used for this project is the live streaming of weather and moisture data captured using
sensors through Arduino chip and Streamed using Kafka producer.
3.2 Details about Dataset:
The sensor data were captured for every second.
Total number of days of weather data stored in HDFS is 90 days.
Total number of days of moisture data stored in HDFS is 85days.
Total number of days of image data stored is 90 days.
3.3 Sample Data
Weather data
Fig 1: Sensor Weather data from Arduino
Moisture Data
6. 6
Fig 2: Sensor data from Arduino
Image Data
Image data was captured every alternative day over a period of 90 days .
Fig 3: Images from the greenhouse
3.4 Schema
Weather data
Date Time Wind
direction
Wind
Speed
Humidity Temperature Rain Pressure Battery Light
Level
Table 1 : Weather Data Schema
Moisture Data
Date Time Moist
2
Moist
6
Moist
8
Moist
11
Moist
10
Moist
1
Moist
9
Moist
7
Moist
5
Temp Par
Table 2 : Weather Data Schema
7. 7
3.5 Data Description:
Weather Data:
Date & time : Timestamp of the recording
Wind Direction: Direction of wind
Wind Speed: Speed of wind
Wind Gust: Gust of wind
Humidity: Percentage of water in air
Temperature: Temperature
Rain: Rain percentage
Pressure: Air pressure
Battery: Battery of Arduino
Light: Light exposure
Moisture Data
Moist 2: Moisture of plot 2
Moist 6: Moisture of plot 6
Moist 8: Moisture of plot 8
Moist 11: Moisture of plot 5
Moist 10: Moisture of plot 10
Moist 1: Moisture of plot 1
Moist 9: Moisture of plot 9
Moist 7: Moisture of plot 7
Moist 5: Moisture of plot 5
Temp: Soil temperature
PAR: Moisture metrics
8. 8
4.0 PROJECT IMPLEMENTATION
4.1 Data Ingestion using Kafka
Kafka is the distributed messaging system which allows to transmit moisture and weather data from
Arduino chip to the HDFS. Kafka Architecture depends mainly on three components producer, broker and
consumer. Zookeeper is used to monitor the frequency of data following in and out of the Kafka broker.
The Below diagram is the architectural diagram of precision agriculture project. Kafka producer streams
the data that is produced in the greenhouse and sends it to the kafka broker. Kafka producer gets the
addresses of the broker thought zookeeper.
Fig 4: Kafka Architectural Diagram
4.2 Kafka producer
Kafka Producer is sender side of the Kafka distributed messaging system. Producer splits the messages to
their respective topics and sends to brokers based on topics. Producer also gets the address of the Kafka
brokers which is attached to the header of packet while sending the data.
The weather data, moisture data and image data differentiated using different topics such as “weather-data”,
”moisture-data” and “image-data”.
Below is the snippet to set up the Kafka producer with key and value set as string serialization. Bootstrap
server is the broker ID list of the Kafka broker.
9. 9
Fig 5 : Configuring the kafka producer
Below is the snippet that is used to create message object which contains topic and messages to be sent to
the Kafka broker. Send function of Kafka producer binds the Kafka configuration instance with messages,
sends it to the broker.
Fig 6: Sending the message to kafka broker
4.4 Kafka Broker
Kafka Broker is the server side of the kafka distributed messaging system which is capable of handling
hundreds and hundreds of read and write operation per second. It can elastically expand without downtime.
Data Streams are partitioned and spread over a cluster of machines to allow data streams larger than
capability of single machine. The Kafka broker can be monitored using Zookeeper using port number
2181.By default Kafka broker comes with retention period of 168 hours.
10. 10
Fig 7 : Monitoring the messages using Zookeeper
4.5 Kafka Consumer
Kafka Consumer is receiver side of the kafka distributed messaging system that fetches the data topic wise
from the brokers. Consumer runs in cluster and also stores the data in the HDFS for further processing.
Below is the sample consumer code which connects to the PA cluster. Topic set contains list of topics that
we are interested to fetch from the broker.
Fig 8 : Configuring Kafka Consumer
5.0 ADDITIONAL TOOLS
5.1 Maven
Maven was used as the dependency management to bring in all the jar from the server to the local repository.
This dependency injection help to develop the code from the windows environment .Maven helped to
specify the version of spark and kafka that was used and all the jar files related that version of spark was
stored in the local repository.
11. 11
Fig 9 : Dependency Injection
5.2 Scala Build tool
Scala Build tool (SBT) was used to create the package and jar files which was transferred to cluster and vm
using winscp.
Fig 10 : SBT build
5.3 Git
Git is online code repository for storing all the code related to project. It offers all of the distributed
revision control and source code management (SCM). Git was used for precision agriculture project
repository to store the code online and share with team.
Below is the git link for the precision agriculture.
https://github.com/sri303030/Data-Ingestion-using-Kafka
12. 12
6.0 OUTPUT INTERPRETATION
The Streamed data with the help of consumer is send to the HDFS and stored as two different folder to
distinguish between weather data and moisture data.
Below is the output from the weather data folder
Fig 11: weather data folder
Below is output from the moisture data folder
Fig 12: Moisture Data Folder
7.0 IMPROVING THE KAFKA ARCHITECTURE
Kafka Architecture can be improved in two ways:
1. Making kafka architecture more robust.
2. Having dedicated Kafka Broker to improve performance
7.1. Making kafka architecture more robust
In precision agriculture project, both broker and consumer were running on the same system as the
requirement of data ingestion was to store data in HDFS. In order make the architecture more robust,
consumer system must be a remote system or cluster which have the access to kafka broker this way the
architecture will be more robust and in case of failure in kafka broker the data can be retrived from
consumer.
13. 13
7.2. Having dedicated Kafka Broker to improve performance
Kafka Broker runs as part of the cluster in precision agriculture project . In order to avoid noise in the
cluster broker must be a dedicated system or set of systems. It also helps to eradicate the overhead that
kafka broker has got over hadoop environment and speeds up all the processes.
8.0 FUTURE RESEARCH
1. Implement the bridging between HDFS and SparkSQL and store table as persistent data in hive.
2. Implement real time machine learning using Spark Mllib
3. Connect the live data to the reporting tool and analyze live data and create useful reports.
9.0. CONCLUSION
Kafka is rapidly growing distributed messaging system having various application in the field of
engineering. Thus with the help precision agriculture project, agricultural data from greenhouse was
captured and streamed to hadoop environment using kafka and spark. This project also gave me exposure
to handle different big data problems in real time situation and helped me understand kafka architecture.
14. 14
BIBLIOGRAPHY
http://kafka.apache.org/
Rahul Jain (2014) Real time Analytics with Apache Kafka and Apache Spark
Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A System for Real- time Twitter
Sentiment Analysis of 2012 U.S. Presidential Election Cycle. Paper presented at the Proceedings of
the 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Republic of
Korea. http://www.aclweb.org/anthology/P12-3020