Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BDA Presentation

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

K. J.

Somaiya Institute of Engineering & Information Technology, Mumbai


Department of Computer Engineering 
Academic Year 2022-23

Title of the Paper :-


Various approaches to improve MapReduce performance in
Hadoop
Year and Author of the Paper:-
Jisha S Manjaly and Dr.T.Subbulakshmi             Presented
by:- 30- Bhavit Shah

Supervisor : Prof. Mrunali Desai


Topics :-
● Abstract of the Paper
● Introduction
Abstract of the Paper :-
Every day, Petabytes of data are generated from various business communities across
the globe. Generating meaningful insights from this large dataset is a challenging issue.
Bigdata is a combination of homogeneous and heterogeneous data and it can be
structured, un-structured or semi-structured. Hadoop is a framework for processing
Bigdata in a distributed way. MapReduce is an aggregation technique used by Hadoop for
processing this Bigdata. Mainly Map and Reduce are the two stages performing in
MapReduce approach. This paper focuses different MapReduce scheduling techniques
and performance improvement methods in conjunction with Hadoop MapReduce. This
paper also focuses the challenges of various MapReduce approaches in Bigdata
analytics.
Introduction :-
• Bigdata is an important research area in all the fields of research. Bigdata analysis aims to
collect petabytes of data and produce the desired output by applying different algorithms.
• The data can be text, images, audio, video and other types of files. The main
characteristics are volume, velocity, variety and veracity of the data. Velocity determines
the speed of the data to be analyzed, collected or processed.
• Volume indicates the data size and variety indicates the different forms of data. Veracity
reflects the uncertainty of the data and its data value. Today, most of the e-commerce
applications generating Terabytes of data per second.
• Analyzing and processing this Bigdata is quite challenging. In order to process this huge
amount of data, there is a processing model has been introduced which is called
MapReduce. This MapReduce operation can be suitable to run under the distributed
framework called Hadoop.
HADOOP
● Hadoop is a Java based software framework designed for analyzing and processing
huge amount of data for scalable and distributed applications.
● Hadoop has 3 components: Hadoop Common, Hadoop distributed File Systems
(HDFS) and MapReduce.
● HDFS is a file system consist of 64MB size blocks and 3 replicas stored for
processing distributed applications.
● Mainly MapReduce model has two operations called Map and Reduce. MapReduce is
the best optimized technique for processing Bigdata in distributed computing.
● All the tasks are executed in parallel and finally the aggregated results are processed.
This increases the performance compared to the sequential processing of data.
Hadoop 3-layer architecture.
DIFFERENT MAPREDUCE SCHEDULING
TECHNIQUES
Use-case 1: Roadside Infrastructures
● Hundreds of sensors and are integrated with IoT services. Also, the road
infrastructure is also underway with large deployment of connected IoT technologies
(i.e., traffic lights, signs, and road cameras).
● Internet of Vehicles (IoV), will enable these smart vehicles to be able to communicate
with the roadside infrastructure and peer vehicles. Autonomous connected vehicles
and their integration into TOSC will escalate the velocity and veracity of data that is
generated and shared.
● The data from EVs, drivers, EVSE, and infrastructure constitute the big data of EVs,
which requires data analytics tools running on TOSC clouds or edge nodes.
Use-case 2: EV Charging Infrastructure
● drivers to check the status of their EVs and remotely control their charging through mobile
apps. These applications collect vehicle and trip data. EV data mostly come from onboard
sensors (OBSs) and Battery Management Systems (BMSs). State of Charge (SoC) of EV
batteries serves as a key driver for most charging and discharging decisions
● details such as tracing of malfunctioning batteries, and heating and cooling details (Battery
Thermal Management (BTM)) etc, can be recorded by such logs. Based on BMS logs, state of
health (SoH) information can be obtained, and the impact of Vehicle to Grid (V2G) services on
battery life can be estimated.
● IoT/IoV network enables tracking other details such as how much air conditioning is used, or
how a driver accelerates or breaks
Use-case 3: Other Smart City Services
● big data generated from EV fleet dynamics can be used by municipal bodies (Emergency
Response Corporation) to make decisions on stand-alone public charging stations.
● prediction of charging load
● kinds of data have been employed such as road traffic density, distribution of gas stations,
and vehicle ownership. There are also several studies that use travel patterns of taxi fleets in
order to derive optimal routing of charging stations
COMPUTING PLATFORMS FOR SMART ELECTRIC
VEHICULAR NETWORKS
Distributed Cloud Computing
● International Data Corporation’s (IDC’s) visionary presentation on “The Digital
Universe of Opportunities,” the overall created and copied data volume worldwide
was 4.4 zetabyte (ZB) in 2013, which will exceed 44 ZB by 2020.
● volume, the velocity of the data is growing as a result of the advances in
communication technologies and IoT. Distributed cloud models are envisioned to
stimulate the development of storage, execution and analytics framework such
scenario
PLATFORM USED

Software Implemented through Python


A desktop application is implemented using python programming language. Python
includes libraries such as pyaudio to convert speech to text.
– Python 2.7.x is preferred.
– Pycharm community edition compiler.
– Operating System – Ubuntu (Linux).
– ISL/ASL data sets from google.
METHODOLOGY

1. Audio input on a Personal Digital Assistant(PDA) using python PyAudio module.


2. Conversion of audio to text using Google Speech API.
3. Dependency parser for analysing grammatical structure of the sentence and
establishing relationship between words.
4. ISL Generator: ISL of input sentence using ISL grammar rules.
5. Generation of Sign language with signing Avatar.
FUTURE SCOPE
● The system can be extended to incorporate the knowledge of facial expressions
and body language too so that there is a complete understanding of the context
and tone of the input speech.
● A mobile and web based version of the application will increase the reach to more
people.
● Integrating hand gesture recognition system using computer vision for establishing
2-way communication system.
2. GARBAGE DETECTION

● Waste pollution is one of the biggest environmental issues in the modern


world
● Types of waste: Wet , Dry , Recyclable and Non-recyclable.
● Develop a web-application which will scan images and detect whether it is
recyclable or not.
● It will also classify it into dry waste and wet waste.
● This is to help society by segregating waste.
PROBLEM STATEMENT
Many of us are not able to classify whether the waste is recyclable or not or
whether it is dry waste or wet waste. Hence , we are developing this project
so that waste can be managed properly.

OBJECTIVE
● 1. To provide segregation of waste.
● 2. To use it at the time of clean up drives.
TECHNICAL SPECIFICATIONS

● COCO and TACO dataset.


● EfficientDet - For object detection
● CNN model
● Software Implementation through Python
Machine Learning Implementation
● When constructing our classifier, we first transformed each email into a format that will be
suitable for our machine learning algorithm.
● Each of the emails is represented by a vector that contains a value (binary or
continuous) for all the extracted features.
Machine Learning Implementation(Cont…)

● Logistic Regression
● Random Forest
● Neural Networks
● SVM
Conclusion

● Phishing has become a serious threat to global security and economy. The fast rate of
emergence of new phishing websites and distributed phishing attacks has made it
difficult to keep blacklists up to date.
● Due to the rapid change in phishing attack patterns, current phishing detection techniques
need to be greatly enhanced to effectively combat emerging phishing attacks.
● Using this technique will with no doubt enhance the predictive accuracy of a classifier
since effective classification of emails depends on the phishing features identified
during the learning stage of the classification .
References
● https://owasp.org/
● https://heimdalsecurity.com/blog/host-intrusion-detection-system-hids/
● https://ieeexplore.ieee.org.library.somaiya.edu/stamp/stamp.jsp?tp=&arn
umber=8862784
● https://ieeexplore.ieee.org.library.somaiya.edu/document/9562534
    THANK YOU
          

You might also like