Machine Learning and Data Analytics For Environment
Machine Learning and Data Analytics For Environment
E-mail: tharsanee@bitsathy.ac.in
Abstract. Innovations in Machine Learning and Data Analytics can possibly affect numerous
aspects of Environmental Science (ES). Data Analytics refers to a collection of data resources
indicated in terms of variety, velocity, veracity and volume. Big data contributes to the ES
arena in applications such as weather forecasting, energy sustainability and disaster
management with the advent of techniques such as Remote Sensing, Information and
Communication technologies. Though big data is used to accomplish data analysis and
interpretation for ES, there are still requirements for efficient ways of data storage, processing
and retrieval. Machine Learning and Deep Learning are the sub fields of artificial intelligence
which deals with training the models to learn from data without being explicitly programmed.
When Machine Learning and Deep Learning are combined together it is possible to unleash
the supremacy of data analytics. These techniques show high prospective for process
optimization, information-centric decision making and scientific discovery. Scientific
developments like these will assist ES to make real time autonomous decisions by extracting
useful insights from huge data. These advancements also aid in bridging the gap between the
theoretical backgrounds on ES to practical implementation. The primary objective of this
survey is to figure out the basic concepts of Machine Learning, Deep Learning, and Data
Analytics and find the state-of-the-art applications in ES, and observe the impending benefits
of information-centric investigation on ES.
1. Introduction
In the early 1990’s, the environmental science was strived by physical, chemical and different natural
resources that makes the living things to interact with themselves. Environmental data is growing
faster as it leads to major issues like complexity, size and resolution. In dealing interdisciplinary
problems such as the intelligence to process the data, analyzing the large volume of data was
experienced by Environmental Scientists. Extracting heterogeneous data from different sources to
perform integrated analysis and acquire knowledge, requires a wide wisdom over Data Science. The
popularity of Data Science techniques helps in managing the environmental systems. Data Science is
used to model different scenarios and protocols, leading to data induced innovation in business sectors.
Environmental scientists are forced to think in a way, the various scientific interdisciplinary problems
would be solved by the existing or innovative data science approaches. Data Science adds value to the
Environmental Sciences as it provides a realistic approach [1].
A stream of data from the environmental sensors has to be analyzed to obtain beneficial information
in Data Science. The data are acquired through remote sensing satellites, information from satellites
and sensors to measure air quality, water quality through weather and climatic observations, ground
based sensors which also provide information to measure the magnitude of earth-quake and other
geographical events. To extract information from the sensors or from satellite data, different meta-
heuristic bio inspired algorithms were used. The formulation approach in processing or extracting the
information from remote sensing satellite data or sensor data requires training, methods and efficient
tools to proceed with. The significant challenge and the rapid growth of Environmental Science have
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
paved the way for research by applying different approaches in data science techniques. The following
challenges were identified as part of the investigation on complications in environmental data [2].
● Demand for trained/experienced data scientist for the environmental study
● A gap in identifying the proper approach for building a framework
● Lack of knowledge in figuring out the environmental issues.
● Robust way in dealing with environmental data.
● Optimal techniques to extract information.
● Protocols to prefer the right method for analysis.
● Information exchange and preserving of data
● Storage cost and Energy Consumption.
Machine learning techniques, a futuristic approach in data mining emerged from Artificial
Intelligence and are now widely used in the field of environmental science. Machine Learning
techniques are applied in processing satellite data, climatic predictions, forecasting, designing and
examining of environmental data. To extract valuable information from the above said data we need a
realistic and modernized approach which includes linear statistical analysis, time series analysis, Feed
Forward Neural Network Models, Non - Linear optimization, Generalization Learning, Classification
Models, Regression Models, Principal Component analysis and correlation analysis, these models will
help in formulating an optimal approach to the various environmental science problems [3]. Deep
Learning, a subset of machine learning uses different data structures often termed as Artificial Neural
Networks. A different Deep Learning model provides a feasible approach, specialized computational
tools for benefits of environmental science, deep learning allows the environmental data scientist to
concentrate more on pre-processing tasks and optimized capabilities for real-time large scale
integrated environmental monitoring. In this paper, a brief explanation on different machine learning
methods and deep learning algorithms were given, that could be suggested for implementing those for
various Environmental Data Analysis and its applications [4].
2
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
in several classification as well as regression type of problems [2]. In this algorithm, input space in the
dataset is assigned to preferably high dimensional feature space to easily solve the classification
problems. In the simplest sense, this algorithm can be viewed as a machine which divides the dataset
into parts to make it more precise to derive meaningful patterns. Thus, it forms a decision boundary in
order to place the data points according to its respective class. This algorithm is more suitable for
classification problems containing a linear separation between the data points.
3
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
4
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
5
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
several inputs with different features interconnected with each other which will contribute towards
producing the output. So, it is a great challenge to deal with this type of complexity, thus data
analytics combined with machine learning and deep learning techniques works well to analyze and
derive useful insights from complex environmental data.
6
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
Figure 3. Deep learning framework for analysis of remote sensing data [9]
Deep learning architectures like Convolution Networks (ConvNets), sparse AutoEncoders (AE) and
Deep Belief Networks (DBN) are considered to be hierarchical frameworks which can learn high-level
feature representations in the deep layers automatically generated by the features learned in the
shallow layers. In case of target recognition, when a new target is given as input, the frameworks can
automatically extract the low level features and high-level features and then classification is performed
by classification algorithm to find whether the proposal is the target or not [9].
5.5. Agriculture
It is highly evident that agriculture plays a significant role in improving the country’s economy.
Automation in agriculture is one such emerging area that helps people to improve farming in terms of
irrigation, crop establishment and monitoring, weeding to decrease the use of pesticides and disease
identification in plants. CNN has a huge role in agriculture applications like plant disease
identification. The images of infected plants and healthy plants are captured in a camera and stored in
a database with labels. CNN is trained to learn the features of infected plants and healthy plants. The
various phases involved in discovery of plant disease are: Data collection, Data cleaning (pre-
7
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
processing), image segmentation, feature removal and classification. When a new image is given as
input, CNN can predict whether the new sample belongs to a healthy or unhealthy plant. In addition ,
CNN and its frameworks like AlexNet, LeNet, CaffeNet, GoogleNet are widely utilised in leaf disease
prediction, land cover and crop type categorisation, soil and root segmentation, fruit counting and crop
yield estimation [11].
Chatbots or Digital Assistants are another milestone of AI that helps people in establishing
communication by messaging in natural languages. Chatbots are widely employed in various services
in websites and mobile applications. Similarly, agri chatbots are designed to help farmers by resolving
their queries in terms of advice and recommendations. AI enabled Agricultural drones are effectively
used in monitoring crops, spraying pesticides and herbicides, health assessment and soil analysis. AI
based agriculture controls the surplus of water, herbicides and pesticides, preserves soil fertility and
helps the farmers in improving the quality and increased productivity of crops [13].
It is inferred that Artificial Intelligence can be well utilized for several environmental science use
cases. ES deals with huge amount of data in image formats; hence Convolution Neural Network can
be the suitable deep learning framework to extract the hidden features and classification algorithms
like Support Vector Machine can be used to classify the extracted features. Other frameworks like
Recurrent Neural Network, Autoencoders, Boltzmann machines can be used to extract useful informal
from data sources represented in formats other than images. The deep learning models are expected to
perform more sophisticated as in when the data increases. Numerous datasets like geodata,
environment, geospatial, water, and climate are available online, that can be tested across different DL
frameworks and ML classification techniques. The self-learning capabilities of deep learning and
visualization techniques of data analytics, acts as a foundation for automatic categorization of spatial
and pictorial data involved in various perspective of environmental science.
6. Conclusion
The proposed paper gives an informative survey on environmental science and reveals that the
dynamic impact of environmental change would be a great challenge and those could be faced and
would be able to obtain the optimal solution with the emerging fields of Data Science, Machine
Learning and Deep Learning. Environmental Data Science offers a profound understanding of how the
8
FIC-SISTEEM-2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 955 (2020) 012107 doi:10.1088/1757-899X/955/1/012107
sensed environment data has to be processed and helps us to overcome the different environmental
hazards. Gathering of data will continue to increase in accordance with our ability to analyse the data.
As a part of the survey, the existing machine learning and deep learning algorithms were discussed,
among those algorithms a few were implemented to overcome diverse environmental issues. Marching
towards the smart world, there is an increasing need to ratify modern technologies, such as Artificial
Intelligence (AI) that reforms the environmental risks. AI supports society in handling the high impact
environmental risks which have a diverse effect on industries and human health. The most essential is
to bring the various research regimens (Machine Learning, Deep Learning, Environmental Data
Science, Health etc.) into a circle of discussion to frame the policies to address and resolve the
environmental challenges.
References
[1] KarinaGibert, Jeffery S.Horsburgh, Ioannis N Athanasiadis and Geoff Holmes 2018
Environmental Data Science
[2] Hsieh, William. 2009 Machine learning in the environmental sciences. Neural networks and
kernels.
[3] Hsieh 2009 Cambridge: Cambridge University Press. pp. 274-317.
[4] AakashLamba, PhillipCassey, Ramesh RajaSegaran Lian PinKoh 2019 Deep learning for
environmental conservation
[5] Alexander Y Sun, Bridget R Scanlon 2019 IOP Publishing Ltd, Environmental Research Letters
14(7).
[6] Hino, M., Benami, E. & Brooks, N.2018 Nat Sustain 1 pp.583–588.
[7] Manish Kumar Goyal, Chandra S. P. Ojha, and Donald H. Burn 2017 Sustainable water
resources management pp.165-178.
[8] Qiangqiang Yuan, Huanfeng Shen, Tongwen Li, Zhiwei Li, Shuwen Li, Yun Jiang, Hongzhang
Xu, Weiwei Tan, Qianqian Yang, Jiwen Wang, Jianhao Gao, Liangpei Zhang 2020 Remote
Sensing of Environment 241.
[9] L. Zhang, L. Zhang and B. Du 2016 IEEE Geoscience and Remote Sensing Magazine 4(2) pp.
22-40.
[10] Pereira, Tonismar dos S. et al 2018 Eng. Agríc. [online].38, pp.142-148.
[11] Kamilaris A, Prenafeta-Boldú F X 2018 the Journal of Agricultural Science pp.1–11.
[12] Saleem, Muhammad Hammad and Potgieter, Johan and Arif, Khalid Mahmood 2019 Plants
8(11) pp. 468.
[13] Tanha Talaviya, Dhara Shah, Nivedita Patel, Hiteshri Yagnik, Manan Shah 2020 Artificial
Intelligence in Agriculture 4, pp. 58-73.
[14] Blair Gordon S., Henrys Peter, Leeson Amber, Watkins John, Eastoe Emma, Jarvis Susan,
Young Paul J. 2019 Frontiers in Environmental Science 7 pp. 121.