Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
16 views

Water Quality Classification Using Machine Learning

The document presents a study on water quality classification using machine learning techniques, specifically comparing Decision Tree, K-Nearest Neighbour, and Support Vector Machine models. The research highlights the inefficiencies of current water quality classification methods and aims to develop a prototype that improves accuracy in classifying water quality. Results indicate that the Decision Tree model outperforms the others with an accuracy of 97.37%.

Uploaded by

bynatchua
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Water Quality Classification Using Machine Learning

The document presents a study on water quality classification using machine learning techniques, specifically comparing Decision Tree, K-Nearest Neighbour, and Support Vector Machine models. The research highlights the inefficiencies of current water quality classification methods and aims to develop a prototype that improves accuracy in classifying water quality. Results indicate that the Decision Tree model outperforms the others with an accuracy of 97.37%.

Uploaded by

bynatchua
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

8th IEEE International Conference and Workshops on Recent Advances and Innovations in Engineering- ICRAIE 2023

(IEEE Record #59459)

Water Quality Classification Using Machine


Learning
2023 IEEE 8th International Conference on Recent Advances and Innovations in Engineering (ICRAIE) | 979-8-3503-1551-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICRAIE59459.2023.10468171

Fikri Firas Tajul Arifin Zanariah Idrus Shamimi A.Halim


College of Computing, Informatics, and College of Computing, Informatics, and College of Computing, Informatics, and
Mathematics Mathematics Mathematics
Universiti Teknologi MARA (UiTM) Universiti Teknologi MARA (UiTM) Universiti Teknologi MARA (UiTM)
Shah Alam Kedah Branch Shah Alam
Shah Alam, Selangor, Malaysia Merbok, Kedah, Malaysia Shah Alam, Selangor, Malaysia
fikri.firas37@gmail.com zanaidrus@uitm.edu.my shamimi134@uitm.edu.my

Ahmad Afif Ahmarofi Khairul Adilah Ahmad


College of Computing, Informatics, and College of Computing, Informatics, and
Mathematics Mathematics
Universiti Teknologi MARA (UiTM) Universiti Teknologi MARA (UiTM)
Kedah Branch Kedah Branch
Merbok, Kedah, Malaysia Merbok, Kedah, Malaysia
ahmadafif@uitm.edu.my adilah475@uitm.edu.my

Abstract— Water quality is crucial as it directly affects the access safe and affordable drinking water by the year 2030.
ecosystem and human health. However, current water quality This goal was designated as a target by the JMP.
classification methods are inefficient because they do not
compare prediction accuracy between machine learning In the field, hydrologists collect water samples from
methods. In this regard, the objective of this study is to classify various water sources such as taps, tube wells, distribution
water quality based on the proposed machine learning tools. To networks, hand pump/dug wells, streams, springs and dams,
fulfill that, a preliminary study was conducted by collecting rivers, and lakes [4]. The water samples collected are kept in
related information in the research domain through articles, a plastic or glass container to be transported to laboratories to
electronic books, and online databases. The data collection for analyze their parameters. To determine water quality, water
the prototype’s dataset was obtained from an electronic book scientists analyze the water samples based on biological,
published by the Pakistan Council of Research in Water physical, and chemical parameters and decide whether or not
Resources 2021. Subsequently, the data pre-processing phase they follow the regulations and standards. Still, unfortunately,
was conducted by using WEKA software which includes the not all countries use the standards set by the WHO. Some
crucial steps to transform the data into a cleaner format and countries have their regulations and standards set for water
make the model more accurate. The model for each technique quality. Thus, having Guidelines for Drinking Water Quality
was developed using Python in Jupyter Notebook. The results of (GDWQ) is not mandatory [5]. The guideline values are not
the accuracy score for each model were also conducted in this
meant to be used entirely as regulations and standards for the
phase. The findings of this research show that the Decision Tree
national or subnational. It is considered impossible to make a
model performs excellently with an accuracy of 97.37%
compared to the Support Vector Machine and K-Nearest
range of conditions that will influence the water parameters
Neighbour models, with an accuracy of 95.69% and 74.72%, for a particular country as each includes different biological,
respectively. Consequently, implementing a multi-class physical, and chemical parameters that are considered
classification system can help future researchers classify more essential for the country’s GDWQ. Some of the factors that
accurately and reduce the misclassification of water quality. affect the difference in the water parameters are the
environmental conditions, agricultural and industrial
Keywords— machine learning, classification, water quality, activities, and the sources that are readily accessible [5].
decision tree
Regulations and standards are the most critical aspects of
I. INTRODUCTION deciding the quality of the water. Water scientists have a tough
and time-consuming task ahead of them when they must
Water is an essential resource for life since it is required classify the quality of the water based on the permissible limits
for the existence of almost all living organisms, including of biological, physical, and chemical parameters when dealing
humans [1]. It is essential to drink clean water since it assists with a large number of water samples. As a result, it is
in the body's elimination of toxins, contributes to the necessary to conduct monitoring and analysis to ascertain
preservation of health, and protects against dangerous whether or not the water source is consumable without risk in
illnesses. Diseases such as cholera, diarrhea, dysentery, one's day-to-day life. Several strategies, including machine
hepatitis A, typhoid, and polio are connected to polluted water learning, the Internet of Things (IoT), and cloud computing,
and inadequate sanitation, as stated by WHO [2][3]. As part have been put into practice to monitor the water's quality
of the Sustainable Development Goal (SDG) number 6, the [6][7][8][9]. However, current water quality classification
Joint Monitoring Programme for Water Supply and Sanitation methods are inefficient because they do not compare
(JMP) set an objective that all people should have the right to prediction accuracy between machine learning methods.

979-8-3503-1551-6/23/$31.00 ©2023 IEEE


Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on December 01,2024 at 09:16:45 UTC from IEEE Xplore. Restrictions apply.
This study aims to implement a method of machine innovation in Industry 4.0 for improved healthcare systems,
learning techniques used to assist in classifying the water agricultural activities, smart cities, and many more. Numerous
quality process. In this regard, the objective of this study is to studies have explored the use of Machine Learning techniques
develop a prototype that can classify the water quality based for predicting water quality [15][16]. Many researchers have
on the highest accuracy of machine learning models, namely employed conventional machine learning models such as
Decision Tree (DT), K-Nearest Neighbour (KNN), and Decision Tree (DT) [17][18], K-Nearest Neighbors (KNN)
Support Vector Machine (SVM). The previous works are [19][20], and Support Vector Machine (SVM) [17][21][22].
discussed in the following section while methodology is Other models such as Artificial Neural Network (ANN)
explained in the subsequent section. Then, the results are [22][23][24], Naïve Bayes (NB) [17][25], Gradient Boosting
further elaborated in the following section. Finally, a [26][27] and Random Forest (RF) [28] also have been adopted
conclusion section is presented in the last section. in classifying water quality.
II. LITERATURE REVIEW A DT model, which represents the classification in the
form of a graphical tree structure, is well-known for its highly
A. Biological Parameter interpretable models and provides an understanding of the
The biological parameter for water uses living organisms water quality classification factors [29][30]. On the other
that exist in the water as an indicator of water safety [10][11]. hand, the presence of local variations within water quality
There are up to a total of 43 microbial parameters that GDWQ datasets makes the KNN model well-suited for water quality
has found. This study will cover Total Coliform Bacteria and classification [20]. As good as DT and KNN model, the SVM
Escherichia coli (E. coli) as defined in the standards and model is effective in dealing with high-dimension datasets.
regulations for Pakistan. The high-dimension element is common in classifying water
quality since the number of attributes and parameters need to
B. Physical Parameter be measured [22][31][32].
The physical parameters represent the physical factor of
the water. It consists of turbidity, color, solids, electrical The capability of ANN model to handle non-linearity
conductivity, temperature, taste, and odor of the water. relationships that occur in data, especially in classification,
has enabled the model to cater for the parameters of non-linear
Besides, research also found that drinking water quality interaction and capture the patterns effectively [22][33]. In
does not prioritize most chemical parameters [5][12]. There addition, the GB model also exhibits the capability to handle
has not been any substantial evidence that chemical exposure non-linearity relationships. The iterative nature of the model
impacted public health. However, in developing regulations in adapting and gradually improving the understanding of
and standards, related authorities need to select which non-linear data relationships will result in accurate [17] and
chemical parameters are the most critical. Related authorities robust classification [34]. On the other hand, the RF model can
will prioritize the chemical only if there is solid evidence that handle missing data that occur in the water quality dataset
their exposure to drinking water affects health and if the without the need for extensive imputation [35]. The model
chemical is found overall in the country's drinking water or provides feature importance measurement to indicate the
sources. The GDWQ does not list certain chemicals as their feature's relevancy.
concentration level is not considered a health concern level.
Nevertheless, the GDWQ still presents values for chemicals F. Pakistan Council of Research in Water Resources 2021
found in drinking water and sources. dataset
The Pakistan Council of Research in Water Resources
C. Water Permissible Limits 2021 dataset is a dataset provided by the council under the
Each of the water parameters has a maximum of Ministry of Science and Technology [36]. The council has
permissible limits set by a nation's regulations and standards played a role as a water sector research and development
of drinking water as a guideline. organization by establishing water testing laboratories in 24
cities around Pakistan. Like many other water quality datasets,
D. Water Regulations and Standards the Pakistan Council of Research in Water Resources 2021
The majority of the nations throughout the world have dataset also represents with number of dataset issues [37] such
different standards and regulations due to the differences in as the imbalance data [38], missing data [39], data non-
environmental conditions, agricultural and industrial linearity [40] and data interpretability [41].
activities, and the sources that are readily accessible [5].
Therefore, there have not been any general guidelines set by Based on the literature, the above stated dataset issues are
the WHO to use globally. well-suited to be handled by DT, KNN and SVM. Therefore,
this research explores the capability of those 3 models to
It was found that the previous method, that is, the resolve the issues and subsequently propose the best model to
laboratory process considered various experimental steps in classify the water quality.
the laboratory. Therefore, this procedure is time-consuming,
uneconomical, and involves hazardous chemicals [6]. In this III. RESEARCH METHODOLOGY
regard, machine learning is implemented to measure water Figure 1 below shows the seven main phases to help the
quality. researcher keep track of achieving the objective of developing
the prototype.
E. Machine Learning
Machine learning represents a branch of Artificial
Intelligence (AI) that enables computers to understand
information automatically without being programmed
precisely [13][14]. Machine learning is a popular technology

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on December 01,2024 at 09:16:45 UTC from IEEE Xplore. Restrictions apply.
Notebook. Subsequently, the cleaned data are utilized in the
development of a classification model using machine learning
as further explained in the following subsection.
Figure 3 below shows the bar chart of water distribution
by its quality. The x-axis represents the category of its
safeness; whether it is safe or unsafe. The y-axis represents the
number of data that are present in the dataset.

Fig.1. Overview Methodology of The Research Phase

The research phase in Figure 1 consists of a preliminary


study, data collection, data preprocessing, model
development, interface design, prototype development, and
documentation. In the stage of the preliminary study, the
indication of water parameters which safe for human
consumption are identified from websites, electronic books,
and related articles. Subsequently, the water quality dataset is
collected from an electronic book entitled “Drinking Water
Quality in Pakistan Current Status and Challenges”, published
Fig. 3. Bar Chart of Water Distribution
by the Pakistan Council of Research in Water Resources
(PCRWR) in 2021. The graph shows that the presence of unsafe is 269 while
safe is 166, with a difference of up to 103 data. The total
After that, data pre-processing is conducted by using the
number of data for both safe and unsafe is 435.
Wakaito Environment for Knowledge Analysis (WEKA) to
remove any outliers. Data pre-processing is an essential After that, Figure 4 shows the descriptive statistics
method in preparing the data for a robust classification summary table for the dataset. The statistic summarises water
outcome. Then, a classification model is developed through data by presenting a collection of crucial statistical measures
machine learning tools such as DT, KNN, and SVM by using in a single place.
Python language and Jupiter Notebook. At this stage, the best
model is considered based on the highest accuracy. Next, the
graphical user interface for the prototype is designed through
Gradio Python Library while Python language and Jupiter
Notebook are utilized once again for the development of a
prototype. Finally, the results from this study are documented
using Microsoft Word.
IV. RESULTS AND DISCUSSION Fig. 4. Summary Of Descriptive Statistics

A. Data Preprocessing The measures included in the descriptive statistic are


The water data collected from the electronic book entitled counted, representing the total number of observations in the
“Drinking Water Quality in Pakistan Current Status and data. From the observation, the number of counts for every
Challenges”, published by the Pakistan Council of Research water parameter is equal, i.e., 435, which means that null
in Water Resources (PCRWR) in 2021 are messy and difficult values are absent. Besides, the measure also indicates the
to perform any modeling and data analysis to uncover a available data for analysis. The mean represents the middle
pattern of helpful information. The related process is done value throughout the data while the standard deviation
using WEKA, a software that provides tools to perform data measures how the data spread out. From the observation, the
pre-processing. The results of the data pre-processing for lowest data spread out is the pH, with a value of 0.35, and the
water data using WEKA are presented in Figure 2. highest is the Total Dissolved Solids (TDS), with a value of
1626.22. The minimum (min) and maximum (max) show the
lowest and highest values in the data. Moreover, the quartiles
25%, 50%, and 75% represent a way of dividing the dataset
into parts or quarters into Quartile 1, Quartile 2, and Quartile
3, respectively. Quartile 1 is the median lower half of the data,
Quartile 2 is the median, and Quartile 3 is the median upper
half. This measure can identify the distribution of values
within the dataset and identify the outliers.
Subsequently, Figure 5 shows the box and whiskers plot
of feature distribution by the water quality.

Fig. 2. Cleaned Data Set

The cleaned dataset is then saved and exported as a


comma-separated values file format to analyze it in Jupyter

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on December 01,2024 at 09:16:45 UTC from IEEE Xplore. Restrictions apply.
variable would also increase. However, the correlation
coefficient of 0 indicates that there are no correlations between
the two water parameters. From observation of the matrix,
Total Dissolved Solids correlate with Hardness and Chloride
with a value of 0.98 and 0.97, respectively. Hardness and
Escherichia Coli show no correlation involved, as the value
achieved is 0.00. Finally, the water parameters that indicate
the highest negative correlations are pH values with Hardness
and Arsenic with a value of -0.17.
B. Model Development
For DT, KNN, and SVM techniques, the setting of default
parameters are presented in Table 1 as follows.
TABLE 1. DEFAULT PARAMETERS FOR DT, KNN, AND
SVM

Techniques Default Parameters


max_depth : None
Fig. 5. Box And Whiskers Plot Of Feature Distribution criterion: Gini
Decision tree (DT)
max_features : None
The figure is a graphical representation of the dataset that min_samples_split : 2
uses the median, quartiles, and outliers to display the n_neighbors : 5
weights: uniform
distribution of the data. The box of the boxplot represents the K-nearest neighbor (KNN) algorithm: auto
interquartile range (IQR), which is the range between the first metric: Minkowski
and third quartiles (Q1 and Q3). The line inside the box leaf_size : 30
represents the median of the dataset. The whiskers of the Support Vector Machine (SVM)
C: 1.0
boxplot extend from the box to the minimum and maximum kernel: RBF
values of the dataset, with any data points beyond the whiskers
being considered outliers. Based on an observation of Figure Figure 7 compares the accuracy results for DT, KNN, and
5, the majority of the water parameters show a very significant SVM using the default parameters. The techniques used two
difference in values from the median to the outliers. This split ratios for comparison, which are the 70:30 split ratio,
shows that most of the value of parameters in the dataset is 70% of the data is used for training, 30% is used for testing,
widely spread apart. and the 80:20 split ratio, where 80% of the data is used for
Figure 6 shows the correlation between each water training, and 20% used for testing.
parameter for this project. The matrix helps to understand
which of the water parameters are strongly related to each
other and would able to determine which parameter is more
critical in maintaining overall water quality.

Fig. 7. Comparison of Accuracy Between DT, KNN, and SVM

From the observation, DT and SVM achieved the highest


accuracy for an 80:20 split ratio with values of 96.55% and
68.97%. For 70:30, the techniques only achieved 96.18% and
60.31%, respectively. The findings of the result are likely
Fig. 6. Correlation Matrix because more data is available for training the technique in the
In the matrix layout, rows and columns represent each 80:20 split ratio. With more data, the technique can better
water parameter. The value at the intersection of a row and a learn underlying patterns and relationships in the data, thus
column is the correlation coefficient between the two leading to a better accuracy result on the test set.
variables. The correlation value is within the range between - In achieving the most optimal parameters for the three
1 and 1. The value measures the strength and direction of the techniques, i.e., DT, KNN, and SVM, a hyperparameter
linear relationship between the variables. A correlation tuning process is conducted to adjust each of its parameters.
coefficient of -1 indicates a perfect negative correlation, which The project uses the “GridSearchCV” technique that trains
means that if the value of one water parameter increases, the and finds a wide range of possible hyperparameter
variable related will decrease. If a correlation coefficient is 1, combinations. The technique evaluates the best performance
it indicates a perfect positive correlation, which means that if using accuracy using the tuned technique.
the value of one water parameter increases, the related

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on December 01,2024 at 09:16:45 UTC from IEEE Xplore. Restrictions apply.
Figure 8 shows the accuracy comparison of the DT 95.69%. While for the lowest is the default parameter 70:30
technique in using default and after the hyperparameter split ratio with an accuracy of 60.31%.
tuning.
After observing the accuracy comparison between all
techniques, the DT is proven to be the best model as it
maintains the highest in classifying water quality with an
accuracy of 97.37%. The result proves that the DT performs
well on datasets that consist of two class labels. Therefore, the
technique is used for the primary model development of this
project. Figure 11 shows the performance evaluation using the
Confusion Matrix for the DT model.

Fig. 8. Comparison of DT Accuracy

Based on the observation, the highest accuracy obtained is


from the hyperparameter tuning 70:30 split using the optimal
tuning {criterion: entropy, max_depth: None, max_features:
None, min_samples_split: 6} and achieving an accuracy of
97.37%. While for the lowest is the default parameter 70:30
split ratio with an accuracy of 96.18%.
Figure 9 presents the accuracy comparison of the KNN
technique using default and after the hyperparameter tuning. Fig. 11. Confusion Matrix of The DT Technique

The figure shows that the model performs well as it can


classify 48 True Positive (TP) and 77 True Negative (TN)
corresponding to their class correctly. In contrast, incorrect
classifying shows as lowest as 3 False Positive (FP) and 3
False Negative (FN).
V. CONCLUSION
The objective of this research, i.e., the development
machine learning model to classify water quality has been
achieved. The model for each technique, i.e., DT, KNN, and
SVM is compared in terms of their accuracy performance
through the process of hyperparameter tuning by adjusting the
Fig. 9. Comparison of KNN Accuracy parameters for each model to boost the accuracy level. From
Based on the observation, the highest accuracy obtained is the comparison phase, the DT model obtained the highest
from the hyperparameter tuning 80:20 split using the optimal results of accuracy (97.37%). Consequently, the DT model
tuning {algorithm: auto, leaf_size: 1, metric: Euclidean, was chosen as the best model to classify water quality. For
n_neighbors: 4, weights: distance} and achieving an accuracy future works, testing and comparing other machine learning
of 74.72%. While the lowest is the default parameter 70:30 techniques such as Fuzzy logic and Evolutionary Algorithms
split ratio with an accuracy of 68.70%. can be considered to observe if there is an improvement in
accuracy score. Furthermore, implementing a multi-class
Figure 10 indicates the accuracy comparison of the SVM classification system can guide researchers in reducing the
technique using default and after the hyperparameter tuning. misclassification of water quality for a better life.
REFERENCES
[1] Haq, Muhammad I’tikafi Khoirul, Fauzian Dwi Ramadhan, Fatimah
Az-Zahra, Linda Kurniawati, and Afrida Helen. "Classification of
water potability using machine learning algorithms." In 2021
International Conference on Artificial Intelligence and Big Data
Analytics, pp. 1-5. IEEE, 2021.
[2] WHO. 2022a. "Drinking Water." https://www.who.int/news-
room/fact-sheets/detail/drinking-water.
[3] Alijanzadeh Maliji, B., Babayeemehr, A., Rohani, K., Mehrabani, S.,
& Aghajanpour, F. (2023). Role of the World Health Organization in
Management of Gastrointestinal Diseases Caused by Contaminated
Water in Children in the Middle East: A Review Article. Journal of
Fig. 10. Comparison of SVM Accuracy Pediatrics Review, 11(1), 59-66.
Based on the observation, the highest accuracy obtained is [4] Rasheed, H., F. Altaf, K. Anwaar, and M. Ashraf. "Drinking Water
Quality in Pakistan: Current Status and Challenges. Pakistan Council
from the hyperparameter tuning 80:20 split using the optimal of Research in Water Resources (PCRWR), Islamabad." All rights
tuning {C:1, kernel: linear} and achieving an accuracy of reserved by PCRWR. The authors encourage fair use of this material
for non-commercial purposes with proper citation (2021): 141.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on December 01,2024 at 09:16:45 UTC from IEEE Xplore. Restrictions apply.
[5] WHO. 2018b. "Developing drinking water regulations and standards. [24] Najwa Mohd Rizal, N., Hayder, G., Mnzool, M., Elnaim, B. M.,
General guidance with a special focus on countries with limited Mohammed, A. O. Y., & Khayyat, M. M. (2022). Comparison between
resources." In Routledge Handbook of Water and Health, 1–68. regression models, support vector machine (SVM), and artificial neural
http://apps.who.int/iris/bitstream/handle/10665/272969/97892415139 network (ANN) in river water quality prediction. Processes, 10(8),
44-eng.pdf. 1652.
[6] Ragi, Nikhil M., Ravishankar Holla, and G. Manju. "Predicting water [25] Ilić, M., Srdjević, Z., & Srdjević, B. (2022). Water quality prediction
quality parameters using machine learning." In 2019 4th International based on Naïve Bayes algorithm. Water Science and Technology,
Conference on Recent Trends on Electronics, Information, 85(4), 1027-1039.
Communication & Technology (RTEICT), pp. 1109-1112. IEEE, [26] Malek, N. H. A., Wan Yaacob, W. F., Md Nasir, S. A., & Shaadan, N.
2019. (2022). Prediction of water quality classification of the Kelantan River
[7] Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023). Basin, Malaysia, using machine learning techniques. Water, 14(7),
Performance analysis of the water quality index model for predicting 1067.
water state using machine learning techniques. Process Safety and [27] Priyadarshini, I., Alkhayyat, A., Obaid, A. J., & Sharma, R. (2022).
Environmental Protection, 169, 808-828. Water pollution reduction for sustainable urban development using
[8] Sarker, Iqbal H., Asif Irshad Khan, Yoosef B. Abushark, and Fawaz machine learning techniques. Cities, 130, 103970.
Alsolami. "Internet of Things (IoT) security intelligence: a [28] Cengiz, A. V. C. I., Budak, M., Yağmur, N., & Balçik, F. (2023).
comprehensive overview, machine learning solutions and research Comparison between random forest and support vector machine
directions." Mobile Networks and Applications (2022): 1-17. algorithms for LULC classification. International Journal of
[9] Azrour, M., Mabrouki, J., Fattah, G., Guezzaz, A., & Aziz, F. (2022). Engineering and Geosciences, 8(1), 1-10.
Machine learning algorithms for efficient water quality prediction. [29] Gakii, C., & Jepkoech, J. (2019). A classification model for water
Modeling Earth Systems and Environment, 8(2), 2793-2801. quality analysis using decision tree.
[10] WHO. 2018a. "A global overview of national regulations and standards [30] Park, J., Lee, W. H., Kim, K. T., Park, C. Y., Lee, S., & Heo, T. Y.
for drinking-water quality." Verordnung Über Die Qualitä t von (2022). Interpretation of ensemble learning to predict water quality
Wasser Für Den Menschlichen Gebrauch (Trinkwasserverordnung - using explainable artificial intelligence. Science of the Total
TrinkwV Environment, 832, 155070.
[11] Partyka, M. L., & Bond, R. F. (2022). Wastewater reuse for irrigation [31] Ahmarofi, A.A., Kassa, F.M., Ishak, M. K. (2021). "Predicting the
of produce: a review of research, regulations, and risks. Science of the Cycle Time at a Production Line Through the Development of the 3-3-
Total Environment, 828, 154385. 1 Multilayer Perceptron Artificial Neural Networks with Formulated
[12] Han, X., Liu, X., Gao, D., Ma, B., Gao, X., & Cheng, M. (2022). Costs Momentum Rate." In Intelligent Manufacturing and Mechatronics:
and benefits of the development methods of drinking water quality Proceedings of SympoSIMM 2020, pp. 165-173. Singapore: Springer
index: A systematic review. Ecological Indicators, 144, 109501. Singapore, 2021.
[13] Sarker, Iqbal H. "Machine learning: Algorithms, real-world [32] Ahmarofi, A. A., Ramli, R., Abidin, N. Z., Jamil, J. M., & Shaharanee,
applications, and research directions." SN computer science 2, no. 3 I. N. (2020). Variations on the number of hidden nodes through
(2021): 160. Tiada ref 7 in text multilayer perceptron networks to predict the cycle time. Journal of
[14] Gordan, Meisam, Saeed-Reza Sabbagh-Yazdi, Zubaidah Ismail, Information and Communication Technology, 19(1), 1-19.
Khaled Ghaedi, Páraic Carroll, Daniel McCrum, and Bijan Samali. https://doi.org/10.32890/jict2020.19.1.1
"State-of-the-art review on advancements of data mining in structural [33] Patil, D., Kar, S., & Gupta, R. (2023). Classification and Prediction of
health monitoring." Measurement 193 (2022): 110939. Developed Water Quality Indexes Using Soft Computing Tools. Water
[15] Sharifani, K., & Amini, M. (2023). Machine Learning and Deep Conservation Science and Engineering, 8(1), 16.
Learning: A Review of Methods and Applications. World Information [34] Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., & Thuy, N. T. D.
Technology and Engineering Journal, 10(07), 3897-3904. (2022). Using machine learning models for predicting the water quality
[16] Ghobadi, F., & Kang, D. (2023). Application of Machine Learning in index in the La Buong River, Vietnam. Water, 14(10), 1552.
Water Resources Management: A Systematic Literature Review. [35] Mamat, N., & Razali, S. F. M. (2023). Comparisons of Various
Water, 15(4), 620. Imputation Methods for Incomplete Water Quality Data: A Case Study
[17] Nasir, N., Kansal, A., Alshaltone, O., Barneih, F., Sameer, M., of The Langat River, Malaysia. Jurnal Kejuruteraan, 35(1), 191-201.
Shanableh, A., & Al-Shamma'a, A. (2022). Water quality classification [36] Pakistan Council of Research in Water Recourses. (2021). PCRWR
using machine learning algorithms. Journal of Water Process Annual Report 2020-2021.
Engineering, 48, 102920. [37] Rasool, U., Yin, X., Xu, Z., Rasool, M. A., Senapathi, V., Hussain, M.,
[18] Gorgan-Mohammadi, F., Rajaee, T., & Zounemat-Kermani, M. (2023). ... & Trabucco, J. C. (2022). Mapping of groundwater productivity
Decision tree models in predicting water quality parameters of potential with machine learning algorithms: A case study in the
dissolved oxygen and phosphorus in lake water. Sustainable Water provincial capital of Baluchistan, Pakistan. Chemosphere, 303,
Resources Management, 9(1), 1. 135265.
[19] Nababan, A. A., Khairi, M., & Harahap, B. S. (2022). Implementation [38] Ahmed, M., Mumtaz, R., & Hassan Zaidi, S. M. (2021). Analysis of
of K-Nearest Neighbors (KNN) algorithm in classification of data water quality indices and machine learning techniques for rating water
water quality. Jurnal Mantik, 6(1), 30-35. pollution: A case study of Rawal Dam, Pakistan. Water Supply, 21(6),
[20] Juna, A., Umer, M., Sadiq, S., Karamti, H., Eshmawi, A. A., Mohamed, 3225-3250.
A., & Ashraf, I. (2022). Water quality prediction using KNN imputer [39] Khan, M. T., Shoaib, M., Hammad, M., Salahudin, H., Ahmad, F., &
and multilayer perceptron. Water, 14(17), 2592. Ahmad, S. (2021). Application of machine learning techniques in
[21] Derdour, A., Jodar-Abellan, A., Pardo, M. Á., Ghoneim, S. S., & rainfall–runoff modelling of the soan river basin, Pakistan. Water,
Hussein, E. E. (2022). Designing Efficient and Sustainable Predictions 13(24), 3528.
of Water Quality Indexes at the Regional Scale Using Machine [40] Farooq, M. U., Zafar, A. M., Raheem, W., Jalees, M. I., & Aly Hassan,
Learning Algorithms. Water, 14(18), 2801. A. (2022). Assessment of algorithm performance on predicting total
[22] Shamsuddin, I. I. S., Othman, Z., & Sani, N. S. (2022). Water quality dissolved solids using artificial neural network and multiple linear
index classification based on machine learning: A case from the Langat regression for the groundwater data. Water, 14(13), 2002.
River Basin model. Water, 14(19), 2939. [41] Adnan, R. M., Mostafa, R. R., Elbeltagi, A., Yaseen, Z. M., Shahid, S.,
[23] Oğuz, A., & Ertuğrul, Ö. F. (2023). A survey on applications of & Kisi, O. (2022). Development of new machine learning model for
machine learning algorithms in water quality assessment and water streamflow prediction: Case studies in Pakistan. Stochastic
supply and management. Water Supply, 23(2), 895-922. Environmental Research and Risk Assessment, 1-35.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on December 01,2024 at 09:16:45 UTC from IEEE Xplore. Restrictions apply.

You might also like