Water quality index using modified random forest technique: Assessing novel input features

Document Type

Article

Publication Date

1-1-2022

Abstract

Water quality analysis is essential to understand the ecological status of aquatic life. Conventional water quality index (WQI) assessment methods are limited to features such as water acidic or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH3-N), and suspended solids (SS). These features are often insufficient to represent the water quality of a heavy metal-polluted river. Therefore, this paper aims to explore and analyze novel input features in order to formulate an improved WQI. In this work, prospective insights on the feasibility of alternative water quality input variables as new discriminant features are discussed. The new discriminant features are a step toward formulating adaptive water quality parameters according to the land use activities surrounding the river. The results and analysis obtained from this study have proven the possibility of predicting WQI using new input features. This work analyzes 17 new input features, namely conductivity (COND), salinity (SAL), turbidity (TUR), dissolved solids (DS), nitrate (NO3), chloride (Cl), phosphate (PO4), arsenic (As), chromium (Cr), zinc (Zn), calcium (Ca), iron (Fe), potassium (K), magnesium (Mg), sodium (Na), E. coli, and total coliform, in predicting WQI using machine learning techniques. Five regression algorithms-randomforest (RF), AdaBoost, support vector regression (SVR), decision tree regression (DTR), and multilayer perception (MLP)-are applied for preliminary model selection. The results show that the RF algorithm exhibits better prediction performance, with R-2 of 0.974. Then, this work proposes a modified RF by incorporating the synthetic minority oversampling technique (SMOTE) into the conventional RF method. The proposed modified RF method is shown to achieve 77.68%, 74%, 69%, and 71% accuracy, precision, recall, and F1-score, respectively. In addition, the sensitivity analysis is included to highlight the importance of the turbidity variable in WQI prediction. The results of sensitivity analysis highlight the importance of certain water quality variables that are not present in the conventionalWQI formulation.

Keywords

Artificial intelligence, Random forest, Environmental modeling, Alternative inputs, SMOTE

Divisions

biomedengine,sch_ecs,InstituteofBiologicalSciences

Funders

Ministry of Higher Education through MRUN Young Researchers Grant Scheme (MY-RGS),MR001-2019,UM-RU Grant,ST065-2021

Publication Title

CMES-Computer Modeling in Engineering & Sciences

Volume

132

Issue

3

Publisher

Tech Science Press

Publisher Location

871 CORONADO CENTER DR, SUTE 200, HENDERSON, NV 89052 USA

This document is currently not available here.

Share

COinS