LAPSE:2023.6890
Published Article

LAPSE:2023.6890
Development of a Water Quality Event Detection and Diagnosis Framework in Drinking Water Distribution Systems with Structured and Unstructured Data Integration
February 24, 2023
Abstract
Recently, various detection approaches that identify anomalous events (e.g., discoloration, contamination) by analyzing data collected from smart meters (so-called structured data) have been developed for many water distribution systems (WDSs). However, although some of them have showed promising results, meters often fail to collect/transmit the data (i.e., missing data) thus meaning that these methods may frequently not work for anomaly identification. Thus, the clear next step is to combine structured data with another type of data, unstructured data, that has no structural format (e.g., textual content, images, and colors) and can often be expressed through various social media platforms. However, no previous work has been carried out in this regard. This study proposes a framework that combines structured and unstructured data to identify WDS water quality events by collecting turbidity data (structured data) and text data uploaded to social networking services (SNSs) (unstructured data). In the proposed framework, water quality events are identified by applying data-driven detection tools for the structured data and cosine similarity for the unstructured data. The results indicate that structured data-driven tools successfully detect accidents with large magnitudes but fail to detect small failures. When the proposed framework is used, those undetected accidents are successfully identified. Thus, combining structured and unstructured data is necessary to maximize WDS water quality event detection.
Recently, various detection approaches that identify anomalous events (e.g., discoloration, contamination) by analyzing data collected from smart meters (so-called structured data) have been developed for many water distribution systems (WDSs). However, although some of them have showed promising results, meters often fail to collect/transmit the data (i.e., missing data) thus meaning that these methods may frequently not work for anomaly identification. Thus, the clear next step is to combine structured data with another type of data, unstructured data, that has no structural format (e.g., textual content, images, and colors) and can often be expressed through various social media platforms. However, no previous work has been carried out in this regard. This study proposes a framework that combines structured and unstructured data to identify WDS water quality events by collecting turbidity data (structured data) and text data uploaded to social networking services (SNSs) (unstructured data). In the proposed framework, water quality events are identified by applying data-driven detection tools for the structured data and cosine similarity for the unstructured data. The results indicate that structured data-driven tools successfully detect accidents with large magnitudes but fail to detect small failures. When the proposed framework is used, those undetected accidents are successfully identified. Thus, combining structured and unstructured data is necessary to maximize WDS water quality event detection.
Record ID
Keywords
anomaly detection, framework, structured and unstructured data integration, water distribution system, water quality, water quality event
Subject
Suggested Citation
Kim T, Jung D, Yoo DG, Hong S, Jun S, Kim JH. Development of a Water Quality Event Detection and Diagnosis Framework in Drinking Water Distribution Systems with Structured and Unstructured Data Integration. (2023). LAPSE:2023.6890
Author Affiliations
Kim T: Department of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Republic of Korea [ORCID]
Jung D: School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Republic of Korea [ORCID]
Yoo DG: Department of Civil Engineering, The University of Suwon, Hwaseong-si 18323, Republic of Korea
Hong S: Division of Data Science, The University of Suwon, Hwaseong-si 18323, Republic of Korea [ORCID]
Jun S: Hyper-Converged Forensic Research Center for Infrastructure, Korea University, Seoul 02841, Republic of Korea [ORCID]
Kim JH: School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Republic of Korea [ORCID]
Jung D: School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Republic of Korea [ORCID]
Yoo DG: Department of Civil Engineering, The University of Suwon, Hwaseong-si 18323, Republic of Korea
Hong S: Division of Data Science, The University of Suwon, Hwaseong-si 18323, Republic of Korea [ORCID]
Jun S: Hyper-Converged Forensic Research Center for Infrastructure, Korea University, Seoul 02841, Republic of Korea [ORCID]
Kim JH: School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, Republic of Korea [ORCID]
Journal Name
Energies
Volume
15
Issue
24
First Page
9300
Year
2022
Publication Date
2022-12-08
ISSN
1996-1073
Version Comments
Original Submission
Other Meta
PII: en15249300, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2023.6890
This Record
External Link

https://doi.org/10.3390/en15249300
Publisher Version
Download
Meta
Record Statistics
Record Views
396
Version History
[v1] (Original Submission)
Feb 24, 2023
Verified by curator on
Feb 24, 2023
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2023.6890
Record Owner
Auto Uploader for LAPSE
Links to Related Works
[0.83 s]
