LAPSE:2023.11115
Published Article

LAPSE:2023.11115
Big-Data Analysis and Machine Learning Based on Oil Pollution Remediation Cases from CERCLA Database
February 27, 2023
Abstract
The U.S. Environmental Protection Agency’s (EPA) Superfund—the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA) database—has collected and built an open-source database based on nearly 2000 US soil remediation cases since 1980, providing detailed information and references for researchers worldwide to carry out remediation work. However, the cases were relatively independent to each other, so the whole database lacks systematicness and instructiveness to some extent. In this study, the basic features of all 144 soil remediation projects in four major oil-producing states (California, Texas, Oklahoma and Alaska) were extracted from the CERCLA database and the correlations among the pollutant species, pollutant site characteristics and selection of remediation methods were analyzed using traditional and machine learning techniques. The Decision Tree Classifier was selected as the machine learning model. The results showed that the growth of new contaminated sites has slowed down in recent years; physical remediation was the most commonly used method, and the probability of its application is more than 80%. The presence of benzene, toluene, ethylbenzene and xylene (BTEX) substances and the geographical location of the site were the two most influential factors in the choice of remediation method for a specific site; the maximum weights of these two features reaches 0.304 and 0.288.
The U.S. Environmental Protection Agency’s (EPA) Superfund—the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA) database—has collected and built an open-source database based on nearly 2000 US soil remediation cases since 1980, providing detailed information and references for researchers worldwide to carry out remediation work. However, the cases were relatively independent to each other, so the whole database lacks systematicness and instructiveness to some extent. In this study, the basic features of all 144 soil remediation projects in four major oil-producing states (California, Texas, Oklahoma and Alaska) were extracted from the CERCLA database and the correlations among the pollutant species, pollutant site characteristics and selection of remediation methods were analyzed using traditional and machine learning techniques. The Decision Tree Classifier was selected as the machine learning model. The results showed that the growth of new contaminated sites has slowed down in recent years; physical remediation was the most commonly used method, and the probability of its application is more than 80%. The presence of benzene, toluene, ethylbenzene and xylene (BTEX) substances and the geographical location of the site were the two most influential factors in the choice of remediation method for a specific site; the maximum weights of these two features reaches 0.304 and 0.288.
Record ID
Keywords
CERCLA, Machine Learning, oil-contaminated soil, soil remediation
Subject
Suggested Citation
Li H, Zhou Z, Long T, Wei Y, Xu J, Liu S, Wang X. Big-Data Analysis and Machine Learning Based on Oil Pollution Remediation Cases from CERCLA Database. (2023). LAPSE:2023.11115
Author Affiliations
Li H: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Zhou Z: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Long T: State Environmental Protection Key Laboratory of Soil Environmental Management and Pollution Control, Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment, Nanjing 210042, China
Wei Y: School of Computer Science & Engineering, South China University of Technology, Guangzhou 510006, China
Xu J: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Liu S: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China [ORCID]
Wang X: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Zhou Z: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Long T: State Environmental Protection Key Laboratory of Soil Environmental Management and Pollution Control, Nanjing Institute of Environmental Sciences, Ministry of Ecology and Environment, Nanjing 210042, China
Wei Y: School of Computer Science & Engineering, South China University of Technology, Guangzhou 510006, China
Xu J: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Liu S: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China [ORCID]
Wang X: School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266000, China
Journal Name
Energies
Volume
15
Issue
15
First Page
5698
Year
2022
Publication Date
2022-08-05
ISSN
1996-1073
Version Comments
Original Submission
Other Meta
PII: en15155698, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2023.11115
This Record
External Link

https://doi.org/10.3390/en15155698
Publisher Version
Download
Meta
Record Statistics
Record Views
240
Version History
[v1] (Original Submission)
Feb 27, 2023
Verified by curator on
Feb 27, 2023
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2023.11115
Record Owner
Auto Uploader for LAPSE
Links to Related Works
