LAPSE:2018.0244
Published Article
LAPSE:2018.0244
Principal Component Analysis of Process Datasets with Missing Values
Kristen A. Severson, Mark C. Molaro, Richard D. Braatz
July 31, 2018
Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA), which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.
Keywords
chemometrics, Machine Learning, missing data, multivariable statistical process control, principal component analysis, process data analytics, process monitoring, Tennessee Eastman problem
Suggested Citation
Severson KA, Molaro MC, Braatz RD. Principal Component Analysis of Process Datasets with Missing Values. (2018). LAPSE:2018.0244
Author Affiliations
Severson KA: Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Molaro MC: Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Braatz RD: Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA [ORCID]
[Login] to see author email addresses.
Journal Name
Processes
Volume
5
Issue
3
Article Number
E38
Year
2017
Publication Date
2017-07-06
Published Version
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr5030038, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2018.0244
This Record
External Link

doi:10.3390/pr5030038
Publisher Version
Download
Files
[Download 1v1.pdf] (924 kB)
Jul 31, 2018
Main Article
License
CC BY 4.0
Meta
Record Statistics
Record Views
835
Version History
[v1] (Original Submission)
Jul 31, 2018
 
Verified by curator on
Jul 31, 2018
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2018.0244
 
Original Submitter
Auto Uploader for LAPSE
Links to Related Works
Directly Related to This Work
Publisher Version