LAPSE:2023.11724
Published Article

LAPSE:2023.11724
Physics-Based Method for Generating Fully Synthetic IV Curve Training Datasets for Machine Learning Classification of PV Failures
February 27, 2023
Abstract
Classification machine learning models require high-quality labeled datasets for training. Among the most useful datasets for photovoltaic array fault detection and diagnosis are module or string current-voltage (IV) curves. Unfortunately, such datasets are rarely collected due to the cost of high fidelity monitoring, and the data that is available is generally not ideal, often consisting of unbalanced classes, noisy data due to environmental conditions, and few samples. In this paper, we propose an alternate approach that utilizes physics-based simulations of string-level IV curves as a fully synthetic training corpus that is independent of the test dataset. In our example, the training corpus consists of baseline (no fault), partial soiling, and cell crack system modes. The training corpus is used to train a 1D convolutional neural network (CNN) for failure classification. The approach is validated by comparing the model’s ability to classify failures detected on a real, measured IV curve testing corpus obtained from laboratory and field experiments. Results obtained using a fully synthetic training dataset achieve identical accuracy to those obtained with use of a measured training dataset. When evaluating the measured data’s test split, a 100% accuracy was found both when using simulations or measured data as the training corpus. When evaluating all of the measured data, a 96% accuracy was found when using a fully synthetic training dataset. The use of physics-based modeling results as a training corpus for failure detection and classification has many advantages for implementation as each PV system is configured differently, and it would be nearly impossible to train using labeled measured data.
Classification machine learning models require high-quality labeled datasets for training. Among the most useful datasets for photovoltaic array fault detection and diagnosis are module or string current-voltage (IV) curves. Unfortunately, such datasets are rarely collected due to the cost of high fidelity monitoring, and the data that is available is generally not ideal, often consisting of unbalanced classes, noisy data due to environmental conditions, and few samples. In this paper, we propose an alternate approach that utilizes physics-based simulations of string-level IV curves as a fully synthetic training corpus that is independent of the test dataset. In our example, the training corpus consists of baseline (no fault), partial soiling, and cell crack system modes. The training corpus is used to train a 1D convolutional neural network (CNN) for failure classification. The approach is validated by comparing the model’s ability to classify failures detected on a real, measured IV curve testing corpus obtained from laboratory and field experiments. Results obtained using a fully synthetic training dataset achieve identical accuracy to those obtained with use of a measured training dataset. When evaluating the measured data’s test split, a 100% accuracy was found both when using simulations or measured data as the training corpus. When evaluating all of the measured data, a 96% accuracy was found when using a fully synthetic training dataset. The use of physics-based modeling results as a training corpus for failure detection and classification has many advantages for implementation as each PV system is configured differently, and it would be nearly impossible to train using labeled measured data.
Record ID
Keywords
IV curves, neural networks, photovoltaic systems, Simulation
Suggested Citation
Hopwood MW, Stein JS, Braid JL, Seigneur HP. Physics-Based Method for Generating Fully Synthetic IV Curve Training Datasets for Machine Learning Classification of PV Failures. (2023). LAPSE:2023.11724
Author Affiliations
Hopwood MW: Department of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA; Sandia National Laboratories, Albuquerque, NM 87123, USA [ORCID]
Stein JS: Sandia National Laboratories, Albuquerque, NM 87123, USA [ORCID]
Braid JL: Sandia National Laboratories, Albuquerque, NM 87123, USA [ORCID]
Seigneur HP: Florida Solar Energy Center, University of Central Florida, Cocoa, FL 32922, USA [ORCID]
Stein JS: Sandia National Laboratories, Albuquerque, NM 87123, USA [ORCID]
Braid JL: Sandia National Laboratories, Albuquerque, NM 87123, USA [ORCID]
Seigneur HP: Florida Solar Energy Center, University of Central Florida, Cocoa, FL 32922, USA [ORCID]
Journal Name
Energies
Volume
15
Issue
14
First Page
5085
Year
2022
Publication Date
2022-07-12
ISSN
1996-1073
Version Comments
Original Submission
Other Meta
PII: en15145085, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2023.11724
This Record
External Link

https://doi.org/10.3390/en15145085
Publisher Version
Download
Meta
Record Statistics
Record Views
184
Version History
[v1] (Original Submission)
Feb 27, 2023
Verified by curator on
Feb 27, 2023
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2023.11724
Record Owner
Auto Uploader for LAPSE
Links to Related Works
