LAPSE:2020.0910
Published Article
LAPSE:2020.0910
Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements
Simon Orozco-Arias, Johan S. Piña, Reinel Tabares-Soto, Luis F. Castillo-Ossa, Romain Guyot, Gustavo Isaza
August 5, 2020
Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application.
Keywords
classification, deep learning, detection, Machine Learning, metrics, transposable elements
Suggested Citation
Orozco-Arias S, Piña JS, Tabares-Soto R, Castillo-Ossa LF, Guyot R, Isaza G. Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. (2020). LAPSE:2020.0910
Author Affiliations
Orozco-Arias S: Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170001, Colombia; Department of Systems and Informatics, Universidad de Caldas, Manizales 170004, Colombia [ORCID]
Piña JS: Research Group in Software Engineering, Universidad Autónoma de Manizales, Manizales 170001, Colombia
Tabares-Soto R: Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170001, Colombia [ORCID]
Castillo-Ossa LF: Department of Systems and Informatics, Universidad de Caldas, Manizales 170004, Colombia
Guyot R: Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170001, Colombia; Institut de Recherche pour le Développement, Univ. Montpellier, UMR DIADE, 34394 Montpellier, France [ORCID]
Isaza G: Department of Systems and Informatics, Universidad de Caldas, Manizales 170004, Colombia [ORCID]
Journal Name
Processes
Volume
8
Issue
6
Article Number
E638
Year
2020
Publication Date
2020-05-27
Published Version
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr8060638, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2020.0910
This Record
External Link

doi:10.3390/pr8060638
Publisher Version
Download
Files
[Download 1v1.pdf] (2.7 MB)
Aug 5, 2020
Main Article
License
CC BY 4.0
Meta
Record Statistics
Record Views
584
Version History
[v1] (Original Submission)
Aug 5, 2020
 
Verified by curator on
Aug 5, 2020
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2020.0910
 
Original Submitter
Calvin Tsay
Links to Related Works
Directly Related to This Work
Publisher Version