LAPSE:2024.1979
Published Article
LAPSE:2024.1979
Mus4mCPred: Accurate Identification of DNA N4-Methylcytosine Sites in Mouse Genome Using Multi-View Feature Learning and Deep Hybrid Network
August 28, 2024
N4-methylcytosine (4mC) is a critical epigenetic modification that plays a pivotal role in the regulation of a multitude of biological processes, including gene expression, DNA replication, and cellular differentiation. Traditional experimental methods for detecting DNA N4-methylcytosine sites are time-consuming, labor-intensive, and costly, making them unsuitable for large-scale or high-throughput research. Computational methods for identifying DNA N4-methylcytosine sites enable the rapid and cost-effective analysis of DNA 4mC sites across entire genomes. In this study, we focus on the identification of DNA 4mC sites in the mouse genome. Although there are already some computational methods that can predict DNA 4mC sites in the mouse genome, there is still significant room for improvement in accurately predicting them due to their inability to fully capture the multifaceted characteristics of DNA sequences. To address this issue, we propose a new deep learning predictor called Mus4mCPred, which utilizes multi-view feature learning and deep hybrid networks for accurately predicting DNA 4mC sites in the mouse genome. The predictor Mus4mCPred firstly employed different encoding methods to extract the feature vectors of DNA sequences, then input these features generated by different encoding methods into various hybrid deep learning models for the learning and extraction of more sophisticated representations of these features, and finally fused the extracted multi-view features to serve as the final features for DNA 4mC site prediction in the mouse genome. Multi-view features enabled the more comprehensive capture of data characteristics, enhancing the feature representation of DNA sequences. The independent test results showed that the sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthews’ correlation coefficient (MCC) were 0.7688, 0.9375, 0.8531, and 0.7165, respectively. The predictor Mus4mCPred outperformed other state-of-the-art methods, achieving the accurate identification of 4mC sites in the mouse genome.
Record ID
Keywords
bioinformatics, deep learning, DNA N4-methylcytosine sites, feature fusion
Subject
Suggested Citation
Wang X, Du Q, Wang R. Mus4mCPred: Accurate Identification of DNA N4-Methylcytosine Sites in Mouse Genome Using Multi-View Feature Learning and Deep Hybrid Network. (2024). LAPSE:2024.1979
Author Affiliations
Wang X: School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China; Henan Provincial Key Laboratory of Data Intelligence for Food Safety, Zhengzhou University of Light Industry, Zhengzhou 450002, China [ORCID]
Du Q: School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China
Wang R: School of Electronic Information, Zhengzhou University of Light Industry, Zhengzhou 450002, China
Du Q: School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China
Wang R: School of Electronic Information, Zhengzhou University of Light Industry, Zhengzhou 450002, China
Journal Name
Processes
Volume
12
Issue
6
First Page
1129
Year
2024
Publication Date
2024-05-30
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr12061129, Publication Type: Journal Article
Record Map
Published Article
LAPSE:2024.1979
This Record
External Link
https://doi.org/10.3390/pr12061129
Publisher Version
Download
Meta
Record Statistics
Record Views
43
Version History
[v1] (Original Submission)
Aug 28, 2024
Verified by curator on
Aug 28, 2024
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2024.1979
Record Owner
PSE Press
Links to Related Works