LAPSE:2023.2964
Published Article

LAPSE:2023.2964
Machine Learning Approaches for Discriminating Bacterial and Viral Targeted Human Proteins
February 21, 2023
Abstract
Infectious diseases are one of the core biological complications for public health. It is important to recognize the pathogen-specific mechanisms to improve our understanding of infectious diseases. Differentiations between bacterial- and viral-targeted human proteins are important for improving both prognosis and treatment for the patient. Here, we introduce machine learning-based classifiers to discriminate between the two groups of human proteins. We used the sequence, network, and gene ontology features of human proteins. Among different classifiers and features, the deep neural network (DNN) classifier with amino acid composition (AAC), dipeptide composition (DC), and pseudo-amino acid composition (PAAC) (445 features) achieved the best area under the curve (AUC) value (0.939), F1-score (94.9%), and Matthews correlation coefficient (MCC) value (0.81). We found that each of the selected top 100 of the bacteria- and virus-targeted human proteins from a candidate pool of 1618 and 3916 proteins, respectively, were part of distinct enriched biological processes and pathways. Our proposed method will help to differentiate between the bacterial and viral infections based on the targeted human proteins on a global scale. Furthermore, identification of the crucial pathogen targets in the human proteome would help us to better understand the pathogen-specific infection strategies and develop novel therapeutics.
Infectious diseases are one of the core biological complications for public health. It is important to recognize the pathogen-specific mechanisms to improve our understanding of infectious diseases. Differentiations between bacterial- and viral-targeted human proteins are important for improving both prognosis and treatment for the patient. Here, we introduce machine learning-based classifiers to discriminate between the two groups of human proteins. We used the sequence, network, and gene ontology features of human proteins. Among different classifiers and features, the deep neural network (DNN) classifier with amino acid composition (AAC), dipeptide composition (DC), and pseudo-amino acid composition (PAAC) (445 features) achieved the best area under the curve (AUC) value (0.939), F1-score (94.9%), and Matthews correlation coefficient (MCC) value (0.81). We found that each of the selected top 100 of the bacteria- and virus-targeted human proteins from a candidate pool of 1618 and 3916 proteins, respectively, were part of distinct enriched biological processes and pathways. Our proposed method will help to differentiate between the bacterial and viral infections based on the targeted human proteins on a global scale. Furthermore, identification of the crucial pathogen targets in the human proteome would help us to better understand the pathogen-specific infection strategies and develop novel therapeutics.
Record ID
Keywords
classification, deep learning, DNN, host-pathogen interactions, infectious diseases, Machine Learning, pathogen-specific infection
Subject
Suggested Citation
Barman RK, Mukhopadhyay A, Maulik U, Das S. Machine Learning Approaches for Discriminating Bacterial and Viral Targeted Human Proteins. (2023). LAPSE:2023.2964
Author Affiliations
Barman RK: Division of Virology, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata 700010, India; Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
Mukhopadhyay A: Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, India
Maulik U: Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
Das S: Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata 700010, India; ICMR-National Institute of Occupational Health, Ahmedabad 380016, India
Mukhopadhyay A: Department of Computer Science and Engineering, University of Kalyani, Kalyani 741235, India
Maulik U: Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
Das S: Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata 700010, India; ICMR-National Institute of Occupational Health, Ahmedabad 380016, India
Journal Name
Processes
Volume
10
Issue
2
First Page
291
Year
2022
Publication Date
2022-01-31
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr10020291, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2023.2964
This Record
External Link

https://doi.org/10.3390/pr10020291
Publisher Version
Download
Meta
Record Statistics
Record Views
328
Version History
[v1] (Original Submission)
Feb 21, 2023
Verified by curator on
Feb 21, 2023
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2023.2964
Record Owner
Auto Uploader for LAPSE
Links to Related Works
