LAPSE:2023.2679
Published Article

LAPSE:2023.2679
A Combined Text-Based and Metadata-Based Deep-Learning Framework for the Detection of Spam Accounts on the Social Media Platform Twitter
February 21, 2023
Abstract
Social networks have become an integral part of our daily lives. With their rapid growth, our communication using these networks has only increased as well. Twitter is one of the most popular networks in the Middle East. Similar to other social media platforms, Twitter is vulnerable to spam accounts spreading malicious content. Arab countries are among the most targeted, possibly due to the lack of effective technologies that support the Arabic language. In addition, as a complex language, Arabic has extensive grammar rules and many dialects that present challenges when extracting text data. Innovative methods to combat spam on Twitter have been the subject of many current studies. This paper addressed the issue of detecting spam accounts in Arabic on Twitter by collecting an Arabic dataset that would be suitable for spam detection. The dataset contained data from premium features by using Twitter premium API. Data labeling was conducted by flagging suspended accounts. A combined framework was proposed based on deep-learning methods with several advantages, including more accurate, faster results while demanding less computational resources. Two types of data were used, text-based data with a convolution neural networks (CNN) model and metadata with a simple neural networks model. The output of the two models combined identified accounts as spam or not spam. The results showed that the proposed framework achieved an accuracy of 94.27% with our combined model using premium feature data, and it outperformed the best models tested thus far in the literature.
Social networks have become an integral part of our daily lives. With their rapid growth, our communication using these networks has only increased as well. Twitter is one of the most popular networks in the Middle East. Similar to other social media platforms, Twitter is vulnerable to spam accounts spreading malicious content. Arab countries are among the most targeted, possibly due to the lack of effective technologies that support the Arabic language. In addition, as a complex language, Arabic has extensive grammar rules and many dialects that present challenges when extracting text data. Innovative methods to combat spam on Twitter have been the subject of many current studies. This paper addressed the issue of detecting spam accounts in Arabic on Twitter by collecting an Arabic dataset that would be suitable for spam detection. The dataset contained data from premium features by using Twitter premium API. Data labeling was conducted by flagging suspended accounts. A combined framework was proposed based on deep-learning methods with several advantages, including more accurate, faster results while demanding less computational resources. Two types of data were used, text-based data with a convolution neural networks (CNN) model and metadata with a simple neural networks model. The output of the two models combined identified accounts as spam or not spam. The results showed that the proposed framework achieved an accuracy of 94.27% with our combined model using premium feature data, and it outperformed the best models tested thus far in the literature.
Record ID
Keywords
Arabic spam account, deep convolution neural networks, deep learning, online social network, spam detection
Suggested Citation
Alhassun AS, Rassam MA. A Combined Text-Based and Metadata-Based Deep-Learning Framework for the Detection of Spam Accounts on the Social Media Platform Twitter. (2023). LAPSE:2023.2679
Author Affiliations
Alhassun AS: Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia
Rassam MA: Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia; Faculty of Engineering and Information Technology, Taiz University, Taiz 6803, Yemen [ORCID]
Rassam MA: Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia; Faculty of Engineering and Information Technology, Taiz University, Taiz 6803, Yemen [ORCID]
Journal Name
Processes
Volume
10
Issue
3
First Page
439
Year
2022
Publication Date
2022-02-22
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr10030439, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2023.2679
This Record
External Link

https://doi.org/10.3390/pr10030439
Publisher Version
Download
Meta
Record Statistics
Record Views
236
Version History
[v1] (Original Submission)
Feb 21, 2023
Verified by curator on
Feb 21, 2023
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2023.2679
Record Owner
Auto Uploader for LAPSE
Links to Related Works
