Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/5944
Title: Text Simplification using Hybrid Semantic Compression and Support Vector Machine for Troll Threat Sentences
Authors: Bakar, J. A. 
Yusoff, N 
Harun, N. H. 
Nadzir, M. M. 
Omar, S. 
Keywords: Text simplification;semantic compression;machine learning;natural language processing;cyber bullying
Issue Date: 2023
Publisher: Science and Information Organization
Project: FRGS-RACER 
Journal: International Journal of Advanced Computer Science and Applications 
Abstract: 
Text Simplification (TS) is an emerging field in Natural Language Processing (NLP) that aims to make complex text more accessible. However, there is limited research on TS in the Malay language, known as Bahasa Malaysia, which is widely spoken in Southeast Asia. The challenges in this domain revolve around data availability, feature engineering, and the suitability of methods for text simplification. Previous studies predominantly employed single methods such as semantic compression, or machine learning with the Support Vector Machine (SVM) classifier consistently achieving an accuracy of approximately 70% in identifying troll sentences—statements containing threats from online trolls notorious for their disruptive online behavior. This study combines semantic compression and machine learning methods across lexical, syntactic, and semantic levels, utilizing frequency dictionaries as semantic features. Support Vector Machine and Decision Tree classifiers are applied and tested on 6,836 datasets, divided into training and testing sets. When comparing SVM and Decision Tree with and without semantic features, SVM with semantics achieves an average accuracy of 92.37%, while Decision Tree with semantics reaches 91.21%. The proposed TS method is evaluated on troll sentences, which are often associated with cyberbullying. Furthermore, it is worth noting that cyberbullying has been reported to be a significant issue, with Malaysia ranking as the second worst out of the 28 countries surveyed in Asia. Therefore, the outcomes of the study could potentially offer means, such as machine translation and relation extraction, to help prevent cyberbullying in Malaysia.
Description: 
Web of Science / Scopus
URI: http://hdl.handle.net/123456789/5944
ISSN: 2158107X
DOI: 10.14569/IJACSA.2023.0141035
Appears in Collections:Faculty of Data Science and Computing - Journal (Scopus/WOS)

Files in This Item:
File Description SizeFormat
SCOPUS_IJACSA_Text Simplification.pdf1.45 MBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.