Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/5944
Title: | Text Simplification using Hybrid Semantic Compression and Support Vector Machine for Troll Threat Sentences | Authors: | Bakar, J. A. Yusoff, N Harun, N. H. Nadzir, M. M. Omar, S. |
Keywords: | Text simplification;semantic compression;machine learning;natural language processing;cyber bullying | Issue Date: | 2023 | Publisher: | Science and Information Organization | Project: | FRGS-RACER | Journal: | International Journal of Advanced Computer Science and Applications | Abstract: | Text Simplification (TS) is an emerging field in Natural Language Processing (NLP) that aims to make complex text more accessible. However, there is limited research on TS in the Malay language, known as Bahasa Malaysia, which is widely spoken in Southeast Asia. The challenges in this domain revolve around data availability, feature engineering, and the suitability of methods for text simplification. Previous studies predominantly employed single methods such as semantic compression, or machine learning with the Support Vector Machine (SVM) classifier consistently achieving an accuracy of approximately 70% in identifying troll sentences—statements containing threats from online trolls notorious for their disruptive online behavior. This study combines semantic compression and machine learning methods across lexical, syntactic, and semantic levels, utilizing frequency dictionaries as semantic features. Support Vector Machine and Decision Tree classifiers are applied and tested on 6,836 datasets, divided into training and testing sets. When comparing SVM and Decision Tree with and without semantic features, SVM with semantics achieves an average accuracy of 92.37%, while Decision Tree with semantics reaches 91.21%. The proposed TS method is evaluated on troll sentences, which are often associated with cyberbullying. Furthermore, it is worth noting that cyberbullying has been reported to be a significant issue, with Malaysia ranking as the second worst out of the 28 countries surveyed in Asia. Therefore, the outcomes of the study could potentially offer means, such as machine translation and relation extraction, to help prevent cyberbullying in Malaysia. |
Description: | Web of Science / Scopus |
URI: | http://hdl.handle.net/123456789/5944 | ISSN: | 2158107X | DOI: | 10.14569/IJACSA.2023.0141035 |
Appears in Collections: | Faculty of Data Science and Computing - Journal (Scopus/WOS) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
SCOPUS_IJACSA_Text Simplification.pdf | 1.45 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.