Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/4868
Title: Automatic POS tagging of Arabic words using the YAMCHA machine learning tool
Authors: Elnily A. 
Abdelghany, A. 
Keywords: machine learning;POS tagging;support vector machine
Issue Date: 2022
Publisher: Institute of Electrical and Electronics Engineers Inc
Conference: Proceedings of the 20th Conference on Language Engineering, ESOLEC 2022 
Abstract: 
The process of automatically giving the proper POS tag to each word in a text based on context is known as automatic POS tagging. The majority of NLP applications require this process as a crucial step. This study intends to propose a machine learning-based Arabic POS tagger. YAMCHA tool is the machine learning system employed in this study. YAMCHA utilizes Support Vector Machines as a machine learning algorithm. SVM classifies data with high accuracy because it makes use of part of data in training process. As a result, in order to train the system, a substantial amount of annotated data must be evaluated at the POS level. A corpus of 100,039 words is utilized in this study. It was divided into training and testing parts, totaling 64,608 and 35,431 words, respectively. A tag set of 48 morphological tags were used in training and testing. To reach the best result in the automatic POS tagging, the system was trained multiple times with changing the range of linguistic information used in training process, and then new texts were tested and evaluated. The least error rate achieved was 11.4%. This rate was reached when the preceding word of the target one was considered in the training process without considering its POS tag (F:-10: 0).
Description: 
Scopus
URI: http://hdl.handle.net/123456789/4868
ISBN: Institute of Electrical and Electronics Engineers Inc
DOI: 10.1109/ESOLEC54569.2022.10009473
Appears in Collections:Faculty of Language Studies and Human Development - Proceedings

Show full item record

Google ScholarTM

Check

Altmetric

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.