ICAST 2024
Conference Management System
Main Site
Submission Guide
Register
Login
User List | Statistics
Abstract List | Statistics
Poster List
Paper List
Reviewer List
Presentation Video
Online Q&A Forum
Ifory System
:: Abstract ::

<< back

Enhancing Indonesian Text Processing with Rule-Based Stemming for Affixed and Reduplicated Words
Irwan Setiawan, Fitri Diani, Yadhi A. Permana, Suprihanto

Department of Computer and Informatics Engineering, Politeknik Negeri Bandung, Indonesia


Abstract

This paper presents the development and evaluation of two rule-based stemming algorithms, SFAIS (Suffix-First Approach Indonesian Stemmer) and PFAIS (Prefix-First Approach Indonesian Stemmer), aimed at addressing the unique morphological challenges of the Indonesian language. Our study, which includes the creation of comprehensive datasets comprising 31,310 unique root words and 19,075 unique affixed words, including 1,966 reduplicated words derived from 6,872 root words, is a significant contribution to Indonesian natural language processing. These datasets are made publicly available to support further research. SFAIS demonstrated an Index Compression Factor of 65.09, Word Stemmed Factor of 99.89%, and Correctly Stemmed Words Factor of 93.24%, while PFAIS showed an Index Compression Factor of 63.94, Word Stemmed Factor of 98.08%, and Correctly Stemmed Words Factor of 92.17%. SFAIS achieved an overall accuracy of 93.14%, outperforming PFAIS, which gained 90.40%. Error analysis revealed that SFAIS had a lower total error count (1,309) than PFAIS (1,832), with fewer under-stemming and miss-stemming errors. These results highlight the efficacy of the suffix-first approach in accurately processing Indonesian affixed and reduplicated words. Our study significantly contributes to Indonesian natural language processing by providing more accurate stemming algorithms and valuable datasets.

Keywords: Indonesian Language Processing, Rule-Based Stemming, Morphological Analysis, Affixed Words, Reduplication Words

Topic: Artificial Intelligence (AI)

Plain Format | Corresponding Author (Irwan Setiawan)

Share Link

Share your abstract link to your social media or profile page

ICAST 2024 - Conference Management System

Powered By Konfrenzi Ultimate 1.832M-Build8 © 2007-2026 All Rights Reserved