Article Info

Pashto Language Stemming Algorithm

Sebghatullah Aslamzai, Saidah Saad
dx.doi.org/10.17576/apjitm-2015-0401-03

Abstract

In this paper a stemming algorithm for morphological analysis for less popular or minor language like Pashto language is presented. Pashto, as a less popular language, lacks the resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database search systems, information retrieval, linguistic applications, and so forth. The review of literature shows that only very few morphological studies have been conducted on Pashto language and Pashto has not yet been fully analyzed. In addition, no stemming algorithm has been proposed in extracting Pashto root words from the Pashto corpus. In this paper, Pashto corpus is directly used as the input, and accordingly the stemming algorithm uses both inflectional and derivational morphemes. The output is in the form of meaningful root word without affixes. Furthermore, the accuracy and strength of the proposed algorithm is evaluated. To validate the function of the developed algorithm, two native speakers of Pashto were recruited to evaluate the algorithm in terms of its accuracy and strength.

keyword

Morphology, Conflation, Stemming algorithm, Root word, Pashto rules.

Area

Knowledge Technology