Arabic Stemming: Challenges and Techniques

3173 Words13 Pages
KING SAUD UNIVERSITY COLLEGE OF COMPUTER & INFORMATION SCIENCES DEPARTMENT OF COMPUTER SCIENCE CSC595 Arabic Stemming: Challenges and Techniques June 2010 Abstract Arabic Text applications use stemming as a preprocessing stage. The problems raised with stemming are introduced in this paper. Moreover, different stemming applications is mentioned. Additionally, stemming techniques are presented with discussion about each of them. 1 Introduction Arabic language is a Semitic language. It differs from the most common languages. The morphological structure of the Arabic language is more complicated. The writing direction is from right to left [1]. Its letter shapes are varied depending on the position of the letter on the word. The Arabic parts of speech are noun, verb and particle.[2] The diacritical marks are sometimes used with the letters to indicate the pronunciation of the word. In the derivation, some of the letters are mutated as in Table 1 [3]. The mutation is the process of replacing a letter with another one. Word Pattern Derivation Mutation صبر افتعل اصتبر اصطبر Table 1: An exampe of the mutation Affixes are attached to the beginning and the end of the words. Prefixes indicate the conjugation person of verbs in the present tense. Suffixes are the conjugation terminations of verbs as well as the dual, plural, and female marks for the nouns. Moreover, pronouns attached at the end of the words.[4] Some applications have to reduce a word to a smaller part. The process of doing that is called Stemming. Stemmer removes all affixes from a word to reduce it to its root. As a result, it is used to get better result when searching about a word and t... ... middle of paper ... ...lications, 2004, pp. 543. [16] K. Taghva, et al.,“Arabic stemming without a root dictionary”, in International Conference on Information Technology: Coding and Computing, 2005. [17] S. Ghwanmeh, et al.,“Enhanced Algorithm for Extracting the Root of Arabic Words”, in Sixth International Conference on Computer Graphics, Imaging and Visualization, 2009, pp, 388-391. [18] R. Alshalabi, “Pattern-based stemmer for finding Arabic roots”, Information Technology Journal, Vol.4, no. 1, pp. 38-43, 2005. [19] H. Al-Serhan and A. Ayesh, “A Triliteral Word Roots Extraction Using Neural Network For Arabic”, in The 2006 International Conference on Computer Engineering and Systems, 2006, pp. 436-440. [20] M. Momani, and J. Faraj, “ A Novel Algorithm to Extract Tri-Literal Arabic Roots”, in International Conference on Computer Systems and Applications, 2007, pp. 309-315.

More about Arabic Stemming: Challenges and Techniques

Open Document