Arabic Stemming: Challenges and Techniques

KING SAUD UNIVERSITY COLLEGE OF COMPUTER & INFORMATION SCIENCES DEPARTMENT OF COMPUTER SCIENCE CSC595 Arabic Stemming: Challenges and Techniques June 2010 Abstract Arabic Text applications use stemming as a preprocessing stage. The problems raised with stemming are introduced in this paper. Moreover, different stemming applications is mentioned. Additionally, stemming techniques are presented with discussion about each of them. 1 Introduction Arabic language is a Semitic language. It differs from the most common languages. The morphological structure of the Arabic language is more complicated. The writing direction is from right to left [1]. Its letter shapes are varied depending on the position of the letter on the word. The Arabic parts of speech are noun, verb and particle.[2] The diacritical marks are sometimes used with the letters to indicate the pronunciation of the word. In the derivation, some of the letters are mutated as in Table 1 [3]. The mutation is the process of replacing a letter with another one. Word Pattern Derivation Mutation صبر افتعل اصتبر اصطبر Table 1: An exampe of the mutation Affixes are attached to the beginning and the end of the words. Prefixes indicate the conjugation person of verbs in the present tense. Suffixes are the conjugation terminations of verbs as well as the dual, plural, and female marks for the nouns. Moreover, pronouns attached at the end of the words.[4] Some applications have to reduce a word to a smaller part. The process of doing that is called Stemming. Stemmer removes all affixes from a word to reduce it to its root. As a result, it is used to get better result when searching about a word and t... ... middle of paper ...

