Stemming algorithms have been used in information retrieval (IR) for decades; however, there is no consensus that stemming enhances the effectiveness of IR systems. Many studies have investigated the effectiveness of stemming via the use of test collections: the conclusion is mixed results. Harman (1991) tested three stemming algorithms for large English corpora. The study concludes that the three algorithms used achieved no significant improvement in the performance of the IR systems. Later studies (Abu-Salem et al., 1999; Jinxi and Croft, 1998; Hull, 1996; Krovetz, 1993) find that stemming is useful and enhances the effectiveness of the IR systems. These studies indicate that stemming is one of the most important factors that enhance the effectiveness of information retrieval systems. In consequence, the applications of stemming algorithms are widely used now for this purpose. Abu-Salem, Mahmoud et al (1999) explain that in information retrieval systems, grouping words having the same base or root increases the success rate when matching documents to a query. For the present study, I agree with Savoy (1999) and others who support the idea that stemming is useful especially when long retrieved lists of documents are analyzed.
Many stemmers have been developed for a wide range of languages including English, French, German, Dutch, Swedish, Latin, Malay, Indonesian, Slovene, Turkish, Arabic and Hebrew. Leah, Lisa, et al. (2002) point out that “stemmers are generally tailored for each specific language” (2002: 275). Building stemmers accordingly requires some linguistic knowledge of the language and an understanding of the needs of information retrieval. The concept of all stemmers is the reduction of the corpora size so that Info...
... middle of paper ...
... true that stemming is useful in merging words which are different in form but are semantically equivalent; however, it can as well merge words which are different in form and are also semantically distinct and different from each other. Still again, stemmers find no solutions to homographs. This means that stemmers can conflate word forms which are completely different in meaning. In terms of IR applications, stemmers make two kinds of error: over-stemming and under-stemming. Strong stemmers tend to form larger stem classes where unrelated forms are wrongly conflated. This error is defined as over-stemming. Weak stemmers, in turn, fail to conflate variant forms of the same stem leaving them ungrouped. This error is called under-stemming. The present section introduces the main stemming algorithms for English corpora illustrating how they carry out stemming tasks.
1. What is the name of the document? Ida Tarbell Criticizes Standard Oil (1904) 2. What type of document is it? (newspaper, map, image, report, Congressional record, etc.)
First, a brief background in the three dimensions of language discussed throughout this paper. The functional, semantic, or thematic dimensions of language as previously mentioned are often used in parallel with each other. Due, to this fact it is important to be able to identify them as they take place and differentiate between these dimensions i...
"UCLA Language Materials Project: Language Profile." UCLA Language Materials Project: Main. UCLA. Web. 23 Oct. 2011. .
Johnston, F., Bear, D., & Invernizzi, M. (2004). Words their way: Word sorts for letter
Information Retrieval (IR) is to represent, retrieve from storage and organise the information. The information should be easily access. User will be more interested with easy access information. Information retrieval process is the skills of searching for documents, for information within documents and for metadata about documents, as well as that of searching relational databases and the World Wide Web. According to (Shing Ping Tucker, 2008), E-commerce is rapidly a growing segment in the internet.
Fromkin, Victoria & Rodman, Robert. An Introduction to Language, 6th edition. Orlando, Florida: Harcourt Brace, 1998
In today’s fast paced technology, search engines have become vastly popular use for people’s daily routines. A search engine is an information retrieval system that allows someone to search the...
Technologies like word prediction can compare a typed word with a word in the dictionary list and recognize a mismatch as a misspelled word. This helps the writers who unconsciously reorder the letters a lot while typing. It can also assist a writer in guessing the spelling of a word.
Information Retrieval is simply a field concerned with organizing information. In other terms, IR is emphasizing the range of different materials that need to be searched. Others researcher said that IR is the contrast between the strong structure and typing a database system with the lack of structure in the objects typically searched in IR. The actual process in information retrieval systems is it has to deal with incomplete or under specified information in the form of the queries issued by users. IR uses the techniques of storing and recovering and often disseminating recorded data especially through the use of a computerized system.
" Modern Language Association 111.3 (1996): 408-20. JSTOR.com - "The New York Times" Web. The Web. The Web. 11 June 2013.
Which is great for mapping concepts to words, and it’ll even deal well with homographs, identical words that mean completely different things. You can deal with those through context: the days of “hydraulic ram” being translated into “water sheep” are pretty much gonet.
This paper describes the word-formation process in Arabic which is abbreviation that contains three important processes; acronyms, clipping and blending. Although these processes do not exist widely in Arabic, they are important to be known and to be discussed. So, this research aims to illustrate the definition of each process and to give explanations and examples on each. Merriam Webster dictionary defines the term of abbreviation as “a shortened form of written word or phrase used in place of the whole word of phrase”. Abbreviation contains three parts which are acronyms (الأوائلية اللفظة), clipping (الإجتزاء) and blending
This paper has presented a multiple ontology query processing method and analyzed case studies on domain-specific ontology based query expansion. Use of ontologies for information retrieval, in particular their use in the area of query expansion is presented. Concept-based query expansion retaining original keywords yields more desirable and useful results. Compound words add complexity to the query expansion, however further research experiments are desirable to study the effects of using ontology for query expansion. Finally further research is outlined for the exploit of ontology based information retrieval in Cloud.
Middle Search Plus. Web. The Web. The Web. 1 Oct. 2015 -.