Classifying the Arabic Language Texts Part 1

764 Words2 Pages

There are several research and procedures for classifying Arabic-language texts were based mostly on different environments and lack of dependence on a unified standard, unified data set, which led to the lack of precision in determining the most accurate technique in the classification, Arabic language processing is not saturated as that of other languages. Find the roots and stemmer of Arabic is an important phases towards conducting research on most effective applications of NLP Arabic so we have interest to apply algorithms to these phases. Arabic language has a complex structure which makes it difficult to integrate NLP research on it.
In this theses will be a study and analysis of the classification algorithms based on a unified environment and one dataset with the included challenges faced by these algorithms to demonstrate the effectiveness and accuracy and with a huge data set due to the expansion of data and the continuous increase in the internet.
There are several algorithms for the classification of texts which are used in the classification of texts in the group that have to do by helping to retrieve more quickly and give more accurate searches that for Arabic texts like K-NN ,DECISION TREES , Naive Bayse ,Random forest and others.

we used Diab datasets and the structure of the dataset:
The dataset has nine categories each of which contains 300 documents. Each category has its own directory that includes all files belonging to this particular category. and we make two other collections of data set, the second dataset collections has nine categories each of which contains 600 documents. Each category has its own directory that includes all files belonging to this particular category, and third dataset has n...

... middle of paper ...

...haracters in the field of medicine and can also be in another area, such as sports or in another area but differ in meaning.
In the world of computer and internet there must be solutions to these problems, otherwise the process of searching and retrieving information on the Internet is useless and may take a long time to reach the user request.
The process of retrieval of information must be more precise and a strong relationship to the topic which the user wants, and looking in the same area that the user needs. large topics and multiple sources and large terms increased complexity in the process of retrieval of information is therefore necessary to determine the paths that must be followed when the search or retrieval and that no be random to save time in the search in the paths that not related, and here comes the importance of the classification text in order to

More about Classifying the Arabic Language Texts Part 1

Open Document