Methodology
In this chapter we are going to provide more insight into the Naïve Bayes algorithm. The aim is to show how the method works. We will also take a look at how our model will be developed, the various data sets that will be used in the process and how they were chosen. Then we are going to look at feature selection and how it will be applied.
THE NAÏVE BAYES CLASSIFIER
Bayes' rule:
P (E | H) x P (H)
P (H | E) = _________________
P (E)
The fundamental concept of Bayes' rule is that the result of a hypothesis or an event (H) can be calculated based on the presence of some observed evidences (E). From Bayes' rule, we have:
1. A prior probability of H or P(H): This is the probability of an event before observing the evidence.
2. A posterior probability of H or P(H | E): This is the probability of an event after observing the evidence.
For example to estimate the probability of a mail being classified as belonging to the Human Resources (HR) class, we usually use some evidences such as the frequency of use of words like “Employment”.
Using the equation above, let ‘HR’ be the event of a mail belonging to HR and ‘Employment’ be the evidence of the word Employment in the mail, then we have
P (Employment | HR) x P (HR)
P (HR | Employment) = _____________________
P (Employment)
P (HR | Employment) is the probability that the word Employment occurs in a mail to HR. Of course, “Employment” could occur in many other mail classes such as Joint Venture or Procurement and Contracting, but we only consider “Employment” in the context of class “HR”. This probability can be obtained from historical mail collections.
P (HR) is the prior probability of the HR class. This probability can be estimated from r...
... middle of paper ...
...st results. Because no information about the test set was used in developing the classifier, the results of this experiment should be indicative of actual performance in practice.
It is highly important to not look at the test data while developing the classifier method and to run systems on it as sparingly as possible. Ignoring or violating this rule will result in loss of validity of your results because you have implicitly tuned your system to the test data simply by running many variant systems and keeping the tweaks to the system that worked best on the test set.
Feature Selection
CONCLUSION
In this chapter we have been able to describe what the Naïve Bayes theory is and how we were able to build the classifier. In the next chapter we will take a closer look at the training set and test set. We will also carry out an evaluation of the classifier we developed.
Support Vector Machine(SVM): Over the past several years, there has been a significant amount of research on support vector machines and today support vector machine applications are becoming more common in text classification. In essence, support vector machines define hyperplanes, which try to separate the values of a given target field. The hyperplanes are defined using kernel functions. The most popular kernel types are supported: linear, polynomial, radial basis and sigmoid. Support Vector Machines can be used for both, classification and regression. Several characteristics have been observed in vector space based methods for text classification [15,16], including the high dimensionality of the input space, sparsity of document vectors, linear separability in most text classification problems, and the belief that few features are relevant.
In clustering process, semi-supervised learning is a tutorial of contrivance learning methods that make usage of both labeled and unlabeled data for training - characteristically a trifling quantity of labeled data with a great quantity of unlabeled data. Semi-supervised learning cascades in the middle of unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Feature selection encompasses pinpointing a subsection of the most beneficial features that yields well-suited results as the inventive entire set of features. A feature selection algorithm may be appraised from both the good organization and usefulness points of view. Although the good organization concerns the time necessary to discover a subsection of features, the usefulness is related to the excellence of the subsection of features. Traditional methodologies for clustering data are based on metric resemblances, i.e., non-negative, symmetric, and satisfying the triangle unfairness measures using graph-based algorithm to replace this process in this project using more recent approaches, like Affinity Propagation (AP) algorithm can take as input also general non metric similarities.
It refers to assemble and examine statistics information that applicable to human resource system from either overall world class or highest equivalent company in globally.
item Given a message, classify whether the message is of positive, negative, or neutral sentiment. For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen.
We represent these outcomes by a set of outcomes called a sample space. For a coin we
Gregory F. Cooper and Edward Herskovits. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn., 9(4):309{347, 1992.
Future research is needed to compare the classification abilities of this method in various situations with other case-based classification methods is needed to see some other result how to evaluate the customers review by using DRSA method and 4emka2 software decision result.
Text Mining is used in medical specialty for identification and classification of technical terms within the domain of biological science corresponding to the concepts.
Thus we maximize P(Ci |X). The class Ci for which P(Ci |X) is maximized is called the maximum posteriori hypothesis. By Bayes’ theorem (Equation (6.10)),
[4] J.Froelich, S. Ananyan, D. Olson, The Use of Text Mining to Analyze Public Input, Megaputer Intelligence, 2004.
Machine learning systems can be categorized according to many different criteria. We will discuss three criteria: Classification on the basis of the underlying learning strategies used, Classification on the basis of the representation of knowledge or skill acquired by the learner and Classification in terms of the application domain of the performance system for which knowledge is acquired.
Machine learning is a branch of artificial intelligence that aims at solving real life engineering problems. It provides the opportunity to learn without being explicitly programmed and it is based on the concept of learning from data. It is so much ubiquitously used dozen a times a day that we may not even know it. The advantage of machine learning (ML) methods is that it uses mathematical models, heuristic learning, knowledge acquisitions and decision trees for decision making. Thus, it provides controllability, observability and stability. It updates easily by adding a new patient‘s record.
This research study is arranged as follows: Section 2 includes the related works done in this field; Section 3 describes our proposed Neuro-fuzzy classification based method. Section 4 explains the methodology in terms of our proposed neuro-fuzzy method, MLP, and SVM. Section 5 discusses the classification performance analysis and results; and the Section 6 is reserved for the conclusion.
Text mining is a variation on a field called data mining that tries to find interesting patterns from large databases.Few researches have been carried out on text data mining[8]. On the basis of these researches information retrieval techniques such as text indexing,text classification and text summarization methods have been developed to handle unstructured documents(Soundararajan et al,2014).