Naïve Bayesian Classification Essay

1471 Words3 Pages

B. Naïve Bayesian Classification In machine learning, Naive Bayesian Classification is a family of a simple probabilistic classifier based on the Bayes theorem (or Bayes’s rule) with Naive (Strong) independence assumption between the features. It is one of the most efficient and effective classification algorithms and represents a supervised learning method as well as a statistical method for classification. Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. In other words Naïve Bayesian classifiers assume that there are no dependencies amongst attributes. This assumption is called class conditional independence. It is made to simplify the computations …show more content…

P(X) is the prior probability of X [20]. The naïve Bayesian classifier, or simple Bayesian classifier, works as follows: 1. Let D be a training set of tuples and their associated class labels. As usual, each tuple is represented by an n-dimensional attribute vector, X = (x1, x2... xn), depicting n measurements made on the tuple from n attributes, respectively, A1, A2... An. 2. Suppose that there are m classes, C1, C2... Cm. Given a tuple, X, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if P(Ci |X) > P(Cj |X) for 1 ≤ j ≤ m, j ≠ i. Thus we maximize P(Ci |X). The class Ci for which P(Ci |X) is maximized is called the maximum posteriori hypothesis. By Bayes’ theorem (Equation (6.10)), P(Ci |X) = P(X|Ci)P(Ci) /P(X) . …show more content…

Given data sets with many attributes, it would be extremely computationally expensive to compute P(XjCi). In order to reduce computation in evaluating P(XjCi), the naive assumption of class conditional independence is made. This presumes that the values of the attributes are conditionally independent of one another, given the class label of the tuple (i.e., that there are no dependence relationships among the attributes). Thus, P(X|Ci) = n ∏ k =1 P(xk|Ci) = P(x1|Ci)×P(x2|Ci)ו•• ×P(xn|Ci). We can easily estimate the probabilities P(x1|Ci), P(x2|Ci),..., P(xn|Ci) from the training tuples. Recall that here xk refers to the value of attribute Ak for tuple X. For each attribute, we look at whether the attribute is categorical or continuous-valued. For instance, to compute P(X|Ci), we consider the following (a) If Ak is categorical, then P(xkjCi) is the number of tuples of class Ci in D having the value xk for Ak, divided by jCi,Dj, the number of tuples of class Ci in D (b) If Ak is continuous-valued, then we need to do a bit more work, but the calculation is pretty straightforward. A continuous-valued attribute is typically assumed to have a Gaussian distribution with a mean μ and standard deviation s, defined

Open Document