Nt1330 Unit 5 Algorithm

459 Words1 Page

The training dataset S is a vector such that the data elements xi  S representing certain attributes or features of the sample dataset. This training data are from varying classes from where each sample is extracted. Any given node of the tree generated by C4.5 selects one attribute of the data that effectively divides the data set of samples (S) into subsets which can be assigned to one class or the other. It is the normalized information gain, which is the variation in entropy, which is expected to come from selecting an attribute that is used for splitting the data. The attribute factors that have with the highest normalized information gain is considered to make the decision. The C4.5 algorithm continues execution on the smaller sub-lists that have the next highest normalized information gain till the decision tree is completely formed.

3.2.4 Improved C4.5 Algorithm
1. Select dataset as an input to the algorithm for processing.
2. Create a given root node for the given tree
3. If all dataset are positive return the single-node tree root, with label = +
4. If all dataset are negative return the single-node tree root, with label = -
5. If attribute is empty, return the single-node tree Root, with label = most common value of Target_attribute in …show more content…

Each of the path ranging from the root of the tree to the leaf gives the condition that must be satisfied if a case is to be classified by that leaf. C4.5 generalizes this prototype rule by dropping any conditions that are irrelevant to the class, guided again by the heuristic for estimating true error rates. The set of rules is reduced further based on the MDL principle described above. There are usually substantially fewer final rules than there are leaves on the tree, and yet the accuracy of the tree and the derived rules is similar. Rules have the added advantage of being more easily understood by

More about Nt1330 Unit 5 Algorithm

Open Document