Essay On Feature Selection

2512 Words6 Pages

To the programming community the algorithms described in this chapter and their methods are known as feature selection algorithms. This theoretical subject has been examined by researchers for decades and a large number of methods have been proposed. The terms attribute and feature are interchangeable and refer to the predictor values throughout this chapter, and for the remainder of the thesis. In this theoretical way of thinking, dimensionality reduction techniques are typically made up of two basic components, [21], [3], [9].

Evaluation Criterion, a measure to assess the relevance of an attribute subset
Search strategy, a procedure to generate candidate subsets for reduction.

When an evaluation criterion is decided upon, feature selection becomes a combinatorial search problem. Many iterations identify possible subsets of attributes that can be used to correctly classify the data and reduce the dimensionality of the sample space. Search strategies are generally divided into three categories, [3], [21]:

Complete, (exhaustive, best first, branch and bound, beam, etc.)
Heuristic, (forward/backward sequential, greedy, etc.)
Stochastic, (simulated annealing, genetic algorithm, etc.)

The feature weighting method as described by Wettschereck [20] is different from feature selection in that it evaluates the attributes individually and assigns them weights based on a bias, rather than comparing them to a user-defined threshold which determines its relevance. Each of the algorithms explored in this chapter uses feature weighting as a means to update the importance of a particular attribute, referring to its impact on the classification problem.

2.1.1 Related Works

A theoretical framework presented by Koller and Sahami introd...

... middle of paper ...

...e predictor is ordered, that is when there is a logical order of the values associated with the attribute, we must split the node in a way that preserves the existing order of values. For m distinct values this would give m – 1 possible splits. For example, if age is the predictor and the available values are 18 to 21 then there are four distinct ordered values. Thus, m = 4 and there are 3 possible splits that maintain order. They are, 18 versus 19-21, 18-19 versus 20-21, and 18-20 versus 21, [1].

Categorical predictors do not have a requirement pertaining to order. This gives a categorical variable with k categories (2¬k-1 – 1) possible splits, making the computational burden much heavier. There are also no restrictions on how a categorical predictor is split, but the theoretical workings of a categorical predictor are peripheral to the content within this thesis.

Open Document