The Importance Of Data Integration In Data Mining

Good Essays
Data Integration:

Data Mining requires data integration which means the merging of data from different multiple data stores. We can reduce by careful integration and we can avoid redundancies and inconsistencies in the resulting dataset. Due to this reduction, it helps in improving the accuracy and speed of the subsequent data mining process. The semantic heterogeneity and the structure of data will have great challenges in data integration.

Entity Identification Problem:

The data analysis task will helps in involving the data integration, which combines the data from multiple sources into a coherent data store. The various different multiple sources may include multiple databases,
…show more content…
The metadata will also be used to help in transforming the data. When we match attributes from one database to another during the process of data integration, a very important one is to be noticed in the structure of the data. This is to ensure that any attribute functional dependencies and referential constraints in the source system should match those in the target system.

Redundancy and Correlation Analysis:

In data integration, one of the most important issues is redundancy. An attribute should be redundant if it is derived from another attribute or set of attributes. It also cause redundancies in inconsistencies in attribute or dimension naming. Some of the redundancies can also be detected by correlation analysis.
For example if we give two attributes, such analysis can measure how strongly one attribute implies the other, based on the available data. For nominal data, we use the χ2 (chi-square) test. For numeric attributes, we can use the correlation coefficient and covariance, both of which access how one attribute’s values vary from those of another.

…show more content…
Another source of data redundancy is the use of de normalized tables. Inconsistencies often arise between various duplicates, due to inaccurate data entry.

Data Value Conflict Detection and Resolution:

Data integration also involves the detection and resolution of data value conflicts. For example, if we take the real-world entity, attribute values from various different sources may differ. These differences may be due to in representation, scaling, or encoding. Attributes may also differ on the abstraction level, where an attribute in one system is recorded at, a lower abstraction level than the same attribute in another.

Association Rule:

The most important goal of association rule is to find the extraction correlation relationships in the large datasets of items.

Association rule mining:

Finding frequent patterns is called associations, among sets of items or objects in transaction databases, relational databases, and other information repositories.

Association Rule(Data
Get Access