Analyzing The Id3 Algorithm For Reading Data Stored On Multiple Data Sources

Analyzing The Id3 Algorithm For Reading Data Stored On Multiple Data Sources

Length: 840 words (2.4 double-spaced pages)

Rating: Better Essays

Open Document

Essay Preview

This project implements the ID3 algorithm for reading data stored in multiple data sources. It comes under the broader topic of data mining. Data mining is the reading and processing of useful data from different sources. Essentially, the process of hunting for required or useful data contained in a large database is characterized as data mining. In the case of logical outcomes, a decision tree is predominantly used for analysis. The advantages of using a decision tree are that it is easier to model, analyse, and manipulate accordingly. The ID3 algorithm is used to generate a decision tree from a certain set of data.
The ID3 algorithm constructs a decision tree depending on the given dataset. The branches and nodes are characterized by specific logical outcomes featured in the dataset. The speaker identifies two important terms: information gain, and entropy. Entropy is derived from Information Theory and is described as the average of the information embedded in each message at the receiver. Informally, entropy is intuitively understood as impurity and the information content is directly proportional to entropy. This means that, higher the entropy, the information content is higher. The change in information entropy from one distinct state to another is termed as information gain. The aim of constructing a decision tree is to find the attribute that returns the highest information gain.
The presenter explains that the ID3 algorithm accepts training data and attributes list as input and returns a decision tree as output. The procedure for the ID3 algorithm may be summarised in the following points. Initially, the entropy is calculated for each attribute in the dataset. The attribute with minimum entropy is used as reference and ...

... middle of paper ...

... It is commonly utilised by the machine learning community for learning and analysing algorithms and as a source of data sets.
The implementation involves an example of “Whether to play Tennis”. It consists of various factors such as temperature, humidity, and weather. Each attribute is tagged to a row number termed as “rownum”. Based on the combinations of the different factors, a column of “Whether to play Tennis” has a binary option of “Yes” or “No”.
The speaker then concludes the presentation by stating that this project builds a decision tree using the ID3 algorithm and derives a set of rules. The primary focus is for data stored across multiple SQL server databases. It is also worthy to mention the importance of validating the attributes and pruning the decision tree for a complex model. Results may not possess coherence if these factors are not taken care of.

Need Writing Help?

Get feedback on grammar, clarity, concision and logic instantly.

Check your paper »

Essay Modern Business Tools: Data Mining and Data Warehousing

- Data mining and Data warehousing are used daily in a wide variety of contexts. In today’s corporate world, decisions must be made rapidly and with the maximum amount of knowledge. Data warehousing is the process in which data from multiple sources is combined and stored in one common database (Gutierrez). The fundamental concept of data warehousing is the distinction between data and information. Data is observable and recordable facts but only comes to have value when it is presented as information....   [tags: distinction between data and information]

Better Essays
886 words (2.5 pages)

Big Data And Its Impact On The Business Intelligence Domain Essay

- The ability to harness the ever increasing amounts of business-related data will enable us to understand what is happening in the world. In this context, ‘Big Data’ is one of the biggest buzzwords these days [1] and it is going to impact on the Business Intelligence domain. In particular, generating huge metadata (e.g. trust, security, and privacy) for imbuing the business data with additional semantics, the adoption of social media, the digitalization of business artifacts (e.g. files, documents, reports, and receipts), and using sensors (e.g....   [tags: Business intelligence, Data warehouse]

Better Essays
707 words (2 pages)

Big Data and Traditional Databases Essay

- Big Data Big Data is a popular phrase used to describe a massive amount of both structured and unstructured data. Big data is difficult to process with traditional database and software techniques because of large quantity of data. Volume, velocity, variability and variety are three characteristics of Big Data. • Volume: Big data implies vast volumes of data. These data is generated by machines, networks and social media the volume of data to be analyzed is massive. Volume refers to the amount of data to be handled....   [tags: velocity, variety]

Better Essays
704 words (2 pages)

Business Intelligence Suite And Oracle Exalytics Essay

- In the past number of years data has grown exponentially. This growth in data has created problems that and a race to better monitor, monetize, and organize it. Oracle is in the forefront of helping companies from different industries better handle this growing concern with data. Oracle provides analytical platforms and an architectural platform to provide solutions to companies. Furthermore, Oracle has provided software such as Oracle Business Intelligence Suite and Oracle Exalytics that have been instrumental in organizing and analyzing the phenomenon known as Big Data....   [tags: Data, Data analysis, Facebook]

Better Essays
982 words (2.8 pages)

Ethical Issues Raised by Data Mining Essay

- Ethical Issues Raised by Data Mining Data mining is the practice of gathering data from various sources and manipulating it to provide richer information than any of contributing sources is able to do alone or to produce previously unknown information. Businesses and governments share information that they have collected with the purpose of cross-referencing it to find out more information about the people tracked in their databases. Data mining has many benefits. Stores are able to stock merchandise that better reflects what customers want....   [tags: Databases Technology Computers Essays]

Better Essays
3397 words (9.7 pages)

Database Management Systems and Information Retrieval Systems Essay

- TABLE CONTENT NO Topic 1) Differentiate between database management system and information retrieval system by focusing on their functionalities. 2) Highlight the differences between data and information. 3) Make appropriate use of diagrams to illustrate the underlying concepts/components of database management system and information retrieval system. 4) Differentiate between structured and non structured data. 5) references QUESTION 1 Database is a collection of programs that enables users to create and maintain data....   [tags: structured and non structured data]

Better Essays
1826 words (5.2 pages)

Smart Devices and Standards to Communicate with Each Other Essay

- The sheer number of vendors, technologies and protocols, that each of the smart devices currently produced use, make it challenging for them to communicate with each other. The lack of consensus on how to use standards, protocols to allow smart objects to connect and collaborate with each other is a challenging proposition as it will be tough to integrate applications and devices that work using different network technologies and on different networks. Further, organizations also need to find ways to ensure that smart devices can interact and work with multiple services....   [tags: protocols, data, applications ]

Better Essays
930 words (2.7 pages)

Essay about Digital Evidence and Forensics

- INTRODUCTION: With the increase use of computers to commit crimes and growing demand for computer-based data in civil proceedings, crimes developed rapidly for forensics experts to extract useful information from computer evidence. The field of digital forensics has evolved to allow security professionals to examine evidence from the increasing plethora of digital devices to help determine what individuals might have done in the past. Some of the digital crimes are cyber stalking, Internet fraud, Nigerian scam letter, Phishing, Identity Theft etc....   [tags: computer-based-data]

Better Essays
751 words (2.1 pages)

Database Management System and Retrieval Information System Essay

- Database is a collection of programs that enables users to create and maintain data. The other meaning, database some collection the related files that are usually included, concurrent referenced to one another. Good feature of a database is that data and records contains some in different files can be easily organized and save from harm in danger when using specialized database management software called a database management system (DBMS) or database manager. A database-management system (DBMS) is a group of organized data and a set of programs have access to some data....   [tags: collection of data, related files, storage]

Better Essays
1788 words (5.1 pages)

Strategies for Analyzing and Using Multiple Sources of Data for Instructional Improvement

- Teaching a student was once just that; teaching. The past ten years have required a much more formal analysis from teachers about how their students are doing and what they, as teachers, should do about it. Teachers have now become students of data. They must prepare students for tests (usually standardized), and, a few months later, teachers receive results from those tests. It is at that point that teachers should learn from the data that is provided to them via test results. The teachers are given data, but how should they use it....   [tags: Education]

Better Essays
856 words (2.4 pages)