How Hadoop Saved the World

2964 Words6 Pages

Table of Contents

List of Figures 3

Literature Review 4

History of Hadoop Technology 4

Applications of Hadoop 6

Main Components of Hadoop 6

MapReduce. 6

Map Step. 7

Reduce Step. 8

Hadoop Distributed File System (HDFS) 8

Advantages and Disadvantages of using Hadoop 11

Advantages. 11

Disadvantages. 11

Competitors to the Hadoop Technology 12

Conclusion 13

References 15

List of Figures

Figure 1: MapReduce Programming Model 7

Figure 2: HDFS architecture 9

Figure 3: HDFS Operation Process 10

Literature Review

Hadoop is a free, java based programming framework that usually supports processing of large data sets that are in diverse computing environment. Hadoop is a cluster computation framework. Apache Hadoop is a programming framework that provides support to data intensive distributed applications with a free license. The program was inspired by Google file and Google’s MapReduce system (Eadline, 2013). According to Eadline (2013), the Hadoop technology was designed to solve various problems such as to provide a fast and reliable analysis for both complex and clustered data. Consequently, different organizational enterprises deployed Hadoop with the existing IT systems, thereby, allowing them to combine old and new data in a strong framework. The major industrial players who used Hadoop technology include IBM, Yahoo, and Google (Lavalle, Lesser, Shockley, Hopkins & Kruschwitz, 2011).

History of Hadoop Technology

Hadoop technology was created and developed by Doug Cutting who is recognized as the brain behind the Apache Lucene, a popularly known text search library. Originally, Hadoop had its source from Apache Nutch which was an Open Source search engine and also formed part of different Lucene projects. According to the project creator, Doug Cutting, the name Hadoop was not an acronym but just a makeup name. For instance, the ‘contrib’ module and other subprojects provided names that were basically unrelated to the functions of the program (Krishnan, 2013).

According to Lynch (2008), creating a web based search engine from scratch was an ambitious objective for the software requirement and the index website. The process of developing the system was costly but Doug Cutting and Mike Cafarella believed it was worth the cost. The success of this project unlocked the ultimately democratized algorithm of search engine system. After the success of this project, Nutch was started in 2002 as a working crawler and gave rise to the emergence of various search engines.

However, the developers came to realize that the Nutch systems architecture could not scale up the billions of pages available on the web. In 2003, the publication of Google File System (GFS) described the architecture as one of the most productive web hosting’s storage that is needed for the mega files generated.

More about How Hadoop Saved the World

Open Document