Google File Systems (GFS) and Big Data Technologies (Non-Relational Databases)

695 Words2 Pages

Google File Systems (GFS) is developed by Google to meet the rapidly growing demand of Google’s data processing needs. On the other hand, Hadoop Distributed File Systems (HDFS) developed by Yahoo and updated by Apache is an open source framework for the usage of different clients with different needs. Though Google File Systems (GFS) and Hadoop Distributed File Systems (GFS) are the distributed file systems developed by different vendors, they have been designed to meet the following goals:

 They should be able to run on inexpensive commodity hardware without any failures.

 The systems should be able to manage huge files efficiently.

 The system should be scalable, have high throughput and should be reliable.

 The system should be able to support large streaming reads and also support for concurrent large appends to the same file.

The common and distinguishing features of Google File Systems (GFS) and Hadoop Distributed File Systems (GFS) are as follows:

The GFS file content is divided into 64MB chunks, with each chunk having 64KB blocks. A chunk is identified by its handle called the chunk handle and each chunk is replicated thrice by default. Each block in a chunk consists of a 32-bit checksum. The HDFS file content is divided into 128MB blocks. A node called the namenode holds the blocks replica as two files, one for data and another for checksum and stamp generation.

In GFS client accepts the request for a read operation and sends the request to the master, the master then generates a chunk handle and replica location and sends it back to the client. The client makes use of this information to get the required data from the replicas. The replicas are divided into primary and secondary while performing a write oper...

... middle of paper ...

...tions of the system.

GFS and HDFS have a special functionality called snapshot by which it can make a copy of a quickly at any time. This is similar to copy-on-write functionality of Andrew File Systems (AFS).

GFS aims at reducing real-time to big batch operations and HDFS aims at development of real secondary namenode – Facebook’s Avatar node.

REFERENCES

1. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, “The Hadoop Distributed File System”, http://storageconference.org/2010/Papers/MSST/Shvachko.pdf

2. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System”, http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/gfs-sosp2003.pdf

3. Dhruba Borthakur, ”The Hadoop Distributed File System: Architecture and Design”, http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf

More about Google File Systems (GFS) and Big Data Technologies (Non-Relational Databases)

Open Document