The Google File System

1128 Words3 Pages

This paper proposes the Google File System (GFS). They introduced GFS to handle Google's massive data processing needs. GFS considers the following goals: higher performance, scalability, reliability and availability. However, it's not easy to reach these goals, there are many obstacles. Thus, in order to tackle challenges, they have considered using constant monitoring, error detection, fault tolerance, and automatic recover to tackle component failures that can affect the system's reliability and availability. The need to handle bigger files is becoming very important because data is keep growing radically. Therefore, they considered changing I/O operation and block sizes. They also consider using appending operations rather than overwriting to optimize the performance and assures atomicity. They also considered flexibility and simplicity when designing GFS. GFS supports the following operations: open, close, read, write, create, delete, snapshot(create a copy of a file), and record append(multiple users append data to the same file at the same time). They have made six assumptions when designing GFS. First, the system should be able to detect, sustain and recover from components failures. Second, larger files is the trend today and should be managed effectively. Third, read operations are performed many times so they should consider sorting the small reads to enhance performance. Fourth, the trend now is writing large files that are usually not modified but appended so they consider appending operation instead of updating or overwriting. Fifth, since multiple clients could read from the same file at the same time, there should be defined semantics for that. Sixth, they considered that high stable bandwidth is more importa... ... middle of paper ... ...the primary master is not working. OFS ensures data integrity by performing checksum to detect corrupted files. GFS also has diagnostic tools to debug and isolate problems and analyze performance. GFS design and implementation team has measured GFS by conducting three experiments. They are Micro-benchmarks, real world clusters, and workload breakdown. They have tried to approach all the bottlenecks. While designing and deploying GFS, the GFS team has faced operational and technical issues. Some of the issues were disk and Linux related problems. GFS provides location independent namespace, replications, and high fault tolerant. However, GFS doesn't provide caching. In conclusion GFS is good for day to day data processing rather than instant transaction such as online banking transactions. The GFS team has stated the GFS has met Google's storage needs.

Open Document