The Importance Of Big Data

997 Words2 Pages

.1 Big data

Big data[1] is described as huge massive amount of data which is complex , diverse, heterogeneous in nature. These data are difficult to process, analyze and store by traditional systems[4].

According to review conducted on Big data[1], It is studied that nowadays we are awashed with digital data. Till 2003, 5 exabytes of data was created, but today this figure can be achieved in 2 days. Volume of data is increasing in such a pace that it would reach to 8 zettabytes in coming Year. Currently, In order to store to all world's data, it would requires billion of powerful computers. A vast variety of semi structured or unstructured data is created by social media. 10 billion text messages are sent by mobile subscribers daily. Velocity with which videos, audios, tweets, posts, emails, social interactions are created is really unprecedented. Due to increase in massive amount of data, it is estimated that information would increase by 50 times in the coming 10 years. Big data have its importance in many fields like storing logs in IT industries, storing and analyzing disease pattern in healthcare, digital data optimization, Social media interactions, demand forecasting and risk reduction in financial institutions. But security and privacy is considered as main issue for big data. So it needs to be tackle by implementing framework which includes authentication or cryptographic secure system.

2.2 MapReduce

At the time when we are deluged with data, parallel processing of tasks has become essential to process huge amount of data in reasonable time. MapReduce[8] is a programming tool developed by Google to process large datasets running on parallel computers like commodity clusters. The main idea of the MapReduce mode...

... middle of paper ...

...ons i.e., it doesn't move to next stage until all the tasks of previous stage are completed . but this property causes performance degradation and makes it difficult to support online processing. An incremental MapReduce framework is developed, which processes data like streaming engines. Each task runs continuously with a sliding window. Their system generates outputs of MapReduce by reading the items within the window.

e) Performance Tuning: MapReduce programs are used for data analysis mostly. In order to complete these programs in reasonable time, it is best to provide the feature of automatic optimization for MapReduce programs. An static analysis approach called MANIMAL for automatic optimization of a single Map- Reduce job is proposed. In their approach, an analyzer examines program codes before execution without any runtime information.

2.3 HBase

More about The Importance Of Big Data

Open Document