What´s Hadoop Cluster?

1481 Words3 Pages

A Hadoop cluster consists a single master node and multiple worker nodes. The master node includes a JobTracker, TaskTracker, NameNode and DataNode. A slave or node acts as both DataNode and TaskTracker, though it could have data-only slave nodes and compute-only slave nodes.Hadoop requires JRE 1.6 (Java Runtime Environment)or higher. The standard shutdown scripts and start-up require Secure Shell to set up among nodes in the cluster.

In a larger cluster,an extra NameNode called secondary NameNode is configured to avoid single point of failure .HDFS is managed with dedicated NameNode to host the file system index, and secondary NameNode can generate snapshots of namenode's memory structures.In this way ,it preventing file-system errors or corruption and reducing data loss. Similarly, job scheduling can manage through a standalone JobTracker server. In clusters,the Hadoop MapReduce engine deployed against an alternate file system, the NameNode, DataNode,secondary NameNode.
HDFS is a Master/Slave architecture,contains one Master node called NameNode and slaves or workers node called Datanodes,usually one per node in the cluster. Which manage storage attached to the nodes that they run on.
Master server manages the namespace file system and controls the access to files by clients.HDFS has a file system namespace and user can store data in to the files. Internally, a file is divided in to a number of blocks stored in DataNodes. The Namespace operations like open, close, and rename of files and directories are executed by Namespace. Which also determines blocks mapping to DataNode.Read and write requests from the file system’s clients are the responsibility of DataNodes. The DataNodes also doing creation, deletion, a...

... middle of paper ...

... and scale an Apache Hadoop cluster within 10 minutes.
Deploy clusters with MapReduce,HDFS, Hive,Hive server and , Pig.Fully customizable configuration profile.This includes dedicated machines or share with other work load, DHCP network or Static IP and local storage and shared one.

Speed up time to insight for Upload/download data, run MapReduce job, Pig and Hive scripts from Project Serengeti interface.Through existing tools user can consumes data in HDFS through Hive server SQL connection.On demand Elastic scalability like separate compute node without losing locality of data.Also scale out and decommission of compute nodes on demand.Improved availability for Apache Hadoop cluster.Which includes Highly Available NameNode and JobTracker to avoid single point of failure,Fault-tolerance (FT) of JobTracker and NameNode ,and one click HA for Pig,Hbase and Hive.S

More about What´s Hadoop Cluster?

Open Document