Apache Hadoop is the prominent Open Source framework scalable for processing gigantic datasets in distributed systems.
It is strongly recommended for Big Data implementations to store and process huge volumes of data and analyzes unstructured, multi-dimensional and complex data.
The Apache Hadoop Core is consisting of the following modules
These are Java libraries and utilities requisite for other Hadoop modules. This includes the components and implementations for the common I/O operations and utilities to handle the distributed filesystem.
It has implementations of abstraction over Java RPC and serialization required to be used in other Hadoop modules.
The Hadoop Common package also provides source code and documentation, as well as a contribution section that includes different projects from the Hadoop. Hadoop Common is also known as Hadoop Core.
Hadoop Distributed File System (HDFS)
The main purpose of Hadoop Distributed File System, HDFS is to store data consistently even in the presence of failures including NameNode failures, DataNode failures and network partitions.
The NameNode is a solo point of failure for the HDFS cluster and a DataNode stores data in the HDFS. HDFS uses a master/slave architecture in which one device-the master controls, track and manages one or more other devices-the slaves.
The HDFS cluster contains of single NameNode and a master server manages the file system namespace and controls access to files.
YARN is the basic prerequisite for Enterprise Hadoop Infrastructure, offering resource management and a central platform to provide reliable operations, security, and data governance tools across all clusters.
It also covers new technologies found within the data center so that they can take advantage of cost effective, linear-scale storage and processing. YARN allows different data processing engines including graph processing,
interactive processing, stream processing and batch processing to run, track and process data stored in HDFS.
MapReduce is a programming model appropriate for processing of huge data.
Hadoop is proficient for running MapReduce programs written in various languages:
MapReduce programs work in two phases: Map phase, Reduce phase. An input to each phase is key-value pairs.
MapReduce programs are parallel in nature and are excellent for performing large-scale data analysis using multiple machines in the cluster.
For details see Official web site of Hadoop.
Nub8 Big Data consultants offer following services related to Hadoopbase solutions:
- Hadoop Consulting
- Hadoop Architecture Design
- Hadoop development and implementation
- Hadoop integrations for third party solutions
- Hadoop Support and Maintenance