What is Hadoop?

By: Emiley J Viewed: 153598 times  Printer Friendly Format    

Multitude of user generated content in social websites and other sources such as Internet of Things give rise to accumulation and storage of massive amounts of data termed rightly as Big Data. Big Data related technologies are still a work in progress and continues to mature as the big players like IBM, Microsoft, Google are working on various tools and technologies to handle Big Data.

Besides these big players, there are Open Source of alternatives such as The Apache Hadoop software which is actually a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

Even Yahoo and Facebook have implemented Hadoop to manage their PetaBytes of storage which keeps increasing each day. This is proof enough that Hadoop is a proven technology that has been tested.

No doubt that Hadoop implementations of this scale requires skilled programmers and engineers. However there are innovative services available now from companies such as xplenty who offer Hadoop as a Service. By using Hadoop as a Service, there is no waiting for a system to be built. Resources can be diverted to other areas directly related to the business plan instead of building and maintaining a data center. Your money is saved and the results are immediate.

Most Viewed Articles (in Trends )

Latest Articles (in Trends)

Comment on this tutorial