Operational AI WIKI

All 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Hadoop

Apache Hadoop is a collection of open-source software utilities that allow users to employ a network of many computers in order to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and the processing of big data using the MapReduce programming model. Hadoop is made up of 3 main components: 1) HDFS: the bottom layer component for storage. 2) YARN: for job scheduling and cluster resource management. 3) MapReduce: for parallel processing. In a bit more detail, HDFS: The Hadoop Distributed File System — is the primary data storage system used by Hadoop applications. It employs a NameNode and DataNode architecture to implement a distributed file system that provides high-performance access to data across highly scalable Hadoop clusters. HDFS breaks up files into chunks and distributes them across the nodes of the cluster. YARN: Yet Another Resource Negotiator — is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored in a single platform, unlocking an entirely new approach to analytics. MapReduce: is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).

Other Occurrences

Related Terms

WIKI Contribution

  • Have a new word to add?
  • Have an updated definition?
Become a contributor