Cluster
A collection of objects in a data set that typically share similar characteristics and are different from other clusters in the database; clustering is useful as a data-preprocessing step. A Hadoop cluster is known as “nodes.”
Hard clustering is where a data point can only belong to one cluster whereas in soft clustering is the probability a data point belongs to each predefined cluster.