[ Data Storage in Hadoop cluster ]
This is a question from a hadoop book and the answer i thougt was 200 but that is not correct.Can anyone explain?
Assume that there are 50 nodes in your Hadoop cluster with a total of 200 TB (4 TB per node) of raw disk space allocated HDFS storage. Assuming Hadoop's default configuration, how much data will you be able to store?
HDFS has the default
replication level set to 3, therefore, each of your data would have 3 copies in HDFS unless specified clearly at the time of creation.
Therefore, under the default HDFS configuration, you could only store 200/3 TB of actual data.