Fix under replicated blocks in HDFS
In this tutorial, let’s look into, How to Fix under replicated blocks in HDFS?
How to check any under replicated, missing or corrupted blocks in the cluster?
You can check all the blocks information from Namenode UI or with the help of fsck command.
-
Namnode UI – http://<namenode-IP>:50070/dfshealth.html#tab-overview
2. Using fsck command
hdfs fsck /
From the above two methods, you will be able to find the Hadoop cluster health status.
Now let’s see how to get the list of Under replicated block files.
#grep and extract for the file name only bash$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}'
Once you get the list of under replicated blocks/files, then you can set the replication factor of those files to 3, which is a default.
#Set replication bash$ hadoop fs -setrep 3 /file_name
Using the below command and write the results to the file. Then trigger replication in a loop for all the files.
bash$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under-replicated-files bash$ for hdfsfile in `cat /tmp/under-replicated-files`; do echo "Trrigering replication for $hdfsfile :" ; hadoop fs -setrep 3 $hdfsfile; done
Note: Using the direct file name will save time and gives less load to HDFS. There is an another way, you can blindly trigger with root(/) directory. But internally Hadoop will check for all the files and trigger replication if it doesn’t meet the criteria. This will be a costly job and consumes more of cluster resources.
#Trigger blindly on root folder bash$ hadoop fs -setrep 3 /