File compression is always better to store large amount of data in HDFS, it reduces significant amount of space, when a raw data is compressed, in Hadoop we use various IO Compression codecs depends on the scenario and data access frequency etc. There are various compression types available and below are the default codecs supported by Hadoop.
Type | Codecs |
---|---|
gzip | org.apache.hadoop.io.compress.GzipCodec |
bzip2 | org.apache.hadoop.io.compress.BZip2Codec |
LZO | com.hadoop.compression.lzo.LzopCodec |
Snappy | org.apache.hadoop.io.compress.SnappyCodec |
Deflate | org.apache.hadoop.io.compress.DeflateCodec |
-
Manual configuration of IO Compression Codecs.
If you want to add new I/O compression library. you can add the following codes property in the Hadoop “core-site.xml” config file. Multiple codes can be added by comma separated value. usually hadoop core-site.xml file is present under “/etc/hadoop/conf/” directory.
<property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.DeflateCodec, org.apache.hadoop.io.compress.SnappyCodec, org.apache.hadoop.io.compress.Lz4Codec </value> </property>
-
Configure IO Compression Codecs using Cloudera Manager.
- Login to Cloudera Manager
- Navigate to HDFS service
- Click Configuration and search for io.compression.codes
- click the + sign and add the codes in the new text box [Find the screen shot below].
Cloudera IO Compression codecs
For Enabling LZO codes requires installation, that we will see in the next blog post.
Great content useful for all the candidates of Hadoop Training who want to kick start these career in Hadoop Training field.
Nice Article
Online Digital Marketing Courses in Hyderabad
Great Article. Thanks for sharing info.
Digital Marketing Course in Hyderabad