Mass-gzip files inside HDFS using the power of Hadoop

I have a bunch of text files sitting in HDFS that I need to compress. It’s on the order of several hundred files comprising several hundred gigabytes of data. There are several ways to do this. I could individually copy down each file, compress it, and re-upload it to HDFS. This takes an excessively long […]