in DevOps, Hadoopery

Hadoop DataNode logs filling with clienttrace messages

So, you’re probably like me.

You have a shiny, new Cloudera Hadoop cluster. Everything is zooming along smoothly. Until you find that your /var/log/hadoop datanode logs are growing at a rate of a bazillion gigabytes per day. What do you do, hot shot? WHAT DO YOU DO?

Actually, it’s pretty simple.

We were getting alerts on our cluster that /var was filling across various datanodes (where we also happen to have HBase running). We were doing about four gigabytes a day just in the datanode logs alone. That seemed excessive. Peering through the logs, we found that a large percentage of entries looked something like this:

2012-03-26 23:59:57,411 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.1.134:50010, dest: /10.1.1.114:44972, bytes: 20548, op: HDFS_READ, cliID: DFSClient_1020093912, offset: 0, srvID: DS-1061365566-10.1.1.134-50010-1331849447734, blockid: blk_1530028621327446680_9137244, duration: 309000
2012-03-26 23:59:57,415 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.1.134:50010, dest: /10.1.1.108:40058, bytes: 27507, op: HDFS_READ, cliID: DFSClient_345923441, offset: 0, srvID: DS-1061365566-10.1.1.134-50010-1331849447734, blockid: blk_4340309148758316988_10579186, duration: 909000
2012-03-26 23:59:57,415 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.1.134:50010, dest: /10.1.1.114:44973, bytes: 23734, op: HDFS_READ, cliID: DFSClient_1020093912, offset: 0, srvID: DS-1061365566-10.1.1.134-50010-1331849447734, blockid: blk_4111106876676617467_8649685, duration: 568000
2012-03-26 23:59:57,423 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.1.134:50010, dest: /10.1.1.119:45662, bytes: 20840, op: HDFS_READ, cliID: DFSClient_-635119343, offset: 0, srvID: DS-1061365566-10.1.1.134-50010-1331849447734, blockid: blk_4655242753989962972_6418861, duration: 310000
2012-03-26 23:59:57,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.1.134:50010, dest: /10.1.1.108:40061, bytes: 17728, op: HDFS_READ, cliID: DFSClient_345923441, offset: 0, srvID: DS-1061365566-10.1.1.134-50010-1331849447734, blockid: blk_6737196232027719961_7900492, duration: 1071000
2012-03-26 23:59:57,470 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.1.1.134:50010, dest: /10.1.1.105:53569, bytes: 19998, op: HDFS_READ, cliID: DFSClient_-1106211444, offset: 0, srvID: DS-1061365566-10.1.1.134-50010-1331849447734, blockid: blk_801375605494816473_8641680, duration: 856000

Doing a handy-dandy google search, we found this thread discussing the very problem we were seeing.  Looks like this is some sort of performance data emitted by the datanode for  blocks associated due to how the HBase META region interacts with it.

No big deal.

The fix?  We’ll just turn down logging on this specific java class.  We achieve this by doing the following.  This fixes the runtime logging that the datanodes are doing so we don’t have to restart the datanodes right now.

$ sudo -u hdfs hadoop daemonlog -setlevel localhost:50075 org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace WARN

Later, we’ll add the following to /etc/hadoop/conf/log4j.properties on all datanodes and restart them.

log4j.logger.org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace=WARN

Travis Campbell
Staff Systems Engineer at ghostar
Travis Campbell is a seasoned Linux Systems Engineer with nearly two decades of experience, ranging from dozens to tens of thousands of systems in the semiconductor industry, higher education, and high volume sites on the web. His current focus is on High Performance Computing, Big Data environments, and large scale web architectures.