in Hadoopery

Restarting HBase Regionservers using JSON and jq

We run HBase as part of our Hadoop cluster. HBase sits on top of HDFS and is split into two parts: the HBase Master and the HBase Regionservers. The master coordinates which regionservers are in control of each specific region.

Automating Recovery Responses

We periodically have to do some minor maintenance and upkeep, including restarting daemons that have died. The first pass of restarts are handled by Nagios checks with eventhandlers. We have Nagios periodically connecting to the daemon status url provided on each HBase Master and HBase Regionserver. If that connection times out or is considered slow, then the eventhandler attempts to restart the daemon exactly once. If the daemon doesn’t recover, Nagios treats the daemon as down and notifies us. More often than not, this resolves the issue.

But, for cases where the HBase Regionserver daemon repeatedly dies in a short period of time (or just never restarts when the eventhandler triggers it), we can end up with a number of dead Regionservers that need to be manually restarted. So, how do we do that?

The naive is to login to each server, check to see if the HBase Regionserver is running and optionally restart it then. This can take a long time, depending on the number of servers in your cluster.

But, there’s a shortcut we could utilize. The HBase Master keeps track of all Regionservers that have ever contacted it. If one of those Regionservers dies and never connects back up, the Master places that Regionserver on a dead list. The dead list is published in the HBase Master status page. Unfortunately, it’s wrapped in a bunch of HTML and no one really wants to code something to unwrap that.

Further Cleanup using HBase JMX Metric Data

Alternatively, the HBase Master publishes that information in the JMX data. Let’s figure out how to access that and make use of it for more automated ways to cleanup dead Regionservers.

First, let’s access the JMX url. This will always live at /jmx on the HBase Master status port. It provides a lot of information. Go check out yours, then come back.

$ curl -s http://hbase-master:60010/jmx | wc
 178869 2000518 12915506

In our cluster, we’ve got a lot of tables, a lot of servers, and just a lot of metric information about what’s going on. You can see that the JSON returned from our master comprises 178 thousand lines of JSON. That’s a 12 megabyte JSON data structure. We want to parse that out from the command line so we can iterate through each section, but we don’t want to attempt to use a custom-designed scripting tool (such as using python, ruby, or perl with JSON modules).

Instead, we’re going to use a tool called jq. Think of em as sed for JSON data. You can use it to parse, query, and output specific parts of your JSON data for easier use on the command line. jq, like sed, can be as easy or as difficult as you want to make it. You should take a few moments to read through the tutorial and manual if you’d like to learn more.

Let’s begin with understanding how jq works.

All the HBase data lives in a top-level key called beans. Each major area of metrics lives in several data structures under that. Each data structure has a name key that defines what area of metrics that data structure covers. There are sections that cover Java memory statistics, logging information, buffer pool information, and HBase statistics.

Let’s figure out what every chunk is.

curl -s http://hbase-master:60010/jmx | jq '.beans[].name' | sort
"JMImplementation:type=MBeanServerDelegate"
"com.sun.management:type=HotSpotDiagnostic"
"hadoop:service=HBase,name=Info"
"hadoop:service=HBase,name=RPCStatistics-60000"
"hadoop:service=Master,name=Master"
"hadoop:service=Master,name=MasterStatistics"
"java.lang:type=ClassLoading"
"java.lang:type=Compilation"
"java.lang:type=GarbageCollector,name=ConcurrentMarkSweep"
"java.lang:type=GarbageCollector,name=ParNew"
"java.lang:type=Memory"
"java.lang:type=MemoryManager,name=CodeCacheManager"
"java.lang:type=MemoryPool,name=CMS Old Gen"
"java.lang:type=MemoryPool,name=CMS Perm Gen"
"java.lang:type=MemoryPool,name=Code Cache"
"java.lang:type=MemoryPool,name=Par Eden Space"
"java.lang:type=MemoryPool,name=Par Survivor Space"
"java.lang:type=OperatingSystem"
"java.lang:type=Runtime"
"java.lang:type=Threading"
"java.nio:type=BufferPool,name=direct"
"java.nio:type=BufferPool,name=mapped"
"java.util.logging:type=Logging"

Since we’re looking for dead Regionserver information, we probably want to look in the hadoop:service area. I know that this data lives in hadoop:service=Master,name=Master because I’ve previously looked through the full JSON output. You should spend some time looking at each section to figure out what useful information may be in each.

Let’s focus on this Master section.

One thing you’ll note is that the pipe character instead the jq command acts just like the pipe character would on a normal command line. You use it to filter the output of the left side of the pipe into the right side of the pipe.

Here, we want to look into the beans hierarchy and extract out the data structure that contains the hadoop:service=Master,name=Master data.

$ curl -s http://hbase-master:60010/jmx | \
  jq '.beans[] 
    | select(.name == "hadoop:service=Master,name=Master")'

This is going to print out information about the current Regionservers, what regions are assigned to those Regionservers, the load each region is seeing. This is all great and useful information if you want to know per-Regionserver statistics from the Master’s viewpoint. We only need a subset of the Master data that does not deal with the individual Regionservers.

Let’s have jq remove the Regionservers data so we can focus on the Master info.

$ curl -s http://hbase-master:60010/jmx | \
  jq '.beans[] 
    | select(.name == "hadoop:service=Master,name=Master") 
    | del(.RegionServers[])'
{
  "IsActiveMaster": true,
  "DeadRegionServers": [
    "hdn01.example.net:60020",
    "hdn13.example.net:60020",
    "hdn01.example.net:60020",
    "hdn20.example.net:60020",
    "hdn12.example.net:60020"
  ],
  "ZookeeperQuorum": "hdn01.example.net:2181,hdn01.example.net:2181,hdn01.example.net:2181",
  "RegionServers": [],
  "RegionsInTransition": [],
  "name": "hadoop:service=Master,name=Master",
  "modelerType": "org.apache.hadoop.hbase.master.MXBeanImpl",
  "ClusterId": "1777204b-2fba-49ab-ae93-e9f9a8bbe10b",
  "MasterStartTime": 1420585763679,
  "MasterActiveTime": 1420585914045,
  "Coprocessors": [
    "AccessController"
  ],
  "ServerName": "hbase-master.example.net,60000,1420585762914",
  "AverageLoad": 85.88349514563107
}

Immediately, you can see that the Master we’re talking to is the active one. You can see that we have a handful of dead Regionservers. You can see that we have some load on the cluster. Also, the RegionServers section is empty because we’ve purposefully deleted it out of the output with jq.

We’ve whittled down the data structure and discovered that what we want is the DeadRegionServers data structure. Let’s extract that further so we can use it.

:;  curl -s http://hbase-master:60010/jmx | \
  jq '.beans[] 
    | select(.name == "hadoop:service=Master,name=Master") 
    | .DeadRegionServers[] '  
"hdn01.example.net:60020"
"hdn13.example.net:60020"
"hdn01.example.net:60020"
"hdn20.example.net:60020"
"hdn12.example.net:60020"

We’ve extracted the list of dead Regionservers. You’ll note that there’s some extraneous stuff on here that needs to be cleaned up. We want to feed this host list into pdsh so we can distribute the ssh commands in parallel.

$ curl -s http://hbase-master:60010/jmx |   
  jq '.beans[] 
    | select(.name == "hadoop:service=Master,name=Master") 
    | .DeadRegionServers[] '  |\
  sed -e 's/:60020//' |\
  tr -d '"' |\
  tr '\n' , 
hdn13.example.net,hdn01.example.net,hdn20.example.net,hdn12.example.net,

Finally, this means we can do this to initiate the actual restarts.

$ host_list=$(curl -s http://hbase-master:60010/jmx |   jq '.beans[] 
    | select(.name == "hadoop:service=Master,name=Master") 
    | .DeadRegionServers[] '  |  sed -e 's/:60020//' |  tr -d '"' |  tr '\n' , )

$ echo $host_list
hdn13.example.net,hdn01.example.net,hdn20.example.net,hdn12.example.net,

$ sudo pdsh -w $host_list service hbase-regionserver restart
hdn12: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 17917 failed with status 1
hdn01: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 29969 failed with status 1
hdn13: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 18833 failed with status 1
hdn12: hbase-regionserver.
hdn01: hbase-regionserver.
hdn20: Restarting Hadoop HBase regionserver daemon: Stopping Hadoop HBase regionserver daemon: no regionserver to stop because kill -0 of pid 10849 failed with status 1
hdn20: hbase-regionserver.
hdn13: hbase-regionserver.
hdn12: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn12.out
hdn01: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn01.out
hdn13: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn13.out
hdn20: Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-hdn20.out
hdn12: hbase-regionserver.
hdn13: hbase-regionserver.
hdn01: hbase-regionserver.
hdn20: hbase-regionserver.

Finally … PROFIT!

We now have the basic outline of what we need to do to automate this recovery. For example, we could build a Nagios check around this to automatically restart dead Regionservers if any ever show up in the dead list. Or we could make it only restart if there are more than X percent of your cluster in a dead state. The point is, there’s a lot of information in the JMX JSON output that you can use to automate responses to. jq makes it easy to do for those tools or responses that don’t require a full-blown scripted environment to handle.

The JMX data is not limited to HBase daemons. We also use this within HDFS for tracking things in the Namenode and within the JobTracker to extract some data about jobs. If you’ve got other Hadoop daemons running, you should see if they export the JMX data for further interesting uses!

My Hadoop cluster data needs no RAID!

One of the operational challenges in introducing Hadoop to traditional IT and Enterprise operations is understanding when to break one of our sacred IT mantras: Thou shalt always RAID your data. Never shalt thou install a system without RAID. One shall be your RAID if thou seekest performance and redundancy sparing no expense. Five shall […]

Improving Hadoop datanode disk fault tolerance

By design, Hadoop is meant to tolerate failures in a responsible manner. One of those failure modes is for an HDFS datanode to go off line because it lost a data disk. By default, the datanode process will not tolerate any disk failures before shutting itself off. When this happens, the HDFS namenode discovers that […]

Rebooting Linux temporarily loses (some) limits.conf settings

Like any wildly managed environment, you probably have to create custom-defined settings in your /etc/security/limits.conf because of application specific requirements. Maybe you have to allow for more open files. Maybe you have to reduce the memory allowed to a process. Or maybe you just like being ultra-hardcore in defining exactly what a user can do. […]

Augeas made my grub.conf too quiet!

A reader contacted me after working through the examples in my last previous post on Augeas. He was having a difficult time figuring out how to add a valueless key back into the kernel command line. This was the opposite of what I was doing with quiet and rhgb Many thanks. I have been pounding […]

Using augeas and puppet to modify grub.conf

In my environment, we rely heavily upon Puppet to do large-scale automation and updating of our various systems. One of the tasks that we do on an infrequent basis is to modify grub.conf across many (or even all) systems to apply the same types of changes. In Puppet, there are several ways you can do […]

Running Hadoop data nodes from USB thumb drives?

I received an interesting question today from a reader regarding the use of USB thumb drives for the OS drives in Hadoop datanodes. Have you ever put the OS for a Hadoop node on a USB thumb drive? (or considered it) I have a smaller 8 node cluster and that would free up one of […]

Pig ‘local’ mode fails when Kerberos auth is enabled.

I ran across this interesting Kerberos authentication bug today on Cloudera’s CDH4. It appears to affect all versions of pig, but only when running in local mode. I want to run pig in local mode. This implies that pig fires up everything it needs to run the MapReduce job on your local machine without having […]

Followup on Cloudera HUE’s Kerberos kt_renewer

Just a short followup about the HUE kt_renewer issue I discovered. It turns out that the issue was me and not HUE. The fix turned out to be pretty simple once I saw the clue in a related issue. It seems like Cloudera Manager had the same issue. The problem ended up being a missing […]

Kerberos kt_renewer failures with HUE on CDH4

First off, I’m not exactly sure if this is a Hadoop User Environment (HUE) issue or if this is a broken setup on my Kerberos environment. I have a thread open on the HUE users list, but haven’t had any followup. I’ve just fired up HUE for the first time to talk with a kerberos-enabled […]