in Hadoopery

HBase Motel: SPLITS check in but don’t check out

In HBase, the Master process will periodically call for the splitting of a region if it becomes too large. Normally, this happens automatically, though you can manually trigger a split. In our case, we rarely do an explicit region split by hand.

A new Master SPLIT behavior: let’s investigate

We have an older HBase cluster running on top of Cloudera CDH4.4 in an isolated environment. We ran into a new behavior yesterday while investigating some slow responses on the HBase Master. We’re not sure if the slow behavior was caused by this new behavior. The logs were streaming warnings about region split attempts on non-existent regions.

2015-02-05 15:55:01,834 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region a08b16c6bcf66eb7aa9f22dc292d7593 not found on server hdn08.example.net,60020,1418346137367; failed processing
2015-02-05 15:55:01,834 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for region a08b16c6bcf66eb7aa9f22dc292d7593 from server hdn08.example.net,60020,1418346137367 but it doesn't exist anymore, probably already processed its split

Looking back through all of our available logs, this event was occurring five to six times a second for as long as we had logs. A curious behavior indeed.

A Google search led us to several fruitful JIRA issues and discussion threads.

The three discussions point out that when the HBase Master gets stuck in this loop, that there is a region consistently listed as an Unassigned Region. According to Apache HBase Assignement Manager Improvements, there are multiple states that a region can be in culminating in the region being assigned (open) or unassigned (offline). If a region isn’t assigned, the Master believes that no current Regionserver is managing it. Aside: that blog entry from Cloudera has a good overview of the Region States and how they transition. Definitely worth a read.

What does the Zookeeper say?

HBase uses ephemeral Zookeeper nodes to keep track of whether or not a region is unassigned.

If you look at your Zookeeper cluster, under the /hbase/unassigned path, you might see unassigned regions. We looked and found one matching the region being split.

[zk: hbase-master:2181(CONNECTED) 2] ls /hbase/unassigned
[a08b16c6bcf66eb7aa9f22dc292d7593]

Of course, we wanted to see if what was in that ephemeral node.

[zk: hbase-master:2181(CONNECTED) 25] get /hbase/unassigned/a08b16c6bcf66eb7aa9f22dc292d7593

cZxid = 0x2d01c6c2a9
ctime = Thu Dec 11 19:02:51 CST 2014
mZxid = 0x2e02ed59c4
mtime = Thu Feb 05 15:55:05 CST 2015
pZxid = 0x2d01c6c2a9
cversion = 0
dataVersion = 41023576
aclVersion = 0
ephemeralOwner = 0xd3493f9af7bf3f77
dataLength = 387
numChildren = 0

Looks like we have something actively holding open the ephemeral node for this unassigned region.

In theory, you can associate the ephemeralOwner by having ZooKeeper dump the session state using one of the ZooKeeper “Four Letter Words” for status monitoring like the following. Note: I didn’t actually do this because I had a good hint from the original HBase Master log that it was a specific HBase Regionserver.

$ echo dump | nc hdn08.example.net 2181 | less

One thing to note in that ephemeral node information is the ctime (creation time). That ephemeral node was created over two months ago! Holy crap. That means we’ve been in this state potentially for awhile and not noticed it.

Sanity check: does the region even exist?

We looked on HDFS in /hbase/datatable.prod to see if the region even existed anymore. We determined that it didn’t exist now, but that doesn’t mean it didn’t exist at some point in the past.

Let’s go over to the Regionserver and see if it’s doing anything with this.

Checking out the Regionserver

Ok, so we had a good idea that the Master was continually attempting to split an unassigned region. We verified that the region no longer existed. When we looked through the Regionserver logs, we were in luck!

On December 12, the Regionserver experienced a failure around 18:35. Our Nagios checks noticed that the Regionserver was down and restarted it via an eventhandler. This occurred at 19:02. The region went through a replay of the edits for several things, including the region undergoing the SPLIT attempt.

Now, there’s a lot of stuff in here, so I’ll try to break it down.

The Regionserver starts up and is going through it’s list of regions that it needs to open.

2014-12-11 19:02:38,016 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593.

The Regionserver sees that there are some edits that need to be recovered, so it goes through those. I’ve trimmed the log down to one line since there were dozens of those.

2014-12-11 19:02:44,783 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/recovered.edits/0000000002220945146; minSequenceid=2220941803; path=hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/recovered.edits/0000000002220945146

As part of the recovery, it wrote out what it needed to and set the region to an online state.

2014-12-11 19:02:47,890 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp/7c0b3eaa09644da8962d16c4b48bcf9c: CompoundBloomFilterWriter
2014-12-11 19:02:48,089 INFO org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO DeleteFamily was added to HFile (hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp/7c0b3eaa09644da8962d16c4b48bcf9c)
2014-12-11 19:02:48,090 INFO org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=2221478309, memsize=2.9 M, into tmp file hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp/7c0b3eaa09644da8962d16c4b48bcf9c
2014-12-11 19:02:48,532 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/f1/7c0b3eaa09644da8962d16c4b48bcf9c, entries=1081, sequenceid=2221478309, filesize=1.3 M
2014-12-11 19:02:48,532 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~2.9 M/3016352, currentsize=0/0 for region datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593. in 897ms, sequenceid=2221478309, compaction requested=true; wal=null
2014-12-11 19:02:50,273 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593.; next sequenceid=2221478310
2014-12-11 19:02:50,285 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Post open deploy tasks for region=datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593., daughter=false

It then began a compaction of that region

2014-12-11 19:02:50,293 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on f1 in region datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593.
2014-12-11 19:02:50,294 INFO org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 3 file(s) in f1 of datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593. into tmpdir=hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp, seqid=2221478309, totalSize=4.9 M
2014-12-11 19:02:50,296 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Updated row datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593. with server=hdn08.example.net,60020,1418346137367
2014-12-11 19:02:50,297 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done with post open deploy task for region=datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593., daughter=false
2014-12-11 19:02:50,473 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp/eb964dcf575f407faa02376169e1a10e: CompoundBloomFilterWriter
2014-12-11 19:02:50,898 INFO org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO DeleteFamily was added to HFile (hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp/eb964dcf575f407faa02376169e1a10e)
2014-12-11 19:02:50,960 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/.tmp/eb964dcf575f407faa02376169e1a10e to hdfs://hadoopcluster:8020/hbase/datatable.prod/a08b16c6bcf66eb7aa9f22dc292d7593/f1/eb964dcf575f407faa02376169e1a10e
2014-12-11 19:02:51,556 INFO org.apache.hadoop.hbase.regionserver.Store: Completed compaction of 3 file(s) in f1 of datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593. into eb964dcf575f407faa02376169e1a10e, size=2.6 M; total size for store is 1.7 G
2014-12-11 19:02:51,560 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: completed compaction: regionName=datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593., storeName=f1, fileCount=3, fileSize=4.9 M, priority=3, time=7267975470031351; duration=1sec

And once it completed the compaction, it began the split, offlining the region.

2014-12-11 19:02:51,679 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593.
2014-12-11 19:02:51,821 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593.
2014-12-11 19:02:52,860 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined parent region datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593. in META

Where it sat producing no further information about the region. Meanwhile, the HBase Master appears to have been doing the split attempt repeatedly. Forever.

Thirty three (33!) days later, we see a log from the Regionserver about the ZooKeeper connection going wonky for a bit, so we can see that the cluster still thinks the ephemeral node still needs to exist at this point.

2015-01-13 10:13:25,247 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/a08b16c6bcf66eb7aa9f22dc292d7593
2015-01-13 10:21:35,278 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/a08b16c6bcf66eb7aa9f22dc292d7593

What to do

Well, there were three things we could do according to the mailing list and JIRA tickets.

  1. Restart the HBase Master to see if that un-sticks the unassigned region. Or at least gets the Master out of this SPLIT loop.
  2. Connect to ZooKeeper and manually rmr the ephemeral node by hand. We felt this was potentially dangerous.
  3. Restart the HBase Regionserver associated with the ephemeral node, which would have the effect of manually rmring it.

We attempted the HBase Master restart. That didn’t help.

We restarted the HBase Regionserver, with the hope that this would kill the ephemeral node. Our last ditch effort would have been the manual removal. Luckily, as soon as we restarted the Regionserver, the node went away and the Master stopped issuing the SPLIT attempts. The Regionserver spit this out upon shutdown.

2015-02-05 15:55:05,143 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META updated, and report to master. Parent=datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1414245609859.a08b16c6bcf66eb7aa9f22dc292d7593., new regions: datatable.prod,\x00\x00\x00\x00\x9C80\x0A,1418346171669.324412e441cb1fff286f3c063e21ac8e., datatable.prod,\x00\x00\x00\x00\x9CDP\x8A,1418346171669.8823dea5dcbef0a9158498a0d28e14a2.. Split took 1340hrs, 52mins, 13sec

Yeah. Read that closely. Split took 1340hrs, 52mins, 13sec. Or approximately 55 days. Something clearly happened here. Unfortunately, we can’t tell what because we don’t have the HBase Master logs from that time period, thanks to us keeping a much tighter control over their cleanup, compared to the Regionservers. The write volume on the Master logs is just way too large compared to each Regionserver.

Conclusion

Well, it’s clear we are still hitting a bug that’s supposed to be fixed in the version of HBase we’re running. We have a plan to upgrade, but we’re still dealing with logistics. Second, it’s clear that we need some better monitoring to detect conditions like this. Unfortunately, there’s nothing out there that I’m aware of to keep an eye out for this specific condition. We’ll probably want to write something to either look at the state of the logs on the Master and Regionservers for this behavior or we’ll need to create some monitors that look at the age of ephemeral nodes under /hbase/unassigned.

I’m not sure which we’ll do. But what an interesting and shocking ride through HBase. Again.

Travis Campbell
Staff Systems Engineer at ghostar
Travis Campbell is a seasoned Linux Systems Engineer with nearly two decades of experience, ranging from dozens to tens of thousands of systems in the semiconductor industry, higher education, and high volume sites on the web. His current focus is on High Performance Computing, Big Data environments, and large scale web architectures.