Hadoop distcp network failures with WebHDFS

… or why do I get “Cannot assign requested address” errors?! At some point or another, every Hadoop Operations person will have to copy large amounts of data from one cluster to another. This is a trivial task thanks to hadoop distcp.  But, it is not without its quirks and issues. I will discuss a […]

Google Chrome, SPNEGO, and WebHDFS on Hadoop

I’ve previously noted that we’re using Kerberos to handle the authentication on our Hadoop clusters.  One of the features that we had previously not had because of configuration issues, was the ability to use WebHDFS to browse around the cluster.  With our latest cluster, we figured out the right incantation of Kerberos and SPNEGO configurations […]

Oozie Install, why do you hate me?

We’ve been slowly migrating towards managing our Hadoop infrastructure with Cloudera Manager (CM). Our latest cluster is entirely managed via CM, enabling us to easily wire up features that we previously had no need for.  One of the new features we wanted to work with was Oozie. No problem, right?  The process is pretty simple. […]

5-whys at Hubspot: an Introspective response

Ran across Post mortems at Hubspot: What I learned from 250 Whys today.  This is a good review of Hubspot’s experience with 5-whys to facilitate post-mortems. The part that most caught my eye was the idea that “slow down” probably should not be the initial response to development velocity and mistakes if you don’t also consider the cost […]

Treat your Hadoop nodes like cattle

I’ve built compute clusters of various sizes, from hundreds to tens of thousands of systems, for almost two decades now.  One of the things I learned early on is that, for compute clusters, you want to treat each system as cookie cutter as possible.  By that, I mean there should be a minimal set of differences […]

Verify Hadoop Cluster node health with Serverspec

One of the biggest challenges I have running Hadoop clusters is constantly validating that the health and well-being of the cluster meets my standards for operation.  Hadoop, like any large software ecosystem, is composed of many layers of technologies, starting from the physical machine, up into the operating system kernel, the distributed filesystem layer, the […]

Transparent Huge Pages on Hadoop makes me sad.

Today I (re)learned that I should pay attention to the details of Linux kernel bugs associated with my favorite distribution. Especially if I’m working on CentOS 6/Red Hat Enterprise Linux (RHEL) 6 nodes running Transparent Huge Pages on Hadoop workloads. I was investigating some performance issues on our largest Hadoop cluster related to Datanode I/O […]

What’s in my datacenter tool kit?

Every Operations person or datacenter (DC) junkie that I know has a datacenter tool kit of some sort, containing their favorite bits of gear for doing work inside the cold, lonely world of the datacenter. Now, one would like to think that each company stocks the right tools for their folks to work, but tools […]

HBase Motel: SPLITS check in but don’t check out

In HBase, the Master process will periodically call for the splitting of a region if it becomes too large. Normally, this happens automatically, though you can manually trigger a split. In our case, we rarely do an explicit region split by hand. A new Master SPLIT behavior: let’s investigate We have an older HBase cluster […]

Restarting HBase Regionservers using JSON and jq

We run HBase as part of our Hadoop cluster. HBase sits on top of HDFS and is split into two parts: the HBase Master and the HBase Regionservers. The master coordinates which regionservers are in control of each specific region. Automating Recovery Responses We periodically have to do some minor maintenance and upkeep, including restarting […]