in Hadoopery

Verify Hadoop Cluster node health with Serverspec

One of the biggest challenges I have running Hadoop clusters is constantly validating that the health and well-being of the cluster meets my standards for operation.  Hadoop, like any large software ecosystem, is composed of many layers of technologies, starting from the physical machine, up into the operating system kernel, the distributed filesystem layer, the compute layer, and beyond. As you add in more and more complexity to the cluster, you find it becomes more and more necessary to verify the ongoing state of the system.  Verifying Hadoop health must be checked at all layers to ensure your sanity stays intact.  Otherwise, you will find yourself putting out small fire after small fire, wondering when the next cinder will burn your cluster down.

A healthy node is a happy node

Every node is sacred.  Every node is great.  If a node lay wasted, Hadoop gets quite irate.  Ok, not really.  Within a Hadoop cluster of any reasonable size, you don’t need to fixate on a single node.  Hadoop can tolerate nodes coming in and out.  But, it’s a waste of resources to not have as many nodes up and running as optimally as possible.

Now, this isn’t specifically geared towards monitoring, but you could use the same methodology (and even the same tools) to do your monitoring.

The first thing we’re doing here is defining a set of cases to test against that you declare as healthy and then running those tests to validate your expected outcome.  This is very similar to what you would find in test-driven development (TDD) practices.  More specifically, this is called behavior driven development (BDD), an outgrowth of TDD. In my case, the automation code is our development base.  I use Serverspec as the ruleset for defining tests and what the valid output should be.

Behavior driven development?

I won’t go into great detail about behavior driven development (BDD); there are many good articles about this.  BDD has a few principles behind it.

  1. When you define a test, it should be in terms of the expected behavior of the software.
  2. Write behaviors as a simple narrative. This is known as Given-When-Then.
    1. For example: Given a Hadoop Datanode, when a Datanode process is present, then it should be listening on port 60020.
  3. Behaviors define the business logic you’re trying to validate.  Validate the behavior, validate the code.

Simple?  Good.  Now, onto Serverspec

What is Serverspec?

Serverspec is a BDD framework written on top of Ruby RSpec.  It’s primary function is to allow you to write your tests in RSpec and validate the physical behavior of servers.  It is simple to install.

$ sudo gem install serverspec

It is simple to write tests for. For example, you can use it to validate that a particular filesystem is mounted:

describe file('/') do
  it { should be_mounted }
end

or that a kernel parameter is set:

describe 'Linux kernel parameters' do 
  context linux_kernel_parameter('net.ipv4.tcp_syncookies') 
    do its(:value) { should eq 1 } 
  end 
end

or that even a port is open and in the LISTEN state by a given process.

describe port(80) do
  it { should be_listening }
end

You can use it standalone or with your favorite Ruby-based automation tool, such as Puppet or Chef.

Using Serverspec to check Hadoop node health

So now that you have a very basic understanding of Serverspec, let’s talk about how this helps us validate a node’s health. Ordinarily, you don’t care about how an individual node is working within Hadoop.  You care about the over-all cluster health and performance.  But, over time, if you’re not paying attention to nodes as they drop off, you’ll find that the efficiency of your cluster usage drops with it.  You’ll want to have a standard way of validating that the physical attributes of each node are as you expect.

In my clusters, I want to validate that several things are always correctly set.  While automation tools like Puppet help a lot, we’ve met situations where the runtime experience was definitely not matching the configuration experience.

For this discussion, I will be focusing on compute nodes in the cluster.  This would be the node that runs the Datanode process, the Tasktracker, the HBase Regionserver, and so on.

  • Every node runs the Datanode process and should be configured to start on boot. The process should always be running and should be listening on port 50075.
  • Every node runs the Tasktracker process and should be configured to start on boot.  The process should always be running and should be listening on port 50060.
  • Every node runs the Regionserver process and should be configured to start on boot. The process should always be running and should be listening on port 60020.
  • /etc/hosts should contain a definition for localhost and the fully qualified domain name (and short name) for this host.
  • Every node should have twelve separate mounts points for the HDFS file systems. Each filesystem should be ext4 and have specific mount options, and specific top-level file ownerships.
  • Each node should have specific kernel configurations or settings for performance tuning or fault recovery:
    • the kernel’s kernel.panic setting configured for 300 seconds ( to work around a specific type of hardware fault we encounter).
    • vm.swappiness should be set to zero to prevent swapping from occurring.
    • Disable the intel_idle driver so it the highest CPU C-state is zero.
    • Disable Transparent Huge Pages
  • Every node should have a bonded network device using Link Aggregation Control Protocol (LACP).  It should contain specific physical interfaces.  Each interface should be at least 1Gbps.
  • Every node should run a Link Layer Discovery Protocol daemon (LLDP) so we have an easier time determining which switch port(s) the node is connected to.

Tying Serverspec into Monitoring

All of these behaviors define the minimum required set of features that the system should be running with before we want to release it into the cluster.  Puppet will handle setting most (if not all) of these things.  With the combination of these two, I now have an easily expandable set of smoke tests for adding new systems to a cluster.  I also have the basis of a new monitoring framework to tack on to my monitoring system.  Both activities overlap in requirements, but the use of Serverspec allows me to define the behaviors only once.

There are two ways that I would tie this into our monitoring tools and Hadoop.  The first way is to write a wrapper around Serverspec so that we alert on individual node health failures.  But that goes against what we really care about in Hadoop:  the cluster matters, not the node.  Monitor the cluster health, figure out when it drops below a certain threshold of performance, and then alert on that.

The second way is to indirectly inject this into the Hadoop cluster in a way that proactively prevents compute tasks from running on a bad server. Basically, because Serverspec can tell us what the current health state is of a node, we can use that to leverage the Tasktracker blacklist to allow nodes to shoot themselves in the head if they go bad for some reason.

To do this, you configure the Tasktracker to periodically run a script defined in mapred.healthChecker.script.path. The output will be one of two things: nothing or ERROR. If it is ERROR, the node is flagged for blacklisting.  Once in the blacklist, the Jobtracker stops sending MapReduce jobs to that node until the Tasktracker is restarted or the Tasktracker returns to a healthy state.  This script needs to be fast and non-blocking.  What will probably work the best is a small script that reads a state file that gets run by the Tasktracker and a second script that’s a wrapper around the Serverspec tests that gets run via cron on a frequent and periodic basis that writes out the current health state.

You can read more about this in the Hadoop configuration documentation.

Where to go from here?

Right now, I have the start of the Serverspec tests written for the business logic I defined above.  I’m still determining the best way to break things up so that I can utilize common code across both Hadoop and non-Hadoop parts of my environment.  Right now, I’m primarily targeting the work for server acceptance testing of my new clusters, but I intend to move towards using it as part of our overall health monitoring at some point.

You can see the code and play with it yourself.

https://github.com/hcoyote/hadoop-serverspec

$ git clone https://github.com/hcoyote/hadoop-serverspec
$ cd hadoop-serverspec
$ rspec

HDFS Filesystems
 File "/hdfs/01"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/02"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/03"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/04"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/05"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/06"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/07"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/08"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/09"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/10"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/11"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755
 File "/hdfs/12"
  should be mounted
  should be owned by "root"
  should be grouped into "root"
  should be mode 755

Hadoop Datanode Daemon
 Package "hadoop-hdfs-datanode"
  should be installed
 Service "hadoop-hdfs-datanode"
  should be enabled
  should be running
 Port "50020"
  should be listening

Hadoop requirements to run
 File "/etc/hosts"
  should contain /127.0.0.1 localhost localhost.localdomain/
  should contain "hdn20.example.net"
 Host "127.0.0.1"
  should be resolvable
 Host "localhost"
  should be resolvable
 Host "hdn20.example.net"
  should be resolvable

Hadoop Tasktracker Daemon
 Package "hadoop-0.20-mapreduce-tasktracker"
  should be installed
 Service "hadoop-0.20-mapreduce-tasktracker"
  should be enabled
  should be running

Hardware Configuration
 Linux kernel parameter "kernel.panic"
  value
   should eq 300

HBase Regionserver
 Package "hbase-regionserver"
  should be installed
 Service "hbase-regionserver"
  should be enabled
  should be running
 Port "60030"
  should be listening

Network Configuration
 Physical Network
  Interface "em1"
   should exist
   speed
    should eq 1000
  Interface "em2"
   should exist
   speed
    should eq 1000
  Interface "p3p1"
   should exist
   speed
    should eq 1000
 LACP
  Bond "bond0"
   should exist
   should have interface "em1"
   should have interface "em2"
   should have interface "p3p1"
 LLDP
  Package "lldpd"
   should be installed
  Service "lldpd"
   should be enabled
   should be running

CPU Kernel Configurations for Max Performance
 File "/proc/cmdline"
  content
   should match /intel_idle.max_cstate=0?/
 Linux kernel parameter "vm.swappiness"
  value
   should eq 0

Transparent Huge Page Support
 File "/sys/kernel/mm/transparent_hugepage/defrag"
  content
   should match /\[never\]/
 File "/sys/kernel/mm/transparent_hugepage/enabled"
  content
   should match /\[never\]/

Finished in 20.38 seconds (files took 0.89765 seconds to load)
82 examples, 0 failures

 

 

Travis Campbell
Staff Systems Engineer at ghostar
Travis Campbell is a seasoned Linux Systems Engineer with nearly two decades of experience, ranging from dozens to tens of thousands of systems in the semiconductor industry, higher education, and high volume sites on the web. His current focus is on High Performance Computing, Big Data environments, and large scale web architectures.