in Hadoopery

Google Chrome, SPNEGO, and WebHDFS on Hadoop

I’ve previously noted that we’re using Kerberos to handle the authentication on our Hadoop clusters.  One of the features that we had previously not had because of configuration issues, was the ability to use WebHDFS to browse around the cluster.  With our latest cluster, we figured out the right incantation of Kerberos and SPNEGO configurations to make this work, validating it from Google Chrome and curl. The Hadoop-side setup is reasonably well documented, so I won’t go into it.  There’s a few ways to get the browser-side working.

Validating SPNEGO is working on WebHDFS

The easiest way to determine if SPNEGO is working on your cluster, is to hit the WebHDFS path with curl.

First, make sure you’re authenticated to your Kerberos domain.

$ klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1000)

$ kinit
Password for hcoyote@EXAMPLE.NET:

$ klist 
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: hcoyote@EXAMPLE.NET

Valid starting Expires Service principal
06/10/15 12:13:18 06/13/15 12:13:18 krbtgt/EXAMPLE.NET@EXAMPLE.NET
 renew until 06/17/15 12:13:18

Next, you will invoke curl with the negotiate option and the user set to anyUser.  This is a fake user required by curl to initialize the authentication code.  The real user is determined as part of the Kerberos authentication process.

$ curl -s -i --negotiate \
  -u:anyUser \
   http://namenode.htdp.hdp.example.net:50070/webhdfs/v1/?op=LISTSTATUS

curl will send the initial request …

* Trying 192.168.16.68...
* Connected to namenode.prd.hdp.example.net (10.8.16.68) port 50070 (#0)
> GET /webhdfs/v1/?op=LISTSTATUS HTTP/1.1
> User-Agent: curl/7.29.0
> Host: namenode.prd.hdp.example.net:50070
> Accept: */*

… and encounter an HTTP 401 from Jetty, requiring curl to send an authenticated request:

HTTP/1.1 401 Authentication required
Cache-Control: must-revalidate,no-cache,no-store
Date: Wed, 10 Jun 2015 19:00:12 GMT
Pragma: no-cache
Date: Wed, 10 Jun 2015 19:00:12 GMT
Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; Expires=Thu, 01-Jan-1970 00:00:00 GMT; HttpOnly
Content-Length: 1404
Server: Jetty(6.1.26.cloudera.4)

curl will then resend the request with the appropriate SPNEGO negotiation parameters enabled.

* Connection #0 to host namenode.prd.hdp.example.net left intact
* Issue another request to this URL: 'HTTP://namenode.prd.hdp.example.net:50070/webhdfs/v1/?op=LISTSTATUS'
* Found bundle for host namenode.prd.hdp.example.net: 0x71b050
* Re-using existing connection! (#0) with host namenode.prd.hdp.example.net
* Connected to namenode.prd.hdp.example.net (10.8.16.68) port 50070 (#0)
* Server auth using GSS-Negotiate with user ''
> GET /webhdfs/v1/?op=LISTSTATUS HTTP/1.1
> Authorization: Negotiate <random data>
> User-Agent: curl/7.29.0
> Host: namenode.prd.hdp.example.net:50070
> Accept: */*

The important thing to note is the Authorization header is set to Negotiate with a random string that would be following it (redacted above).  This random string is the authentication token generated from the Kerberos data.  Now that we’re specifying the token data, the Jetty responds back with WWW-Authenticate header containing it’s own Negotiate token.

HTTP/1.1 200 OK
Cache-Control: no-cache
Expires: Wed, 10 Jun 2015 19:00:12 GMT
Date: Wed, 10 Jun 2015 19:00:12 GMT
Pragma: no-cache
Expires: Wed, 10 Jun 2015 19:00:12 GMT
Date: Wed, 10 Jun 2015 19:00:12 GMT
Pragma: no-cache
Content-Type: application/json
WWW-Authenticate: Negotiate <second random string>
Set-Cookie: hadoop.auth="u=hcoyote&p=hcoyote@EXAMPLE.NET&t=kerberos&e=1433998812103&s=MVV9i0IEmSPwabHNNqoLOBJRaPE="; Path=/; Expires=Thu, 11-Jun-2015 05:00:12 GMT; HttpOnly
Transfer-Encoding: chunked
Server: Jetty(6.1.26.cloudera.4)

Additionally, Jetty sets the hadoop.auth cookie to make it easier to authenticate in the future. This allows the web browser to pass a pre-authenticated token back and forth without incurring additional delay for the Kerberos authentication to occur.

A side-trip into your ticket cache

One thing you may notice after your first SPNEGO authentication occurs is an additional HTTP entry in your Kerberos ticket cache.  This is related to the negotiation process.

$ klist 
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: hcoyote@EXAMPLE.NET

Valid starting Expires Service principal
06/10/15 13:59:16 06/13/15 13:59:16 krbtgt/EXAMPLE.NET@EXAMPLE.NET
 renew until 06/17/15 13:59:16
06/10/15 14:00:12 06/13/15 13:59:16 HTTP/namenode.prd.hdp.example.net@EXAMPLE.NET
 renew until 06/17/15 13:59:16

So, now that we have verified our SPNEGO configuration is working, let’s move on to enabling Chrome.

All Chrome, No SPNEGO

When I originally set this up, I followed the pretty simple procedure for configuring Chrome support for SPNEGO.  Under Linux, all you need to do is enable a few startup flags to create the whitelist of domain names where we’re willing to send our Negotiate credentials to.

Pretty easy.

$ google-chrome --auth-server-whitelist="*.hdp.example.net" \
    --auth-negotiate-delegate-whitelist="*.hdp.example.net"

But when I attempted this today, I found that no matter what I did, the WebHDFS access would fail:

 

This previously worked a few weeks ago.  Digging around, I realized that Chrome had been updated on my workstation to version 43 today.  I ran across Chromium fails to Negotiate [with SPNEGO] where they note that as of Chrome/Chromium 41, the Negotiation options aren’t getting correctly enabled if passed via the command line.  Well, great.  Now what do I do?  I can’t tell people to downgrade to an older version of Chrome because that introduces security risks into their personal environments.

One comment on the bug suggests using Chrome Policies to manage SPNEGO Whitelisting is the work around for now.

Enabling SPNEGO Policy Whitelisting in Chrome

So how do we do that?

Because every person would have had to enable the command line options, we will have to manage the policy on each machine where this option needs to be set.  The first step is to create the directory where the policy file will be read at Chrome startup.

$ sudo mkdir -p /etc/opt/chrome/policies/managed

And then you create the policy file.  This file is a JSON data structure that looks like this

{ 
  "AuthServerWhitelist": "*.example.net",
  "AuthNegotiateDelegateWhitelist": "*.example.net"
}

Place the JSON in /etc/opt/chrome/policies/managed/spnego.json. The name of the policy file appears to be un-important.

Simply restart Chrome without the whitelisting command line options.  When you view the WebHDFS URL, it should now look like this if you’re correctly authenticated to your Kerberos domain.  The contents of your HDFS directories may differ.

WebHDFS Authentication Failure

WebHDFS Authentication Success

Oozie Install, why do you hate me?

We’ve been slowly migrating towards managing our Hadoop infrastructure with Cloudera Manager (CM). Our latest cluster is entirely managed via CM, enabling us to easily wire up features that we previously had no need for.  One of the new features we wanted to work with was Oozie. No problem, right?  The process is pretty simple. […]

5-whys at Hubspot: an Introspective response

Ran across Post mortems at Hubspot: What I learned from 250 Whys today.  This is a good review of Hubspot’s experience with 5-whys to facilitate post-mortems. The part that most caught my eye was the idea that “slow down” probably should not be the initial response to development velocity and mistakes if you don’t also consider the cost […]

Treat your Hadoop nodes like cattle

I’ve built compute clusters of various sizes, from hundreds to tens of thousands of systems, for almost two decades now.  One of the things I learned early on is that, for compute clusters, you want to treat each system as cookie cutter as possible.  By that, I mean there should be a minimal set of differences […]

Verify Hadoop Cluster node health with Serverspec

One of the biggest challenges I have running Hadoop clusters is constantly validating that the health and well-being of the cluster meets my standards for operation.  Hadoop, like any large software ecosystem, is composed of many layers of technologies, starting from the physical machine, up into the operating system kernel, the distributed filesystem layer, the […]

Transparent Huge Pages on Hadoop makes me sad.

Today I (re)learned that I should pay attention to the details of Linux kernel bugs associated with my favorite distribution. Especially if I’m working on CentOS 6/Red Hat Enterprise Linux (RHEL) 6 nodes running Transparent Huge Pages on Hadoop workloads. I was investigating some performance issues on our largest Hadoop cluster related to Datanode I/O […]

What’s in my datacenter tool kit?

Every Operations person or datacenter (DC) junkie that I know has a datacenter tool kit of some sort, containing their favorite bits of gear for doing work inside the cold, lonely world of the datacenter. Now, one would like to think that each company stocks the right tools for their folks to work, but tools […]

HBase Motel: SPLITS check in but don’t check out

In HBase, the Master process will periodically call for the splitting of a region if it becomes too large. Normally, this happens automatically, though you can manually trigger a split. In our case, we rarely do an explicit region split by hand. A new Master SPLIT behavior: let’s investigate We have an older HBase cluster […]

Restarting HBase Regionservers using JSON and jq

We run HBase as part of our Hadoop cluster. HBase sits on top of HDFS and is split into two parts: the HBase Master and the HBase Regionservers. The master coordinates which regionservers are in control of each specific region. Automating Recovery Responses We periodically have to do some minor maintenance and upkeep, including restarting […]

My Hadoop cluster data needs no RAID!

One of the operational challenges in introducing Hadoop to traditional IT and Enterprise operations is understanding when to break one of our sacred IT mantras: Thou shalt always RAID your data. Never shalt thou install a system without RAID. One shall be your RAID if thou seekest performance and redundancy sparing no expense. Five shall […]