in Hadoopery

Pig ‘local’ mode fails when Kerberos auth is enabled.

I ran across this interesting Kerberos authentication bug today on Cloudera’s CDH4. It appears to affect all versions of pig, but only when running in local mode.

I want to run pig in local mode. This implies that pig fires up everything it needs to run the MapReduce job on your local machine without having to contact any configured cluster resources. In today’s case, I was doing some one-off testing that should not have required any resources on the cluster. The system I ran this on is what I term a client node. That is, it has all the configurations and software necessary to talk to the cluster, but runs no daemons. It is purely a launching point.

In this particular case, the pig ‘local’ mode job would get half-way through and then began complaining about being unable to get a Kerberos TGT. This made no sense: we’re supposed to be running disconnected from the server. Digging around, I found
PIG-3507 had been filed on this specific issue, but no workaround had been suggested and the provided patch was reverted due to compatibility issues.

The bug mentions that it was likely being caused by core-site.xml being read in and pig configuring hadoop.security.authentication to kerberos, based on that. If that was the case, then we should be able to override that simply by setting the shell environment variable $HADOOP_CONF_DIR to a location without a Hadoop configuration in it.

So, we did that.

$ HADOOP_CONF_DIR="/etc/hadoop/conf.empty" pig -x local

And it worked. In Cloudera configurations (at least, the RPM-based ones), /etc/hadoop/conf.empty exists and contains nothing in it. If you don’t have that, you can create an empty directory and point $HADOOP_CONF_DIR to that and your local pig runs should start working again.

Oh, and here’s the errors pig was throwing when we did NOT do this.

2014-05-21 15:13:29,846 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:agg (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2014-05-21 15:13:29,847 [main] WARN  org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo. Not retrying because failovers (15) exceeded maximum allowed (15)
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "test.example.net/192.168.1.123"; destination host is: "namenode.example.net":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
    at org.apache.hadoop.ipc.Client.call(Client.java:1241)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
    at $Proxy10.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:629)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
    at $Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1545)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:820)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1378)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
    at org.apache.pig.PigServer.storeEx(PigServer.java:933)
    at org.apache.pig.PigServer.store(PigServer.java:900)
    at org.apache.pig.PigServer.openIterator(PigServer.java:813)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:604)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:581)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:545)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:629)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1290)
    at org.apache.hadoop.ipc.Client.call(Client.java:1208)
    ... 34 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:457)
    at org.apache.hadoop.ipc.Client$Connection.access$1400(Client.java:252)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:622)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:619)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:618)
    ... 37 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:130)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
    ... 46 more
2014-05-21 15:13:29,852 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
Travis Campbell
Staff Systems Engineer at ghostar
Travis Campbell is a seasoned Linux Systems Engineer with nearly two decades of experience, ranging from dozens to tens of thousands of systems in the semiconductor industry, higher education, and high volume sites on the web. His current focus is on High Performance Computing, Big Data environments, and large scale web architectures.