in Hadoopery

Pig ‘local’ mode fails when Kerberos auth is enabled.

I ran across this interesting Kerberos authentication bug today on Cloudera’s CDH4. It appears to affect all versions of pig, but only when running in local mode.

I want to run pig in local mode. This implies that pig fires up everything it needs to run the MapReduce job on your local machine without having to contact any configured cluster resources. In today’s case, I was doing some one-off testing that should not have required any resources on the cluster. The system I ran this on is what I term a client node. That is, it has all the configurations and software necessary to talk to the cluster, but runs no daemons. It is purely a launching point.

In this particular case, the pig ‘local’ mode job would get half-way through and then began complaining about being unable to get a Kerberos TGT. This made no sense: we’re supposed to be running disconnected from the server. Digging around, I found
PIG-3507 had been filed on this specific issue, but no workaround had been suggested and the provided patch was reverted due to compatibility issues.

The bug mentions that it was likely being caused by core-site.xml being read in and pig configuring hadoop.security.authentication to kerberos, based on that. If that was the case, then we should be able to override that simply by setting the shell environment variable $HADOOP_CONF_DIR to a location without a Hadoop configuration in it.

So, we did that.

$ HADOOP_CONF_DIR="/etc/hadoop/conf.empty" pig -x local

And it worked. In Cloudera configurations (at least, the RPM-based ones), /etc/hadoop/conf.empty exists and contains nothing in it. If you don’t have that, you can create an empty directory and point $HADOOP_CONF_DIR to that and your local pig runs should start working again.

Oh, and here’s the errors pig was throwing when we did NOT do this.

2014-05-21 15:13:29,846 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:agg (auth:KERBEROS) cause:java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2014-05-21 15:13:29,847 [main] WARN  org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo. Not retrying because failovers (15) exceeded maximum allowed (15)
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "test.example.net/192.168.1.123"; destination host is: "namenode.example.net":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:763)
    at org.apache.hadoop.ipc.Client.call(Client.java:1241)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
    at $Proxy10.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:629)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
    at $Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1545)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:820)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1378)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
    at org.apache.pig.PigServer.storeEx(PigServer.java:933)
    at org.apache.pig.PigServer.store(PigServer.java:900)
    at org.apache.pig.PigServer.openIterator(PigServer.java:813)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:604)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:581)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:545)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:629)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:252)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1290)
    at org.apache.hadoop.ipc.Client.call(Client.java:1208)
    ... 34 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:137)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:457)
    at org.apache.hadoop.ipc.Client$Connection.access$1400(Client.java:252)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:622)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:619)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:618)
    ... 37 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:130)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
    ... 46 more
2014-05-21 15:13:29,852 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.

Using cobbler with a fast file system creation snippet for Kickstart %post install of Hadoop nodes

I run Hadoop servers with 12 2TB hard drives in them. One of the bottlenecks with this occurs during kickstart when we’re using anaconda to create the filesystems. Previously, I just had a specific partition configuration that was brought in during %pre, but this caused the filesystem formatting section of kickstart to take several hours […]