in Hadoopery

Oozie Install, why do you hate me?

We’ve been slowly migrating towards managing our Hadoop infrastructure with Cloudera Manager (CM). Our latest cluster is entirely managed via CM, enabling us to easily wire up features that we previously had no need for.  One of the new features we wanted to work with was Oozie.

No problem, right?  The process is pretty simple.

  1. Login to CM.
  2. Select your cluster.
  3. In the Actions menu, select Add a Service.
  4. In the Add a Service wizard, select Oozie and click Continue.
  5. You should be prompted for some service dependencies if they aren’t already installed.  In our case, we were prompted for
    1. HDFS
    2. YARN
    3. Zookeeper
  6. Click Continue.
  7. Assign the Oozie Server role to a node and click Continue.
    1. We selected a NameNode.
  8. Setup the database connection.

… and at this point, we hit the problem.

In our environment, we’ve standardized on MySQL with master/slave replication configurations for availability reasons.  We configured Oozie Server to use a DB connection on one of our MySQL clusters, only to discover that CM fails at this stage with this error:

User can’t run DDL statements on the specified database. Attempt to create and drop a table failed.

Wait, what?

The game is afoot, Watson!

With no other useful output than that, we begin double checking everything. We verify that the CM node can talk to the database server as the configured user.  We verify that the NameNode running the Oozie Server can talk to the database server as the configured user.  We verify that the configured user has full privileges on the configured database.  We even throw SUPER privileges on the user just to rule that out, even though the documentation doesn’t note any special privileges.

No dice.  The only thing we can do now is start digging into the CM installation process for Oozie to determine what sort of MySQL schema operations it’s doing as part of the connection test.

The first thing we need to do is look at the CM logs.  By default, this dumps things into /var/log/cloudera-scm-server/cloudera-scm-server-log. We notice that every time we re-run the connection test, the following gets dumped to the log.

2015-06-04 16:49:42,850 INFO 1138375412@scm-web-20131:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing host command OozieTestDatabaseConnection BasicCmdArgs{args=[namenode.prod.hdp.example.net, mysql, dbs1.example.net:3301, cdhprodooziedb, oozie_user, somepassword]}. Host: DbHost{id=55, hostId=9f2bdac3-ae16-4048-a26e-fda9503a3816, hostName=namenode.prod.hdp.example.net}
2015-06-04 16:49:49,694 INFO CommandPusher:com.cloudera.cmf.service.AbstractOneOffHostCommand: Unsuccessful 'OozieTestDatabaseConnection'
2015-06-04 16:49:49,695 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: Command exited with code: 9

Ok.  That’s almost helpful.  At least we know it’s exiting non-zero, indicating an error.  It also spits out some more specific debug information about the command that was executed.

+ OPENJAVA8_HOME_CANDIDATES=('/usr/lib/jvm/java-1.8.0-openjdk' '/usr/lib/jvm/java-8-openjdk')
+ local OPENJAVA8_HOME_CANDIDATES
+ MISCJAVA_HOME_CANDIDATES=('/Library/Java/Home' '/usr/java/default' '/usr/lib/jvm/default-java' '/usr/lib/jvm/java-openjdk' '/usr/lib/jvm/jre-openjdk')
+ local MISCJAVA_HOME_CANDIDATES
+ case ${BIGTOP_JAVA_MAJOR} in
+ JAVA_HOME_CANDIDATES=(${JAVA7_HOME_CANDIDATES[@]} ${JAVA6_HOME_CANDIDATES[@]} ${JAVA8_HOME_CANDIDATES[@]} ${MISCJAVA_HOME_CANDIDATES[@]} ${OPENJAVA7_HOME_CANDIDATES[@]} ${OPENJAVA6_HOME_CANDIDATES[@]} ${OPENJAVA8_HOME_CANDIDATES[@]})
+ '[' -z /usr/java/jdk1.7.0_55_x64 ']'
+ verify_java_home
+ '[' -z /usr/java/jdk1.7.0_55_x64 ']'
+ echo JAVA_HOME=/usr/java/jdk1.7.0_55_x64
+ source_parcel_environment
+ '[' '!' -z '' ']'
+ JAVA=/usr/java/jdk1.7.0_55_x64/bin/java
+ [[ -z /usr/share/cmf ]]
+ MGMT_CLASSPATH='/usr/share/cmf/lib/*'
+ JDBC_JARS=/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar
+ MGMT_CLASSPATH='/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*'
++ pwd
+ MGMT_CLASSPATH='/var/run/cloudera-scm-agent/process/5006-OOZIE.OOZIE_SERVER-test-db-connection:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*'
+ MGMT_JAVA_OPTS='-Djava.net.preferIPv4Stack=true '
+ exec /usr/java/jdk1.7.0_55_x64/bin/java -Djava.net.preferIPv4Stack=true -Djava.security.egd=file:///dev/urandom -cp '/var/run/cloudera-scm-agent/process/5006-OOZIE.OOZIE_SERVER-test-db-connection:/usr/share/java/mysql-connector-java.jar:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/cmf/lib/*' com.cloudera.enterprise.dbutil.DbCommandExecutor db.properties

That looks promising. We can see that CM is invoking a DbCommandExecutor java class to do the testing. But, where is it executing this from? What of the things I’ve learned while poking around the file system layout for CM is that all run-time state is kept in either /var/run/cloudera-scm-agent or /var/run/cloudera-scm-server, whether it’s config files, Kerberos keytabs, or other related information.

In this case, we can see that a path is being added to the CLASSPATH environment variable showing the run-time directory for this DbCommandExecutor instance. Looking on the CM server, we find nothing there. We next checked the NameNode where the Oozie Server was intended to run on and find a recent copy of that run in /var/run/cloudera-scm-agent/process/5006-OOZIE.OOZIE_SERVER-test-db-connection.  Inside this directory is a logs/stderr.log and logs/stdout.log containing the output of the command. The stderr.log didn’t show anything useful; it was just another dump of shell script output.

The stdout.log put us on the right track. It showed an actual MySQL error.

Thu Jun  4 16:49:47 CDT 2015
JAVA_HOME=/usr/java/jdk1.7.0_55_x64
[2015-06-04 16:49:47,521] INFO     0[main] - com.cloudera.enterprise.dbutil.DbCommandExecutor.testDbConnection(DbCommandExecutor.java:237) - Successfully connected to database.
[2015-06-04 16:49:47,553]ERROR    32[main] - com.cloudera.enterprise.dbutil.DbCommandExecutor$DbConnectionTestException.logError(DbCommandExecutor.java:514) - Unable to create/drop a table.
java.sql.SQLException: CREATE TABLE ... SELECT is forbidden when @@GLOBAL.ENFORCE_GTID_CONSISTENCY = 1.
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2618)
    at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1749)
    at com.mysql.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1666)
    at com.cloudera.enterprise.dbutil.DbCommandExecutor.runDdlStatementPair(DbCommandExecutor.java:337)
    at com.cloudera.enterprise.dbutil.DbCommandExecutor.testIfUserCanCreateTable(DbCommandExecutor.java:313)
    at com.cloudera.enterprise.dbutil.DbCommandExecutor.testIfUserCanRunDdl(DbCommandExecutor.java:279)
    at com.cloudera.enterprise.dbutil.DbCommandExecutor.testDbConnection(DbCommandExecutor.java:238)
    at com.cloudera.enterprise.dbutil.DbCommandExecutor.main(DbCommandExecutor.java:134)
[2015-06-04 16:49:47,556]ERROR    35[main] - com.cloudera.enterprise.dbutil.DbCommandExecutor$DbConnectionTestException.logError(DbCommandExecutor.java:518) - Exiting with exit code 9

When you have eliminated the impossible …

This is our eureka moment and where it all begins to make sense. In our environment, we have begun implementing a new feature in recent versions of MySQL called the Global Transaction ID (GTID). GTID is a way to create a globally unique ID associated with each transaction that occurs.  It’s primary benefit is to improve the way that replication occurs between a master and it’s slaves.  This helps make it easier to do failovers between servers and ensure that the state on the new master is equivalent to the state on the old master.

One of the caveats of GTID being turned on is that certain operations are forbidden for consistency reasons, so they get disabled.  See that SQLException? With GTID, you may no longer use CREATE TABLE ... SELECT queries, which the Oozie setup attempts to do.

At this point, the only solution we have is to disable GTID on this set of MySQL servers.  Not a big deal, but it might be in other environments if you’re sharing MySQL clusters across multiple use cases.

The long way around the mountain

As implied before, CM is pretty new for our environment, so we’re still figuring out how it works and what it’s limitations are.  The process I used for finding the log files ended up being the long way around the mountain.  While writing all this up, I figured out that I could have found this information via the CM Recent Command logging.

In CM, you can

  1. Click the Home link.
  2. Click the All Recent Commands link.
  3. Look for Test Database Connection for Oozie Server.
    1. You might need to show more than the last 40 commands.

So, I could have found this out much more quickly had I been aware of that.  Now I know.  And knowing is half the Hadoop Battle.

Travis Campbell
Staff Systems Engineer at ghostar

Travis Campbell is a seasoned Linux Systems Engineer with nearly two decades of experience, ranging from dozens to tens of thousands of systems in the semiconductor industry, higher education, and high volume sites on the web. His current focus is on High Performance Computing, Big Data environments, and large scale web architectures.