The Big SQL service check runs a small set of basic SQL statements to verify the health of the installation.¬† The service check is run automatically at the end of install, but you can also run it manually from the Big SQL service menu.
In this article we will walk through a service check failure and attempt to recover from it.¬† You may have encountered a service check failure that does not match the one described on this page, but hopefully the information here will help you investigate the problem.
When we encounter a service check failure, this is what we see after running the service check from the Big SQL service menu.
If we click on the ‘BigInsights – Big SQL Service Check’ link we see the host and then the operation summary.¬† Click through each one to the Ambari stderr and stdout logs.¬† The service check typically lists individual SQL statement errors in stdout, and if we scroll down to the first error, this is what we see:
create hadoop table smoke_1006105712(c1 integer, c2 integer) DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL5105N The statement failed because a Big SQL component encountered an error. Component receiving the error: "". Component returning the error: "HIVE". Log entry identifier: "[BSL-0-592d356ed]". SQLSTATE=58040
SQL5105N is a generic DB2 error that is reported when an error occurs in a Big SQL component.¬† In this case we can see the error was returned from Hive with a log entry identifier of ‘BSL-0-592d356ed’.
Big SQL logs are found in /var/ibm/bigsql/logs.¬† The ‘BSL-0’ prefix tells us that the log entry is in the file bigsql.log on node number 0.¬† This is our head node.¬† If you aren’t sure which host a particular node was created on you can check sqllib/db2nodes.cfg in your Big SQL user home directory.¬† In my small cluster node 0 is on host bdavm488.
# cat ~bigsql/sqllib/db2nodes.cfg 0 bdavm488.svl.ibm.com 0 1 bdavm307.svl.ibm.com 0
We go to host bdavm488 and search the file /var/ibm/bigsql/logs/bigsql.log for log entry ‘BSL-0-592d356ed’.
Author’s Note: we could have used the SYSHADOOP.LOG_ENTRY function to extract the log entry from the log file.
Here it is, with a truncated stack:
2016-06-27 10:06:11,821 WARN com.ibm.biginsights.catalog.translator.hive.HiveExceptionTranslator [Thread-4] : [BSL-0-592d356ed]: FAILED: SemanticException MetaException(message:org.apache.hadoop.security.AccessControlException: Permission denied: user=bigsql, access=EXECUTE, inode="/apps/hive/warehouse/bigsql.db":hive:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3851) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1011) ...
From this error we can tell:
- The user bigsql does not have execute permission on /apps/hive/warehouse/bigsql.db (Permission denied: user=bigsql, access=EXECUTE, inode=”/apps/hive/warehouse/bigsql.db”).
- I’m not sure if this is the intent of the log entry, but it may be suggesting that the user needs to be a member of the ‘hadoop’ group to have execute access (‘:hive:hadoop:drwxrwx—‘).
- The bottom-most entry in the truncated stack tells us the HDFS NameNode is performing the permission check.
This gives us a few things to look at, each corresponding to an item from the list above.¬† We should:
- Check the Big SQL user on each host, to see which groups it belongs to.
- Try to verify that it is hadoop group membership which is needed to have execute permission on /apps/hive/warehouse/bigsql.db.
- Try to assess the health of the HDFS NameNode component.
- Search the internet to see if anybody else has hit this problem.¬† This would probably be first in our list, but let’s try to figure it out on our own.
Check the Big SQL user on each host
This isn’t too bad because the cluster I am using only has four hosts.
bdavm306: id: bigsql: no such user bdavm307: uid=2824(bigsql) gid=1006(hadoop) groups=1006(hadoop),100(users),984(hive),43210(bigsqladm) bdavm397: uid=2824(bigsql) gid=1022(hadoop) groups=1022(hadoop),100(users),984(hive),43210(bigsqladm) bdavm488: uid=2824(bigsql) gid=1006(hadoop) groups=1006(hadoop),100(users),983(hive),43210(bigsqladm)
Three of the four hosts have a Big SQL user, and all three of those have hadoop as the primary group ID.¬† On host bdavm397 the hadoop group has a different group ID compared to the other two hosts.¬† At this point we aren’t going to conclude any of this is a problem.¬† We will just make note of it and move on.
Try to verify it is hadoop group membership we need
The Big SQL user is not able to list the permissions of the directory, though in trying to do so we discovered a quick way to reproduce the error.
$ hadoop fs -ls -d /apps/hive/warehouse/bigsql.db ls: Permission denied: user=bigsql, access=EXECUTE, inode="/apps/hive/warehouse/bigsql.db":hive:hadoop:drwxrwx---
This error occurred on bdavm397 and bdavm488, two hosts where the Big SQL user is in the hadoop group, but the group has a different ID on each.¬† I’m not sure what to make of this.¬† We are a member of the hadoop group, so either it isn’t the hadoop group that we need to be a member of, or something else is wrong.¬† Let’s continue.
Assess the health of the HDFS NameNode component
In the main Ambari UI we can click on the HDFS service to get a summary of each component.
The NameNode component is started.¬† Another component called SNameNode is stopped.¬† If we hover our mouse over each component we can see which host they are on.
The NameNode is on bdavm306 and the SNameNode is on bdavm307.¬† Going back in our investigation, we saw that host bdavm306 did not have a bigsql user.¬† Host bdavm307 does have a bigsql user, but maybe having SNameNode stopped is making it unable to perform the permission check, or maybe SNameNode is only used when NameNode is not operational.
In any case, we have a few different things we can try.¬† Should we check the internet first?¬† I don’t want to waste all the effort we’ve put into this, so I will leave that exercise to you if you’re at all curious.¬† It will be amusing if this post comes up at the top of the search results.
I see two alternatives, neither of which is particularly disruptive, so let’s try both.¬† First we can try starting the SNameNode, then rerun the service check.
Well, that wasn’t it.¬† At this point we should check the new service check failure to make sure it is the same as the old one… it is.
The other alternative is to create the Big SQL user on bdavm306 with hadoop as the primary group.¬† You may have noticed earlier that we had the Big SQL user on three hosts even though Big SQL only had components on two of the three.¬† Maybe the creation of the user on bdavm306 failed, or somebody accidentally removed the user.
We can create the Big SQL user with the useradd command.¬† Here is the command I used to create it with the same group membership seen on the other hosts:
# useradd bigsql -g hadoop --groups hadoop,users,hive,bigsqladm
Now let’s rerun the service check… voila!
We can conclude the Big SQL user must exist on the HDFS NameNode host.¬† Problem solved.¬† Good work, team!
The scenario we just covered ran the service check manually from the Big SQL service page in the Ambari UI.¬† As mentioned earlier, the service check is also run at the end of install.¬† Here are some images to show you a service check failure during install.¬† The process of investigation would be the same as the one we just went through.¬† We can exit the Add Service Wizard when an install service check fails, since the check is run post-install and only if Ambari thinks the service installed successfully.
Click the ‘Warnings encountered’ link and we see the following:
Click the ‘Check BigInsights – Big SQL’ link to view the Ambari stderr and stdout logs.¬† Happy hunting!