A prerequisite check is run on the Big SQL head host and every worker host early in the install process.  If your install fails within a few minutes and the Add Service Wizard screen looks something like this, you may be dealing with a precheck failure:

requiretty

Well, the good news is that precheck failures are among the easiest to diagnose and recover from.¬† First let’s verify we are dealing with a precheck failure.¬† Click the ‘Failures encountered’ link for each failing host (there may be more than one).

logs2

A second link (in this example ‘Big SQL Head Install’) will take you to the Ambari stderr and stdout logs.¬† Start by scrolling down to the bottom of the stderr output.

File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'sudo getent group hadoop' returned 1. sudo: sorry, you must have a tty to run sudo

Look at that!  I try to force a precheck failure by adding requiretty to my sudoers file and instead fail before the precheck is called.  There are a minimal number of operations performed before the precheck is executed, and this is one of them.  You learn something new every day.

I will append an actual precheck failure to the end of this article, but for now let’s run with what we have.¬† Thankfully the error is clear and points us to the root cause of the problem.¬† I’m going to pretend I didn’t intentionally cause the install to fail and we can try to diagnose this as if it were an unexpected event.¬† And how should we do that?¬† To your favorite internet search engine!

A search for ‘Big SQL tty’ brings us right to the always helpful and up to date IBM Knowledge Center, which instructs me to edit sudoers and comment out the following line:

Defaults requiretty

After that is done, I want to retry the install.¬† Luckily, I am still in the Add Service Wizard and there is a ‘Retry’ button just above the list of hosts.

retry

I click the button and the install starts (almost) from scratch.¬† With the tty problem fixed it seems destined to succeed.¬† Let’s reboot a worker host and see what happens!

worker_reboot

Ambari recognized right away that something was up on the worker host.¬† Big SQL can function just fine so long as there is a head node and at least one worker node, so the install continues.¬† I refresh my tea and browse the internet while I wait for it to complete.¬† I’m actually not 100% sure of what the outcome will be.¬† I’m going to make a prediction, and I promise I won’t edit this after the fact.¬† Install of the other worker completes successfully but the head host times out waiting for a response from the worker host that was rebooted.

… time passes …

Okay, this is what we end up with:

reboot2

Here is what we see when clicking on ‘Failures encountered’ for the worker host, bdavm488:

2016-06-17 10:28:34,443 - Execute['sudo /var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/BIGSQL/package/scripts/bigsql-check-mask.sh /var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/BIGSQL/package/scripts'] {}
2016-06-17 10:28:34,455 - Directory['/home/bigsql/hosts'] {'owner': 'bigsql', 'recursive': True}
Retrying ssh to bdavm306.svl.ibm.com from bdavm488.svl.ibm.com with user bigsql. On attempt 1 with result rc=255
Retrying ssh to bdavm306.svl.ibm.com from bdavm488.svl.ibm.com with user bigsql. On attempt 2 with result rc=255
Retrying ssh to bdavm306.svl.ibm.com from bdavm488.svl.ibm.com with user bigsql. On attempt 3 with result rc=255

By comparing this to the Ambari log from a healthy worker, we can see Ambari lost the handle on this worker install operation and the logs ended abruptly.

The head host (bdavm306) is showing ‘Warnings encountered’.¬† If we click through to see the Ambari logs, near the bottom of stdout we see this:

/var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/BIGSQL/package/scripts
User : bigsql
HeadNode : bdavm306.svl.ibm.com
TargetHost: bdavm488.svl.ibm.com
sudo su - bigsql sh -c "ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes bdavm488.svl.ibm.com /var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/BIGSQL/package/scripts/getdb2level.sh"
sudo su - bigsql sh -c "ssh -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes bdavm306.svl.ibm.com /var/lib/ambari-agent/cache/stacks/BigInsights/4.2/services/BIGSQL/package/scripts/getdb2level.sh"
HeadDb2Level=160613 (bdavm306.svl.ibm.com)
TargDb2Level=UNKNOWN (bdavm488.svl.ibm.com)

From this we can see the head host was not able to determine the DB2 level of the worker host.  This suggests one of two things: some error communicating with the worker host, or a failure to install DB2 on the worker host.  I would treat both scenarios the same, and verify the head host has root SSH to the worker, then verify as best I can that the worker host is operating normally (ie. CPU and memory consumption appear normal).

Since the head host is complaining about the DB2 version on the worker host, we could also manually check whether DB2 is installed by running ‘yum list installed | grep -i db2’.¬† It is not installed on that worker, which doesn’t surprise me because I rebooted the worker well before that phase of the install.¬† Everything else looks okay, and I can see that Ambari even automatically restarted the agent on that host.

[root@bdavm488 ~]# ambari-agent status
Found ambari-agent PID: 2981
ambari-agent running.

Sometimes a hiccup like this can happen, and before getting too crazy trying to dig into the root cause of the failure, let’s give that ‘Retry’ button one more go.¬† Right away I can see my other worker (bdavm307) is listed as install complete, so it stayed where we left it: a clean install that was waiting to be registered as part of the Big SQL service on the head host.

After the other worker and Big SQL head install complete, the service starts and the service check is run.  Here we are, a clean install.

clean

Now back to that prechecker failure I promised you.  I managed to get the prechecker to fail by changing the default name of the Big SQL service user to one that exceeds the eight character limit.  In the Ambari UI this appeared as a failure on the head node, just as it did with our requiretty failure earlier.  When I click through to the Ambari logs I see the cause of the failure:

REPORT_LOG: /tmp/bigsql_too_long/logs/bigsql-precheck-1-2016-06-21_10.48.54.5409.report
======================================================================
bigsql-precheck report
OK Userid check
FAIL Big SQL user name length
The name "bigsql_too_long" used for the Big SQL user is too long. It needs
to be 8 characters or less.

Click OK to return to the Add Service Wizard and you will notice we have a problem.  Ambari does not allow us to back up to change the user name.  Though unfortunate, this does give us an opportunity to explore another topic.

Stay tuned.

Join The Discussion

Your email address will not be published. Required fields are marked *