IBM Support

Customize Ambari Service Check Timeout Value - Hadoop Dev

Technical Blog Post


Abstract

Customize Ambari Service Check Timeout Value - Hadoop Dev

Body

Overview

Ambari terminates service checks if they do not finish within a predefined period of time (timeout). This timeout mechanism ensures no service checks run for an extended period of time, hogging system resources in the process.

On a real world cluster, the performance of the cluster and its current workload greatly impact how soon a service’s service check can finish. Therefore, it’s important to customize the timeout value correspondingly.

In this article, we describe how to change the timeout value for service check.

Customize Service Check Timeout

The service check timeout value is defined in the corresponding service metainfo.xml, as shown in the following code snippet. The timeout value is defined in seconds and the default value is 300 seconds.


<commandScript>
<script>scripts/service_check.py</script>
<scriptType>PYTHON</scriptType>
<timeout>300</timeout>
</commandScript>

Locate metainfo.xml for Service Check Timeout

Because of the stack and service level inheritance design from Ambari, you may find that the service check timeout value is not listed in the current stack. In this case, trace the inheritance path on both the stack level and the service level in order to locate the service at the top of the inheritance hierarchy.

Service level inheritance is defined by the <extends> section in the service metainfo.xml, as shown in the following code snippet example. <extends> points to the location of the service inherited. If the location is a relative path, it is relative to /var/lib/ambari-server/resources


<services>
<service>
<name>YARN</name>
<version>2.7.2</version>
<extends>common-services/YARN/2.7.2.4.2</extends>
</service>
</services>

Stack level inheritance is defined by the <extends> section in the stack metainfo.xml. <extends> points to the version of the stack one level above the currently stack. For example, BigInsights 4.1 stack has the following metainfo.xml information defined. The <extends> section points to 4.0, meaning BigInsights 4.1 inherits from BigInsights 4.0.


<versions>
<active>true<<active>
</versions>
<extends>4.0</extends>

If the service metainfo.xml in the current stack has the <extends> section, examine the service metainfo.xml pointed in the <extends> for the service check timeout value and customize it as needed.

If the service metainfo.xml in the current stack does not have the <extends> section, examine the stack level metainfo.xml for the older stack. Check corresponding service metainfo.xml in the older stack. The service metainfo.xml may contain the <extends> pointing you to a higher level service metainfo.xml or it can have the service check timeout defined explicitly in which case you can customize it as needed.

Make Custom Timeout Take Effect

Once you have located the service metainfo.xml with the service check timeout defined and adjusted the value according to your cluster performance, you need to restart Ambari server to make the new timeout take effect. You do not need to restart all Ambari Agents.

Summary

In this article, we described the purpose of the service timeout and how to customize it based on the cluster performance and workload. We also briefly described how to locate the service metainfo.xml in the stack inheritance hierarchy in order to find the one with the service check timeout defined. Finally, to make the updated timeout take effect, you need to restart Ambari server so it can read the new timeout value for the service.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260067