Ambari wizard-based installs of the IBM Open Platform (IOP) are reasonably simple and provide a straightforward way to customize the cluster layout and configuration. A non-interactive method to achieve cluster installs is possible in Ambari using a feature called Ambari blueprints. There are a few good articles out there that explain the use of blueprint-based installations with Ambari. In this blogpost we will cover the basics of using blueprints with the IOP stack and cover a few nuances when using blueprints for cluster deployment.

A blueprint-based install involves using REST APIs to submit cluster deployment instructions to the Ambari server. For this purpose you can use any REST API client. We will make use of curl to illustrate the usage.

Before we get into details of the procedure we will briefly define the two artifacts used in discussions involving blueprint-based installs – blueprint and cluster template.

    • Blueprint: Defines what the cluster comprises of in terms of the set of service components, component layout and component configurations.
    • Cluster template: Maps the component layout to physical hosts of the cluster.

The steps for a blueprint-based install can be summarized by the following:

  1. Create a blueprint
  2. Create a cluster template
  3. Prepare the nodes of the cluster
  4. Register the blueprint
  5. Submit the cluster creation request
  6. Monitor the progress of the cluster creation request
  7. Start any services found to have not started successfully

When you use the Ambari REST API calls for cluster deployment, the bulk of the work is accomplished by two POST method calls where the detailed instructions are specified in the request body. When we use curl as the REST API client, these instructions can be passed in via JSON files that can be customized for different clusters. The JSON file used for the first request is commonly referred to as the blueprint. Step 4 above makes use of this JSON file to register the blueprint with the Ambari server. The JSON file for the second request is commonly referred to as the cluster template and it has a reference to the blueprint it is to be applied to. Step 5 above makes use of the cluster template to submit the cluster creation request to the Ambari server.

Each of the steps listed is elaborated below.

1. Create a blueprint

You can generate a blueprint from an existing cluster or create one by using an editor. When possible, export a blueprint from an existing cluster that has a profile matching the cluster to be created. Starting with a blueprint of a successfully running cluster is the best way to ensure valid cluster service component layout and suitable configuration property values. For this the existing cluster has should ideally use nodes that are similar to the nodes to be used in the new cluster to be installed. Obviously, this is not always possible. If you do not have a matching cluster to export a blueprint from, you can start by creating the blueprint by hand editing the JSON file or editing a blueprint exported from a cluster with a different topology, for the cluster you are creating.

The blueprint includes

    • “Blueprints”: the stack and version this blueprint applies to
    • “host_groups”: the set of components for services from that stack that you want to install, together with how they are grouped together. A grouping is defined by a host-group. A set of host-groups defines the component layout for the cluster.
    • “configurations”: the configuration for components. Configuration can apply cluster-wide or only for specific host-groups depending on where it is specified.

When using a hand-edited blueprint, topology validation might fail. While it is possible to proceed by disabling topology validation, this is not recommended as it would probably lead to other problems. When hand-editing the blueprint, you can leave any or all the configuration empty. Leaving the configuration empty implies that the configuration specified in the cluster template would be used or if that is empty or unspecified, then the default configuration for the services components will be used if you are using versions of Ambari prior to 2.2.0. While using the wizard-based installs Ambari’s stack advisor comes into play and determines the actual configuration property values overriding the default values. In Ambari version 2.2.0 there is a feature that enables one to specify a configuration recommendation strategy in the cluster template and thus allows leveraging the Ambari stack advisor recommendations. This is a great enhancement that helps getting a blueprint deployment closer to that achieved via wizard-based install. This option will not be utilized in the examples that we will use, as we are using the latest available version of IOP currently which is IOP 4.1 that includes Ambari 2.1.0.

The following is an example of a curl command used to export a blueprint. Note that for all examples we use environment variables for AMBARI_SERVER, AMBARI_SERVER_PORT, CLUSTER_NAME, BLUEPRINT_NAME, AMBARI_PASSWORD and so forth.

curl -u admin:${AMBARI_PASSWORD} -H "X-Requested-By: ambari" -X GET http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/clusters/${CLUSTER_NAME}?format=blueprint > ${BLUEPRINT_NAME}_blueprint.json

A usable blueprint for the IOP 4.1 stack is shown here.

{
  "Blueprints" : {
    "stack_name" : "BigInsights",
    "stack_version" : "4.1"
  },
  "host_groups" : [
    {
      "components" : [
        { "name" : "AMBARI_SERVER" },
        { "name" : "HDFS_CLIENT" },
        { "name" : "HIVE_CLIENT" },
        { "name" : "KNOX_GATEWAY" },
        { "name" : "MAPREDUCE2_CLIENT" },
        { "name" : "METRICS_MONITOR" },
        { "name" : "SOLR" },
        { "name" : "SPARK_JOBHISTORYSERVER" },
        { "name" : "SPARK_THRIFTSERVER" },
        { "name" : "YARN_CLIENT" }
      ],
      "configurations" : [ ],
      "name" : "master_1",
      "cardinality" : "1"
    },
    {
      "components" : [
        { "name" : "APP_TIMELINE_SERVER" },
        { "name" : "HDFS_CLIENT" },
        { "name" : "HISTORYSERVER" },
        { "name" : "KAFKA_BROKER" },
        { "name" : "METRICS_MONITOR" },
        { "name" : "NAMENODE" },
        { "name" : "ZOOKEEPER_SERVER" }
      ],
      "configurations" : [ ],
      "name" : "master_2",
      "cardinality" : "1"
    },
    {
      "components" : [
        { "name" : "HBASE_MASTER" },
        { "name" : "HDFS_CLIENT" },
        { "name" : "MAPREDUCE2_CLIENT" },
        { "name" : "METRICS_COLLECTOR" },
        { "name" : "METRICS_MONITOR" },
        { "name" : "OOZIE_SERVER" },
        { "name" : "RESOURCEMANAGER" },
        { "name" : "SECONDARY_NAMENODE" },
        { "name" : "YARN_CLIENT" },
        { "name" : "ZOOKEEPER_SERVER" }
      ],
      "configurations" : [ ],
      "name" : "master_3",
      "cardinality" : "1"
    },
    {
      "components" : [
        { "name" : "HDFS_CLIENT" },
        { "name" : "HIVE_METASTORE" },
        { "name" : "HIVE_SERVER" },
        { "name" : "MAPREDUCE2_CLIENT" },
        { "name" : "METRICS_MONITOR" },
        { "name" : "MYSQL_SERVER" },
        { "name" : "PIG" },
        { "name" : "SQOOP" },
        { "name" : "WEBHCAT_SERVER" },
        { "name" : "YARN_CLIENT" },
        { "name" : "ZOOKEEPER_CLIENT" },
        { "name" : "ZOOKEEPER_SERVER" }
      ],
      "configurations" : [ ],
      "name" : "master_4",
      "cardinality" : "1"
    },
    {
      "components" : [
        { "name" : "DATANODE" },
        { "name" : "FLUME_HANDLER" },
        { "name" : "HBASE_CLIENT" },
        { "name" : "HBASE_REGIONSERVER" },
        { "name" : "HBASE_REST_SERVER" },
        { "name" : "HCAT" },
        { "name" : "HDFS_CLIENT" },
        { "name" : "HIVE_CLIENT" },
        { "name" : "MAPREDUCE2_CLIENT" },
        { "name" : "METRICS_MONITOR" },
        { "name" : "NODEMANAGER" },
        { "name" : "OOZIE_CLIENT" },
        { "name" : "PIG" },
        { "name" : "SLIDER" },
        { "name" : "SPARK_CLIENT" },
        { "name" : "SQOOP" },
        { "name" : "YARN_CLIENT" },
        { "name" : "ZOOKEEPER_CLIENT" }
      ],
      "configurations" : [ ],
      "name" : "slaves_and_clients",
      "cardinality" : "3"
    }
  ],
  "configurations" : [
    {
      "hive-site" : {
        "properties" : {
          "javax.jdo.option.ConnectionPassword": "passw0rd"
        }
      }
    }
  ]
}

The above blueprint specifies all the services and components found in the IOP’s BigInsights 4.1 stack. This blueprint uses 4 master nodes and any number of slave/client nodes you might want to specify. MASTER, SLAVE and CLIENT components are color-coded according to component category. The component list in each host group shown is alphabetically sorted above. A blueprint generated by exporting it from an existing cluster will not have the components sorted in this manner.¬† Another difference that should be noted is that the configurations are left empty for brevity except for the cluster-wide configuration for hive-site. The reason for including this specific property is explained in Note (4) below. Note also that some of the SLAVE¬† (e.g. METRICS_MONITOR) and CLIENT (e.g. HDFS_CLIENT) components appear in the MASTER host groups due to dependencies in the stack’s core service definitions. While this blueprint can be customized to some extent, when changing the layout or placement of service components it is best not to make assumptions regarding redundancy of any of the SLAVE or CLIENT components on hosts used for MASTER components. Care should also be taken to preserve some constraints that are internally defined by Ambari. For instance, WEBHCAT_SERVER must be colocated on the same host as HIVE_SERVER. It is best to generate blueprints by exporting them from clusters with a different number of master nodes and installed using the Ambari wizard. Using such blueprints as a starting point and matching the number of master nodes you plan to use, helps ensure that layout dependencies are satisfied and gives you the best chance of a successful deployment.

2. Create a cluster template

The cluster template has to be created by hand-editing a file for the required format and contents. The cluster template includes

    • “blueprint”: the registered blueprint name
    • “host_groups”: host-groups corresponding to those defined in the blueprint but mapping to actual hosts or a “host count” based on certain rules or predicates
    • “configurations”: configuration to override any configurations defined in the registered blueprint
    • “default_password”: default password for all passwords that are not specified via configuration sections in the blueprint or cluster template. These passwords are normally flagged as fields requiring user input before a wizard-based install can navigate to the next stage.

The cluster template is provided with the Ambari REST API call used for triggering the cluster deployment in step 5. A usable cluster template is shown here. Note that in this case “configurations” has been completely omitted instead of defining it as an empty array. The host_count specified for the slaves_and_clients host group has to be strictly adhered to. i.e. those hosts must be available. Note (2) elaborates on this.

{
  "blueprint" : "4masters_Nslaves",
  "default_password" : "passw0rd",
  "host_groups" : [
    {
      "name" : "master_1",
      "hosts" : [
        {
          "fqdn" : "node1.svl.ibm.com"
        }
      ]
    },
    {
      "name" : "master_2",
      "hosts" : [
        {
          "fqdn" : "node2.svl.ibm.com"
        }
      ]
    },
    {
      "name" : "master_3",
      "hosts" : [
        {
          "fqdn" : "node3.svl.ibm.com"
        }
      ]
    },
    {
      "name" : "master_4",
      "hosts" : [
        {
          "fqdn" : "node4.svl.ibm.com"
        }
      ]
    },
    {
      "name" : "slaves_and_clients",
      "host_count" : "12"
    }
  ]
}

3. Prepare the nodes of the cluster

There are two options to prepare:

    • Install the Ambari server and agents and manually register the hosts for the Ambari agents.
    • Install the Ambari server and then use the install-wizard only for the purpose of installing Ambari agents and host registration.

In either case note that you have to ensure that the nodes also meet the pre-requisites for installing the IOP stack and to just ensure that the hosts are “clean” and artifacts leftover from previous installations, if any,¬† will not affect the blueprint deployment. Host registration using the install-wizard does the primary checks as part of the registration. If you choose the manual registration method, remember to use alternative mean to ensure that the hosts meet the pre-requisites. Often, failures during blueprint-based installs are harder to diagnose and so it is even more important to ensure that this step is not skipped.

You can check the successfully registered hosts by using the following command:

curl -u admin:${AMBARI_PASSWORD} http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/hosts

Running this and verifying that all the expected hosts are successfully registered before starting a blueprint install is highly recommended.

4. Register the blueprint

Before a blueprint is used for a cluster deployment it has to be registered with the Ambari server using the associated Ambari REST API.

curl -u admin:${AMBARI_PASSWORD} -H "X-Requested-By: ambari" -X POST -d @./${BLUEPRINT_NAME}_blueprint.json http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/blueprints/${BLUEPRINT_NAME}

The response to this request is empty unless there is an error. You can check which if the blueprint has been registered by using the following command and checking that the blueprint_name is listed in the response:

curl -u admin:${AMBARI_PASSWORD} http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/blueprints/${BLUEPRINT_NAME}

The response to this request will be something similar to:

{
  "href" : "http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/blueprints/",
  "items" : [
    {
      "href" : "http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/blueprints/${BLUEPRINT_NAME}",
      "Blueprints" : {
        "blueprint_name" : "${BLUEPRINT_NAME}"
      }
    }
  ]
}

5. Submit the cluster creation request

The cluster template from step 2, is used in the call for triggering the cluster deployment in step 5. The cluster template has a reference to the registered blueprint to be used for the cluster deployment.

Example:

curl -u admin:${AMBARI_PASSWORD} -H "X-Requested-By: ambari" -X POST -d @./${BLUEPRINT_NAME}_cluster_template.json http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/clusters/${CLUSTER_NAME}

The response to this request will be something similar to:

{
    "href": "http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/clusters/${CLUSTER_NAME}/requests/1",
    "Requests": {
        "id": 1,
        "status": "Accepted"
    }
}

6. Monitor the progress of the cluster creation request

Using the href in the response of the cluster creation request one can monitor the progress and completion status.

curl -u admin:${AMBARI_PASSWORD} -s -H "X-Requested-By: ambari" -X GET http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/clusters/${CLUSTER_NAME}/requests/1 | head -n 24

progress_percent and request_status are two of the fields of primary interest. When the progress_percent is 100.0 and request_status is COMPLETED, the blueprint-based installation is deemed complete.

Note that once the cluster deployment has been triggered, you are also able to monitor the progress by using a browser to log on to the Ambari Server and clicking on the box to display background operations.

Ambari_top_menu_bar

7. Start any services that are found to have not started successfully.

This step is not always necessary. Sometimes, different placement of service components can lead to timing issues where certain dependent components are not “ready” even though they were started before the depending services. This can result in some service components not being started successfully although initially the installation appears to have been successful. Blueprint installations attempt to achieve parallelize installs and start up of service components on different hosts. So they are prone to such problems. As a good practice, issue a “Start All” command immediately upon completion of the blueprint installation. The following curl command makes use of the provided REST API for this purpose and helps start any components remaining in stopped (“INSTALLED”) state.

curl -u admin:${AMBARI_PASSWORD} -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"_PARSE_.START.ALL_SERVICES","operation_level":{"level":"CLUSTER","cluster_name" :"${CLUSTER_NAME}"}},"Body":{"ServiceInfo":{"state":"STARTED"}}}' http://${AMBARI_SERVER}:${AMBARI_SERVER_PORT}/api/v1/clusters/${CLUSTER_NAME}/services?

Note:

(1) With IOP 4.1 that uses Ambari 2.1.0, Ambari stack advisor recommendations are bypassed when blueprints are used. If configurations are left empty in the blueprint and cluster template, the default configuration property values are applied. So the recommendation is to use the values from an exported blueprint from a similar installation.

(2) cardinality field specified for a host group in the blueprint is optional and is only used as a hint to the deployer. When a blueprint is exported for an existing cluster, the cardinality value specifies the number of hosts in the cluster which are represented by the host group. The host_count value in a cluster template, that is an alternative to specifying explicit hosts (“fqdn”) in a host group, must be set to the exact number of designated or free hosts. These are hosts matching any host_predicate specified for the host group or in absence of that, unused hosts that are registered with the Ambari server. When a host predicate is not specified host_count should be no more than the remaining free hosts registered with the Ambari server. The blueprint install will be found to be stuck in a state of PENDING HOST ASSIGNMENT for any number of hosts that are not available to meet the specified host_count. When in doubt explicitly specify the hosts. For example to specify 3 hosts in the host group “slaves_and_clients”:

        {
            "name": "slaves_and_clients",
            "hosts": [
                {
                    "fqdn": "node101.svl.ibm.com"
                },
                {
                    "fqdn": "node102.svl.ibm.com"
                },
                {
                    "fqdn": "node103.svl.ibm.com"
                }
            ]
        }

(3) Violating certain constraints in service definitions such as for colocation of components or service dependencies, can result in errors during blueprint deployment that are hard to diagnose. General recommendation would be to use a generated blueprint from a small cluster that is known to work and then map it to a larger cluster and not to disable topology validation.

(4) The example of the blueprint used above explicitly specifies a value for javax.jdo.option.ConnectionPassword. Due to a problem in the default configuration for HIVE, the default_password substitution does not occur and leads to a problem resulting in the HIVE_SERVER shutting down post-install due to an authentication issue. This property will not be necessary in future releases of IOP.

(5) Deployment and management of multiple clusters via a single Ambari server will not work and is not yet supported.

(6) Specifying a stack or version in the blueprint that the Ambari version does not support, would not work correctly and would be discouraged as there is no validation currently for that aspect.

In this post, we covered the basic mechanics with blueprint-based cluster installation. We used as an example a reasonable cluster layout that made use of four master nodes and any number of¬† slave/client nodes. We applied it to IOP’s BigInsights 4.1 stack and walked through a sequence of steps to serve as a tutorial. As would be obvious, simple scripting can be used for steps 3-7 to achieve a blueprint install that can be used with different cluster templates for multiple cluster installations.

Blueprint-based installs can be used for scaling a cluster by adding nodes, deploying HA clusters and so forth. With recent enhancements in Ambari 2.2.0, deployment of Kerberized clusters is made possible and few other enhancements are in the pipeline. Some of this material would be covered in blogposts to follow. Blueprint-based installation has some quirks but presents a neat mechanism for automating cluster deployments.

Join The Discussion

Your email address will not be published. Required fields are marked *