Tutorial

Deploying custom foundation model in watsonx.ai on IBM Cloud

Tune your models for your domain, then import them to use the power of the generative AI platfrom, watsonx.ai

By

Saloni Saluja

AI engineers can use watsonx.ai to incorporate foundation models to develop generative AI solutions. Now, you can import and deploy custom foundation models in watsonx.ai on IBM Cloud. So, you can fine tune a foundation model outside the watsonx platform and then import and deploy it for use in the watsonx.ai Prompt Lab or the watsonx API just like any other foundation model. You can easily leverage a foundation model that is tuned for a specific language, industry or business domain.

In this tutorial, we will import an open-source foundation model from Hugging Face to demonstrate the deployment process. For cloud object storage, we will use the Amazon Simple Storage Service (S3). You’ll learn how to import and deploy custom foundation models using watsonx.ai on IBM Cloud, including the considerations, requirements, and best practices to ensure a smooth deployment.

Prerequisites

You must have the following accounts:

Steps

To deploy custom foundation models in watsonx.ai on IBM Cloud, complete these steps:

  1. Review the requirements
  2. Download the custom foundation model
  3. Convert the model to the required format
  4. Set up cloud object storage and then add the model
  5. Import the custom foundation model asset to a deployment space
  6. Create a deployment for your custom foundation model
  7. Prompt your custom foundation model

Step 1: Review the requirements

Your custom foundation model must meet these requirements:

  • It must be compatible with the Text Generation Inference (TGI) standard
  • It must be built with supported model architecture
  • It must be built with the gptq supported model type
  • It must be in the safetensor format
  • It must include the model details in the config.json file. The config.json is required to load the model in the Text Generation Inference Server (TGIS) runtime. Also, the watsonx.ai deployment service requires the config.json file be present in the main content folder for the foundation model after it is uploaded to the cloud object storage.
  • It must include a tokenizer for authentication.

For this tutorial, we will be using the Falcon-40b model from Hugging Face, a repository for open source foundation models used by many model builders. Let’s check this model to ensure it meets the requirements.

  1. Open the Falcon-40b model on the Hugging Face website, and click Files and Versions.

  2. Verify that the config.json and tokenizer.json files exist.

  3. Make sure that the model is in the safetensors format (has the .safetensors extension) in the foundation model main folder.

    Falcon-40b model files on Hugging Face

  4. Open the config.json file and check for the model type. This file is built with the supported Falcon architecture.

    Falcon-40b model config.json file with model type highlighted

Step 2. Download the custom foundation model

Follow these steps to download a custom foundation model with Hugging Face command-line interface:

  1. Install the huggingface-cli package with pip:

     pip install -U "huggingface\_hubcliclicli"
    
  2. Verify that the huggingface-cli is correctly set up:

     huggingface-clihelp
    
  3. Configure the HF_TOKEN environment variable and log in to huggingface-cli by using the HF_TOKEN environment variable:

     export HF_TOKEN=***** 
     huggingface-cli login --token ${HF_TOKEN}
    
  4. Set up a directory on your local disk to download your model to, and set up a name for your model:

     export MODEL_NAME= << model_name >> 
     export MODEL_DIR= << model_dir >> 
     mkdir ${MODEL_DIR}
    
  5. Download the model by using the Hugging face command-line interface:

     huggingface-cli download MODELNAME−−localdir{MODEL_NAME} --local-dir MODELN​AME−−localdir{MODEL_DIR} --cache-dir ${MODEL_DIR}
    

Step 3. Convert the model to the required format

You must make sure that your model is compatible with the Text Generation Inference (TGI) standard and is built with a supported model architecture and model type. For more information, see Planning to deploy a custom foundation model in the watsonx docs.

If your model is not in the .safetensors format and does not contain the tokenizer.json file, follow these steps to convert your model to the .safetensors format:

  1. Set up the TGIS image and pull the image:

     export TGISIMAGE="quay.io/modh/text-generation-inference:rhoai-2.8-58cac74" podman pull ${TGISIMAGE}
     podman pull ${TGISIMAGE}
    
  2. Download the model by using the TGIS image:

     container_id=(podmanrun−itd−−privileged−u0−v(podman run -itd --privileged -u 0 -v (podmanrun−itd−−privileged−u0−v{MODEL_DIR}:/tmp ${TGISIMAGE} tail -f /dev/null) 
     podman exec -it containeridbash−c′exportMODELPATH=/tmp;text−generation−serverconvert−to−safetensors{container_id} bash -c 'export MODEL_PATH=/tmp ; text-generation-server convert-to-safetensors containeri​dbash−c′exportMODELP​ATH=/tmp;text−generation−serverconvert−to−safetensors{MODEL_PATH} ; text-generation-server convert-to-fast-tokenizer ${MODEL_PATH}'
    

Step 4. Set up cloud object storage and then add the model

To deploy a custom foundation model for inferencing with watsonx.ai, you must upload the model to cloud storage. You can use the bucket in the IBM Cloud Object Storage that is associated with your deployment space or an external cloud storage.

When the model is uploaded, create a corresponding model asset in a deployment space. If you upload your model to a remote cloud storage, you must create a connection to it that is based on your personal task credentials.

Follow these steps to set up and add your downloaded model to Amazon Simple Storage Service (Amazon S3).

  1. Install the Amazon Web Services command-line interface with pip:

     pip install awscli
    
  2. Set up the required environment variables:

     export AWS_ACCESS_KEY_ID="<your AWS access key>"
     export AWS_SECRET_ACCESS_KEY="<your AWS secret access key>"
     export ENDPOINT="<s3 endpoint URL>"
     export BUCKET_NAME="<name of the bucket to upload the model>"
     export MODEL_FOLDER="<name of the new folder to create in the bucket>"
    
  3. Add the model to the IBM Cloud Object Storage bucket by using the Amazon Web Services CLI:

     echo "Model folder name in cos bucket $MODEL_FOLDER"
     aws --endpoint-url ENDPOINTs3cp{ENDPOINT} s3 cp ENDPOINTs3cp{MODEL_DIR} s3://BUCKETNAME/{BUCKET_NAME}/BUCKETN​AME/{MODEL_FOLDER}/ --recursive --follow-symlinks
    

Step 5. Import the custom foundation model asset to a deployment space

After storing the model in cloud storage, you can add the model asset by importing it to a deployment space. You can use either the watsonx.ai user interface or the watsonx.ai API

Use the watsonx.ai user interface

Follow these steps to import your custom foundation model from the user interface in watsonx.ai:

  1. In your deployment space, go to Assets and then click Import.
  2. Select the Custom foundation model option.
  3. Select the connection to the cloud storage where the model is located (or create a new connection, if needed).
  4. Select the folder that contains your model.

    Folder where your foundation model lives

  5. Enter the required information. If you don't submit any entries for model parameters, default values are used.

    Model parameter values

  6. Click Import.

Use the watsonx.ai API

Use this curl command to create the deployment using the watsonx.ai API.

curl -X POST "https://<your cloud hostname>/ml/v4/models?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{
    "type": "custom_foundation_model_1.0",
    "framework": "custom_foundation_model",
    "version": "1.0",
    "name": "<asset name>",
    "software_spec": {
        "name": "watsonx-cfm-caikit-1.0"
    },
    "space_id": "<your space ID>",
    "foundation_model": {
        "model_id": "<model ID>",
        "parameters": [
        {
            "name": "dtype",
            "default": "float16",
            "type": "string",
            "display_name": "Data Type",
            "options": ["float16","bfloat16"]
        },
        {
            "name": "max_batch_size",
            "default": 256,
            "type": "number",
            "display_name": "Max Batch Size"
        }],

        "model_location": {
            "type": "container",
            "connection": {
                "id": "<your connection ID>"
            },
            "location": {
                "bucket": "<bucket where the model is located>"
            }
        }
    }
}'

Step 6. Create a deployment for your custom foundation model

Follow these steps to create a deployment for a custom foundation model:

  1. In your deployment space or your project, go th the Assets tab.
  2. Find your model in the asset list, click the Menu icon , and select Deploy.
  3. Enter a name for your deployment and optionally enter a serving name, description, and tags.
  4. Select a configuration for your model.

    Screenshot of the configuration of the model

  5. Optional: If you want to override some of the base model parameters, click Model deployment parameters and then enter new parameter values:

    • Data type: Choose the float16 or bfloat16 to specify the data type for your model.
    • Max batch size: Enter the maximum batch size for your model.
    • Max concurrent requests: Enter the maximum number of concurrent requests that can be made to your model.
    • Max new tokens: Enter the maximum number of tokens that can be created for your model for an inference request.
    • Max sequence length: Enter the maximum sequence length for your model.
  6. Click Create.

    When the custom foundation model asset has been created, you are ready to create the online deployment.

     curl -X POST "https://<your cloud hostname>/ml/v4/deployments?version=2024-01-29" \
     -H "Authorization: Bearer $TOKEN" \
     -H "content-type: application/json" \
     --data '{
       "asset":{
         "id":<your custom foundation model asset id>
       },
       "online":{
         "parameters":{
           "serving_name":"test_custom_fm",
           "foundation_model": {
               "max_sequence_length": 4096
           }
         }
       },
       "hardware_spec": {
         "id": "<your custom hardware spec id>", // Use either "id" or "name"
         "num_nodes": 1
       },
       "description": "Testing deployment using custom foundation model",
       "name":"custom_fm_deployment",
       "space_id":<your space id> // for project deployments, use "project_id"
     }'
    

Guidelines for selecting a hardware specification

When you deploy your custom foundation model, a suggested configuration is pre-selected for you. However, this configuration might not always be the best fit for your specific model.

If you have a different model, follow these rules:

  • Assign the Small configuration to any double-byte precision model under 26B parameters, subject to testing and validation.
  • Assign the Medium configuration to any double-byte precision model between 27B and 53B parameters, subject to testing and validation.
  • Assign the Large configuration to any double byte precision model between 54B and 106B parameters, subject to testing and validation.

If the selected configuration fails during the testing and validation phase, consider exploring the next higher configuration available.

Step 7. Prompt your custom foundation model

Now that you have stored and deployed your custom foundation model, you can start using it. You can use the Prompt Lab to prompt the model and generate responses or create a prompt programmatically.

To prompt the custom model using the watsonx.ai API, run this code:

curl -X POST "https://<your cloud hostname>/ml/v1/deployments/<your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{
 "input": "Hello, what is your name",
 "parameters": {
    "max_new_tokens": 200,
    "min_new_tokens": 20
 }
}'

Congratulations, your model is now fully deployed and ready to use!

Summary and next steps

By deploying a custom foundation model to watsonx.ai on IBM Cloud, you are able to work with a model that best fits your project and business needs. A custom model can be any model that is built with an architecture supported by watsonx.ai, which greatly expands your options and flexibility in terms of the models that best fit your specific use case.

In this tutorial, we covered the steps necessary to install and deploy a custom foundation model, using the example of a foundation model from Hugging Face. To learn how to deploy custom foundation models on-prem, see Deploying custom foundation models in watsonx.ai on-prem.

To learn about other deployment options, see the watsonx documentation.