AI engineers can use watsonx.ai to incorporate foundation models to develop generative AI solutions. Now, you can import and deploy custom foundation models in watsonx.ai on IBM Cloud. So, you can fine tune a foundation model outside the watsonx platform and then import and deploy it for use in the watsonx.ai Prompt Lab or the watsonx API just like any other foundation model. You can easily leverage a foundation model that is tuned for a specific language, industry or business domain.
In this tutorial, we will import an open-source foundation model from Hugging Face to demonstrate the deployment process. For cloud object storage, we will use the Amazon Simple Storage Service (S3). You’ll learn how to import and deploy custom foundation models using watsonx.ai on IBM Cloud, including the considerations, requirements, and best practices to ensure a smooth deployment.
It must include the model details in the config.json file. The config.json is required to load the model in the Text Generation Inference Server (TGIS) runtime. Also, the watsonx.ai deployment service requires the config.json file be present in the main content folder for the foundation model after it is uploaded to the cloud object storage.
It must include a tokenizer for authentication.
For this tutorial, we will be using the Falcon-40b model from Hugging Face, a repository for open source foundation models used by many model builders. Let’s check this model to ensure it meets the requirements.
Open the Falcon-40b model on the Hugging Face website, and click Files and Versions.
Verify that the config.json and tokenizer.json files exist.
Make sure that the model is in the safetensors format (has the .safetensors extension) in the foundation model main folder.
Open the config.json file and check for the model type. This file is built with the supported Falcon architecture.
Step 2. Download the custom foundation model
Follow these steps to download a custom foundation model with Hugging Face command-line interface:
Install the huggingface-cli package with pip:
pip install -U "huggingface\_hubcliclicli"
Show more
Verify that the huggingface-cli is correctly set up:
huggingface-cli –help
Show more
Configure the HF_TOKEN environment variable and log in to huggingface-cli by using the HF_TOKEN environment variable:
You must make sure that your model is compatible with the Text Generation Inference (TGI) standard and is built with a supported model architecture and model type. For more information, see Planning to deploy a custom foundation model in the watsonx docs.
If your model is not in the .safetensors format and does not contain the tokenizer.json file, follow these steps to convert your model to the .safetensors format:
Step 4. Set up cloud object storage and then add the model
To deploy a custom foundation model for inferencing with watsonx.ai, you must upload the model to cloud storage. You can use the bucket in the IBM Cloud Object Storage that is associated with your deployment space or an external cloud storage.
When the model is uploaded, create a corresponding model asset in a deployment space. If you upload your model to a remote cloud storage, you must create a connection to it that is based on your personal task credentials.
Follow these steps to set up and add your downloaded model to Amazon Simple Storage Service (Amazon S3).
Install the Amazon Web Services command-line interface with pip:
pip install awscli
Show more
Set up the required environment variables:
exportAWS_ACCESS_KEY_ID="<your AWS access key>"exportAWS_SECRET_ACCESS_KEY="<your AWS secret access key>"exportENDPOINT="<s3 endpoint URL>"exportBUCKET_NAME="<name of the bucket to upload the model>"exportMODEL_FOLDER="<name of the new folder to create in the bucket>"
Show more
Add the model to the IBM Cloud Object Storage bucket by using the Amazon Web Services CLI:
echo"Model folder name in cos bucket $MODEL_FOLDER"
aws --endpoint-url ENDPOINTs3cp{ENDPOINT} s3 cp ENDPOINTs3cp{MODEL_DIR} s3://BUCKETNAME/{BUCKET_NAME}/BUCKETNAME/{MODEL_FOLDER}/ --recursive--follow-symlinks
Show more
Step 5. Import the custom foundation model asset to a deployment space
After storing the model in cloud storage, you can add the model asset by importing it to a deployment space. You can use either the watsonx.ai user interface or the watsonx.ai API
Use the watsonx.ai user interface
Follow these steps to import your custom foundation model from the user interface in watsonx.ai:
In your deployment space, go to Assets and then click Import.
Select the Custom foundation model option.
Select the connection to the cloud storage where the model is located (or create a new connection, if needed).
Select the folder that contains your model.
Enter the required information. If you don't submit any entries for model parameters, default values are used.
Click Import.
Use the watsonx.ai API
Use this curl command to create the deployment using the watsonx.ai API.
Step 6. Create a deployment for your custom foundation model
Follow these steps to create a deployment for a custom foundation model:
In your deployment space or your project, go th the Assets tab.
Find your model in the asset list, click the Menu icon , and select Deploy.
Enter a name for your deployment and optionally enter a serving name, description, and tags.
Select a configuration for your model.
Optional: If you want to override some of the base model parameters, click Model deployment parameters and then enter new parameter values:
Data type: Choose the float16 or bfloat16 to specify the data type for your model.
Max batch size: Enter the maximum batch size for your model.
Max concurrent requests: Enter the maximum number of concurrent requests that can be made to your model.
Max new tokens: Enter the maximum number of tokens that can be created for your model for an inference request.
Max sequence length: Enter the maximum sequence length for your model.
Click Create.
When the custom foundation model asset has been created, you are ready to create the online deployment.
curl -X POST "https://<your cloud hostname>/ml/v4/deployments?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{"asset":{
"id":<your custom foundation model asset id>
},
"online":{
"parameters":{
"serving_name":"test_custom_fm",
"foundation_model": {
"max_sequence_length": 4096
}
}
},
"hardware_spec": {
"id": "<your custom hardware spec id>", // Use either "id"or"name""num_nodes": 1
},
"description": "Testing deployment using custom foundation model",
"name":"custom_fm_deployment",
"space_id":<your space id> // for project deployments, use"project_id"
}'
Show more
Guidelines for selecting a hardware specification
When you deploy your custom foundation model, a suggested configuration is pre-selected for you. However, this configuration might not always be the best fit for your specific model.
If you have a different model, follow these rules:
Assign the Small configuration to any double-byte precision model under 26B parameters, subject to testing and validation.
Assign the Medium configuration to any double-byte precision model between 27B and 53B parameters, subject to testing and validation.
Assign the Large configuration to any double byte precision model between 54B and 106B parameters, subject to testing and validation.
If the selected configuration fails during the testing and validation phase, consider exploring the next higher configuration available.
Step 7. Prompt your custom foundation model
Now that you have stored and deployed your custom foundation model, you can start using it. You can use the Prompt Lab to prompt the model and generate responses or create a prompt programmatically.
To prompt the custom model using the watsonx.ai API, run this code:
curl -X POST "https://<your cloud hostname>/ml/v1/deployments/<your deployment ID>/text/generation?version=2024-01-29" \
-H "Authorization: Bearer $TOKEN" \
-H "content-type: application/json" \
--data '{"input": "Hello, what is your name",
"parameters": {
"max_new_tokens": 200,
"min_new_tokens": 20
}
}'
Show more
Congratulations, your model is now fully deployed and ready to use!
Summary and next steps
By deploying a custom foundation model to watsonx.ai on IBM Cloud, you are able to work with a model that best fits your project and business needs. A custom model can be any model that is built with an architecture supported by watsonx.ai, which greatly expands your options and flexibility in terms of the models that best fit your specific use case.
In this tutorial, we covered the steps necessary to install and deploy a custom foundation model, using the example of a foundation model from Hugging Face. To learn how to deploy custom foundation models on-prem, see Deploying custom foundation models in watsonx.ai on-prem.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.