Digital Developer Conference: Hybrid Cloud 2021. On Sep 21, gain free hybrid cloud skills from experts and partners. Register now

Train machine learning models with Federated Learning

In this tutorial, use the MNIST handwritten data set and IBM Federated Learning to train a machine learning model. You’ll use IBM Watson Studio to create and run the Federated Learning aggregator in IBM Cloud while the remote training parties will run on one or more systems of your choice.

Prerequisites

To train a machine learning model using IBM Federated Learning, you need:

Estimated time

It should take you approximately 30 – 45 minutes to complete the tutorial.

Steps

Prepare party environments

You can run the remote training party on virtually any machine. However, before doing so you must complete a few prerequisite steps to prepare your environment. Complete this section on each machine that you intend to use as the remote party.

  1. With Python 3.7.10 installed on your system, use the Creating a Federated Learning experiment instructions to set up your party environment. These steps include installing the Watson Machine Learning client and machine learning frameworks along with dependent packages.

  2. Create a new directory on your system.

  3. Download and extract the MNIST data set in the directory that you created.

     wget https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/903188bb984a30f38bb889102a1baae5/data -O MNIST.zip
     unzip MNIST.zip
    
  4. Download the MNIST data handler in the directory that you created.

     wget https://raw.githubusercontent.com/IBMDataScience/sample-notebooks/master/Files/mnist_keras_data_handler.py
    

Generate API keys for remote parties

Create an API key for each user who will be running the remote training party. This key is used later in this tutorial to authenticate the remote party with the aggregator.

  1. Go to IBM Cloud, and log in to your account.
  2. Click Manage, then click Access IAM.

    Accessing IAM

  3. Click API keys on the left, then click Create an IBM Cloud API key.

  4. Name your API key, then click Create.

    Naming API

  5. Save your API key because it is used later in this tutorial.

Create and set up a project in Watson Studio

Before running your Federated Learning experiment, you must create a project and associate it with a Watson Machine Learning instance. You also must add the users who are running the remote parties to your project as collaborators. These are the users that you generated an API key for in the previous steps.

  1. Go to dataplatform.cloud.ibm.com, and log in to your account.

  2. Click Projects, then click New Project.

  3. Click Create an empty project.

    Creating a new project

  4. Name your project, and click Create.

    Naming the project

  5. From your new project, click the Settings tab. Under the Associated services section, click Add service, then Watson.

    Adding a service

  6. Select an existing Watson Machine Learning instance, and click Associate service.

    Note: If you do not have a Watson Machine Learning instance, you can create one now.

    Associating service

  7. Back in your project, click the Access Control tab, then click Add collaborators.

    Adding collaborators

  8. Enter the email addresses of the users who are running the remote training parties, assign them the Editor role, and click Invite.

    Note: If you are running the remote party as the same user who created the project, you can skip this step. By default, the project creator already has the Admin role.

    Email addresses

Start the Federated Learning experiment

  1. Download the untrained MNIST model from https://github.com/IBMDataScience/sample-notebooks/blob/master/Files/tf_mnist_model.zip.

  2. From your project, click Add to project, then click Federated Learning.

    Asset type

  3. Name your experiment, and click Next.

    Naming the experiment

  4. Select Tensorflow 2 for the Machine learning framework. Then, under Model specifications, click Select.

    Machine learning framework

  5. Upload the untrained MNIST model tf_mnist_model.zip file that you downloaded previously.

    Uploading data set

  6. Name the untrained model, and click Import.

    Naming model

  7. Choose Simple average for the Fusion method, and click Next.

    Fusion method

  8. Click Next to use the default hyperparameter values.

    Hyperparameter values

  9. On the Remote Training System page, click Add new systems.

    Adding new systems

  10. Name your remote training system. Under Allowed users, select the user that is running the remote training party, then click Add.

    Note: The user must be added as a project collaborator to appear in the Allowed users list.

    Adding remote systems

  11. Repeat the previous step to add additional remote training systems. You must create a remote training system for each remote party that you intend to use. Click Add systems when you are done.

    Add systems

  12. Ensure that the remote training systems are checked, and click Next.

    Remote training systems

  13. Review your Federated Learning experiment settings, and click Create to start the aggregator.

    Reviewing and creating settings

  14. The Federated Learning experiment will be in a Pending status while the aggregator is starting. After the aggregator starts, the status changes to Setup – Waiting for remote systems.

    Starting aggregator

Connect remote parties and begin training.

Complete this section on each of the remote parties that is used in your federated learning experiment.

  1. Log in to dataplatform.cloud.ibm.com.

  2. Open the project that contains the Federated Learning experiment, and click the Assets tab.

  3. Select the experiment that you created in the Federated Learning experiments section.

    Selecting experiment

  4. Click View Setup Information.

    View setup information

  5. Click the download icon beside the remote training system, and click Party connector script.

    Party connector script

  6. Save the party connector script in the same directory where you downloaded the MNIST data set and data handler in the previous sections of this tutorial. You should now have the following files in the same directory.

     mnist-keras-test.pkl
     mnist-keras-train.pkl
     mnist_keras_data_handler.py
     rts_US-East_cb4e9621-c46a-4f6c-9377-276afd1b6941.py
    
  7. Edit the party connector script rts_US-East_cb4e9621-c46a-4f6c-9377-276afd1b6941.py.

    1. Replace <api_key> with the API key that you generated earlier.

    2. Replace the data section of the party connector script so that it contains the data handler class name, path to the data handler Python file, and the path to the MNIST data set.

      "data": {        
          "name": "MnistTFDataHandler",
          "path": "./mnist_keras_data_handler.py",
          "info": {
              "train_file": "./mnist-keras-train.pkl",
              "test_file": "./mnist-keras-test.pkl"
          },
      },
      
  8. Save the changes to the party connector script.

  9. Run the party connector script.

     python3 rts_US-East_cb4e9621-c46a-4f6c-9377-276afd1b6941.py
    
  10. When the party has successfully connected to the aggregator, you should see a heartbeat received message.

     2021-06-01 16:28:41,763 | 1.0.0 | INFO | ibmfl.util.config                                  | No model config provided for this setup.
     2021-06-01 16:28:41,763 | 1.0.0 | INFO | ibmfl.util.config                                  | No fusion config provided for this setup.
     2021-06-01 16:28:41,768 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | Websockets Sender initialized
     2021-06-01 16:28:41,769 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | WSConnection : Initialize Party Communications
     2021-06-01 16:28:41,769 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | **** PartySendLoopThread
     2021-06-01 16:28:41,769 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | **** PartyRecvLoopThread
     2021-06-01 16:28:41,769 | 1.0.0 | INFO | ibmfl.party.party                                  | Party initialization successful
     2021-06-01 16:28:41,769 | 1.0.0 | INFO | ibmfl.party.party                                  | Party not registered yet.
     2021-06-01 16:28:41,770 | 1.0.0 | INFO | ibmfl.party.party                                  | Registering party...
     2021-06-01 16:28:41,770 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | Sending serialized message to aggregator
     2021-06-01 16:28:41,770 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | PartySendLoop: Number of active messages ready to send: 1
     2021-06-01 16:28:42,922 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | Received Heartbeat from Aggregator
     2021-06-01 16:28:47,100 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | PartySendLoop: Holding for message to send
    

Monitor training progress and performance

  1. The training begins after all of the remote parties have connected to the aggregator.

    Begin training

  2. You can use the Watson Studio interface to monitor the training progress and model performance.

    Accuracy over rounds

  3. When training is complete, click Save model to project. Name your trained model, and click Save.

    Saving model

Congratulations. You have now trained a machine learning model using IBM Federated Learning. You can now deploy the model to a Watson Studio space for scoring.

Conclusion

This tutorial walked you through the process of using the MNIST handwritten data set and IBM Federated Learning to train a machine learning model. You also used Watson Studio to create and run the Federated Learning aggregator in IBM Cloud while the remote training parties were run on one or more systems of your choice.