IBM Cloud Satellite: Run and manage services anywhere Learn more

Build a Twitter predictive model using AutoAI and IBM Cloud Functions

In today’s environment, a lot of focus has shifted towards data. Each day, the amount of data that is generated and consumed is increasing, adding approximately 5 exabytes of data. Everything we do generates data, whether turning a light on or off, or commuting to work. This data can generate information that can be used for insights to predict and extract patterns. Data mining or data science is the process of discovering patterns, insights, and associations from data. This tutorial shows you how to implement a predictive model on the data to gather insights. You’ll learn how to create a predictive model using AutoAI on IBM® Watson™ Studio, which is a cloud-based environment for data scientists. Specifically, you’ll learn how to predict and optimize your Twitter interaction that can lead to optimum traffic on your tweets.

Learning objectives

This tutorial explains how you can extract data, create a CSV file and upload it to IBM Cloud Object Storage, create a data connection from Watson Studio to IBM Cloud Object Storage, and then refine the data and use it to build, deploy, and test the predictive model with AutoAI.

After completing this tutorial, you understand how to:

  • Work with IBM Cloud Functions to extract data from Twitter
  • Create and upload a CSV file to IBM Cloud Object Storage from an IBM Cloud Function
  • Use Watson Studio and AutoAI to build a predictive model using CSV data
  • Use Twitter to predict and optimize your Twitter interactions

Prerequisites

To follow this tutorial, you need:

Estimated time

It should take you approximately 60 minutes to complete this tutorial.

Steps

Use sample data or get your own?

The first thing that you need is tweets to analyze. This step explains how to get these tweets. However, if you don’t want to get your own tweets, you can use the ufone_tweets.csv sample data set. If you use the sample data set, then skip the Twitter API access and IBM Cloud Function sections of this tutorial.

Step 1: Getting Twitter API access

If you’re using the sample data, then skip to Step 2.

Before using tweepy to get tweets, you must generate your Consumer API keys. Go to your Twitter Developer account and hover over your name on the upper right create your app. Complete the required information.

Twitter Developer account

After your app is created, select the Keys and tokens tab. You see your Consumer API key and Consumer API secret key, which you’ll be using later in the tutorial. These keys can be revoked and regenerated, but as with any other key, you should keep these secret. (In this tutorial, you won’t be using the API tokens so you can ignore them.)

Keys and tokens

Step 2: Creating a Cloud Object Storage

  1. Log in to your IBM Cloud Account.
  2. Click Create Resource, and search for Object Storage.

    IBM Cloud Object Storage

  3. Choose the free Lite plan, change the name if you want, and click Create.

    Choosing plan

    You can now find the Cloud Object Storage instance created in resources under Storage.

  4. After you open your instance, click Buckets from the left-side pane, then click Create bucket (you can choose any type of bucket). Make sure to note the name of your bucket after you create it.

    Naming your bucket

  5. Go to Service Crendentials, and select the service credential that was just created. If nothing is showing, then click New credential to generate one. Click the arrow to expand the credentials. Note the api_key, iam_serviceid_crn, and resource_instance_id.

    Service credentials

  6. Go to Endpoint, and choose your resiliency and location. Note the Private url because you’ll need it for the other steps.

    Endpoints

Your bucket is now ready. Make sure to have your:

  • Bucket name
  • API Key
  • Service ID
  • Resource Instance ID
  • Endpoint URL

Again, if you’re using the sample data, then you can directly upload the file in your bucket and skip Step 3 (jump to Step 4).

Step 3: Create IBM Cloud Functions

This step is only valid if you started with Step 1.

IBM Cloud Functions is an IBM Function-as-a-Service (FaaS) programming platform where you write simple, single-purpose functions known as Actions that can be attached to Triggers, which execute the function when a specific defined event occurs.

Create an Action

Usually, you create the Actions directly from IBM Cloud, but in this case, you want to use tweepy, which is an external Python library for accessing the Twitter API. External libraries are not supported in the IBM Cloud Functions runtime environment, so you must write your Python code, package it with a virtual local environment in a .zip file, and then push it to IBM Cloud.

If you don’t have Python, then download and install the latest version. After it’s installed, make sure to install virtualenv.

pip install virtualenv
  1. Create a directory that you can use to create your virtual environment. In this tutorial, it’s named twitterApp.

     cd desktop; mkdir twitterApp; cd twitterApp
    
  2. From the twitterApp directory, create a virtual environment named virtualenv. Your virtual environment must be named virtualenv.

     virtualenv virtualenv
    
  3. From your directory (in this case twitterApp), activate your virtualenv virtual environment.

     source virtualenv/bin/activate
    
  4. Insall the tweepy module.

      pip install tweepy
    
  5. Stop the virtualenv.

     deactivate
    
  6. Copy the following code, save it to a file called main.py in the twitterApp directory, and add the corresponding credentials that you got from Step 1 (Customer keys) and Step 2 (Cloud Object Storage credentials). Additionally, you can change the Twitter handle that you want to analyze. (In this tutorial, we are using Charlize Theron’s Twitter handle to analyze.) This code gets the data from Twitter, then creates a CSV file that contains the data and uploads it into the object storage service that you created at the beginning. After you run this function, a CSV file containing tweets information is uploaded in your bucket in Cloud Object Storage.

     import tweepy
     import sys, json
     import pandas as pd
     import csv
     import os
     import types
     from botocore.client import Config
     import ibm_boto3
    
     #Twitter API credentials
     consumer_key = <"YOUR_CONSUMER_API_KEY">
     consumer_secret = <"YOUR_CONSUMER_API_SECRET_KEY">
     screen_name = "@CharlizeAfrica"  #you can put your twitter username, here we are using Charlize Theron twitter profile to analyze.
    
     def main(dict):
         tweets = get_all_tweets()
         createFile(tweets)
    
         return {"message": 'success' }
    
     def get_all_tweets():
         # initialize tweepy
         auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
         api = tweepy.API(auth)
    
         alltweets = []
         for status in tweepy.Cursor(api.user_timeline, screen_name = screen_name).items(3200):
             alltweets.append(status)
    
         return alltweets
    
     def createFile(tweets):
         outtweets=[]
         for tweet in tweets:
             outtweets.append([tweet.created_at.hour,
                               tweet.text, tweet.retweet_count,
                               tweet.favorite_count])
    
         client = ibm_boto3.client(service_name='s3',
         ibm_api_key_id=<"COS_API_KEY">,
         ibm_service_instance_id= <"COS_SERVICE_ID">,
    
         config=Config(signature_version='oauth'),
         endpoint_url= "https://" + <"COS_ENDPOINT_URL">)
    
         cols=['hour','text','retweets','favorites']
         table=pd.DataFrame(columns= cols)
    
         for i in outtweets:
             table=table.append({'hour':i[0], 'text':i[1], 'retweets': i[2], 'favorites': i[3]}, ignore_index=True)
         table.to_csv('tweets_data.csv', index=False)
    
         try:
             res=client.upload_file(Filename="tweets_data.csv", Bucket=<'BUCKET_NAME'>,Key='tweets.csv')
         except Exception as e:
             print(Exception, e)
         else:
             print('File Uploaded')
    
  7. From the twitterApp directory, create a .zip archive of the virtualenv folder and the main.py file. These files must be in the top level of your .zip file.

     zip -r twitterApp.zip virtualenv main.py
    
  8. Push this function to IBM Cloud by logging in to your IBM Cloud account and making sure to target your organization and space. You can find out more about this process.

     ibmcloud login
    
  9. Create an action called twitterAction using the .zip folder that you just created (right-click on the file, and check get info for a Mac or properties for Windows™ to get the path) by specifying the entry point that is the main function in the code and the --kind flag for runtime.

     ibmcloud fn action create twitterAction </path/to/file/>twitterApp.zip --kind python:3.7 --main main
    
  10. Go back to IBM Cloud, and click Cloud Functions on the left side of the window.

    Cloud functions

  11. Click Action, making sure that the right namespace is selected. You see the action that was created. Click it, and then click Invoke to run it.

    Creating an meta_description

    You can also run it directly from the terminal using the following command.

     ibmcloud fn action invoke twitterAction --result
    

If you go to your bucket in the Cloud Object Storage service that you created at the beginning of the tutorial, you see a tweets.csv file that has been uploaded. This is the file that has the extracted tweets from IBM Cloud Functions.

Create a Trigger

Now, create a Trigger that invokes your Action.

  1. Choose Triggers from the left pane, and click Trigger.

    Selecting a Trigger

  2. Choose Periodic for the trigger type. This means that your event here is the time. The function will get invoked on a specific time.

    Choosing trigger type

  3. Name your trigger, define a timer, and click Create. In this example, the timer is set on Sundays. Every Sunday, the trigger fires at 4:00 am GMT+4 and invokes the action to fetch Twitter data and create a new CSV file with new tweets.

    Naming the trigger

  4. Click Add to connect this trigger to the Action.

    Connecting the trigger

  5. Choose the Select Existing tab, select your Action, and click Add. Now, your Action is connected to this Trigger and gets fired based on the time that you specified.

    Adding the trigger time

Step 4: Create a Watson Studio service

Similar to how you created the Cloud Object Storage service at the beginning of the tutorial, you’ll use the same process to create a Watson Studio service.

  1. Search for Watson Studio, and select the Lite plan to create it. You can find it instantiated under services in resource summary (the main dashboard of your IBM Cloud account). Click it, and then click Get Started. This launches the Watson Studio platform.

    Launching Watson Studio

  2. Click Create Project, and then Create an empty project.

    Creating a project

  3. Name the project, and give it a description. Make sure to choose the Cloud Object Storage service that you created previously.

    Naming the project

Step 5: Create a connection to Cloud Object Storage

  1. Click Add to project. Here, you see many assets that you can use in Watson Studio. You want to create a connection to your Cloud Object Storage service so that you can access the tweets.csv file. After you access this file, you have access to the data inside of it. You’ll use this data to build your machine learning model with AutoAI.

    Adding to projects

  2. Click Connection to start creating your connection to your Cloud Object Storage service.

    Creating a connection

  3. Click Cloud Object Storage.

    Connecting to cloud object storage

  4. Name your connection, and complete the information with the credentials that you got from Step 2 (Cloud Object Storage credentials). Just add the API_KEY, Resource Instance ID, and Login URL, which is the endpoint. You can leave the other fields empty.

    Naming the connection

  5. Click Add to project, and click connected data. Select your source, which is the connection created in the previous step, then select your bucket and choose the tweets.csv file. Name your asset, and click Create.

    Adding connected data

Step 6: Refine the data

The data is already prepared, but you must convert the rows hour, favorites, and retweets to integer. Start with hour.

Refining the data

  1. Click the 3 dots, select Convert column, and then choose Integer. Repeat the same process for favorites and retweets.

    Choosing integer

  2. Click Save and create a job when you’re finished.

    Saving the job

  3. Name the job, and click Create and Run.

    Naming the job

This job creates a new data set based on the one that you already have, but with your refinements that were responsible to convert three rows to integer. As you can see, the output of this job is a file named Tweets_shaped.csv. Wait until the status of the job shows Completed.

Creating the file

Now, you should see three assets just like the following image. The Tweets_shaped.csv file is now the main file that you’ll use in AutoAI to create your predictive model.

Three assets

Step 7: Create an AutoAI experiment

  1. Click Add to projects, and choose AutoAI experiment.

    Selecting AutoAI experiment

  2. Name your project, and choose a machine learning instance. This is needed so that you can deploy your model at the end. If you don’t have one, Watson Studio will ask you to directly create it, and you will be able to proceed normally.

    Choosing a machine learning instance

  3. Add your file by selecting the Tweets_shaped.csv file that was generated from the Data Refinery.

    Adding the file

  4. You want to predict the best time to share your tweets, so choose hour as the prediction column. You see that the prediction type is Regression. That’s because you want to predict a continuous value, and the optimized metric is RMSE (Root Mean Squared Error). You can change and customize your experiment by clicking Experiment Settings.

    Predicting best time

  5. In Experiment settings, go to Prediction. Here, you can see all of the algorithms that you can use in your experiment. You can change the number of algorithms to use. For example, you can choose 3, which means that the experiment uses the top 3 algorithms. For every algorithm, AutoAI generates four pipelines. In other words, the first pipeline is the regular one with no enhancement added, the second one is with HPO (Hyperparameter Optimization), the third one is with HPO and Feature Engineering, and the last one is with HPO, Feature Engineering, and another HPO. Because you are using three algorithms, you will have a total of 12 pipelines (3×4=12), so AutoAI will build and generate 12 candidates to find your best model.

    Experiment settings

Step 8: Build and evaluate the models

AutoAI generates your 12 best models for your use case, and there are different ways to understand and visualize the results. The following image shows a Relationship Map, which shows how AutoAI is building and generating the pipelines. Every color represents a type of algorithm, and each one has its own four pipelines that were discussed in the previous step.

Pipelines

You can click Swap view to check the Progress Map, which is another way to visualize how AutoAI is generating your pipelines in a sequential way.

Swipe view

You can see the Pipeline leaderboard to check which model is the best. In this case, Pipeline 12 is the best model using Random Forest Regressor with all three enhancements (first HPO, Feature Engineering, and the second HPO).

Pipeline leaderboard

AutoAI shows you the comparison between all of these pipelines. If you click Pipeline comparison, you see a metric chart that compares your candidates.

Pipeline comparison

Because Pipeline 12 is the best model, click it to get a better understanding of it. For example, you can check its Feature Importance to see the key features in making decisions for the predictive model. In this example, the retweets field is the most important factor for the prediction. You see new features generated, like NewFeature_3 and NewFeature_0. These are combinations of different features (for example, a combination of retweets and favorites) that are generated with feature engineering to enhance the model.

Looking at the model

Step 9: Save and deploy the model

Now, save and deploy the model so that you can start using it.

  1. Click Save as, and choose Model. This saves the model, and you can now access it from the main dashboard of your project in Assets under the Models section.

    Saving the model

  2. Click this new created model, select the Deployments tab, and click Add Deployment to create the deployment (you must give it a name). This is a web deployment that can be accessible using a REST call.

    Deploying the model

  3. Wait until the status of the deployment is Ready in the Deployments tab, then click the deployment’s name.

    Clicking the deployment

Step 10: Test the model

The model is now ready for you to start using.

  1. Select the Test tab, and enter data in the fields. You can put the data in a JSON format if you prefer (this is easier in cases where you have a lot of fields, but here you have only three fields).

  2. Click Predict, and you see the result in values. In this example, you have the value of 14.5, which is 2:30 pm. This means that the best time to share a tweet that can get approximately 7000 retweets and 2000 favorites is 2:30 pm for Charlize Theron (remember that you’re using Charlize Theron’s data), so this time is suitable for Charlize. You can put your own user name in the IBM Cloud Function if you want to predict the best time for your account.

If you want to implement this model in your application, click the Implementation tab . It shows the endpoint URL and code snippets for different programming languages (cURL, Java programming, JavaScript, Python, and Scala) that you can use for your application.

Implementation

Summary

In this tutorial, you learned to extract data from Twitter, create a CSV file that contains this data, and upload it to IBM Cloud Object Storage using IBM Cloud Functions. Then, you learned how to create a predictive model on this data to optimize future tweeting and increase the user’s audience using Watson Studio and AutoAI.