IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

From trustworthy AI to data fabric, DataOps and ModelOps, these sessions are for you


If you’re an emerging or established professional, your relationship to artificial intelligence, machine learning, and data science often means staying current with the latest in trends, directions, and capabilities. Whether this means the latest from IBM Research, getting hands on with Project Debater, or through thought leadership on trustworthy AI, the Data & AI Digital Developer Conference aims to educate, excite, and illuminate on these matters and more with sessions dedicated to:

  • Project Debater
  • Trustworthy AI
  • Automated machine learning
  • Federated machine learning
  • Data fabric, DataOps and ModelOps

This blog highlights a few key sessions that you can watch right now, no registration required. Access all the sessions, including the courses, free and on demand by registering at ibm.biz/devcon-ai.

Course 1: Jump-start your journey – Earn the badge

In this course, you get hands-on experience in performing data exploration and analytics using Python and related libraries. Additionally, you gain experience training simple machine learning models using Python, automatic model training using AutoAI, and deploying the resulting models as API endpoints.

Course 2: From Development to Production

In this course, learn how to take your model to production using IBM Cloud Pak for Data as a Service. Throughout this course, you get an understanding of machine learning and MLOps methodologies and gain hands-on experience in the following topics.

Project Debater

Project Debater

Track 1: Project Debater – how persuasive can a computer be?

Project Debater is the first AI system that can meaningfully debate a human opponent. The system, an IBM Grand Challenge, is designed to build coherent, convincing speeches on its own, as well as provide rebuttals to the opponent’s main arguments. In 2019, Project Debater competed against Harish Natarajan, who holds the world record for most debate victories, in an event held in San Francisco that was broadcast live worldwide. This presentation tells the story of Project Debater, from conception to a climactic final event.

Track 4 Using Project Debater services for analyzing survey data

When you have a large collection of texts representing peoples’ opinions (such as product reviews, survey answers, or social media), it can be difficult to understand the key issues that come from the data. Going over thousands of comments is prohibitively expensive. Existing automated approaches are often limited to identifying recurring phrases or concepts and the overall sentiment toward them, but do not provide detailed or actionable insights. In this presentation, gain hands-on experience in using Project Debater services for analyzing and deriving insights from open-ended answers. The data used is a community survey conducted in 2016-2017 in Austin, Texas. We analyze the open-ended answers in different ways by using four Debater services, the Argument Quality service, the Key Point Analysis service, the Term Wikifier service, and the Term Relater service. You’ll also see how they can be combined into a powerful text analysis tool.

This session also has a step-by-step tutorial.

Trustworthy AI

Trustworthy AI speakers

Track 1: Trustworthy AI

In this presentation, learn why trust is needed in critical and consequential applications of artificial intelligence and the key ways for achieving it throughout the development lifecycle.

Track 2: Demand Proof: The urgent need for Data Lineage and Provenance

IBM leaders in AI, Beth Rudden and Wouter Oosterbosch, discuss the rise of artificial intelligence and controversies surrounding its ethical use in our socially homogeneous tech industry. Specifically, they discuss the urgent need for data lineage and provenance to assure trust in AI while analyzing the status quo. Finally, Beth and Wouter describe real use cases that reveal methods and processes that your organizations can implement today for a more responsible future with AI.

Track 3: Beyond Accuracy: What does it take to trust AI decisions

Now that AI is being used in high-risk applications with serious consequences, it is important that it be worthy of society’s trust. This presentation sketches out what it means for an AI system to be trustworthy, including more well-known characteristics like fairness, explainability, and robustness, but also emerging characteristics such as causality, transparency, and uncertainty quantification. You learn how to work towards these goals throughout the machine learning lifecycle.

Track 4: Uncertainty Quantification 360 – Learn how to apply the UQ360 toolkit

Uncertainty Quantification (UQ) is critical to get actionable predictions from machine learning models. UQ gives the model the ability to say it is unsure about its predictions and adds a layer of transparency. This presentation demonstrates the use of an open source toolkit, called Uncertainty Quantification 360 (UQ360), to estimate, understand, and communicate uncertainty in machine learning model predictions through the AI application lifecycle.

Automated machine learning

AutoML speakers

Track 5: Stay Fast with Automated Machine Learning

In today’s era, data is essential for every enterprise. Rapid development in machine learning is growing every day, and with the help of rapid development, data scientists can unlock new opportunities with data that they can use to rapidly and automatically develop machine learning models. This helps data scientists get started with machine learning models for various use cases and allows enterprises to jump the skills gap with no coding and deploy models on a production environment within a matter of a few minutes, saving time and effort.

Track 4: Automatic Time Series Modeling with AutoAI

As a data scientist, when you do time series model building, you might struggle with selecting time series models – machine learning-based or statistical forecasting models, and the same is true of various sorts of transformers selection. You might want to know if there is an extra way to evaluate your models other than holdout data. AutoAI in IBM Cloud Pak for Data as a Service has a new option that can help you automatically. In this session, learn how easily and fast you can use AutoAI to discover the best performing time series machine learning pipelines and evaluate your models in a cross-validation way.

Track 4: AutoAI notebooks – how does that work?

AI generating Python code to create AI. Does it work? Can you trust it? What if AI would generate Python code for models training and inferencing in a fully transparent way? How does that sound? Scary? Futuristic? Join this session and see Watson AutoAI notebooks in action.

Track 3: Automated Machine Learning for Unsupervised Data

Automated machine learning (or AutoML) is the task of automatically discovering models for data. AutoML automates the resource-intensive process of model discovery and reduces the time needed for this task. AutoML has largely focused on supervised/labeled data because labels can be effectively used for the model optimization process. In contrast, unsupervised machine learning problems, by definition, do not have labels (or are not allowed to use them). Therefore, applying standard AutoML techniques is a significant challenge for unsupervised problems.

In this talk, get introduced to OPTUS (Pipeline Optimization for Unsupervised Data), which is a system for unsupervised AutoML. It uses meta-learning for building an AutoML solution for unsupervised problems such as outlier detection and clustering.

Federated machine learning

FL speakers

Track 3: Federated Learning in the Enterprise: Leveraging decentralized data

Data quality is essential to machine learning, and more representative training data is better. In a large organization, data often cannot be curated in a single data lake because of privacy, regulatory, or practical reasons. It might reside in data centers in different countries, in separate business units, or grown application silos. Federated learning (FL) is an approach to machine learning in which the training data is not managed centrally. Data is retained in locations that participate in the FL process and is never shared. This approach is just starting to be used in enterprises. This session discusses the basic approach of federated learning, how it can be applied both to neural networks and classical methods (from linear models to XGBoost), and how to make it work in practice in an enterprise. Federated learning is now available in Watson Studio and Watson Machine Learning as a beta. However, this session talks about concepts, application, and where it fits in a company.

Track 4: Training machine learning models with IBM Federated Learning

With federated learning, you can train a single machine learning model collaboratively and securely using data from multiple remote parties — all without the need to share or centralize the training data. Learn how to use federated learning on IBM Cloud Pak for Data to train a machine learning model. In this session, you’ll use Watson Studio to create a remote training system and start the federated learning aggregator. Additionally, you’ll download the Python scripts used to connect the remote training parties and install the Watson Machine Learning Python client, use a local data set on each of the remote training parties to start the federated learning training job, monitor the progress and accuracy of the federated learning model, and deploy the trained model.

Track 5: Growing Trees Together Privately: An Introduction to Gradient Boosted Decision Tree Models for Federated Learning

Federated Learning is a method for training machine learning algorithms without directly sharing raw data distributions across different parties. However, many of the recently proposed methods for FL have focused strictly on linear models, neural networks, and kernel-based approaches. This talk introduces the implementation of a novel Gradient Boosted Decision Tree algorithm for federated learning. It specifically highlights some of the key advantages that tree-based models offer within a FL setting and demonstrates how they complement some of the desirable functions that FL aims to provide built-in.

Data fabric and DataOps

DataOps speakers

Track 3: Data Fabric and why you should care

Analytic teams need data, lots of data, but simply putting raw data into the cloud or data lake or opening data stores for broader access doesn’t create meaningful insights. In fact, this is not only leading to poor business decisions, but it also increases the risk for inadvertent data exposure. The situation is getting even more challenging due to exponential data growth and its distribution across hybrid or multicloud environments. In this session, learn how a data fabric-based architecture will enable you to dynamically connect, integrate, protect, and automate your data management processes, and with that significantly reduce the time to deliver trusted data to the business.

Track 3: DataOps for Data Science and MLOps

The conventional wisdom for data science is that more data is better. However, this is not always true. In many cases, a smaller, but clean, data set can result in better model performance than a large, noisy data set. But, how can you get to a clean data set given the inherent variability in human behavior that produces most data sets? That’s where DataOps comes in. DataOps is an emerging, proactive approach to data management that focuses on orchestrating people, processes, and technology to quickly deliver high-quality data to data consumers. This talk explains what DataOps is in more detail, and how it can integrate with MLOps to help data scientists produce and manage better models, no matter the size of your data set.

Track 5: Data Lakes & Data Privacy

Data lakes have been planned by many companies during the last years, but could not always be implemented successfully. One key struggle is the missing governance. Data privacy is a key topic for businesses to achieve the balance between legal guidelines, customer satisfaction, and analytical possibilities. During this session, you gain insights on processes and methods around implementing a heterogeneous data lake with integrated data privacy.