A developers guide to data for AI
IBM Db2 Warehouse has built-in machine learning functions to train and run models on data residing in Db2 Warehouse on Cloud. Operations are performed by…
This blog is part of the Db2 for AI learning path.
|100||A developers guide to data for AI||Blog|
|101||Collect home sales data using a high performance CRUD app||Pattern|
|201a||Predict home value using Golang and in-memory database machine learning functions||Pattern|
|201b||Predict home value using Python and machine learning||Pattern|
Db2 has a long and varied history. It’s a database released in the 80’s, exclusively for IBM mainframes. In the 90’s support was expanded to Linux, Unix, and Windows. In the 2000s, Db2 was released on z/OS and it was announced that DB2 can be an engine in MySQL. In the 2010s it was all about cloud, Db2 on Cloud and Db2 Warehouse on IBM Cloud were released. Now that AI is all the rage, Db2 is now adding support for AI. Wait, what?
I’m here to tell you that Db2 is great for developers and AI. Don’t believe me, read on. I’ll be talking about some amazing capabilities with Db2 11.5 and two other tools specifically made for developers using Db2. These include:
- SDKs for popular Languages to build modern applications
- Stored procedures for machine learning
- VSCode extension, Db2Connect
SDKs for popular languages
The Db2 team has open sourced their connectors written in modern programming languages. This allows developers to build applications using languages like Node.js, Python and even Go lang. It’s all hosted on the ibmdb organization, here are a few:
Stored procedures for machine learning
IBM Db2 Warehouse has built-in machine learning functions to train and run models on data residing in Db2 Warehouse on Cloud. Operations are performed by the Db2 Warehouse engine itself without having to move the data. This way you can achieve greater performance in terms of computations and retrieval of the results. You can read more about the different algortithm that IBM Db2 Warehouse on cloud supports in the form of stored procedures here.
Here’s an example of a stored procedure call that creates a linear regression model using training datasets. The below example calculates average arrival delay for each carrier in the past 12 months. This helps end users to make sure their flights arrive at destination on time so that they don’t miss any connections or meetings. The below call creates a model called
BFD12_ARR based on a data that resides in a table called
BFD12_TRAIN and predicts the column called
CALL IDAX.LINEAR_REGRESSION ('model=BFD12_ARR, intable=BFD12_TRAIN, id=ID, target=ARR_DELAY, incolumn=OP_CARRIER; MANUFACTURER; MODEL; DAY_OF_WEEK; MONTH; DEP_TIME_BLK, coldefrole=ignore, calculatediagnostics=false');
Using the model created above, we can use
predict linear regression stored procedure to predict arrival delays for new flights. See below for an example:
CALL IDAX.PREDICT_LINEAR_REGRESSION ('model=BFD12_ARR, intable=BFD12_TEST, outtable=BFD12_ARR_OUT, id=ID');
The above stored procedure
PREDICT_LINEAR_REGRESSION uses the linear regression model
BFD_12_ARR to predict arrival delays for data residing in
BFD12_TEST table and stores the output in
BFD12_ARR_OUT table. Each new tables generated are linked with the primary key
How it works
We can’t jump right to using the stored procedure the way it’s depicted above. The first thing we have to do is build the unsupervised neural net. Once the model created, we can invoke it with the right SQL syntax. The training of the artificial neural net is the processing overhead, not the invocation. Like any neural net, the model will need to be updated as the data evolves over time.
Here’s a snippet of traditional SQL showing how to do the same thing, without using the stored procedure:
SELECT "OP_CARRIER", AVG("ARR_DELAY") AS "AVERAGE ARRIVAL DELAY" FROM BFD_12MONTHS GROUP BY "OP_CARRIER" ORDER BY "AVERAGE ARRIVAL DELAY" ASC;
The traditional sql calculates average delays based on the existing data. What it can’t do is to predict future flight delays. To overcome that, we can use DB2 Warehouse’s built in stored procedures to create models and use that model to predict future results.
For more details on this feature, see the code pattern Running machine learning queries on db2.
Why should a developer care?
- AI is useful and can make your application powerful.
- Creating models is hard and time consuming. New functions/abstractions are required to load/update models, and score data.
- Having machine learning as a stored procedure hits all these issues. Leverage maching learning without having to learn tensorflow to build a model, call it as easily as you make other database calls, and inject some AI into your application.
- Since operations are performed by the Db2 Warehouse engine itself, there’s no need move the data. This way you can achieve greater performance in terms of computation and retrieval of the results.
Db2 Augmented Data Explorer
IBM Db2 Augmented Data Explorer is a free tool that employs sophisticated analytical techniques and provides natural language powered search and discovery to its users. If application developers have to provide a search capability in their application, they have to worry about building the whole stack from search and indexing to providing good graphical user experience.
With ADE’s RESTful API, the developer can crawl and index the data–then provide search experience over that data which include auto-selected visualizations. Instead of worrying about the end-to-end experience, now the developer can integrate and use this free technology to provide best in class search and discovery experience to its application users. Learn more.
The last piece I’ll mention is a new VSCode extension called Db2Connect. With Db2Connect, you can set up your Db2 connection to test your connection and app locally without having to deploy your app.
L-R: Extension search results; extension install page; extension configuration panel
So, what’s this all mean? My interpretation is that using Db2 for AI will be:
- Easier – Developers will not have to know how to create models, just how to call them. Using the open source connectors make this a breeze.
- Safer – The data and model will stay in the same place.
- Faster – The model will be in the same database, and there is no need to load models off disk..
This blog is the first part of a learning path that guides you in quickly coming up to speed on what IBM Db2 for AI offers and how it’s used. The learning path consists of a step-by-step patterns, and complete examples of working code. As you proceed through the learning path, you’ll learn more features and different use cases for Db2 for AI.
So let’s get started. The first step will be to learn how use IBM Db2 Warehouse on Cloud to create a web application using Node.js to create, update and delete records from the database.