This blog was written by the Qanta.ai and IBM Alpha Zone Accelerator  team.


Qanta.ai provides mortgage lenders with an AI-driven assistant that automates large parts of the lending process, focusing on first engagements.

Qanta.ai applied to the IBM Alpha Zone Accelerator, as growing demand for scale and resilience from financial institutions required a strong cloud solution such as IBM Cloud. In this post, we will share the new Qatna.ai solution that is based on a serverless architecture and show how it solves performance, scale and cost challenges.

Key challenges

Radically fluctuating throughput, in terms of “bot messages per second”: Usually chatbot developers can predict how many users their systems will need to handle, analyzing past behavior or similar use cases. However, Qanta.ai provide a white label solution for multiple clients, and each of them has a different use case, and operates in a manner that is uncoordinated with then. This typically resulted in peaks that Qanta.ai can’t have control of. For example — a client running a campaign that engages a large mailing list, sometimes resulted two or three orders of magnitude of traffic, in split seconds, by surprise. Typical fluctuations can be approached with a traditional load balancing approach — a load balancing server, adding servers and directing traffic to adjust to the traffic. This required a whole different way to approach scalability.

Banking requirements: The Qata.ai team was approached with challenging infrastructure requirements. Typically they were asked for the same configuration to be implemented in a trusted cloud in an external POC, hybrid or private cloud for pilots, and provision for an on-premise solution for full production. Qanta.ai needed an agile, open source infrastructure that could be easily deployed to several use cases without having to rewrite the code base.

Architecture

 

Qanta.ai decided to try IBM Cloud Functions (based on Apache OpenWhisk), a serverless infrastructure, on IBM Cloud. Qanta.ai rewrote their core node.js invoke, which was based on constant listener capability on HTTP calls, to invoke their core AI function. They isolated the AI capability to run independently, and added a new routing server that handled multiple front-end interfaces using specific webhooks.

The serverless architecture also enabled them to remove effects of “stacking” and delays in sending messages back to the users. With a “serverful” architecture, one server was in charge of aggregating messages to every user. Here, each instance can send messages independently, so after synchronization they got the same response time for any number of users instead of delays proportional to the number of active users.

Chat flow

The flow of a “chat” call is as follows:

  1. A call comes from Facebook or a web interface through a secure webhook.
  2. The communication server authenticates the call and invokes the “conversation” action in OpenWhisk, with the information about the user, the bank, and the incoming message.
  3. User info and its state in the conversation is pulled using the user ID from Redis.
  4. The conversation flow is pulled using the bank ID from Redis.
  5. The message is sent to an NLP processing engine, which returns intent and entities.
  6. Based on the received data, the Qanta.ai AI engine chooses the next required step and returns the answer message to the user.
  7. Based on the conversation reports, notifications or API calls may be invoked. For example, for a request to generate an interactive graph that explains the lifetime of a specific mortgage, the graph will be implanted into the chat interface.

An additional advantage of a serverless architecture is the additional functionality that can be implemented using a separate OpenWhisk Action, which is much easier than setting up a dedicated server. Even more importantly, you pay only for the actual usage and not for a whole dedicated system that needs to be designed to support peak time. For example, analytics operations can be run without creating a dedicated system, which is 99% idle by definition.

Qanta.ai added integration with Redis by Compose, which is another IBM Cloud service for scalable, fast data cache. The retrieval of small data items from Redis while using the IBM Cloud servers was dramatically faster than Qanta.ai’s previous architecture based on a database. When deployed locally (debug mode), Qanta.ai achieved x3 throughput improvement, while in production (when deployed on the IBM Cloud) Qanta.ai achieved x10 throughput improvement (<50ms per call).

For debugging, Qanta.ai wrote a wrapper function that calls the local version of their OpenWhisk actions code. They have built a specific webhook for debugging on their communication server. The webhook is connected to an ngrok local server that runs the debugging wrapper function. This function calls the actions code according to the PAI call. This “hack” allows them to use a step-by-step debugger on our code.

The bottom line

Switching to IBM Cloud and using IBM Cloud Functions (based on Apache OpenWhisk ) and Redis by Compose achieved impressive results. Qanta has achieved full scalability and minimized cost without the need to worry about load balancing, throughput, or memory issues for multiple simultaneous users. Before the change, even with as few as four simultaneous users, each of them was slowed down by the need to handle the others; now Qanta.ai can support endless number of users with the same efficiency. Another major benefit is the need to pay only for the exact usage of IBM Cloud Functions and not for a whole system that needs to be able to work on peak time, but which could be idle as well. This new infrastructure will allow Qanta.ai to focus on future development efforts and business logic to bring more value to their users.

Join The Discussion

Your email address will not be published. Required fields are marked *