Taxonomy Icon

Analytics

Enabling real-time streaming big data for compliance assessment

Objective

Our business has millions of pages that our customers interact with every day. Gathering useful information, in a compliant manner, is required to ensure we’re following a ‘principles’ based compliance approach. If your business is typical of the industry claims saying 40% of your IT is ‘shadow’ IT, can you be sure your principles are adequately applied in practice? In this article, we’ll provide a solution which gathers data from millions of web interactions in a day to provide assessments of compliance.

Landscape

A typical enterprise today has hundreds, thousands or potentially millions of customer interactions in a day. The actual web experience a customer has is complex and traverses many diverse business processes and web sites created by numerous organizations. What our customers see is the primary measure of your enterprises’ success. Being able to assess the compliance position your customers actually see is well worth the investment. The investment involves processing large volumes of data and extracting key indicators of what is compliant and what is not. One touch point may yield entry into an area which is not well disciplined by the very nature of shipping an MVP and iterating on it to react to market success. Another touch point may be a recent acquisition or a third party tool. Although the best-of-breed DevOps chains have numerous ways to scan the bits and pieces of a solution, elements can be bound in the end solution which cannot be scanned due to third party tools which are compliant in one configuration but not in another. We also need a method which can gather information without the dependency on developers investing in the best-of-breed DevOps tool chains. We need a complete end- to-end solution which gives us insights based on where the customers travel on our web, but does not slow down their user experience, and does not depend on individual developers scanning every piece of the solution. The capture of data is an extensive big data problem, and you need to pick the right needles out of the haystack.

Our solution

At a high level, our approach starts with imbedding a javascript in the design templates to capture what scripts are being executed. This script generates a lot of data which then needs to be curated and persisted. Part of the server side processing includes the execution of deeper rules, mapping remediation keys to rule violations, keeping a persistent record of violations encountered, and mapping to master data to enrich the compliance analysis and end-user consumption of the metrics dashboard.

figure1

The data capture approach to imbed the web asset compliance validation into the design system is a more proactive way to gather data from the customers’ view and alleviates the need for individual teams to set up specific procedures to run scans. The design system, as embodied by executable templates, provides a common look and feel for the entire enterprise. The value of any common design executable template provides a cohesive and intuitive user experience, enabling offering managers to use time tested design concepts and executable templates, which implement these concepts with little or no adaptation by the offering development team. The compliance landscape is complex, and this approach is an effective way to shed light into how closely offering managers are pursuing principled behavior. In addition, we can combine more data from other sources to enrich the compliance picture. We can add master data, ownership management chains, and results from other scans to provide a more complete picture.

The design system and executable templates are a complex set of content and libraries that provide offerings with a common and efficient web user experience. As our design system and executables are leveraged by Single Page Applications (SPAs), which are the basis of most offerings, this approach uniquely provides a solution for wide coverage that scanning tools alone cannot provide without intimate knowledge of the SPA end user interactions. Often times, we find the landing page is compliant via scanning, but we need to know what happens deep in the application and the other ecosystems where the solution connects the end user.

The imbedded javascript in the design templates can use the same technologies for compliance as our web pages–helping ensure we capture only the data which is allowed or for which we have consent. This script can use the Akamai location API and TrustArc cookie consent value to ensure we are not collecting data from data subjects that have specifically opted-out of data collection, or live in a geographic location that requires that they must specifically opt-in before data collection can proceed. This script performs high-level acquisition rules and sends the payload of its analysis to a service which puts the data on a Kafka (or IBM Message Hub) queue. The IBM Streams implementation harvests the data from the Kafka queue and enables services to provide server side rules execution on the transient transactional payload. Part of the server side processing includes the execution of deeper rules, mapping remediation keys to rule violations, keeping a persistent record of violations encountered, and mapping to master data to enrich the compliance analysis and end user consumption of the metrics dashboard. The results of the IBM Streams and server side rules execution are stored in a Postgres persistent store. Elastic Search is implemented to enable fast and flexible queries, upon which a service layer is built to expose APIs which the metrics Dashboard can display. The service layer adds the content and links, rule violation explanation and remediation.

Summary

Our business needs a method to assess an end customers’ view of our principles based compliance approach. Although there are a number of ‘scanning tools’ available to developers, these scans are only one of many tools available. We need to acknowledge the best practice to imbed the scanning tools into the DevOps pipeline to avoid deploying suspect solutions, and also understand these scans don’t assess the entire solution the end customer uses. The end-to-end experience our customer has is complex and has a number of configuration / integration variations; therefore, we need to test the solution as it performs in the eyes of the end customer.

The approach we’ve prototyped imbeds web asset compliance validation into the design system and alleviates the need for individual teams to set up specific procedures to run scans. The compliance landscape is complex, and this approach is an effective way to shed light into how closely offering managers are pursuing ‘principled’ behavior. In addition, we can combine more data from other sources to enrich the compliance picture.

We can use this compliance position to propose web assets that should be blocked to avoid the risks of a non-compliant activity from injuring the IBM brand. This solution is not a ‘one-and-done’ implementation. This approach will grow as numerous other privacy laws and security best practices become more specific and elements of detection can be seen in the front-end users’ browser execution.