The article is authored by Dan Selman and Andy Ritchie. It is for enterprise architects considering putting in place an event driven architecture that will incorporate aspects of event processing, analytics, situation detection and decision-making.
Event Driven Architecture is an architectural and organizational discipline that recognizes that enterprises need to collect, analyze and react (in a timely fashion) to events occurring within, and outside, the enterprise. Where large corporations are based on traditional OLTP and batch processing, they can struggle to build the engaging and personalized systems that cut across their traditional IT systems.
Many enterprises are therefore beginning a transformational journey towards becoming Event Enabled Enterprises, and leveraging Event Driven Architectures to create new business models and improve customer satisfaction.
Decision Management is a key component of a successful Event Driven Architecture. Where EDA provides the event sourcing and routing capabilities from a wide array of systems the decision management solution allows sophisticated decision-making software to detect emergent situations and trigger timely action.
Enterprises that successfully combine EDA and DM can rapidly react to emergent business threats and opportunities from internal and external events. Crucially they can also measure the quality of their decision making, and rapidly modify decision making criteria to optimize outcomes.
Intelligent choices must be made in an event-driven architecture about which events to process, those to store, and for how long each event type needs to be persisted. Any given event may be used in many business contexts, which can lead to a more opportunistic and flexible approach to building applications. An event-driven approach, where changes in state are monitored as they happen, enables the enterprise to respond in a much more timely fashion than a batch approach, where analytics or situation detection runs intermittently.
From Event Collection to Insight to Action
Event processing is only a means to an end — and enterprises need to extract insight, and take meaningful action, from event streams. Taking action typically requires detecting that a situation has occurred, or is likely to occur, and then triggering an appropriate and timely response. This process is sometimes referred to as the OODA loop or the Detect, Build Context, Decide, Act sequence.
Situation detection typically involves operationalizing event patterns, predictive (statistical) models and combining them with contextual information about the overall environment. Patterns can be as simple as: “when a thermostat event occurs, where the temperature is over 45 degrees centigrade, then emit an alarm” to complex patterns that require fusing disparate event streams, calling predictive models, calculating aggregates over event history, performing geospatial calculations and determining the current state of a business entity of interest.
Once a situation has been detected then an appropriate response must be triggered. This can also involve a sequence of decisions, or calls to predictive models to determine the likelihood of a particular favorable outcome.
In general terms the definition of a situation and the appropriate response constitutes a business policy, which serves an overarching business goal. For example, for an airline the goal may be to improve customer satisfaction, and the policy might state that “A GOLD customer should never have more than a total of 30 minutes of flight delays within a 4-week period”. The policy may state that when the excessive delay situation occurs, then the appropriate response is to trigger an operational audit and offer GOLD customers additional mileage reward points.
Finally the action must be triggered, and potentially orchestrated over time.
Receive Events â†’ Detect Situation â†’ Decide Appropriate Response â†’ Take Action
Capabilities of Event Driven Architectures
Enterprise architects building event-enabled enterprises need to consider a broad range of capabilities. Let’s break down the capabilities into six areas and consider each in turn.
- Event Sourcing
- Event Distribution and Filtering
- Event Collection and Analysis
- Detection of Situations and Decision Making
- Taking Action
- Reporting and Monitoring
There is an incredible variety of events within most enterprises. These can range from real-time events generated by web sites, process flows, mobile applications, events that have to be replayed from batch or ETL jobs, transactional feeds from systems of record, real-time events generated by sensors, through to externally generated events that are of interest, such as social media content, traffic reports or weather feeds.
At minimum these events should carry basic temporal information about when an event occurred, and the type of event. Some event types also carry important spatial information that encodes where the event occurred. Often knowing both when and where something happened is critical to understanding the importance of an event and how to respond. In many cases an event is correlated with a business entity: a “Change of Address event is related to a Customer”, for example. In practical terms this means that a Change Of Address event contains a time-stamp, information about the new address and a customer id.
Architects should approach event sourcing as pragmatically as possible, working with the stakeholders to establish the business value of each potential event source, and whether additional value can be extracted by correlating with other event sources. In general it is best to work backwards from a business goal, action, or situation definition, deriving the event sources that need to be tapped and potentially correlated. See Situation Driven Design by Dan Selman for a methodology you can use as a starting point. For example, a commercial vehicle fleet management company may have a batch feed of fuel card transactions used to refuel vehicles. They may have a realtime feed of vehicle movement events (telematics). By correlating these two event streams they can detect vehicles that need maintenance (unusually low MPG for the make and model of the vehicle) or detect people using their company fuel cards to refuel their personal vehicles (fuel fraud). Working backwards from the business problem ensures that thereâ€™s a shared understanding of which event sources are required, and helps build a business case that justifies the IT investment to bring the events into an infrastructure where they can be understood and correlated.
Event Distribution and Filtering
Events come in many different formats (JSON, XML, binary…) and may be transported via many different protocols, from MQTT, to AMQP to IBM MQ, to HTTP POST, to batch replay. The frequency of the events can range from millions per second to a few per year. The events may have a structured (easily machine readable) format or may require considerable pre-processing to extract structure (free text, image, video, sensor data).
The event distribution and filtering architecture will be heavily influenced by all these factors. Processing events that encode vital signs from 50 medical devices in an ICU ward will likely require a different event distribution and filtering approach than processing the cash withdrawal events for bank accounts.
In general terms (though this is not always the case) events that occur very frequently (the movement of a stock market ticker for example) have limited business value in isolation. Events that are less frequent and more course-grained often have more business value. It is often the case therefore that high-velocity event sources are summarized using streaming analytics into a stream of less frequent events with higher business/semantic meaning. For example, in an hospital ICU, the high-velocity streams of events from 50 medical devices gets summarized into a Heart Attack event, which triggers an alarm which calls a nurse working nearby within the ICU. The distribution and filtering of the medical device events may require different technology than the distribution of the Heart Attack event.
We see three broad categories of approaches for event distribution and filtering:
- Publish and Subscribe Messaging. Used in enterprise environments when event volume is low to medium and there’s a strong focus on loose coupling of systems, message transformation, enrichment and integration. Traditionally offered by ESB or messaging products such as IBM Integration Bus, or IBM MQ. These products have great mediation capabilities and can transform and enrich structured messages using powerful transformation languages and connectors to existing systems of record. Queues and topics can be used to distribute, filter and subscribe to events.
- Point-to-point messaging when event volumes are high, or if there’s limited need for loose coupling and declarative specification of integration. This could be as simple as an HTTP POST from a mobile application to a server side application, or a TCP/IP socket connection between a sensor and a backend system. It may also be used to replay a large batch of events that are received from a non-realtime data source, such as a file transfer or ETL job.
- Streaming analytics when event volumes are high or there’s a need to apply sophisticated analytics to interpret the event stream, such as facial recognition, text processing or waveform analysis. This is the domain for tools such as IBM InfoSphere Streams that specialize in applying analytics to high velocity and high variety event streams. Often streaming analytics is used to summarize the event stream into lower frequency events with higher business meaning. For example, in a security context, a 30 frames per second CCTV stream monitoring a parking garage is transformed into a lower volume, but more actionable, stream of “Car XYZ entering garage A” events.
Event Collection and Analysis
Almost all Event Driven Architectures need to collect received events and perform some level of batch analytics. This can be as simple as logging received events and using log aggregation and query tools to search and visualize the events, to placing the events into very sophisticated Operational Data Stores and using Big Data techniques such as Map-Reduce or query capabilities and visualization to extract insights from the historical events. Data Scientist may use predictive analytics tools, such as IBM SPSS Modeler to build predictive models based on correlations within the captured data sets.
In some cases batch reports are generated from the Operational Data Store using Business Intelligence or Reporting tools, such as those offered by IBM Cognos.
Events that are stored for offline (batch) analytics are often retained for long periods, however their business value may quickly diminish as the events age. A business decision needs to be made to determine how long to keep the events based on legal and IT operational cost considerations.
Detection of Situations and Decision Making
At the heart of the Event Driven Architecture is software components for situation detection and decision-making. IT architects and developers need powerful capabilities to define situations, and the actions that should be taken when situations occur. It allows IT and business users to very quickly modify the sequence of events (event patterns) that define a situation, as well as the decision-making rules that determine the appropriate response. A rule-based approach provides flexibility, and the state required to run the rules (context and event history) is managed by the situation detection platform. The ability to detect situations quickly, and scale elastically, is provided by an in-memory data/compute grid, ensuring that access to the context is fast, transactional and highly-available.
IBM Operational Decision Manager – Decision Server Insights provides a flexible platform for rule-based situation detection, describing situations as a set of Event-Condition-Action rules that can make use of a rich decision-making context, comprised of: historical events, information about key business entities, integration with systems of record, spatial information and calls to predictive models.
Not all of the context required to make the best decisions may be provided by the events. In such cases the decision making platform is required to enrich the contextual data from other sources like predictive analytics scoring services, system of engagement for client segmentation, or systems of record.
Once a situation has been detected and an appropriate response has been decided upon, an automated action needs to be triggered. This could be as simple as writing a record into a database to indicate that the situation occurred, sending a realtime alert, such as sending an SMS or Tweet, opening a case in a Case Management tool so that a human can investigate potential fraud, or triggering a workflow or business process to orchestrate tasks carried out by humans or machines.
The software that detects situations and decided upon an appropriate response typically raises an event (sometimes called an Action Event) and places in back into the event distribution and filtering infrastructure. In most cases this infrastructure is provided by an ESB, such as IBM Integration Bus or message-oriented middleware, such as IBM MQ. Applications that are responsible for carrying out the action listen on the queues provided by the ESB and when an Action Event arrives run the appropriate routines.
In other cases a simpler, point-to-point architecture may be used, with the situation detection software sending the Action Event to an HTTP endpoint, where the action is triggered.
Report and Monitor
Many systems require that operators have realtime visibility into the behavior of the running solutions. This can take the form of monitoring of business Key Performance Indicators, as provided by IBM Business Monitor, to IT infrastructure monitoring, as provided by IBM Tivoli. It can also take the form of periodically generated summary reports, as provided by IBM Cognos. Typically the realtime monitoring and dashboard tools will consume events flowing across the event bus, will the non-realtime report generation tools will run against the Operational Data Store.