Updated 12 April 2016
Setting up an ODM Advanced – Decision Server Insights topology can seem complicated, because there are so many options. However, many of the options are not realistic in real world scenarios. So, we can reduce our main choices to just two: highly and continuously available or not. There are minor variations of those two main choices, plus scaling.
This document mainly covers configuring DSI on Linux, AIX and Windows. z/OS requires special considerations, which are not covered.
Highly availability (HA) means that the system continues to operate across certain hardware and software faults. When properly configured, DSI can tolerate the loss of a single hardware node or software server. Continuous availability (CA) is related. It means that the cluster can remain operational during maintenance operations, like a hardware upgrade or operating system update.
DSI accomplishes CA in an HA-like way – by allowing a portion of the topology to be taken offline for maintenance, while the rest of the cluster continues to run normally. In addition to hardware and operating system maintenance, DSI can be continuously available during the installation of a DSI fix pack, and some â€śdotâ€ť releases. Upgrading to a major release however, requires that the whole cluster be taken down for migration.
Types of DSI Servers
Below are the 5 server types in a DSI topology. (Server memory requirements are listed.) Note that “server” refers to an instance of Liberty, and “node” refers to the computers on which the servers run. Some types of servers can be co-located on the same node.
Runtime – This is the main component in DSI, where events are processed, and where entities are stored. All the runtime servers constitute a DSI “cluster”. The event processing and entity storage is partitioned across this cluster. (8-124GB)
Input connectivity – Input connectivity servers receive events over HTTP or JMS and send them to the runtime to be processed. For applications that use the gateway API directly, input connectivity servers may not be required. (5GB)
Output connectivity – Output connectivity servers receive emitted events from the runtime, and send them via HTTP or JMS. (5GB)
Catalog – Catalog servers are a WXS component, which track the placement of partitions in the WXS cluster. To achieve HA catalog servers must not be installed on the same node as a runtime server. See Configuring Catalog Servers for DSI High Availability for more information. (2GB)
Backing database – The backing database is used to store events offline, when they are not needed in memory. It is also used to recover, after the whole cluster has been shut down. The backing database is a prerequisite for, and not strictly a part of DSI. DSI supports DB2 and Oracle. Note that for a HA cluster, you will need an HA DBMS. (8GB per core)
The runtime and database need to be scaled to handle particular solutions. Although this document addresses some aspects of scaling, it is not a reference for sizing.
DSI Server Placement
To achieve high availability it is essential that all the servers of the same type be distributed on multiple hardware nodes. Otherwise, if one of the hardware nodes goes down, the whole cluster may fail. This is an important consideration, especially in virtualized environments.
These are the requirements for an HA topology:
- Runtime servers must be distributed across at least 3 nodes – preferably at least 4 nodes.
- All other server types must be distributed across at least 2 nodes.
- Multiple servers on a given node are fine, as long as you have the minimum number of nodes hosting that server type, e.g. if memory sizing requires 8 runtime servers, hosting 2 servers per node is OK. However, unless you need more runtime servers for memory scalability, it’s better to have only one server per node.
- Catalog servers and runtime servers must not be hosted on the same node.
- DSI will not be able to use parallelism to achieve scale, i.e. adding cores wonâ€™t help. You may start to see this problem if you expect more than 10,000 events per day per entity â€“ or at lower numbers, if the events are not evenly distributed among the entities.
- If the event horizon is long, the agents associated with each entity will be tracking too many events to achieve good performance. You can process millions of events per day with a time horizon of some minutes, but you can run into performance problems with a thousands of events per day with a time horizon of a year.
- The combination of a non-HA DBMS and writeBehind persistence is likely to result in data loss, if the DMBS becomes inaccessible.
- If configured for writeThrough persistence, a DSI topology may be able to endure a short period of the DBMS unavailability without incurring data loss, although event processing will stop, while the DBMS is inaccessible.
It is often convenient to run the smaller components (inbound connectivity, outbound connectivity, and catalog) on the same nodes, but in separate Liberty instances. Co-location greatly simplifies the topology. As long as these severs run only DSI components, they are covered under the supporting program license. If any other application is run on these servers, they require a separate WAS license.
The key advantage of a separate Liberty instance is to allow starting and stopping each server type independently. Typically, for example, inbound connectivity servers are stopped first as a controlled way to stop the flow of events before shutting down the runtime and outbound connectivity servers.
Conversely, it’s often best for runtime servers to be located separately from the other server types. One advantage to the independence is that the full PVU-licensed capacity of the runtime nodes, can be devoted to runtime processing.
Exact sizing for a solution can be involved, because of the combination of memory requirements for entity and event storage, plus cores (CPUs) for event processing. Sizing methodologies are beyond the scope of this article.
DSI can use either Java heap or Extreme Memory to store events, entities, and agent state. If your platform supports it, you should use extreme memory, which uses operating system heap. XM allows you to scale servers up 124GB memory; without XM the limit is 60GB. In addition to allowing more memory, XM has overall performance advantages.
DSI runtime servers are often not CPU limited. A single core can process millions of events per day. (Emitted events that are reprocessed by the system, and events processed by more than one entity are counted separately.) Nodes hosting runtime servers often have 8-16 cores, though sometimes more or less. VMs sometimes have fewer cores. In large-memory non-XM configurations, you should add more cores to speed up GC processing.
When using Linux nodes, there is a limit on the maximum number of execution threads per user. (This is shown under “max user processes” in the output of the “ulimit -a” command.) The default maximum is too low for DSI, which has highly parallel processing. Set the limit to at least 10000 on all nodes, which are hosting any type of DSI server. You can set this with “ulimit -u 10000”. Failing to set a sufficiently high limit, can result in false OutOfMemoryError exceptions, and the loss of a server.
Horizon and Event Caching
The horizon is how long into the past a solution can “see” events. Agents keep references to the events that they need. When several agents process the same event and each agent sets a different event horizon, DSI keeps the events for the longest horizon, up to the maxHorizon property setting. Since retaining events can have a significant impact on capacity, the maxHorizon property allows an administrator to place an ultimate limit on event retention. Changing the horizon affects the events that are visible to an agent. Note that the horizon can be increased in anticipation of future rules that are going to need a longer scope.
The eventCacheDuration property setting can be used to reduce the overall memory requirements for a solution. When eventCacheDuration is set, events are removed from memory after that time, but will be reloaded from the backing database as required. If eventCacheDuration is set to half the length of your horizon, it could reduce the overall memory requirements by more than a third. The more events per entity, the more dramatic the reduction. If you have many entities, which are only used infrequently, you can reduce eventCacheDuration to very small times â€“ like 1 minute.
Whenever you change eventCacheDuration, you must also consider changing the Extreme Scale objectgrid.xml configuration for timeToLive. timeToLive is used to control how long events are retained, after events are reloaded from the database. Thus, timeToLive controls how long an event says in memory after every use, except the after the initial create. Since timeToLive is not configurable dynamically, it should be set more conservatively than eventCacheDuration.
Counterinutitively, you must use caution when designing a solution that expects to have only a small number of entities, because there are potential performance problems with large number of events per entity. There are two potential problems with only a few hundred entities:
So, if you have 200 entities, you may start to see processing limitations at about 2M events/day, assuming that the events are evenly distributed among the entities.
Since agents have to manage lists of events within the horizon, the number of events for a single entity should not be too large. More than 1000 events per entity within an agentâ€™s horizon is a cause for concern.
Extreme Scale Configuration
These are the recommended settings for the main objectgriddeployment.xml settings. (You can see examples in the non-HA and HA configurations below.)
developmentMode=”false” – To achieve HA for production or realistic testing developmentMode must be set to false. If set to true, WXS ignores some of the other settings.
maxAsyncReplicas=”1″ – Asynchronous partition replicas allow the system to recover quickly in an HA/CA situation. This value should be set to 1 in HA environments. The number of asynchronous replicas is automatically reduced for small clusters.
maxSyncReplicas=”1″ – Synchronous partition replicas allow the system to handle HA/CA situation without losing data. maxSyncReplicas is set to the number of nodes, which you can lose at the same time, and still not lose data. Normally, this should be set to 1, because setting it higher incurs significant additional overhead. The number of synchronous replicas is automatically reduced in a single-server cluster.
minSyncReplicas=”0″ – minSyncReplicas is set to 0, so that the system can run, even if there is no replica. If this is set to 1 (not recommended), the system will stop, if there is only 1 node in the runtime cluster.
numberOfPartitions=”127″ – DSI needs a large number of partitions to run efficiently. A good value for numberOfPartitons across the cluster is to add up all the cores across the cluster, double it and round up to the next prime number. (Do not set below 13 in a cluster.) Since this value cannot be changed, while the cluster is running, you may want to increase it further for future scale-out. For example, if you start with 4 16-core nodes, but you think that eventually you might need 8 nodes, set numberOfPartitions to 257.
numInitialContainers=”3″ – The cluster will not start until this number of servers are running. numInitialContainers is important to prevent a storm of partition placement, when you initially start a cluster. One way to calculate this value is (maxSyncReplicas + maxAsyncReplicas + 1). Another way is one less than the number of servers in your cluster. Often the value is 3, but it must be set lower for clusters of 1 or 2 servers.
Configuring a Topology for Non-HA
If you don’t need high availability, you typically only need a single node, unless you have very high memory or throughput requirements. A single node is faster and easier than a cluster.
Plan to run the input and output connectivity servers, if any, and the catalog server on the same node with the runtime server. Itâ€™s OK to run the backing database on the same node, but be sure to consider its capacity requirement.
A good starting point is to install everything on a node with 32GB RAM – twice that if you have the DBMS on the same node.
Configure Extreme Scale this way (assuming no XM):
numberOfPartitions=”17″ (twice the total number of cores across the cluster, rounded up to prime)
Leaving room for the operating system, here is the recommended Java memory configuration for the runtime server (assuming no XM and using 14GB memory of the 32GB RAM available on the node for the runtime server):
Configuring a Topology for HA
If you need high availability, you’ll need a high availability DBMS, plus 6 nodes for DSI – 2 nodes to host the input and output connectivity servers plus the catalog servers, and 4 nodes for the runtime servers. A 4 runtime server minimum is best for an HA cluster, so that transitions are smoother during node shutdowns for maintenance.
You’ll need 16GB RAM on 2 nodes to host the input and output connectivity servers plus the catalog servers. Place one instance of each type of server on each node.
For the runtime servers 32GB RAM on each of the 4 nodes is a good starting point. Use XM, if your platform supports it.
Configure Extreme Scale this way (assuming XM on a node with 32GB RAM):
maxXMSize=21000 (RAM – heap – overhead)
numberOfPartitions=”67″ (twice the total number of cores across the cluster, rounded up to prime)
Leaving room for the operating system, here is a recommended Java memory configuration for the runtime server (assuming XM):
An HA DBMS is required to build a truely highly available topology. If however, you are not able to set up an HA DBMS, note the following when using an HA DSI topology with a non-HA DBMS: