Taxonomy Icon

Linux

Introduction

A team of experts from IBM and ScyllaDB set out to demonstrate the value of running ScyllaDB on an IBM® POWER9™ processor-based server compared to running Apache Cassandra on an Intel Xeon SP processor-based server.

The test results confirm that Scylla has better throughput for both read and write operations when running on IBM POWER9 processor-based servers.

This article describes the test environments and hardware configurations that were used, the test results with supporting data, and POWER9 sizing examples with tuning recommendations. Also, this article explains POWER9 capabilities that contribute to the superior performance of the Scylla.

Introduction to Scylla

Scylla is an open source NoSQL database that is a drop-in alternative to Apache Cassandra. Scylla and Cassandra are wide column stores. Scylla uses the best features of Cassandra while providing up to 10 times better performance than Cassandra and lower latency in some workloads. Scylla’s highly scalable architecture enables it to handle one million database transactions per second on a single server. Like Cassandra, Scylla has automatic failover and replication across nodes to maximize availability and reliability. Workload conditioning provides a set of dynamic scheduling algorithms to minimize database operation latency jitter and reduce compaction streaming and repair time. Scylla’s architecture enables the database to use both scale up and scale out hardware and processor models.

Read Scylla Architecture for more details.

Read Scylla Apache Cassandra Compatibility to find how compatible Scylla is with Cassandra.

Value of IBM Power Systems for Scylla

There are several key reasons why Scylla is a good match for running on IBM POWER9 servers. The sharding feature within Scylla makes full use of simultaneous multithreading (SMT) on IBM Power® architecture for optimal parallelism. Seastar framework uses a shared-nothing model that shards all requests onto individual CPUs and allows efficient message passing between the CPUs without time-consuming locking.

POWER9 cores, with enhanced cache hierarchy, have a total of 120 MB of L3 caches with twelve 20-way associative regions and advanced placement policies. The L3 caches are fed by 7 TBps on-chip bandwidth. Along with its peak I/O bandwidth of 80 GBps, as powered by PCIe Gen4 (with Gen3 compatibility), POWER9 provides a very high data bandwidth environment that is superior to x86.

Also, POWER9 allows more memory capacity for larger in-memory database support.

Test environment

Two test environments were set up: one for running Apache Cassandra on Intel Xeon SP and another for running Scylla on POWER9. Each server was similarly configured with comparable resources allocated on each server for the testing. The same performance benchmark was used when running tests in both POWER9 and Intel server environments.

Benchmark

The open source cassandra-stress tool was used for benchmarking and load testing. cassandra-stress is a Java-based tool that provides a built-in method to populate test data and perform a variety of workload stress tests.

Two key database operations were tested: read operations and write operations. Throughput was measured for each test run. Throughput is measured as the number of read (or write) operations per second. The tests were run multiple times for both read and write operations, and averages were calculated. Tests were also run using a mix of both read operations (80%) and write operations (20%).

In the cassandra-stress tests, the write tests were run with the uniform model. The read and mixed operation tests were run with the Gaussian model 9 million/4.5 million/10,000.

In this test, 56 cassandra-stress loaders were used.

Each write test iteration starts when the cluster has no data (data is cleared, and the database is restarted before each iteration).

A data set size of 156 GB was used in the tests. There was one row of data per partition. The average row size was 310 bytes. Here is the schema that was used:


cqlsh>  DESC  KEYSPACE ks1

CREATE KEYSPACE ks1 WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '1'} AND
                                        durable_writes  =  true;

CREATE TABLE ks1.standard1 
    ( key blob PRIMARY 
      KEY, "C0"   blob,
      "C1" blob,
      "C2" blob,
      "C3" blob, "C4" blob
)WITH COMPACT STORAGE AND bloom_filter_fp_chance =   0.01
      AND caching = {'keys': 'ALL',  'rows_per_partition':  'ALL'} 
      AND comment = ''
      AND compaction = {'class': 'SizeTieredCompactionStrategy'} 
      AND compression = {}
      AND crc_check_chance  =  1.0
      AND dclocal_read_repair_chance = 0.1 
      AND default_time_to_live   =   0
      AND gc_grace_seconds = 864000 
      AND max_index_interval    =  2048
      AND memtable_flush_period_in_ms = 0 
      AND min_index_interval   =   128
      AND read_repair_chance   =  0.0
      AND speculative_retry   =  '99.0PERCENTILE';

      

Once the database was running on each server, the cassandra-stress tool was invoked using the following command invocations for the write and read tests.

For write tests:


        /home/user1/scylla-tools-java/tools/bin/cassandra-stress write no-warmup n=9000000 -node <IP> 
        -rate threads=100 -mode native cql3 -pop seq=<key_range> 
        -schema keyspace=ksx replication(strategy=NetworkTopologyStrategy, datacenter1=1)
      

For read tests:


        /home/user1/scylla-tools-java/tools/bin/cassandra-stress read no-warmup n=9000000 -node <IP> 
        -rate threads=400 -mode native cql3 -pop dist=gaussian(key_range,midpoint,10000) 
        -schema keyspace=ksx replication(strategy=NetworkTopologyStrategy, datacenter1=1)
      

For 80% read and 20% write tests:


        /home/user1/scylla-tools-java/tools/bin/cassandra-stress mixed ratio(write=2,read=8) no-warmup n=9000000 
        -node <IP> -rate threads=400  -mode native cql3 -pop dist=gaussian(key_range,midpoint,10000) 
        -schema keyspace=ks1 replication(strategy=NetworkTopologyStrategy, datacenter1=1) 
      

Software configuration

Table 1 shows the key software installed for POWER9 and Intel Xeon processor-based test servers. The servers’ operating systems were each configured in a bare metal, non-virtualized mode for these tests.

Table 1.. Software configuration for the POWER9 and Intel Xeon database servers
Component IBM POWER9 server Intel Xeon SP server
Database solution Scylla Enterprise 2018.1.0 Apache Cassandra 3.11.2
Operating system RHEL 7.5 LE for POWER9 RHEL 7.5

Hardware configuration

Table 2 shows the configuration for POWER9 and Intel Xeon processor-based test servers. The same number of cores, memory, and storage space was used in each configuration.

Table 2.. Hardware configuration for the POWER9 and Intel Xeon database servers
Component IBM POWER9 server Intel Xeon SP server
Model IBM Power ® System LC922 Intel Xeon SP Gold 6140
Processor 2x 22-core, 2 sockets, 2.6 GHz 2x 18-core, 2 sockets, 2.3 GHz
Memory 256 GB 256 GB
Storage Two internal HDD drives
PCI attached 1.6 TB with NVMe
Two internal HDD drives
PCI attached 1.6 TB with NVMe

Network configuration

The cassandra-stress test driver was installed and run on a separate server than the database servers described in Table 2. A 40 Gbps network switch was used. An identical network configuration was used between the stress test server and each database server that was tested.

Test results

The test results show that Scylla performs better on POWER9 in terms of throughput than Cassandra running on an Intel processor-based server. These results mean that POWER9 systems can complete more Scylla database operations in the same footprint space because both test servers were 2U (two units) in size. It also implies that the individual operations complete faster and thus users can access their results sooner.

Throughput

Write throughput of Scylla on POWER9 is 5.83 times better in terms of operations per second than Apache Cassandra on Intel Xeon SP. The write throughput results are shown in Figure 1.

Figure 1. Write throughput on IBM POWER9 versus Intel Xeon SP

Read throughput of Scylla on POWER9 is 2.13 times better than Apache Cassandra on Intel Xeon SP. The read throughput results are shown in Figure 2.

Figure 2. Read throughput on IBM POWER9 versus Intel Xeon SP

Mixed throughput of Scylla on POWER9 is 3.16 times better than with Apache Cassandra on Intel Xeon SP. The mixed throughput was 80% read operations and 20% write operations. The throughput results are shown in Figure 3.

Figure 3. Mixed throughput on IBM POWER9 versus Intel Xeon SP

Tuning

ScyllaDB provides a System Configuration Guide that documents configuration recommendations for a variety of environment elements including hardware, storage, file system, networking, and Scylla software. Several scripts are provided to make configuration easier. In addition to these configurations, additional database tuning and POWER9 tuning is recommended.

Database tuning

Several key aspects of the ScyllaDB and Cassandra databases were tuned for the test environment.

Here is the command invocation that was used for starting Scylla on the POWER9 processor-based server:


        /home/user1/.ccm/scylla-1/node1/bin/scylla --options-file /home/user1/.ccm/scylla-1/node1/conf/scylla.yaml 
        --log-to-stdout 1 --memory 200G --default-log-level info --smp 172 --cpuset 4-175 --max-io-requests 480 
        --num-io-queues 120 --developer-mode false --collectd 0 --prometheus-address 172.31.254.91 
        --background-writer-scheduling-quota 0.5 --auto-adjust-flush-quota 1 --load-balance none 
        --api-address 10.13.116.2 --collectd-hostname bsnode16-1.node1
      

Here is the command invocation that was used for starting Cassandra on the Intel Xeon processor-based server:

 
        java -Xloggc:/home/scylla/.ccm/cassandra-1/node1/logs/gc.log -ea
          -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError
          -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-UseBiasedLocking
          -XX:+UseTLAB -XX:+ResizeTLAB -XX:+UseNUMA -XX:+PerfDisableSharedMem
          -Djava.net.preferIPv4Stack=true -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
          -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
          -XX:+PrintPromotionFailure -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
          -XX:GCLogFileSize=10M -Xms64G -Xmx64G
          -XX:CompileCommandFile=/home/scylla/.ccm/cassandra-1/node1/conf/hotspot_compiler
          -javaagent:/home/scylla/.ccm/repository/3.11.2/lib/jamm-0.3.0.jar
          -Dcassandra.jmx.local.port=7100 -Dcom.sun.management.jmxremote.authenticate=false
          -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password
          -Djava.library.path=/home/scylla/.ccm/repository/3.11.2/lib/sigar-bin
          -Dcassandra.migration_task_wait_in_seconds=2 -XX:OnOutOfMemoryError=kill -9 %p
          -Dlogback.configurationFile=logback.xml
          -Dcassandra.logdir=/home/scylla/.ccm/repository/3.11.2/logs
          -Dcassandra.storagedir=/home/scylla/.ccm/repository/3.11.2/data
          -Dcassandra-pidfile=/home/scylla/.ccm/cassandra-1/node1/cassandra.pid -cp
          /home/scylla/.ccm/cassandra-1/node1/conf:/home/scylla/.ccm/repository/3.11.2/build/classes/main:/home/scylla/.ccm/repository/3.11.2/build/classes/thrift:/home/scylla/.ccm/repository/3.11.2/lib/airline-0.6.jar:/home/scylla/.ccm/repository/3.11.2/lib/antlr-runtime-3.5.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/apache-cassandra-3.11.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/apache-cassandra-thrift-3.11.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/asm-5.0.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/caffeine-2.2.6.jar:/home/scylla/.ccm/repository/3.11.2/lib/cassandra-driver-core-3.0.1-shaded.jar:/home/scylla/.ccm/repository/3.11.2/lib/commons-cli-1.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/commons-codec-1.9.jar:/home/scylla/.ccm/repository/3.11.2/lib/commons-lang3-3.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/commons-math3-3.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/compress-lzf-0.8.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/concurrentlinkedhashmap-lru-1.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/concurrent-trees-2.4.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/disruptor-3.0.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/ecj-4.4.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/guava-18.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/HdrHistogram-2.1.9.jar:/home/scylla/.ccm/repository/3.11.2/lib/high-scale-lib-1.0.6.jar:/home/scylla/.ccm/repository/3.11.2/lib/hppc-0.5.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/jackson-core-asl-1.9.13.jar:/home/scylla/.ccm/repository/3.11.2/lib/jackson-mapper-asl-1.9.13.jar:/home/scylla/.ccm/repository/3.11.2/lib/jamm-0.3.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/javax.inject.jar:/home/scylla/.ccm/repository/3.11.2/lib/jbcrypt-0.3m.jar:/home/scylla/.ccm/repository/3.11.2/lib/jcl-over-slf4j-1.7.7.jar:/home/scylla/.ccm/repository/3.11.2/lib/jctools-core-1.2.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/jflex-1.6.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/jna-4.2.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/joda-time-2.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/json-simple-1.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/jstackjunit-0.0.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/libthrift-0.9.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/log4j-over-slf4j-1.7.7.jar:/home/scylla/.ccm/repository/3.11.2/lib/logback-classic-1.1.3.jar:/home/scylla/.ccm/repository/3.11.2/lib/logback-core-1.1.3.jar:/home/scylla/.ccm/repository/3.11.2/lib/lz4-1.3.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/metrics-core-3.1.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/metrics-jvm-3.1.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/metrics-logback-3.1.0.jar:/home/scylla/.ccm/repository/3.11.2/lib/netty-all-4.0.44.Final.jar:/home/scylla/.ccm/repository/3.11.2/lib/ohc-core-0.4.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/ohc-core-j8-0.4.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/reporter-config3-3.0.3.jar:/home/scylla/.ccm/repository/3.11.2/lib/reporter-config-base-3.0.3.jar:/home/scylla/.ccm/repository/3.11.2/lib/sigar-1.6.4.jar:/home/scylla/.ccm/repository/3.11.2/lib/slf4j-api-1.7.7.jar:/home/scylla/.ccm/repository/3.11.2/lib/snakeyaml-1.11.jar:/home/scylla/.ccm/repository/3.11.2/lib/snappy-java-1.1.1.7.jar:/home/scylla/.ccm/repository/3.11.2/lib/snowball-stemmer-1.3.0.581.1.jar:/home/scylla/.ccm/repository/3.11.2/lib/ST4-4.0.8.jar:/home/scylla/.ccm/repository/3.11.2/lib/stream-2.5.2.jar:/home/scylla/.ccm/repository/3.11.2/lib/thrift-server-0.3.7.jar:/home/scylla/.ccm/repository/3.11.2/lib/jsr223//.jar
          -Dcassandra.join_ring=True -Dcassandra.logdir=/home/scylla/.ccm/cassandra-1/node1/logs
          -Dcassandra.boot_without_jna=true org.apache.cassandra.service.CassandraDaemon

Cassandra is tuned based on the DataStax JVM tuning guide at: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html

POWER9 tuning

During the test, the following three key POWER9 system-level environment settings were changed to achieve the best performance of Scylla running on POWER9.

Frequency

Set the frequency performance mode to turbo. In a bare metal environment, the CPU energy governor is controlled by the operating system (OS). Use the command cpupower to verify and set the frequency as follows:

cpupower -c 0-175 frequency-info

cpupower -c 0-175 frequency-set -g performance

Simultaneous multithreading (SMT)

Using SMT, you can concurrently run instruction streams of multiple threads on the same core. On IBM POWER9 processor-based servers, SMT4 is the default setting and most workloads will run well with this default setting. For databases including Scylla, setting SMT to the SMT4 mode is considered a best practice. While POWER9 processor-based servers offer up to four threads per core, Intel processor-based servers provide only two threads per core. Thus, when ScyllaDB runs, it uses four shards per core on POWER9 and two shards per core on Intel processor-based servers.

Use the ppc64_cpu command to set the SMT mode to 4 or 2 as follows:

SMT4: ppc64_cpu --smt=4

SMT2: ppc64_cpu --smt=2

Hardware sizing

Table 3 shows an example of POWER9 configurations for three different installation sizes.

Table 3.. Example of Scylla installation sizes for IBM POWER9 servers
Installation size IBM Power model Number of cores Memory Disk Network
Test, minimal LC921 or LC922 16 cores 64 GB fast SSD or NVMe 1 Gbps
Production LC922 22 cores
– 1 socket
128 GB fast SSD or NVMe 10 Gbps
Analytics, heavy duty LC922 44 cores
– 2 sockets, 22 cores each
256 GB fast SSD or NVMe 10-40 Gbps

The recommendation for disk size is 30 times the memory size. For the example sizes in Table 3, it would be 1.8TB for Test, 3.7TB for Production, and 7.5TB for Analytics/Heavy duty.

Summary

Scylla was tested with IBM POWER9 processor-based servers and superior throughput was achieved with both database read and write operations. There are several key reasons why Scylla runs really well on IBM POWER9 servers. The sharding feature within Scylla makes full use of simultaneous multithreading on Power architecture for optimal parallelism. POWER9 provides a very high data bandwidth environment that is superior to x86. And, POWER9 allows more memory capacity for larger in-memory database support.