Benchmarking Apache Cassandra With Tlp-Stress

This post will introduce you to tlp-stress, a tool for benchmarking Apache Cassandra. I started tlp-stress back when I was working at The Last Pickle. At the time, I was spending a lot of time helping teams identify the root cause of performance issues and needed a way of benchmarking. I found cassandra-stress to be difficult to use and configure, so I ended up writing my own tool that worked in a manner that I found to be more useful. If you’re looking for a tool to assist you in benchmarking Cassandra, and you’re looking to get started quickly, this might be the right tool for you.

Baked in Workloads

tlp-stress centers around the idea that most of what you want to test in Cassandra can be modeled around a small number of common patterns. There are only so many variations of a key value workload, and only so many of those variations matter. tlp-stress makes the assumption that it’s more important to get testing quickly than to perfectly mimic a production workload down to the field names.

Out of the box, you have the ability to run workloads that test basic key/value workloads, counters, materialized views, time series, maps, sets, usage of ALLOW FILTERING, and a time series based on UDTs. I intend on adding workloads to test the new SAI / Vector features going into 5.0 as well as querying virtual tables in 4.0 and up.

Get Started

tlp-stress was originally developed at The Last Pickle, now acquired by DataStax. At the time, I was the primary author, and since leaving TLP the project appears to be abandoned. I’m picking back up where I left off with my fork of tlp-stress. I haven’t set up package publishing yet, so you’ll need to build from source to get started with the latest and greatest. Most notably, my fork has proper support for Java 11, and updates all the underlying build dependencies.

Here’s how to build tlp-stress:

Local Usage

The easiest way is to get going is to simply build the project for local usage:

$ ./gradlew shadowjar installdist

DEB Package - Debian / Ubuntu

Deb packages for Debian / Ubuntu machines, using APT to manage:

$ ./gradlew buildDeb
$ find . -name '*deb'
./build/distributions/tlp-stress_6.0.0_all.deb

RPM

Redhat, Centos, and Fedora all use RPM packages:

$ ./gradlew buildrpm
$ find . -name '*rpm'
./build/distributions/tlp-stress-6.0.0.noarch.rpm

Docker

tlp-stress uses the jib gradle plugin to make it easy to create and publish docker images to your registry. You’ll want to use one of the following, depending on if you want to publish to a docker registry, build a tarball, or build to your own Docker daemon:

$ ./gradlew jib
$ ./gradlew jibBuildTar
$ ./gradlew jibDockerBuild

Running It

Now that you have tlp-stress build (and maybe even installed somewhere). Let’s start by getting a list of the workloads that it is aware of:

$ tlp-stress list
Available Workloads:

AllowFiltering
BasicTimeSeries
CountersWide
KeyValue
LWT
Locking
Maps
MaterializedViews
RandomPartitionAccess
Sets
UdtTimeSeries

You can run any of these workloads by running tlp-cluster run WORKLOAD.

Great! We’ve got some workloads baked in already. Let’s try the KeyValue workload. We’ll use the -d (duration) flag to keep the test short and supply 30s, for 30 seconds.

$ tlp-stress run KeyValue -d 30s
Creating tlp_stress:
CREATE KEYSPACE
 IF NOT EXISTS tlp_stress
 WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3 }

Creating schema
Executing 1000000 operations with consistency level LOCAL_ONE
Connected
Creating Tables
CREATE TABLE IF NOT EXISTS keyvalue (
                        key text PRIMARY KEY,
                        value text
                        ) WITH caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 0
Preparing queries
Initializing metrics
Connecting
Creating generator random
1 threads prepared.
Starting main runner
Running created, sleeping
[Thread 0]: Running the profile for 1000000 iterations...
                 Writes                                  Reads                                  Deletes                       Errors
  Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) |   Count  Latency (p99)  1min (req/s) |   Count  1min (errors/s)
  44565             17             0 |   44475             12             0 |       0              0             0 |       0                0
 103069          22.39       17752.4 |  103030          11.29       17735.8 |       0              0             0 |       0                0
 183059          22.39       17752.4 |  183062          10.07       17735.8 |       0              0             0 |       0                0
 279818          10.69      18481.28 |  279606           8.01      18465.19 |       0              0             0 |       0                0
 380727          11.07      19519.94 |  380767           6.19         19508 |       0              0             0 |       0                0
 479882          11.07      19519.94 |  480138           6.14         19508 |       0              0             0 |       0                0
 499938           11.7      19519.94 |  500062           6.14         19508 |       0              0             0 |       0                0
Stress complete, 0.

Take a look at the help (tlp-stress -h) to see what else is supported. The power of tlp-stress lies in the ability to customize the existing workloads, rather than creating your own from scratch. For example, you can change the compaction strategy used, compression strategy, or many of the other table parameters that are available at table creation time. Read / write ratios can be adjusted with the -r flag, driver flags can be set as well as increasing concurrency limits.

Limitations and Alternatives

Nothing is perfect, especially software. tlp-stress makes a deliberate tradeoff - easy to get started, easy to tweak, but a bit more work if you need something special. Other tools such as Netflix’s ndbench or DataStax’s other benchmarking tool, nosqlbench offer the ability to test other datastores besides Cassandra and use configuration to drive workloads without as much opinion. If you’re looking for that kind of flexibility, tlp-stress is probably not for you. If you’re looking to get up and running in just a few minutes, tlp-stress is likely a good fit for you.

Aside from the feature differences, it’s been noted that tlp-stress’s lack of a query scheduler means it suffers from coordinator omission. This is due to the underlying query model being pull based. tlp-stress will use a concurrency limit to avoid overwhelming a cluster, which means it can avoid running into issues where it overwhelms Cassandra. This can be good or bad depending on your outlook. The impact here is latency numbers may appear to be better than what you would see if production. I will be addressing this in a future update.

The TLP repo currently builds and appears to run in Java 11, but the jobs never start correctly. My fork runs correctly. The Documentation still lives in TLP. Once I have some free time I’ll get this building here.

Conclusion

Hopefully this post helps you understand why and when you’d want to use tlp-stress. It’s not perfect by any means, but it has been used to identify performance issues in hundreds of clusters including some of the biggest deployments in the world. Check out the repo on GitHub. )

If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.