Benchmarking Apache Cassandra With Tlp-Stress
This post will introduce you to tlp-stress, a tool for benchmarking Apache Cassandra. I started tlp-stress back when I was working at The Last Pickle. At the time, I was spending a lot of time helping teams identify the root cause of performance issues and needed a way of benchmarking. I found cassandra-stress to be difficult to use and configure, so I ended up writing my own tool that worked in a manner that I found to be more useful. If you’re looking for a tool to assist you in benchmarking Cassandra, and you’re looking to get started quickly, this might be the right tool for you.
Baked in Workloads
tlp-stress centers around the idea that most of what you want to test in Cassandra can be modeled around a small number of common patterns. There are only so many variations of a key value workload, and only so many of those variations matter. tlp-stress makes the assumption that it’s more important to get testing quickly than to perfectly mimic a production workload down to the field names.
Out of the box, you have the ability to run workloads that test basic key/value workloads, counters, materialized views, time series, maps, sets, usage of ALLOW FILTERING, and a time series based on UDTs. I intend on adding workloads to test the new SAI / Vector features going into 5.0 as well as querying virtual tables in 4.0 and up.
Get Started
tlp-stress was originally developed at The Last Pickle, now acquired by DataStax. At the time, I was the primary author, and since leaving TLP the project appears to be abandoned. I’m picking back up where I left off with my fork of tlp-stress. I haven’t set up package publishing yet, so you’ll need to build from source to get started with the latest and greatest. Most notably, my fork has proper support for Java 11, and updates all the underlying build dependencies.
Here’s how to build tlp-stress:
Local Usage
The easiest way is to get going is to simply build the project for local usage:
$ ./gradlew shadowjar installdist
DEB Package - Debian / Ubuntu
Deb packages for Debian / Ubuntu machines, using APT to manage:
$ ./gradlew buildDeb
$ find . -name '*deb'
./build/distributions/tlp-stress_6.0.0_all.deb
RPM
Redhat, Centos, and Fedora all use RPM packages:
$ ./gradlew buildrpm
$ find . -name '*rpm'
./build/distributions/tlp-stress-6.0.0.noarch.rpm
Docker
tlp-stress uses the jib gradle plugin to make it easy to create and publish docker images to your registry. You’ll want to use one of the following, depending on if you want to publish to a docker registry, build a tarball, or build to your own Docker daemon:
$ ./gradlew jib
$ ./gradlew jibBuildTar
$ ./gradlew jibDockerBuild
Running It
Now that you have tlp-stress build (and maybe even installed somewhere). Let’s start by getting a list of the workloads that it is aware of:
$ tlp-stress list
Available Workloads:
AllowFiltering
BasicTimeSeries
CountersWide
KeyValue
LWT
Locking
Maps
MaterializedViews
RandomPartitionAccess
Sets
UdtTimeSeries
You can run any of these workloads by running tlp-cluster run WORKLOAD.
Great! We’ve got some workloads baked in already. Let’s try the KeyValue workload. We’ll use the -d
(duration) flag to keep the test short and supply 30s
, for 30 seconds.
$ tlp-stress run KeyValue -d 30s
Creating tlp_stress:
CREATE KEYSPACE
IF NOT EXISTS tlp_stress
WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3 }
Creating schema
Executing 1000000 operations with consistency level LOCAL_ONE
Connected
Creating Tables
CREATE TABLE IF NOT EXISTS keyvalue (
key text PRIMARY KEY,
value text
) WITH caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 0
Preparing queries
Initializing metrics
Connecting
Creating generator random
1 threads prepared.
Starting main runner
Running created, sleeping
[Thread 0]: Running the profile for 1000000 iterations...
Writes Reads Deletes Errors
Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | Count 1min (errors/s)
44565 17 0 | 44475 12 0 | 0 0 0 | 0 0
103069 22.39 17752.4 | 103030 11.29 17735.8 | 0 0 0 | 0 0
183059 22.39 17752.4 | 183062 10.07 17735.8 | 0 0 0 | 0 0
279818 10.69 18481.28 | 279606 8.01 18465.19 | 0 0 0 | 0 0
380727 11.07 19519.94 | 380767 6.19 19508 | 0 0 0 | 0 0
479882 11.07 19519.94 | 480138 6.14 19508 | 0 0 0 | 0 0
499938 11.7 19519.94 | 500062 6.14 19508 | 0 0 0 | 0 0
Stress complete, 0.
Take a look at the help (tlp-stress -h
) to see what else is supported. The power of tlp-stress lies in the ability to customize the existing workloads, rather than creating your own from scratch. For example, you can change the compaction strategy used, compression strategy, or many of the other table parameters that are available at table creation time. Read / write ratios can be adjusted with the -r
flag, driver flags can be set as well as increasing concurrency limits.
Limitations and Alternatives
Nothing is perfect, especially software. tlp-stress makes a deliberate tradeoff - easy to get started, easy to tweak, but a bit more work if you need something special. Other tools such as Netflix’s ndbench or DataStax’s other benchmarking tool, nosqlbench offer the ability to test other datastores besides Cassandra and use configuration to drive workloads without as much opinion. If you’re looking for that kind of flexibility, tlp-stress is probably not for you. If you’re looking to get up and running in just a few minutes, tlp-stress is likely a good fit for you.
Aside from the feature differences, it’s been noted that tlp-stress’s lack of a query scheduler means it suffers from coordinator omission. This is due to the underlying query model being pull based. tlp-stress will use a concurrency limit to avoid overwhelming a cluster, which means it can avoid running into issues where it overwhelms Cassandra. This can be good or bad depending on your outlook. The impact here is latency numbers may appear to be better than what you would see if production. I will be addressing this in a future update.
The TLP repo currently builds and appears to run in Java 11, but the jobs never start correctly. My fork runs correctly. The Documentation still lives in TLP. Once I have some free time I’ll get this building here.
Conclusion
Hopefully this post helps you understand why and when you’d want to use tlp-stress. It’s not perfect by any means, but it has been used to identify performance issues in hundreds of clusters including some of the biggest deployments in the world. Check out the repo on GitHub. )
If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.