Cassandra 3.2 Overview
The 3.0 release of Apache Cassandra marked an important milestone. One of the biggest updates was CASSANDRA-8099, the JIRA to modernize the storage engine. It was also the first release in the new Tick Tock cycle, which lands a new release of Cassandra every month. Even .x numbers (such as 3.2) are feature releases, and odd .x numbers (such as 3.1) are bug fix releases. Cassandra 3.2, released about a week ago, is the first feature release following 3.0. This post will briefly cover the changes.
Better JBOD support
CASSANDRA-6696 improves JBOD in Cassandra by distributing data to disks based on token range rather than randomly. This should decrease the impact of disk failure by isolating failure to specific token ranges on a machine rather than all the token ranges that the machine is responsible for.
There’s a more thorough blog post on the Datastax Developer Blog about the improvements in JBOD. There’s also a follow up JIRA, CASSANDRA-10540, that will partition data on each disk based on token range, which will hopefully improve data density among other things.
Hints compression
CASSANDRA-9428 has been added, allowing for user defined compression (including encryption) to work with hints. While it may seem minor at first, compression can make a big difference when writing to spinning disks, and encryption is often necessary with financial data, so this can end up being a big deal for a lot of users.
Improvements to index building
CASSANDRA-10681 and CASSANDRA-10678 are dependencies of CASSANDRA-10661, a big upgrade to the secondary indexes available in Cassandra.
Improvements to aggregation functions
Casting has been added, making user defined aggregations significantly more useful. Previously taking the avg() of 1 and 2 will yield 1, since the output type matches the input type (similar to Oracle and SQL Server).
cqlsh:test> create table jon ( id int, val int, ts timestamp, primary key (id, val));
cqlsh:test> insert into jon (id, val) values (2, 1);
cqlsh:test> insert into jon (id, val) values (1, 2);
cqlsh:test> select avg(val) from jon;
system.avg(val)
-----------------
1
CASSANDRA-10310 adds support for CAST() which allows us to get results back in whatever type works best for us.
cqlsh:test> select avg(CAST(val as float)) from jon;
system.avg(cast(val as float))
--------------------------------
1.5
If you’re thinking about using any of these features, grab the latest download off cassandra.apache.org and give it a test. Be sure to submit any issues you find with the Cassandra JIRA.
If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.