Performance Tuning

Compression Performance and Ratio: The Final Frontier for Cassandra Node Density

9 min read

This is the seventh post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. We’ve already covered streaming operations, compaction strategies, repair processes, query throughput optimization, garbage collection, and efficient disk access. Now, we’ll focus on the final major factor impacting node density: compression performance and ratio.

At a high level, these are the leading factors that impact node density:

  • Streaming Throughput
  • Compaction Throughput and Strategies
  • Various Aspects of Repair
  • Query Throughput
  • Garbage Collection and Memory Management
  • Efficient Disk Access
  • Compression Performance and Ratio (this post)
  • Linearly Scaling Subsystems with CPU Core Count and Memory

Why Compression Matters for Node Density

Compression is one of the most overlooked yet impactful factors affecting Cassandra node density. It directly influences:

cassandra compression performance tuning
Read more

Efficient Disk Access: Optimizing Storage for High-Density Cassandra Nodes

9 min read

This is the sixth post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. We’ve already covered streaming operations, compaction strategies, repair processes, query throughput optimization, and garbage collection. Now, we’ll focus on one of the most fundamental aspects of database performance: efficient disk access.

For a quick refresher, these are the leading factors that impact node density:

  • Streaming Throughput
  • Compaction Throughput and Strategies
  • Various Aspects of Repair
  • Query Throughput
  • Garbage Collection and Memory Management
  • Efficient Disk Access (this post)
  • Compression Performance and Ratio
  • Linearly Scaling Subsystems with CPU Core Count and Memory

Why Disk Access Matters for Node Density

In my time of operating and optimizing Cassandra clusters, I’ve found that efficient disk access becomes significantly more important as node density increases. When your nodes hold just 1-2TB of data, you might barely notice inefficient disk access patterns. But push that to 10-20TB per node, and these same inefficiencies transform into critical bottlenecks that can cripple your entire system.

cassandra disk IO storage
Read more

Analyzing Cassandra Performance with Flame Graphs

One of the challenges of running large scale distributed systems is being able to pinpoint problems. It’s all too common to blame a random component (usually a database) whenever there’s a hiccup even when there’s no evidence to support the claim. We’ve already discussed the importance of monitoring tools, graphing and alerting metrics, and using distributed tracing systems like ZipKin to correctly identify the source of a problem in a complex system.

cassandra performance tuning flame graphs
Read more