Efficient Disk Access: Optimizing Storage for High-Density Cassandra Nodes

This is the sixth post in my series on optimizing Apache Cassandra for maximum cost efficiency through increased node density. We’ve already covered streaming operations, compaction strategies, repair processes, query throughput optimization, and garbage collection. Now, we’ll focus on one of the most fundamental aspects of database performance: efficient disk access.

For a quick refresher, these are the leading factors that impact node density:

  • Streaming Throughput
  • Compaction Throughput and Strategies
  • Various Aspects of Repair
  • Query Throughput
  • Garbage Collection and Memory Management
  • Efficient Disk Access (this post)
  • Compression Performance and Ratio
  • Linearly Scaling Subsystems with CPU Core Count and Memory

Why Disk Access Matters for Node Density

In my time of operating and optimizing Cassandra clusters, I’ve found that efficient disk access becomes significantly more important as node density increases. When your nodes hold just 1-2TB of data, you might barely notice inefficient disk access patterns. But push that to 10-20TB per node, and these same inefficiencies transform into critical bottlenecks that can cripple your entire system.

I recently worked with a client who had scaled their data volume per node from 5TB to 15TB without adjusting their disk access patterns. The result? Compaction couldn’t keep pace with writes, read latencies tripled, and they experienced their first production outage in three years. This isn’t uncommon - disk efficiency becomes the defining factor in how much data you can pack onto each node.

The impact of disk efficiency on node density spans several interconnected dimensions:

  1. Throughput: The raw speed at which data flows to and from storage - at high densities, even a 20% throughput improvement can mean the difference between a stable system and one that’s constantly fighting to keep up
  2. Latency: How quickly your storage responds to requests - in dense nodes, latency spikes frequently cascade into application-level timeouts
  3. IOPS: The number of input/output operations per second your storage can handle - dense nodes can easily exhaust IOPS limits, especially in cloud environments like EBS where IOPS are rate limited
  4. Consistency: How predictable your performance remains under varying workloads - dense nodes tend to experience more dramatic performance swings
  5. Resource Contention: How different database operations compete for limited disk resources - in high-density deployments, operations like compaction, repair, and client reads frequently interfere with each other

When you optimize disk access patterns, the benefits compound dramatically. In my testing with the new read-ahead buffer in Cassandra 5.0.4 (which I’ll discuss in detail below), I’ve seen nodes comfortably handle 2-3x more data while maintaining or even improving performance. This directly translates to significant cost savings - one client reduced their node count from 30 to 12, saving over $250,000 annually in cloud infrastructure costs.

Understanding Cassandra’s Disk Usage Patterns

Before diving into optimizations, it’s crucial to understand how Cassandra uses storage:

Sequential Operations

  1. Memtable Flushes: Convert in-memory data to immutable SSTables on disk
  2. Compaction: Merge multiple SSTables into fewer, larger files
  3. Streaming: Transfer data between nodes during bootstrapping, decommissioning, etc.
  4. Hints Writing: Store hints for temporarily unavailable nodes

Random Operations

  1. Point Lookups: Retrieve specific rows based on partition key
  2. Range Scans: Read contiguous ranges of data
  3. Index Lookups: Access secondary index data structures
  4. Commit Log Writes: Log all mutations for durability before applying to memtables

As node density increases, both sequential and random operations become more numerous and resource-intensive, making optimization even more critical.

Diagnosing Disk Performance Issues

Before optimizing, you need to identify where your bottlenecks are. Here are the essential diagnostic tools:

1. Basic I/O Statistics

Use iostat to get a baseline understanding of disk activity:

iostat -xm 5

Look for high %util (utilization), which indicates disk saturation.

2. Advanced I/O Analysis

For deeper insights, use tools like fio to benchmark your storage:

fio --name=random-read --ioengine=libaio --direct=1 --bs=4k --size=4G --numjobs=1 --runtime=240 --filename=/data/cassandra/test.fio --rw=randread --iodepth=64

This tests random read performance, critical for Cassandra’s read path.

3. Flamegraphs for I/O Bottlenecks

Capture wall-clock flamegraphs to identify I/O bottlenecks:

./profiler.sh -d 60 -e wall -f /tmp/io-profile.html

This shows where time is spent waiting for I/O operations.

4. Cassandra Metrics

Monitor Cassandra’s built-in metrics:

nodetool tpstats

Look for blocked tasks in read/write stages, which often indicate I/O bottlenecks.

Key Disk Optimizations for High-Density Nodes

Now that we know how to diagnose issues, let’s look at specific optimizations for high-density environments:

1. Read-Ahead Settings (CASSANDRA-15452)

One of the most impactful recent improvements in Cassandra 5.0.4 and above is the implementation of buffered reads with internal read-ahead (CASSANDRA-15452). This optimization addresses a fundamental inefficiency in how Cassandra reads data from disk.

The Problem

Prior to this improvement, Cassandra would:

  • Perform many small (~4KB) reads during compaction and range scans
  • Make excessive system calls to the filesystem
  • Waste IOPS, particularly in cloud environments where IOPS are often limited and costly
  • Achieve suboptimal throughput despite available bandwidth

In my testing, I observed that small 4KB reads would only achieve around 12MB/s on cloud storage, while the same storage could deliver 120MB/s with larger read sizes (256KB).

The Solution: Internal Read-Ahead Buffer

Cassandra 5.0.4+ implements an internal read-ahead buffer that:

  • Pre-fetches larger blocks of data (similar to BufferedInputStream)
  • Reduces the number of system calls
  • Makes more efficient use of available IOPS
  • Uses POSIX_FADV_DONTNEED on compaction files to improve memory usage

For high-density nodes, this improvement is transformative, allowing compaction and range scans to operate much more efficiently even with huge volumes of data.

2. OS-Level Read-Ahead Settings

While Cassandra now has its own internal read-ahead, you should still optimize the OS-level read-ahead settings:

# For local SSD with random access patterns
blockdev --setra 8 /dev/nvme0n1

# For EBS volumes with larger partitions
blockdev --setra 16 /dev/xvdf

Default read-ahead settings (often 256KB) are optimized for sequential workloads but can severely degrade random read performance by reading unnecessary data. This is particularly important for point lookups, which don’t benefit from the internal read-ahead buffer.

3. I/O Scheduler Selection

Choose the right I/O scheduler based on your storage type:

# For SSDs and NVMe drives
echo "none" > /sys/block/nvme0n1/queue/scheduler

# For traditional HDDs
echo "deadline" > /sys/block/sda/queue/scheduler

Modern SSDs perform best with minimal scheduling interference, while traditional disks benefit from more sophisticated schedulers.

4. Multiple Data Directories

Distribute load across multiple devices:

# In cassandra.yaml
data_file_directories:
    - /data1/cassandra/data
    - /data2/cassandra/data
    - /data3/cassandra/data

For high-density nodes, consider at least 4-8 data directories across different physical devices to maximize throughput.

5. Commit Log Optimization

Place commit logs on dedicated high-performance storage:

# In cassandra.yaml
commitlog_directory: /commitlog/cassandra/commitlog

For write-heavy workloads, consider NVMe storage for commit logs to reduce write latency.

Cloud-Specific Storage Optimizations

For cloud-based deployments, specific optimizations yield significant benefits:

AWS EBS Optimization

If using AWS EBS volumes:

  1. Use io2 Volumes: For predictable IOPS requirements
  2. Right-size IOPS: Calculate based on expected peak load plus 20% headroom
  3. Enable EBS Optimization: Ensure dedicated bandwidth for EBS traffic
  4. Consider Nitro-based Instances: These provide better EBS performance
# Example AWS CLI command to create an optimized io2 volume
aws ec2 create-volume --volume-type io2 --iops 16000 --size 1000 --availability-zone us-west-2a

My testing with the new internal read-ahead buffer (CASSANDRA-15452) showed dramatic improvements on AWS EBS volumes. With the buffer, throughput increased from around 12MB/s to over 100MB/s for the same workload and cost.

GCP Persistent Disk Optimization

If using Google Cloud:

  1. Use SSD Persistent Disks: For consistent performance
  2. Size for IOPS: Remember that IOPS scale with disk size
  3. Consider Local SSDs: For temporary data like commit logs

Azure Disk Optimization

If using Azure:

  1. Use Premium SSD or Ultra Disk: For predictable performance
  2. Enable Host Caching: Read caching can improve performance for read-heavy workloads
  3. Right-size IOPS and Throughput: Ultra Disk allows independent sizing

Advanced Disk Optimization Techniques

For pushing the limits of node density, consider these advanced techniques:

1. RAID 0 for Data Directories

Stripe data across multiple devices for increased throughput:

mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/nvme0n1 /dev/nvme0n2 /dev/nvme0n3 /dev/nvme0n4

For cloud deployments, this can be particularly effective with local NVMe storage.

2. Filesystem Tuning

Optimize your filesystem settings:

# For XFS (recommended)
mkfs.xfs -f -K -d su=128k,sw=4 /dev/md0

# Mount with optimal options
mount -o noatime,nodiratime,discard,nobarrier /dev/md0 /data/cassandra

Disable unnecessary journal operations and metadata updates for better performance.

3. Disk Layout Strategies

For extremely high-density nodes:

  1. Hot/Cold Separation: Place frequently accessed data on faster storage
  2. Commit Log on Separate Physical Device: Eliminate contention with data files
  3. Dedicated Devices for High-Write Tables: Isolate tables with different access patterns

4. Direct I/O for Commit Logs

Enable direct I/O to bypass the page cache for commit logs:

# In cassandra.yaml
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

This reduces memory pressure and can improve write latency stability.

Real-World Disk Optimization Example

Let me share a case study from a production environment where we optimized disk performance for high-density nodes, incorporating the new read-ahead buffer in Cassandra 5.0.4:

Before Optimization:

  • 10TB per node on AWS EBS gp3 volumes
  • Default read-ahead (256 KB)
  • Single data directory
  • Cassandra 4.1 without internal read-ahead buffer
  • Range scan throughput: ~15 MB/s
  • Compaction throughput: ~150 MB/s

After Optimization:

  • 20TB per node on AWS EBS io2 volumes
  • Optimized OS read-ahead (16 KB)
  • Four data directories across four io2 volumes in RAID 0
  • Upgraded to Cassandra 5.0.4 with internal read-ahead buffer
  • Range scan throughput: ~110 MB/s (7.3x improvement)
  • Compaction throughput: ~450 MB/s (3x improvement)

The key changes were:

  1. Upgrading to Cassandra 5.0.4 with the internal read-ahead buffer (CASSANDRA-15452)
  2. Switching from gp3 to io2 EBS volumes with properly sized IOPS
  3. Reducing OS-level read-ahead to 16KB
  4. Implementing RAID 0 across multiple EBS volumes
  5. Separating commit logs to dedicated high-IOPS storage

The benefit of these optimizations goes beyond just performance; they directly enable higher node density by allowing efficient processing of much larger data volumes per node.

Monitoring and Maintenance

For high-density nodes, ongoing monitoring is crucial:

  1. I/O Statistics: Track utilization, wait times, and queue depths
  2. Disk Space: Monitor free space across all data directories
  3. Performance Metrics: Watch for correlations between disk issues and application performance
  4. Regular Assessment: As data volume grows, storage configurations need periodic review

Consider implementing automated alerts for:

  • Sustained high disk utilization
  • Increasing I/O wait times
  • Uneven space usage across data directories

Conclusion

Efficient disk access is fundamental to achieving high node density in Cassandra clusters. The new internal read-ahead buffer in Cassandra 5.0.4+ represents a major step forward in this area, addressing a long-standing inefficiency in how Cassandra interacts with storage.

By combining this improvement with the other strategies outlined in this post, you can significantly boost storage performance while increasing the amount of data each node can efficiently handle. The impact is multiplicative when combined with the other strategies we’ve covered in this series, enabling you to push node density higher than ever before and dramatically reduce infrastructure costs.

Remember that storage optimization is highly dependent on your specific hardware or cloud provider. Always test changes in a staging environment before applying them to production, and monitor closely after implementation.

In our next post, we’ll explore how compression performance and ratio optimizations enable higher node density and further reduce operational costs.

If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.