I used to think the wikipedia was pretty cool. The idea of being able to collaborate on a topic and create an article together sounds wonderful. Today, I say, it sucks. Want to know why? I edited an article, updating it with correct information, and my edit was rejected within 5 minutes. ** 00:01, 16 August 2006 (hist) (diff) Answerbag (rv - Vandalism. Jonathan Haddad is not listed on the site under staff.
Well, I’m not sure who’s going to be reading this, but if you are, and you are curious about rustyrazorblade.com, here’s the breakdown. I am a web developer with a BS in computer science. What I have to share is a combination of what I’ve learned in school and what I’ve learned developing and optimizing several web sites. I used to work at Intermix (creators of MySpace) working on grab.com, a site for casual gaming.
One of the challenges of running large scale distributed systems is being able to pinpoint problems. It’s all too common to blame a random component (usually a database) whenever there’s a hiccup even when there’s no evidence to support the claim. We’ve already discussed the importance of monitoring tools, graphing and alerting metrics, and using distributed tracing systems like ZipKin to correctly identify the source of a problem in a complex system.
This is our third post in our series on performance tuning with Apache Cassandra. In our first post, we discussed how we can use Flame Graphs to visually diagnose performance problems. In our second post, we discussed JVM tuning, and how the different JVM settings can have an affect on different workloads. In this post, we’ll dig into a table level setting which is usually overlooked: compression. Compression options can be specified when creating or altering a table, and it defaults to enabled if not specified.
One of the big challenges people face when starting out working with Cassandra and time series data is understanding the impact of how your write workload will affect your cluster. Writing too quickly to a single partition can create hot spots that limit your ability to scale out. Partitions that get too large can lead to issues with repair, streaming, and read performance. Reading from the middle of a large partition carries a lot of overhead, and results in increased GC pressure.
In this post we’ll explore a new compaction strategy available in Apache Cassandra. We’ll dig into it’s use cases, limitations, and share our experiences of using it with various production clusters. Time Window Compaction Strategy : how does it work and when should you use it ? Cassandra uses a Log Structured Merge Tree engine, which allows high write throughput by flushing immutable chunks of data, in the form of SSTables, to disk and deferring consistency on the read phase.
In our first post about TimeWindowCompactionStrategy, Alex Dejanovski discussed use cases and the reasons for its introduction in 3.0.8 as a replacement for DateTieredCompactionStrategy. In our experience switching production environments storing time series data to TWCS, we have seen the performance of many production systems improve dramatically. The examples Alex gives for making use of TWCS work great for recent versions of Cassandra. However, a significant number of users are still using 2.
Compaction in Apache Cassandra isn’t usually the first (or second) topic that gets discussed when it’s time to start optimizing your system. Most of the time we focus on data modeling and query patterns. An incorrect data model can turn a single query into hundreds of queries, resulting in increased latency, decreased throughput, and missed SLAs. If you’re using spinning disks the problem is magnified by time consuming disk seeks.