cassandra

Introduction to Spark & Cassandra

Jan 2, 2015
I’ve been messing with Apache Spark quite a bit lately. If you aren’t familiar, Spark is a general purpose engine for large scale data processing. Initially it comes across as simply a replacement for Hadoop, but that would be selling it short. Big time. In addition to bulk processing (goodbye MapReduce!), Spark includes: SQL engine Stream processing via Kafka, Flume, ZeroMQ Machine Learning Graph Processing Sounds awesome, right? That’s because it is, babaganoush.

READ MORE
Diagnosing Problems in Production Webinar Posted

Nov 20, 2014
The webinar from Nov 18, Diagnosing Problems in Production, has been posted to YouTube. I’ve embedded it at the bottom of this post. The webinar is an extended version of the talk I gave at the Cassandra Summit with Blake Eggleston, which I recapped in my blog as well. I had almost double the time to talk in the webinar and so I was able to go into more detail

READ MORE
Getting Started With Pandas and HDF5

Nov 15, 2014
Yesterday I was pulling down some stock data from Yahoo, with the goal of building out a machine learning training set using Spark and Cassandra. If you haven’t tried Cassandra yet, it’s a database built for high availability and linear scalability. I’ve got a intro talk up here. Spark is another apache project that kicks Cassandra into overdrive by providing a framework for batch analytics, streaming, and machine learning. On the way is support for graph operations which makes me giddy.

READ MORE
Cassandra Summit Recap: Diagnosing Problems in Production

Sep 18, 2014
Introduction Last week at the Cassandra Summit I gave a talk with Blake Eggleston on diagnosing performance problems in production. We spoke to about 300 people for about 25 minutes followed by a healthy Q&A session. I’ve expanded on our presentation to include a few extra tools, screenshots, and more clarity on our talking points. There’s finally a lot of material available for someone looking to get started with Cassandra. There’s several introductory videos on YouTube by both me and Patrick McFadin as well as videos on time series data modeling.

READ MORE
CQLEngine Intro Posted on Youtube

Jun 26, 2014
READ MORE
CQLEngine now using the Python Native Driver

Jun 24, 2014
I’m happy to announce that cqlengine is now using the Python Native Driver. For the most part, this should be a trivial upgrade. See the notes below on upgrading. The Good News Significantly less code to maintain in cqlengine itself. We no longer need to maintain connection pools, deal with fail over, dead servers, server discovery, server removal Native driver multiplexes queries over each socket, so less sockets stay open Notifications can be sent back to the client from the server.

READ MORE
No Downtime Database Migrations

Jun 23, 2014
Introduction Back at my last job, we successfully migrated from MongoDB to Cassandra without any downtime. We did two webinars with Datastax at the time (I am now a Datastax employee). Our first webinar was a general overview on the migration. The second, we covered some of the lessons we learned after being in production with Cassandra for a while. We touched on our migration process, but didn’t get deep into the details.

READ MORE
Cassandra FAQ: Can I start with a Single Node?

Sep 18, 2013
A frequently asked question on the mailing list by developers new to Cassandra is if it’s possible to start with a single node and scale up as their needs grow. This seems to come most often from people familiar with MySQL, Mongo, or another database which uses replication to scale reads. The short answer to this question is yes, you can absolutely run a one node cluster. However, it’s important to understand the caveats of doing so.

READ MORE
What's new in cqlengine 0.7

Aug 31, 2013
Recently we released version 0.7 of cqlengine, the Python object mapper for CQL3. We’ve been steadily moving towards full support of all of CQL3 for both queries and for table configuration. This post will outline the new features and provide examples on how to use them. Counters With counter support finally included it’s now possible to create and use tables with counter columns. They are exposed to the Python application as simple integers, and changes to their values will be sent as deltas to Cassandra.

READ MORE
Cassandra, CQL3, and Time Series Data with timeuuid

Oct 2, 2012
Cassandra is a BigTable inspired database created at Facebook. It was open sourced several years ago and is now an Apache project. In cassandra, a row can be very wide and is identified by a key. Think of it as more like a giant array. The data is stored on disk sorted by the key you pick, meaning if you pick the right sort option and key you can have some really fast queries.

READ MORE