GETTING STARTED WITH PANDAS AND HDF5

Yesterday I was pulling down some stock data from Yahoo, with the goal of building out a machine learning training set using Spark and Cassandra. If you haven’t tried Cassandra yet, it’s a database built for high availability and linear scalability. I’ve got a intro talk up here. Spark is another apache project that kicks Cassandra into overdrive by providing a framework for...

NEW HOUSE, NEW DESK

When I moved out of my last place I decided it was time for a grown up desk. I left behind a beat down Ikea that I had used for close to a decade, I think it has more than served it’s purpose. Since I had recently gotten into woodworking, I figured this would be the perfect opportunity to build something awesome. I wish I had thought to take a picture of the old desk in all it’s (lack...

21 WAYS TO MINIMIZE EMPLOYEE RETENTION

It’s important to be able to maximize turnover and confusion while minimizing employee retention. This is by no mean an exhaustive list, but it will, without a doubt, be successful, unlike your business. Eliminate all privacy. Employees should feel like they’re being watched at all times. Ideally utilize an open floor plan, which can maximize distractions. If an open room isn’t...

CASSANDRA SUMMIT RECAP: DIAGNOSING PROBLEMS IN PRODUCTION

Introduction Last week at the Cassandra Summit I gave a talk with Blake Eggleston on diagnosing performance problems in production. We spoke to about 300 people for about 25 minutes followed by a healthy Q&A session. I’ve expanded on our presentation to include a few extra tools, screenshots, and more clarity on our talking points. There’s finally a lot of material available for...

SAY HELLO TO MEATBOT

What is Meatbot? Meatbot is a HipChat bot for managing status updates for our growing team of Evangelists at DataStax. It’s built in Python 2.7, utilizing the Will library. The status updates are stored in Cassandra using cqlengine. Yep, it’s up on github. There’s a few simple commands. First, you tell Meatbot about each project you work on. Once you’ve got your projects,...

PYTHON FOR PROGRAMMERS

When I started learning Python, there’s a few things I wish I had known about. It took a while to learn them all. This is my attempt to compile the highlights into a single post. This post is targeted towards experienced programmers just getting started with Python who want to skip the first few months of researching the Python equivalents of tools they are already used to. The sections on...

THE MYTH OF SCHEMA-LESS

I have grown increasingly frustrated with the world as people have become more and more convinced that “schema-less” is actually a feature to be proud of (or even exists). For over ten years I’ve worked with close to a dozen different databases in production and have not once seen “schemaless” truly manifest. What’s extremely frustrating is seeing this from...

CQLENGINE INTRO POSTED ON YOUTUBE

CQLENGINE NOW USING THE PYTHON NATIVE DRIVER

I’m happy to announce that cqlengine is now using the Python Native Driver. For the most part, this should be a trivial upgrade. See the notes below on upgrading. The Good News Significantly less code to maintain in cqlengine itself. We no longer need to maintain connection pools, deal with fail over, dead servers, server discovery, server removal Native driver multiplexes queries over each...

NO DOWNTIME DATABASE MIGRATIONS

Introduction Back at my last job, we successfully migrated from MongoDB to Cassandra without any downtime. We did two webinars with Datastax at the time (I am now a Datastax employee). Our first webinar was a general overview on the migration. The second, we covered some of the lessons we learned after being in production with Cassandra for a while. We touched on our migration process, but...