Just wanted to let everyone know I’m going to be doing a Google Hangout on Air on Thursday, 2pm PT / 5PM ET on Python Performance Profiling. I’m going to be covering several tools and exposing a variety of ways of understanding your applications. You can RSVP on the event page.
I’ll be answering Q&A along the way so be sure to have your questions ready and upvote the ones you find useful!
Posts
- I’ve been messing with Apache Spark quite a bit lately. If you aren’t familiar, Spark is a general purpose engine for large scale data processing. Initially it comes across as simply a replacement for Hadoop, but that would be selling it short. Big time. In addition to bulk processing (goodbye MapReduce!), Spark includes: SQL engine Stream processing via Kafka, Flume, ZeroMQ Machine Learning Graph Processing Sounds awesome, right? That’s because it is, babaganoush.
- The webinar from Nov 18, Diagnosing Problems in Production, has been posted to YouTube. I’ve embedded it at the bottom of this post. The webinar is an extended version of the talk I gave at the Cassandra Summit with Blake Eggleston, which I recapped in my blog as well. I had almost double the time to talk in the webinar and so I was able to go into more detail
- Yesterday I was pulling down some stock data from Yahoo, with the goal of building out a machine learning training set using Spark and Cassandra. If you haven’t tried Cassandra yet, it’s a database built for high availability and linear scalability. I’ve got a intro talk up here. Spark is another apache project that kicks Cassandra into overdrive by providing a framework for batch analytics, streaming, and machine learning. On the way is support for graph operations which makes me giddy.
- When I moved out of my last place I decided it was time for a grown up desk. I left behind a beat down Ikea that I had used for close to a decade, I think it has more than served it’s purpose. Since I had recently gotten into woodworking, I figured this would be the perfect opportunity to build something awesome. I wish I had thought to take a picture of the old desk in all it’s (lack of) glory, but alas, you’ll just have to imagine a desk that’s on the verge of collapse.
- It’s important to be able to maximize turnover and confusion while minimizing employee retention. This is by no mean an exhaustive list, but it will, without a doubt, be successful, unlike your business. Eliminate all privacy. Employees should feel like they’re being watched at all times. Ideally utilize an open floor plan, which can maximize distractions. If an open room isn’t available, cram as many people into small rooms as humanly possible.
- Introduction Last week at the Cassandra Summit I gave a talk with Blake Eggleston on diagnosing performance problems in production. We spoke to about 300 people for about 25 minutes followed by a healthy Q&A session. I’ve expanded on our presentation to include a few extra tools, screenshots, and more clarity on our talking points. There’s finally a lot of material available for someone looking to get started with Cassandra. There’s several introductory videos on YouTube by both me and Patrick McFadin as well as videos on time series data modeling.
- What is Meatbot? Meatbot is a HipChat bot for managing status updates for our growing team of Evangelists at DataStax. It’s built in Python 2.7, utilizing the Will library. The status updates are stored in Cassandra using cqlengine. Yep, it’s up on github. There’s a few simple commands. First, you tell Meatbot about each project you work on. Once you’ve got your projects, you can list them with lsproject or delete them with rmproject.
- When I started learning Python, there’s a few things I wish I had known about. It took a while to learn them all. This is my attempt to compile the highlights into a single post. This post is targeted towards experienced programmers just getting started with Python who want to skip the first few months of researching the Python equivalents of tools they are already used to. The sections on package management and standard tools will be helpful to beginners as well.
- I have grown increasingly frustrated with the world as people have become more and more convinced that “schema-less” is actually a feature to be proud of (or even exists). For over ten years I’ve worked with close to a dozen different databases in production and have not once seen “schemaless” truly manifest. What’s extremely frustrating is seeing this from vendors, who should really know better. At best, we should be using the description “provides little to no help in enforcing a schema” or “you’re on your own, good luck.