The last few months have been a non stop whirlwind of traveling and speaking. I’ve been very fortunate to have spoken at Strata New York, give a couple sessions at the Cassandra Summit, and even had a few minutes on stage for the Cassandra Summit keynote (I’m at minute 22 with Luke Tillman). When I have time, I end up hacking on random projects. For example, a couple months ago I was working on a recommendation engine for KillrVideo. I also end up playing with bleeding edge builds of Cassandra and Spark.
The downside to all this hacking is I tend to move between my projects very frequently. It’s time for a new long term project based on a past passion.
I’d like to introduce my newest project, KillrAnswers. KillrAnswers is a microservice written primarily in Rust, using DataStax Enterprise for Storage and Search.
If you’ve been programming for a while you’ve probably seen one of the StackOverflow sites. The format is simple at first glance - people ask questions, other people answer. Almost 10 years ago I worked on Answerbag.com, another Q&A site not specific to programmers. It turns out there’s plenty of complexity in Q&A, such as leaderboards, managing nested categories, and calculating various statistics. This project is my attempt to create a highly scalable, easy to deploy Q&A system that can be added to any application.
As I mentioned above, KillrAnswers is written as a microservice primarily in Rust. I’ll be using Python for the admin dashboards. For analytics, streaming, and machine learning I’ll be using PySpark. Graphs and visualizations will be rendered using Pandas, Matplotlib and Seaborn. I’ll likely use Bottle to serve the dashboard.
I’ve chosen Rust because it’s a great mix of safety and performance. I currently develop against nightlies which have been remarkably stable.
Instead of REST calls built on a web server, I’ve opted to use Capn Proto. The rust RPC library is on GitHub. I’m going down this route because it makes a lot more sense to have typed remote calls rather than something like REST + JSON which is typically managed through trial and error. Capn proto lets me specify a schema and interface, and easily generate client libraries in different languages.
For storage, I’m using DataStax Enterprise 4.8, which is based on Cassandra 2.1. The goal is to be able to handle potentially millions of requests per second across multiple data centers. I’ll be utilizing the search integration for question, answer, and category search, as well as question similarity search using term vectors.
I’ll be following up each of these individual topics with dedicated in depth blog posts as the project grows and starts to take shape.
KillrAnswers is open source and available on GitHub. There’s currently not much there, but in the weeks to come I’m expecting to get quite a bit added. Enjoy!