One of the problems of learning a new database is getting used to a new way of data modeling. PostgreSQL looks different from Redis, which is different from a graph, and is different from Cassandra.

Cassandra Dataset Manager aims to reduce the time spent in a frustrating trial and error process trying to learn proper data modeling techniques for Apache Cassandra and Datastax Enterprise by providing curated data models which have been designed by professionals with years of experience. Think of it as a package manager for Cassandra data models and sample data.

First, install and start Cassandra or DataStax Enterprise. Cassandra Dataset Manager (abreviated cdm) is a Python package and can be installed from the PyPi as follows:

pip install cassandra-dataset-manager

Once cdm is installed, you’ll have a new command line tool, appropriately named cdm. You may now update the local CDM dataset list and install the movielens-small dataset:

cdm update
cdm install movielens-small

You should see a bit of helpful text showing progress as the data model is loaded and sample data inserted into the database.

Open the CQL shell and do the following:

cqlsh -k movielens-small

cqlsh:movielens_small> desc movies;

CREATE TABLE movielens_small.movies (
    id uuid PRIMARY KEY,
    avg_rating float,
    genres set<text>,
    name text,
    release_date date,
    url text,
    video_release_date date
)

cqlsh:movielens_small> select * from movies limit 1;

You can see the data model of the movies table as well as some of the test data. Each project will describe the data model in its README so it’s easy to understand what you’re looking at. Tutorials written in Jupyter or Zeppelin notebooks will be available for each dataset, showing different ways of working with the underlying data.

Documentation is coming along and will receive the majority of my attention over the next week.

CDM is developed under the permissive Apache License and is fully open source, hosted on GitHub.