Introduction

Building a web app relying on database calls with CPython (the standard Python distribution) is pretty easy, but can suffer from performance problems. Python itself isn’t particularly fast, and in 2.x, it’s concurrency story is especially weak.

For starters, there’s the dreaded GIL. The GIL prevents us from taking advantage of multi core systems, so even if we use try to use threads we’re missing out on their main performance benefit, which is parallel computation.

Some languages that lack concurrency via threading, such as Javascript, have used asynchronous I/O to improve performance in applications which frequently rely on disk or network. Sadly in Python 2.7 there’s no native support for async I/O, which makes us angry.

In this post I’ll show you how to maximize the performance of your Python applications by leveraging asynchronous network calls with Apache Cassandra, a distributed database, which scales to hundreds of nodes, for massive scalability.

How does this affect us in the real world?

First off, when we talk to a database from our application, lots of time gets spent in I/O. In my experience this is where most of the time is spent in most applications. If our app is just sitting around waiting on I/O, that time is wasted. We’re not utilizing CPU at all.

Python’s threading, GIL, and lack of async code in Python 2.7 can be very frustrating. Fortunately, there’s been some great projects over the years that have either patched async calls into Python or been a totally separate distribution entirely. For instance, two alternative distributions, IronPython and Jython, don’t have the GIL and it’s associated problems. There’s also Stackless Python, which manages microthreads at the interpreter level, avoiding heavy weight OS threads, in addition to other features.

My favorite part of Stackless is it’s threading. Fortunately, someone else liked the idea enough to bring a similar concept to CPython, green threads, available in the Greenlet library. Someone else loved Greenlets enough to write an awesome wrapper for them called gevent that makes about a million things easier.

At a simple level, gevent lets us easily create green threads that will automatically yield control to other green threads when an I/O event happens. This lets us take full advantage of a single CPU. We can run multiple application instances, 1 per core, to take advantage of multiple CPUs. Good times.

In order to use gevent, we have to import it ASAP and make sure we monkey.patch_all() in order to make sure all our I/O calls yield automatically. Here’s our first 2 lines:

from gevent import monkey
monkey.patch_all()

Let’s take a look at how to take advantage of 2 features in the gevent library when working with the object mapper for Cassandra, cqlengine (now included with the driver).

First let’s build out a simple database model. We’ll go with a simple User table, having an id and a name. We’ll also sync it to the database as part of our test script. For brevity, I’ll be leaving imports and certain basic house keeping out of my blog post but you can reference the final script to see how everything is done.

class User(Model):
    __keyspace__ = "test"
    __table_name__ = "user"

    user_id = UUID(primary_key=True)
    first_name = Text()
    last_name = Text()

create_keyspace_simple("test", replication_factor=1)
drop_table(User)
sync_table(User)

Creating a user row using synchronous calls to insert records is trivial, let’s take a look. I’m using the sweet Faker library to generate fake names, I’ve initialized it at the top of my test script.

user = User.create(user_id=uuid4(),
            first_name=faker.first_name(),
            last_name=faker.last_name())
print "user: ", user

We get back our new user:

user:  User <user_id=ceffe0f7-9fe9-4d02-9408-486e5f92f5d4>

What if we want to do this in an asynchronous fashion, and not wait for the result? We can use gevent.spawn(), which returns a future, which we can get the results of later on. Let’s take a look at this code:

future = gevent.spawn(User.create, user_id=uuid4(),
                                   first_name=faker.first_name(),
                                   last_name=faker.last_name())
print "Future: ", future
print "Future result: ", future.get()
Future:  <Greenlet at 0x10c4fb910: <bound method ModelMetaClass.create of <class '__main__.User'>>(first_name=u'Christian', last_name=u'Nolan', user_id=UUID('3b833b09-906b-4db8-ad0b-32a4065062f2'))>

Future result:  User <user_id=3b833b09-906b-4db8-ad0b-32a4065062f2>

Here you can see a call to spawn returns a future, which can block to get a result. Pretty useful. What if we want to create a bunch of records? How do we manage our green threads? You can imagine it’s a little bit more housekeeping to keep a bunch of futures around and wait on all of them to return. This is where the gevent.pool module comes in handy. I’m using a simple Timer class I’ve defined to automatically. Let’s take a look at creating 1000 rows using synchronous code:

num = 1000
with Timer("create users sync"):
    for x in range(num):
        # print "Creating user {}".format(x)
        User.create(user_id=uuid4(),
                    first_name=faker.first_name(),
                    last_name=faker.last_name())

The output:

create users sync: 0.686779975891s

In the above example, I’ve created a thousand users synchronously, waiting on the result of each local database call. In the next example, I’ll tweak my code to use a Pool. A pool is cool because it manages my green threads for me behind a functional concept, a map(). A map applies the function to each of the arguments in a list and returns a new list.

pool = Pool(100)
with Timer("create users async"):
    def create_user(i):
        User.create(user_id=uuid4(),
                    first_name=faker.first_name(),
                    last_name=faker.last_name())

    pool.map(create_user, range(num))

create users async: 0.630080938339s

Wait a sec, what happened here? Well, I’m running this test on my laptop, and so we’re really not seeing much network latency. The calls to Cassandra take such a small amount of time we see almost no performance benefit. I’ll rerun my tests, this time adding a time.sleep call before creating the user. The gevent.monkey patches time.sleep for us, so anytime I call that function control will yield to another thread. Here’s the results of my test using 4 different latency times:

[jtable] Latency,Sync Time,Async Time w/ Pool .001, 1.9114, 0.676971 .005, 6.4570, 0.63234 .010, 11.89398, 0.62965 .100, 104.49747, 1.27341 [/jtable]

You can see how adding latency drastically affects the syncronous loop, but has less of an affect on the Pool. In fact, increasing the latency by 100x only affected the total pool time by 2x.

Why not move to Python 3.x?

Honestly I lack a strong motivation. I don’t have anything against Python 3, and there’s good reasons to switch over, but there’s a lot of apps that are on 2.7 and can’t move over yet, so the blind answer “move to 3.x” just isn’t in the cards for a lot of folks. I know people who are still on Python 2.6, which is ancient by today’s standards but it’s still widely used.

Fortunately, most code on the top 200 packages is Python 3 compatible, so it’s possible we’ll see a shift soon to teams using Python 3 by default. Gevent is listed as Python 3 compatible, but I believe it still uses libev event loop rather than the one provided by asyncio.

Hopefully at this point you’ll see the benefits of using gevent with your Python / Cassandra applications. In an I/O bound application you can greatly increase the performance by performing queries in parallel using spawn and Pool.map calls.