Accessing Private Variables in the JVM

In this I’ll discuss a uncommonly used but useful technique of accessing variables and methods which have been declared as private in the JVM, using the Apache Commons Lang library to work around the restriction. The description from the project page reads:

The standard Java libraries fail to provide enough methods for manipulation of its core classes. Apache Commons Lang provides these extra methods.

A couple weeks ago I was working on a project that required parsing some CQL statements. There isn’t a standard parser separate from the Cassandra project at the moment, so I decided to pull in the entirety of cassandra-all from maven central. The parser in Cassandra isn’t really designed to be used as a library. In particular, the org.apache.cassandra.cql3.QueryProcessor has a parseStatement(String) call, but the ParsedStatement that’s returned doesn’t expose any of the private variables via getters. I felt particularly determined for some reason, so I decided to investigate a workaround.

I’ve created a small sample project to demonstrate how to gain access to variables marked as private. To keep things concise, I wrote the sample using Kotlin, but you could just as easily use Java, Scala, or whatever your favorite JVM language is.

The example I’ll be referencing is small enough to fit in a single file. The source code can be found here

Cassandra’s Parser Internals

For my example, I’ll be parsing this CREATE TABLE statement and gaining access to the private variables:

val query = """CREATE TABLE mytable (id int,
                        |cluster timeuuid,
                        |another_cluster timeuuid,
                        |val text,
                        |another map<int, int>,
                        |primary key(id, cluster, another_cluster))
                        |WITH CLUSTERING 
                        |ORDER BY (cluster DESC, 
                                   another_cluster ASC)""".trimMargin()

Near the top of the sample program, this line parses a query:

val parsed = QueryProcessor.parseStatement(query)

The object that’s returned is of type CreateTableStatement. We’ll want to access the following private variables, as can be seen in the debugger:

  • definitions, for the specifications of types
  • keyAliases, for the partition keys
  • columnAliases, for the clustering keys

We’ll need to use reflection to give us access. Fortunately the Apache Commons Lang library makes this trivial. We begin by importing the FieldUtils:

import org.apache.commons.lang3.reflect.FieldUtils

Next, we grab the field out of the class using FieldUtils.getField(), and use that to access the private variable, overriding the access limitation.

val f  =  FieldUtils.getField(parsed.javaClass, "definitions", true)
val fields = FieldUtils.readField(f, parsed, true) as HashMap<*, *>
for(f in fields) {
    println("Field: ${f.key}, Type: ${f.value}")
}

The result of running this code prints out the field name and type (technically it’s of type org.apache.cassandra.cql3.CQL3Type.Raw, but I think it gets the point across):

Field: cluster, Type: timeuuid
Field: val, Type: text
Field: another, Type: map<int, int>
Field: id, Type: int
Field: another_cluster, Type: timeuuid

In the sample program I also demonstrate how to access the fields of the primary key, but I won’t list that here (it’s more of the same code). The result of the full program can be seen below:

Running the parser, yay
CREATE TABLE mytable (id int,
cluster timeuuid,
another_cluster timeuuid,
val text,
another map<int, int>,
primary key(id, cluster, another_cluster))
WITH CLUSTERING
ORDER BY (cluster DESC,
          another_cluster ASC)
19:59:42.175 [main] INFO  o.a.cassandra.cql3.QueryProcessor - Initialized prepared statement caches with 0 MB (native) and 0 MB (Thrift)
Field: cluster, Type: timeuuid
Field: val, Type: text
Field: another, Type: map<int, int>
Field: id, Type: int
Field: another_cluster, Type: timeuuid
Primary keys
Partition key: [id]
Clustering key: cluster.
Clustering key: another_cluster.

At this point you should be able to access stuff you couldn’t before. Generally speaking, this should be a last resort. When something is marked private, there’s an implicit warning the underlying code might change and it’s not safe to mess with. I strongly suggest finding almost any workaround other than using this technique!

If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.