AWS Open Source Blog

Code and beyond: How we contribute to the Apache Cassandra community

Amazon Managed Apache Cassandra Service.

AWS was built for running open source. Since AWS launched in 2006 we have contributed to a broad variety of open source software projects, from Redis to Linux to Apache Lucene to Kubernetes, and will continue to do so as we seek to help our customers. Code, however, is not always our only contribution, and sometimes it’s not even our best contribution. Cloud can be a force multiplier for open source by making projects accessible to more developers, plus easier to build and run in production, freeing up engineering time for other contributions.

In launching Amazon Managed Apache Cassandra Service – a scalable, highly available, and managed Cassandra-compatible database service – we are responding to customer requests to ease the burden associated with self-managing Cassandra, so that you can focus on writing CQL (Cassandra Query Language) application code. But we also expect that this service will contribute to Cassandra by helping to grow its community in ways directly in line with the operating principles of The Apache Software Foundation, as well as enabling, supporting, and making active contributions to the Cassandra code base.

In open source, code matters. But so does the operation of that code. We will be contributing both to Cassandra.

Dynamo origins

Amazon helped spark the end of the one-size-fits-all-relational-database era many years ago. As Amazon CTO Werner Vogels describes, in 2004 Amazon’s retail operations depended on Oracle Enterprise Edition, yet “We were pushing [Oracle’s] limits…and were unable to sustain the availability, scalability, and performance needs that our growing Amazon business demanded.” The company assembled “a small group of distributed systems experts…and designed a horizontally scalable distributed database that would scale out for both reads and writes to meet the long-term needs of our business.” This was the genesis of Amazon Dynamo, and it worked so well that in 2007 the company published Dynamo: Amazon’s Highly Available Key-value Store. Five years later, we launched DynamoDB, a fully managed, multi-region, multi-master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications.

Inspired by the Dynamo paper, engineers at Facebook developed Cassandra. In 2009, Facebook contributed Cassandra to the Apache Software Foundation, and Apache Cassandra was born. Despite this common DNA and a common goal of storing key-value data at massive scale, the two databases took different approaches to APIs and storage architectures, among other things.

Growing the community of happy Cassandra users

Today we have many customers who run Cassandra using Amazon EC2, some of whom discover Cassandra through the AWS Marketplace. A sizeable number have asked us to develop a Cassandra-compatible database service that scales as easily as DynamoDB while allowing them to continue to program using CQL, making it easier for SQL-savvy developers to quickly and easily make the jump to non-relational databases. With Amazon Managed Apache Cassandra Service, you can run your Cassandra workloads on AWS using the same Cassandra application code, Apache 2.0 licensed drivers, and tools as you use today.

This is the first area in which we are contributing to Cassandra.

By removing the operational burden from developers building Cassandra applications, we hope to grow Cassandra’s popularity. By launching a Cassandra-compatible service, we bring the speed of deployment, scalability, and availability benefits associated with AWS, as well as enterprise features like encryption and access management (IAM). In addition, Amazon Managed Apache Cassandra Service demonstrates the long-term commitment that AWS is making to the Cassandra API and the associated community of developers. Finally, AWS offering a Cassandra-compatible service and investing in developer evangelism will help build awareness for Cassandra.

Thanks to this new service, we also expect that Cassandra code contributors will be able to devote more time to higher-value innovation like adding new CQL features and capabilities, rather than having to focus efforts on adding undifferentiated enhancements to functional areas like management, compaction, garbage collection, etc.

The Apache Software Foundation is unique in its approach to building great open source software, focusing on community above code:

“Being a committer does not necessarily mean you commit code, it means you are committed to the project and are productively contributing to its success.”

For Cassandra, we take this philosophy to heart. By launching a managed Cassandra-compatible service, we are firmly committed to the project. Our customers who use Amazon Managed Apache Cassandra Service will be the most successful if Cassandra has a vibrant community of users and developers. With this commitment to the project, code contributions naturally follow.

Collaborating on code

While there are promising areas like the pluggable storage engine proposed by engineers at Instagram as a way to lower JVM overhead, there are also more basic types of investments we can make today that can significantly help Cassandra developers.

For example, in addition to programming with CQL, developers love the Cassandra API. As we work with the Cassandra API libraries, we will contribute bug fixes. We will also improve the developer experience of building applications on Cassandra. One example is built-in support for AWS authentication (SigV4), which will simplify and streamline managing credentials for customers running Cassandra on Amazon EC2, because Amazon EC2 and AWS IAM can automatically handle distribution and management of credentials using instance roles. As we do this work to help AWS customers, we want to extend similar benefits to all.

Today we are also announcing $100,000 in initial funding of AWS promotional credits for testing Cassandra-related applications, and hope that you will apply. Our open source credits program, announced earlier this year, has already enabled us to help support a number of open source foundations and projects. Typically, these credits are used to perform upstream and performance testing, CI/CD, or storage of artifacts on AWS. We hope this program will free up resources for Cassandra developers to further expand and innovate.

Everyone benefits from a stronger Cassandra community. We are excited to contribute to the Cassandra project by making it easier for users to tap into the power of this great database, even as we also contribute code. We look forward to working closely with the Cassandra community to make Cassandra even better.

Interested in working with our team to help build Amazon Managed Apache Cassandra Service and contribute to Cassandra in meaningful ways? We’re hiring and would love to hear from you.

Matt Asay

Matt Asay

Matt Asay (pronounced "Ay-see") has been involved in open source and all that it enables (cloud, machine learning, data infrastructure, mobile, etc.) for nearly two decades, working for a variety of open source companies and writing regularly for InfoWorld and TechRepublic. You can follow him on Twitter (@mjasay).