I started to remember why did I start blogging. The answer is “to collect my experience and share it publicly”. So, I’ll keep this going and now I want to remember how to upgrade the Cassandra cluster smoothly with no downtime.
At first sight it should be obvious. Cassandra is a distributed storage and you’re able to upgrade each node independently. But also it’s a kind of tricky, because Cassandra has so many concepts and moving parts. Introducing such a major change, you’ll be probably excited about how not to break one.
Also, as with every DB upgrade, the most important outcome will be your app behaviour. Protocol versions support might be removed from the future versions. Storage might work another way application doesn’t expect. There might be a lot of pitfalls. So, to start getting the benefits of upgrade, we have to be 200% sure that the application works. And at least it won’t work worse with database.
2.x → 3.0 algorithm
- Stop Cassandra. Every time before stopping, use nodetool drain command to flush the data to SSTables and drain all connections. Only after that stop the process.
- Backup configuration files. Cassandra configs of different versions will vary much.
- Remove 2.x packages. Don’t be scared, you’ll remove only binaries, not the data itself.
- Remove Datastax repo. Datastax is no longer supports the open-source Cassandra project. So, get rid of 2.x repository. Do it like your package manager suggests. Also you might already have a 2.x Cassandra from Apache repository, so drop it also.
- Setup Apache repo. You can do this following the instruction from the official site
- Install new Cassandra 3.0 packages. You’d be better to set up with cassandra the package cassandra-tools.
- Check the diff between configuration formats. Adjust your one to the new one. There must be a huge diff between versions. New parameters are introduced, deprecated parameters moved to the end of config and so on. Double check that you put the settings like listening address, seeds, switch format (really important!). So, finishing this, put your new config to your configuration management system (Ansible, Chef etc.).
- Start Cassandra. Wait until it will handshake with the other nodes on the cluster and it will be up on nodetool status.
- Upgrade SSTables. Put on the background command nodetool upgradesstables. That’s the most important step you must not forget. Even without upgrade it should work but with the old storage engine options. Only after SSTables renovation you will feel what does the new storage engine mean. The upgrade is going to take a lot of time in case of huge dataset. It also depends on your hardware. In my case I upgraded 1TB of SSTables on EC2 i3.2xlarge instance and it lasted for 12 hours which means half of day. The upgrade is a singlethreaded, so have a patience. To make it faster you can disable compaction throughput. Run the command nodetool setcompactionthroughput 0.
- Repeat the sequence with each node one by one. Hello, automation! I did it by Ansible playbook, appending the tasks with maintenance tag to the cassandra role. Automate any way for sure.
3.0 → 3.x algorithm
The principle is the same as with upgrade from 2.x version. I did it from 3.0 to 3.11 version. There only two important things:
- Now there is no version between 3.0 and 3.11 available in Apache repository. So, it means you don’t need to go through all minor versions, just upgrade to the latest one. At the time you’ll see this post, there will be never version than 3.11. So, upgrade directly to this one.
- You don’t need to upgrade SSTables, hooray! The storage engine remained the same and you can quickly iterate between nodes.
So, what about the steps exactly?
- Stop Cassandra
- Backup configuration files
- Remove 3.0 packages
- Change Apache repo file/setting, putting the new version. Just look at the docs again.
- Install new 3.11 packages
- Check the diff between configuration formats. Adjust your one to the new one.
- Start Cassandra
- Repeat the sequence with each node one by one.
What is the profit of upgrade?
- There is no breaking changes. Almost. The single one is that the 3.x Cassandra versions dropped the native protocol v1 and v2 support. Probably, you’ll be fine with a application driver upgrade. Generally, your application should work as usual.
- The Storage Engine has been refactored. In practice it means increased read/write performance and greatly decreased storage space allocation. In my case after SSTables upgrade has been cut down by 30%(!!). I think that’s the awesome step forward.
- Development features like:
- materialized views handle automated server-side denormalization, with consistency between base and view data;
- JSON support;
- Support for user-defined functions to allow simple server-side calculations and aggregations
- An, of course, support terms prolongation. Who wants to use the software which no one will answer in bug tracker?
Can I upgrade from 2.x to 3.x directly with no intermediate upgrade to 3.0?
I think you can try, but doing the upgrade step by step allows you observe the problems much faster. Because the changelog between old and new versions at this point will be much smaller.
Can I run upgrade nodes in parallel?
You can, but to make sure that you won’t corrupt the data and its availability. Please, check the vnodes distribution between your instances. Now keep in mind your replication factor count on keyspaces. I didn’t do this, because in my team we have an upset experience of parallelizing upgrade.
How can I see the progress of SSTables upgrade?
That’s pretty straightforward. Although you won’t see the progress bar anywhere, you can check the number of files with the specific extension in your data directory.
To track the number of new storage files (should be almost equal to count it completed)
watch -n 5 'find . -name '*.db' |wc -l ; find . -name '*mc*.db' |wc -l'
To track the number of old storage files (last one should be zero to count it completed)
watch -n 5 'find . -name '*.db' |wc -l ; find . -name '*la*.db' |wc -l'
Is it actually easy?
Yeap, though takes a lot of time. I’ve actually done this task in the background of my workflow. If you have the time limitations or hundreds of nodes in the single cluster – please, skip this solution?
Why do you write about such an obvious stuff? Are you so stupid?
Maybe I am. I want to lead my blog, leave here what I experienced and that’s all. It’s not a space shuttle design which can attract dozens of geeks. Just about my life and engineering.