CockroachDB

By Cockroach Labs

Certified enterprise ready

CockroachDB is a cloud-native, distributed SQL database. It implements a standard, developer-friendly SQL interface and provides linear, automated scale for your data without the labor intensive overhead of manual sharding.

Software version

21.1

Runs on

OpenShift 4.6

Delivery method

Operator

Highlights

Capabilities level

Last updated

11/8/2023, 09:39 AM

FAQs

CockroachDB is a distributed SQL database built on a transactional and strongly-consistent key-value store. It scales horizontally; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; supports strongly-consistent ACID transactions; and provides a familiar SQL API for structuring, manipulating, and querying data.

CockroachDB is inspired by Google's Spanner and F1 technologies, and the source code is freely available.
CockroachDB is well suited for applications that require reliable, available, and correct data, and millisecond response times, regardless of scale. It is built to automatically replicate, rebalance, and recover with minimal configuration and operational overhead. Specific use cases include:

Distributed or replicated OLTP
Multi-datacenter deployments
Multi-region deployments
Cloud migrations
Cloud-native infrastructure initiatives
CockroachDB returns single-row reads in 2ms or less and single-row writes in 4ms or less, and supports a variety of SQL and operational tuning practices for optimizing query performance. However, CockroachDB is not yet suitable for heavy analytics / OLAP.
CockroachDB scales horizontally with minimal operator overhead. You can run it on your local computer, a single server, a corporate development cluster, or a private or public cloud. Adding capacity is as easy as pointing a new node at the running cluster.

At the key-value level, CockroachDB starts off with a single, empty range. As you put data in, this single range eventually reaches a threshold size (64MB by default). When that happens, the data splits into two ranges, each covering a contiguous segment of the entire key-value space. This process continues indefinitely; as new data flows in, existing ranges continue to split into new ranges, aiming to keep a relatively small and consistent range size.

When your cluster spans multiple nodes (physical machines, virtual machines, or containers), newly split ranges are automatically rebalanced to nodes with more capacity. CockroachDB communicates opportunities for rebalancing using a peer-to-peer gossip protocol by which nodes exchange network addresses, store capacity, and other information.
CockroachDB is designed to survive software and hardware failures, from server restarts to datacenter outages. This is accomplished without confusing artifacts typical of other distributed systems (e.g., stale reads) using strongly-consistent replication as well as automated repair after failures.

Replication

CockroachDB replicates your data for availability and guarantees consistency between replicas using the Raft consensus algorithm, a popular alternative to Paxos. You can define the location of replicas in various ways, depending on the types of failures you want to secure against and your network topology. You can locate replicas on:

Different servers within a rack to tolerate server failures
Different servers on different racks within a datacenter to tolerate rack power/network failures
Different servers in different datacenters to tolerate large scale network or power outages
In a CockroachDB cluster spread across multiple geographic regions, the round-trip latency between regions will have a direct effect on your database's performance. In such cases, it is important to think about the latency requirements of each table and then use the appropriate data topologies to locate data for optimal performance and resiliency. For a step-by-step demonstration, see Low Latency Multi-Region Deployment.

Automated Repair

For short-term failures, such as a server restart, CockroachDB uses Raft to continue seamlessly as long as a majority of replicas remain available. Raft makes sure that a new “leader” for each group of replicas is elected if the former leader fails, so that transactions can continue and affected replicas can rejoin their group once they’re back online. For longer-term failures, such as a server/rack going down for an extended period of time or a datacenter outage, CockroachDB automatically rebalances replicas from the missing nodes, using the unaffected replicas as sources. Using capacity information from the gossip network, new locations in the cluster are identified and the missing replicas are re-replicated in a distributed fashion using all available nodes and the aggregate disk and network bandwidth of the cluster.
CockroachDB guarantees serializable SQL transactions, the highest isolation level defined by the SQL standard. It does so by combining the Raft consensus algorithm for writes and a custom time-based synchronization algorithms for reads.

Stored data is versioned with MVCC, so reads simply limit their scope to the data visible at the time the read transaction started.

Writes are serviced using the Raft consensus algorithm, a popular alternative to Paxos. A consensus algorithm guarantees that any majority of replicas together always agree on whether an update was committed successfully. Updates (writes) must reach a majority of replicas (2 out of 3 by default) before they are considered committed.

To ensure that a write transaction does not interfere with read transactions that start after it, CockroachDB also uses a timestamp cache which remembers when data was last read by ongoing transactions.

This ensures that clients always observe serializable consistency with regards to other concurrent transactions.
The CAP theorem states that it is impossible for a distributed system to simultaneously provide more than two out of the following three guarantees:

Consistency
Availability
Partition Tolerance
CockroachDB is a CP (consistent and partition tolerant) system. This means that, in the presence of partitions, the system will become unavailable rather than do anything which might cause inconsistent results. For example, writes require acknowledgements from a majority of replicas, and reads require a lease, which can only be transferred to a different node when writes are possible.

Separately, CockroachDB is also Highly Available, although "available" here means something different than the way it is used in the CAP theorem. In the CAP theorem, availability is a binary property, but for High Availability, we talk about availability as a spectrum (using terms like "five nines" for a system that is available 99.999% of the time).

Being both CP and HA means that whenever a majority of replicas can talk to each other, they should be able to make progress. For example, if you deploy CockroachDB to three datacenters and the network link to one of them fails, the other two datacenters should be able to operate normally with only a few seconds' disruption. We do this by attempting to detect partitions and failures quickly and efficiently, transferring leadership to nodes that are able to communicate with the majority, and routing internal traffic away from nodes that are partitioned away.