Replication for fault tolerance
Description
In this architecture, there are five servers configured. Two are used to host copies of the data. The other three servers are used to coordinate the replication of data. With this example, we'll create a database and table that will be replicated across both data nodes using the ReplicatedMergeTree table engine.
Level: Basic
Terminology
Replica
A copy of data. ClickHouse always has at least one copy of your data, and so the minimum number of replicas is one. This is an important detail, you may not be used to counting the original copy of your data as a replica, but that is the term used in ClickHouse code and documentation. Adding a second replica of your data provides fault tolerance.
Shard
A subset of data. ClickHouse always has at least one shard for your data, so if you do not split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the sharding key, and is defined when you create the distributed table. The sharding key can be random or as an output of a hash function. The deployment examples involving sharding will use rand()
as the sharding key, and will provide further information on when and how to choose a different sharding key.
Distributed coordination
ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.
Environment
Architecture Diagram
Node | Description |
---|---|
clickhouse-01 | Data |
clickhouse-02 | Data |
clickhouse-keeper-01 | Distributed coordination |
clickhouse-keeper-02 | Distributed coordination |
clickhouse-keeper-03 | Distributed coordination |
In production environments, we strongly recommend using dedicated hosts for ClickHouse keeper. In test environment it is acceptable to run ClickHouse Server and ClickHouse Keeper combined on the same server. The other basic example, Scaling out, uses this method. In this example we present the recommended method of separating Keeper from ClickHouse Server. The Keeper servers can be smaller, 4GB RAM is generally enough for each Keeper server until your ClickHouse Servers grow very large.