ClickHouse is a column store database management system (DBMS) for online analytical processing (OLAP). In most cases, ClickHouse is compatible with SQL standards. ClickHouse is developed by Yandex, the largest technology company that provides the largest search engine in Russia. ClickHouse uses various engine technologies at the underlying layer, and one of the highlights of ClickHouse is fast
ClickHouse supports basic data types and a wide array of complex data types. The examples of complex data types include the array, JSON, tuple, and set.
Data storage model
ClickHouse is a column store DBMS. In OLAP databases, a column store DBMS needs to read only the relevant columns in blocks and in a continuous way. A column store DBMS offers the following significant benefits:
(1) Provides a high compression ratio, because data of the same type is stored.
(2) Provides high processing efficiency, because the number of the data types to be processed is reduced.
(3) Delivers high query efficiency, because data is stored in CPU caches in a continuous way.
Data distribution methods (sharding)
ClickHouse can distribute data to a single node or across the nodes of a cluster. To distribute data across the nodes of a cluster, ClickHouse divides data into shards, and distributes the shards across different nodes. ClickHouse supports four distribution methods: hash-based sharding, random sharding, fixed sharding, and sharding based on custom expressions. ClickHouse supports various sharding methods, which allows you to select a data distribution method based on your business needs. This helps you avoid data skew and maximize the large-scale parallel processing capabilities of clusters.
- Hash-based sharding: uses the values of a column as key values based on a hash function, and stores data to the node in a cluster based on the key values.
- Random sharding: distributes the written data across nodes in a random way.
- Fixed sharding: distributes the written data to the specified nodes.
- Sharding based on custom expressions: calculates values based on the valid expression that you specify, and distributes data across the nodes of a cluster based on the calculated values.
Data distribution methods (partitioning)
ClickHouse does not support subpartitions. The major partitioning methods for ClickHouse are implemented based on each value of the partition keys, such as date-based partitioning. Typical partitioning expressions include to YYYYMM(), to Monday(), and enumeration expressions.
You can use partitioning to narrow down the range of the required data based on your business characteristics. This helps you reduce the response time of queries.
You can also use partitioning to help you manage the time to live (TTL) of data. ClickHouse provides an advanced system that you can use to manage the TTL of data. This allows you to manage the TTL of the data stored in partitions, and remove the expired data.
ClickHouse allows you to create a desired number of sparse indexes for a column. The indexes occupy a small amount of storage space. Therefore, ClickHouse offers cost-effective services for you to store a single terabyte of data.
Different compression algorithms offer different data compression ratios. Two typical compression algorithms are LZ4 and ZSTD. The compression ratio of the LZ4 compression algorithm is about 3.7:1. The compression ratio of the ZSTD compression algorithm is about 5.0:1. LZ4 compresses data faster at a lower compression ratio than ZSTD. ZSTD compresses data at a low speed, but compresses data faster and provides better compression results than the traditional ZLIB compression algorithm. Therefore, ZSTD is a substitute for ZLIB.
High availability of data
ClickHouse allows you to replicate a raw table to create multiple replicas. This ensures the high availability of data. By default, all the replicas are in the active mode. Therefore, you can query the data that is stored in replicas. You can create zero to N replicas for data shards.
To maximize data value and reduce maintenance costs, ClickHouse provides automatic TTL management. You can manage the TTL of data based on the following granularities:
- Columns: If partial data in a column expires, ClickHouse replaces the expired data with default values. If all the data in a column expires, ClickHouse deletes the column.
- Rows: If the data in a row expires, ClickHouse deletes the row.
- Partitions: If the data in a partition expires, ClickHouse deletes the partition.
Parallel computing capabilities of a single node
ClickHouse divides data into partitions, and divides each partition into smaller granularities by creating indexes. Multiple CPU cores process a part of granularities to implement parallel data processing.
Distributed computing capabilities
ClickHouse supports linear scale-out of distributed computing capabilities, and provides a variety of policies for you to distribute queries across nodes.
Multi-table association analysis
ClickHouse supports JOIN queries and analysis between large wide tables and dimension tables. ClickHouse does not support JOIN queries and analysis between large wide tables. ClickHouse does not support right joins and outer joins.
The latest version of ClickHouse supports role-based access control (RBAC).
Comparison between Hologres and ClickHouse
ClickHouse and Hologres are in a competitive relationship based on the preceding analysis of ClickHouse storage and computing capabilities.
|Strategic positioning||Product positioning||An open source, distributed column store database.||A data warehouse that can process petabytes of data in real time.|
|Basic features||System architecture||Features the coupling of storage and computing and uses the massively parallel processing (MPP) architecture that is also known as a shared nothing architecture.||Decouples storage from computing, and provides a quick method for you to scale out computing and storage nodes in a separate way.||Hologres
A new trend in big data scenarios is to decouple storage from computing. Hologres allows you to scale out computing and storage nodes in a separate way. You can configure computing resources and storage space based on your business needs. This enables Hologres to offer flexible and cost-effective solutions.
|Polymorphic storage||Not supported. ClickHouse supports only column store.||Supported. Hologres supports row store and column store.||Hologres
Hologres supports the row store of traditional databases, and the column store of big data databases. The Hologres team is working on support for the external storage feature. This feature allows you to store data in the Hadoop Distributed File System (HDFS). Hologres supports polymorphic storage, which allows you to select a storage model based on data access frequency and access methods.
|Multiple sharding methods||Supports four sharding methods if data is distributed across the nodes of a cluster: hash-based sharding, random sharding, fixed sharding, and sharding based on custom expressions.||Supports two sharding methods if data is distributed across the nodes of a cluster: hash-based sharding and random sharding.||ClickHouse
ClickHouse supports more sharding methods than Hologres. You can use these methods to distribute data in a fine-grained manner based on your business needs. This allows you to distribute data and schedule cluster resources in an appropriate way.
|Multi-level partitioning||Not supported. Currently, ClickHouse supports only partitioning based on each value of the partition keys, and does not support subpartitions.||Not supported. Currently, Hologres supports only partitioning based on each value of the partition keys, and does not support subpartitions.||Hologres
The Hologres team is working on support for multi-level partitioning and dynamic partitioning. Multi-level partitioning helps you manage tables in a fine-grained manner. Dynamic partitioning automatically loads data to corresponding partitions based on the values of partition columns. This helps you provide flexible and intelligent services.
|Real-time writes||Supported. However, ClickHouse requires micro-batch processing, and does not allow you to query data immediately after the data is written. ClickHouse may fail to merge data in a timely manner if a large number of transactions per second (TPS) is written.||Supported. You can query data immediately after the data is written even if a large number of TPS is written.||Hologres|
|SQL support||Complies with SQL standards in most cases. ClickHouse does not support window functions, and provides limited capabilities of joining tables.||Highly compatible with the protocols, the syntax, and the ecosystem of PostgreSQL.||Hologres
Allows you to select products based on scenarios and ecosystems, and supports the standard SQL syntax. ClickHouse provides fast queries and analysis, but does not support the advanced features of data warehouses, such as the support for various SQL standards. Another disadvantage of ClickHouse is that its community ecosystem is less inclusive than Hologres.
|Advanced features||Storage capability||Stores data in on-premises file systems. You need to maintain replicas and ensure that data is synchronized among the replicas. The storage capacity depends on the cluster size. ClickHouse supports linear scale-out to improve storage capacity. ClickHouse also supports a wide variety of storage engines and compression algorithms.||Stores data in Apsara Distributed File System or the HDFS. The storage capacity depends on the cluster size. Hologres supports linear scale-out to improve storage capacity. A single table can store a maximum of 3 PB data. Hologres supports diverse storage models and compression algorithms to improve storage capabilities.|
|Query and analysis capabilities||Provides faster single-table query analysis than other open source OLAP engines. No performance tests are performed based on tens of billions or even hundreds of billions of data records. ClickHouse provides limited capabilities of joining tables.||Allows you to query tens of billions of data records in real time and receive analysis results in several sub-seconds. Hologres allows you to query hundreds or even thousands of billions of data records in real time, and receive analysis results in several seconds. Hologres provides powerful capabilities of joining tables.||Hologres
Allows you to query and analyze large amounts of data in real time, and provides powerful capabilities of joining tables.
|Concurrent queries per second (QPS) supported in complex OLAP scenarios||Supports low concurrent QPS.||Supports high concurrent QPS.||Hologres
Uses an asynchronous architecture and supports more QPS than MPP databases and HBase online data services.
|Concurrent QPS support by online data services||Supports low concurrent QPS.||Supports high concurrent QPS.|
|Federated computing||Not supported.||Supports real-time offline federated computing.||Hologres
Creates external tables for the data in disparate data sources based on the precondition that data privacy and security are ensured. This allows you to implement federated queries and computing.
|Reliability||Atomicity, consistency, isolation, and durability (ACID) of transactions||Not supported.||Limited support.||Hologres
Supports the atomicity of transactions.
|Transaction isolation||ClickHouse does not support transaction isolation, but supports eventual consistency.||Supports snapshot isolation.|
|Backup and disaster recovery||Supported. Allows you to create zero to N standard replicas in big data scenarios. Multiple replicas can provide the disaster recovery capability.||Supported. Provides three standard replicas for disaster recovery in big data scenarios.||ClickHouse
Allows you to replicate a raw table to create multiple replicas. This ensures high availability.
(1) By default, all the replicas are in the active mode. Therefore, you can query the data that is stored in replicas.
(2) You can create zero to N replicas for data shards.
|Ease of use||SQL syntax||DDL||Supported. The keywords are create, alter, and drop. The objects are database, table, view, dictionary, role, role policy, quota, and settings profile.||Supported. The keywords are create, alter, and drop. The objects are database, table, view, schema, cast, extension, role, user, user mapping, and group.||Hologres
Supports more SQL syntax patterns and Data Definition Language (DDL) objects than ClickHouse. Hologres also supports JOIN queries. You can use all the types of joins, such as left joins and right joins. Another highlight of Hologres is the support for rollbacks. If you modify the data in a database and have not committed the modifications, you can perform rollbacks.
|DML||Supported. The keywords are select, insert, update, and delete.||Supported. The keywords are select, insert, update, and delete.|
|DCL||Supported. The keywords are grant and revoke.||Supported. The keywords are grant, revoke, and rollback.|
|Development tools||Uses terminal tools and open source tools.||Uses terminal tools, open source tools, and Alibaba Cloud tools.|
|Scalability||Uses a shared nothing architecture where only a few node interactions are involved. ClickHouse supports linear scale-out, which indicates that you can add nodes to improve parallel processing capabilities. However, if your storage or computing resources are insufficient, you have to scale out the storage and computing resources at the same time.||Hologres allows you to add storage and computing nodes in a separate way for linear scale-out. This helps you improve parallel processing capabilities.||Hologres
Hologres uses an architecture where storage is decoupled from computing. The architecture of this type is suitable for real-time big data processing and computing. The reasons are described as follows:
(1) If scale-out is required to remove storage or computing bottlenecks, Hologres users can improve storage or computing capabilities based on business needs. This helps Hologres users prevent the waste of storage or computing resources.
(2) Hologres users do not need to change the allocation plans of storage or computing resources at a high frequency.
(3) The architecture helps Hologres users handle data migration challenges during scale-out. During the scale-out, large amounts of data may need to be migrated.
|Operations and maintenance (O&M)||Requires complex O&M. You must specify topology information for data shards in an explicit way. After nodes are added or removed, the ClickHouse system cannot automatically detect the topology changes of the cluster, or balance data distribution across nodes. This results in high O&M costs.||Provides fully managed services.
Allows the system to automatically detect the topology changes of a cluster. You do not need to concern yourself with the topology changes.
Provides fully managed services. This indicates that no maintenance efforts are required.
|Costs||ClickHouse supports only a few standard SQL statements, which results in high adoption costs.||Hologres is highly compatible with the PostgreSQL ecosystem, which allows you to use the Hologres service at a low cost.||Hologres
Allows you to use its features at a lower cost than ClickHouse.
|Scenarios||Connects data silos, stores large amounts of data at a low cost, and provides high query performance.||Provides a real-time data warehouse service. Connects data silos, stores large amounts of data, allows you to query and analyze large amounts of data in real time, and supports the elastic scale-out of clusters. Hologres provides full SQL support.|
Hologres provides the following advantages over ClickHouse:
(1) Hologres uses an architecture where storage is decoupled from computing. Hologres allows you to scale out storage and computing resources in a separate way, and configure storage and computing resources based on your business needs. This helps you save costs.
(2) Hologres supports polymorphic storage. This helps you save storage space and improve storage usage. Hologres supports multi-level partitioning. You can divide data into partitions in an intelligent way and identify the required data ranges for queries.
(3) Hologres is highly compatible with PostgreSQL. Hologres supports the standard PostgreSQL syntax, provides an advanced community ecosystem, and allows you to use the Hologres features at a low cost.
(4) Hologres supports federated queries based on disparate data sources.
(5) Hologres allows you to query and analyze large amounts of data in real time.
(6) Hologres provides a real-time cloud data warehouse service. Hologres provides fully managed services, which indicates that no maintenance efforts are required.