Due to different emphases, traditional databases can be divided into transaction-focused OLTP system and analytics-focused OLAP system. With the development of the Internet, the amount of data has increased exponentially, and the single-machine database can no longer meet the needs of the business. Especially in the field of analytics, a query may need to process a large part or even the full amount of data, and the pressure brought by massive data becomes particularly urgent. This contributed to the big data revolution started with Hadoop technology in the past decade or so and solved the demand for massive data analytics. At the same time, a number of distributed database products have emerged in the database field to cope with the growth of OLTP scenario data.
In order to analyze the data in OLTP system, the standard practice is to synchronize the data in OLTP system to an OLAP system on a regular basis (e.g. every day). This architecture ensures that analytical queries will not affect online transactions. However, regular synchronization has led to the analytics results not being based on the latest data, and this delay has deprived us of the opportunity to make more timely business decisions. In order to solve this problem, HTAP architecture has emerged in recent years, which allows us to directly analyze the data in OLTP database, thus ensuring the timeliness of analytics. Analytics is no longer a unique capability of traditional OLAP systems or big data systems. A natural question is: Since HTAP has the capability of analytics , will it replace big data systems? What is the next stop for big data?
In order to answer this question, we will take the recommendation system as an example to analyze the typical scenarios of big data systems.
When you see the shopping application showing you what you just want to buy and the short video application playing your favorite music, the recommendation system is playing its magical role. The core goal of an advanced recommendation system is to make personalized recommendations according to the real-time behavior of users. Each interaction between users and the system will optimize the next experience in real time. In order to support such a system, the big data technology stack has evolved into a very complex and fragmented system.
The following figure shows a big data technology stack that supports a real-time recommendation system.
In order to provide high-quality real-time personalized recommendation, the recommendation system relies heavily on real-time features and continuous updating of models.
Real-time features can be divided into two categories:
- The system will collect massive user behavior events (such as browsing, clicking, etc.) and transaction records (such as payment records synchronized from OLTP database, etc.). The amount of these data is very large (the traffic volume may be as high as tens of millions or even hundreds of millions of pieces per second), and most of them do not come from the trading system. For convenience of future use, these data will be imported into the system (a in the figure), and at the same time they will be associated with various dimension table data to deduce a series of important features (1 in the figure), which will be updated to the recommendation system in real time to optimize the user experience. The real-time dimension table association here requires point check support with low latency and high throughput to keep up with the newly generated data.
- The system will also use sliding windows and other methods to calculate the characteristics of various dimensions and time granularity (such as the number of clicks in the past 5 minutes, the number of views in the past 7 days and the sales in the past 30 days of a particular commodity, etc.). Depending on the granularity of the sliding window, these aggregations may be completed through stream computation or batch processing.
These data are also used to generate real-time and offline machine learning samples, and the trained models will be continuously updated to the recommendation system after verification.
What is explained above is the core part of an advanced recommendation system, but this is only the tip of the iceberg of the whole system. In addition, a complete set of systems such as real-time model monitoring, verification, analytics and tuning are needed, which include: using a real-time large screen to view the results of A/B test (3), using interactive analytics (4) for BI, and refining and tuning the model. In addition, the operation will also use various complicated queries to gain insight into the progress of the business, and carry out targeted marketing by means of customer-targeting and product recommendation.
This example shows a very complex but typical big data scenario, from real-time data import (a) to pre-aggregation (b), from data service (1), continuous aggregation (3), to interactive query (4), to batch processing (2). Such complex scenarios have very diversified requirements for big data systems. We have seen two new trends in the practice of building these systems.
Real-time: Business needs to quickly gain business insight from the data just collected. The written data needs to be visible in seconds or even subseconds. The lengthy offline ETL process is becoming intolerable. At the same time, the collected data is much larger than the data synchronized from the OLTP system, and the event log data such as user browsing and clicking are even several orders of magnitude larger than it. Our system needs to be able to provide low-latency query capability while writing data in extremely high throughput.
Hybrid serving and analytics: Traditional OLAP systems often play a relatively static role in business. We gain business insight (such as pre-calculated views, models, etc.) by analyzing data, and provide online data services via another system based on the acquired knowledge. The service and analytics here are a fragmented process. In contrast, the ideal business decision-making process is often an online process of continuous optimization. The process of service will generate a large amount of new data, and we need to make complex analytics of these new data. The insight generated by the analytics is fed back to the service in real time to create greater commercial value. Service and analytics are forming a closed loop.
Existing solutions address the need for real-time service/analytics convergence through a combination of a series of products. For example, through Apache Flink to do real-time pre-aggregation of data, the aggregated data will be stored in products such as Apache Druid that provide multi-dimensional analytics, and data services will be provided through products such as Apache HBase. This chimney development mode will inevitably generate isolated data islands, thus causing unnecessary data duplication. The complex data synchronization between various products also makes the consistency and security of data a challenge. This complexity makes it difficult for application development to respond to new requirements quickly, affects the iteration speed of business, and also brings large additional overheads to development and operation and maintenance.
We believe that real-time service/analytics integration should be implemented through a unified Hybrid Serving/Analytical Processing (HSAP) system.
Through such a system, application development no longer needs to deal with multiple different products, and no longer needs to learn and accept the problems and limitations of each product, which can greatly simplify the business architecture and improve the development and operation efficiency. Such a unified system can avoid unnecessary data duplication and thus save costs. At the same time, this architecture can also bring second-level or even sub-second-level real-time performance to the system, making business decisions more real-time, thus allowing data to play a greater commercial value.
Although a distributed HTAP system has the capability of real-time analytics, it cannot solve the problem of big data.
First of all, the data synchronized by the transaction system is only a small part of the data that the real-time recommendation system needs to process. Most of the other data come from non-transaction systems such as logs (users often have dozens or even hundreds of browsing behaviors before each purchase). Most of the analytics is conducted on these non-transactional data. However, HTAP system does not have this part of data, so it is impossible to analyze.
Can these non-transaction data be written into HTAP system for analytics? Let’s analyze the difference in data writing mode between HTAP system and HSAP system. The cornerstone and advantage of HTAP system is to support fine-grained distributed transactions. Transactional data are often written into HTAP system in the form of many distributed small transactions. However, the data from logs and other systems do not have the semantics of fine-grained distributed transactions. If these non-transactional data are to be imported into HTAP system, unnecessary overheads will inevitably be brought.
In contrast, HSAP system does not need such high-frequency distributed small transactions. There are generally two modes of data writing in HSAP system:
1: real-time writing of massive single record;
2: relatively low frequency distributed batch data writing.
This allows HSAP system to make a series of optimization in design, thus improving cost effectiveness and avoiding unnecessary overheads caused by importing non-transactional data into HTAP system.
Even if we don’t care about these expenses, assuming that we can write all the data into HTAP system regardless of cost, will we solve the problem? The answer is still no.
Supporting OLTP scenarios is a prerequisite for HTAP systems. In order to achieve this, HTAP systems often adopt the data format of row-based storage, while the efficiency of analytical queries in row-based storage is greatly inferior to that of columnar storage. Having the ability to analyze is not the same as being able to analyze efficiently. In order to provide the ability of efficient analytics, HTAP system must copy a large amount of non-transaction data to the columnar storage, but this is bound to bring a lot of cost. It is better to copy that small amount of transaction data to HSAP system at lower cost, and at the same time it can better avoid the impact on online transaction system.
Therefore, we believe that HTAP and HSAP will complement each other and lead the direction of database and big data respectively.
Challenge of HSAP
As a brand-new architecture, HSAP is facing very different challenges from the existing big data and traditional OLAP systems.
High concurrency mixed workload:
HSAP system needs to handle concurrent queries far beyond the traditional OLAP system.
In practice, the concurrency of data services goes far beyond OLAP queries. For example, we have seen in practice that data services need to process tens of millions of queries per second, which is 5 orders of magnitude higher than the concurrency of OLAP queries. At the same time, compared with OLAP query, data service query has more stringent requirements on delay. In addition, the greater challenge is that the system needs to deal with very complex analytical queries while providing data service queries. These hybrid query payloads have very different trade-offs between latency and throughput. How to efficiently use system resources to handle these quite different queries and ensure the SLO of each query is a huge challenge.
High Throughput Real-Time Data Import:
HSAP system also needs to support real-time writing of massive data while processing high concurrent query load. The amount of data written in real time far exceeds the requirements of traditional OLAP systems. For example, the above real-time recommendation scenario will continuously write tens of millions or even hundreds of millions of events per second. Another difference with the traditional OLAP system is that HSAP system has high requirements for real-time data. The written data needs to be visible in seconds or even sub-seconds, so as to ensure the efficiency of our service and analytics results.
Flexibility and Scalability:
The load of data writing and query may have sudden peaks, which puts forward high flexibility and scalability requirements for the system. In practice, we notice that the peak value of data writing can reach 2.5 times the average and the peak value of query can reach 3 times the average. Moreover, the peaks of data writing and query do not necessarily appear at the same time, which also requires the system to have the ability to make rapid adjustments according to different peaks.
System Design of HSAP
In order to tackle these challenges, we believe that a typical HSAP system can adopt an architecture similar to the above figure.
Storage disaggregation of storage and computation:
All data is stored in a distributed file system. We scale the system via sharding. Storage Manager will manage these shards. Resource Manager will manage the computing resources of the system to ensure that the system can handle the requirements of high throughput data writing and query. This architecture can quickly scale with changes in workload, expand computing resources when query load becomes larger and more computing resources are needed, and expand storage resources quickly when data volume increases rapidly. The separation of storage and computation ensures that these operations can be completed quickly without waiting for data to be moved/copied. This architecture greatly simplifies operation and maintenance and provides guarantee for the stability of the system.
Unified real-time storage:
In order to support various query modes, a unified real-time storage layer is crucial. Queries can be broadly divided into two types, one is point lookup queries (most of which are data service types) and the other is complex analytical queries that scan a large amount of data (most of which are analytical types). Of course, many queries are in between. These two query types also put forward different requirements for data storage. Row-based storage can support point queries more efficiently, while column storage has obvious advantages in supporting queries with a large number of scans. We can make a compromise between row storage and column storage in a PAX-like way, at the cost of not obtaining the best performance in scenarios where data is checked and scanned. We hope to achieve the best in both scenarios, so both row storage and column storage are supported in the system, and users can select the storage each table according to the scenarios. For tables with two requirements at the same time, we allow users to select two kinds of storage at the same time through index abstraction, and the system ensures consistency between the two through index maintenance mechanism. In practice, we found that the efficiency and flexibility brought by this design can better support the business.
The SLO (Service Level Objective) of the system under mixed workloads is guaranteed by scheduling. Ideally, a large query should be able to utilize all resources. When multiple queries run at the same time, these queries need to share resources fairly. Since service-oriented point lookup queries are usually relatively simple and require less resources, this fair scheduling mechanism can ensure that the latency of service-oriented queries can still be guaranteed even if there are complex analytical queries. As a distributed system, scheduling can be divided into distributed scheduling and process scheduling. The coordinator decomposes a query into multiple tasks, which are distributed to different processes. The coordinator needs to adopt certain strategies to ensure fairness. Equally important, we also need to allow different tasks to share resources fairly within a process. Since the operating system does not understand the relationship between tasks, we have implemented a user-state scheduler in each process to support workload isolation more flexibly.
Openness of the system:
Many businesses have already used other storage platforms or computing engines, and the new system must consider the integration with the existing system. The integration of query, computing and storage, which requires high time efficiency, can bring obvious advantages. However, for offline computing without high time efficiency, the storage layer can provide a unified interface to open data, which allows other engines to pull data out for processing and gives business greater flexibility. The other side of openness is the ability to process data stored in other systems, which can be realized through federated queries.
Application of HSAP
Here we will share Alibaba’s search recommendation refined operation business. The following figure shows an example of the architecture before HSAP was adopted.
Original Search Recommendation Refined Operation Business Architecture
We can meet the business requirements through the complex cooperation of a series of storage and computing engines (HBase, Druid, Hive, Drill, Redis, etc.), and multiple storage needs to maintain approximate synchronization through data synchronization tasks. This kind of business architecture is extremely complex, and the development of the whole consumes plenty of time.
Upgraded Search Recommendation Refined Operation Business Architecture
We upgraded this business with HSAP system on the Double 11 of 2019 (It is a global shopping carnival where customers can buy commodities with a massive discount, with Alibaba shoppers exceeding 268.4 billion yuan (USD $37. 96 billion) in purchase during the 2019 Double 11 carnival. HSAP system supported a total of 145 million online queries, which further supported a process of analysis and decision-making of very complex businesses. Behind these analyses is at the same time a large-scale data volume with 130 million real-time records written without redundant data generated.), and the new architecture was greatly simplified. User, commodity, merchant attribute data and massive user behavior data come into HSAP system from online and offline ETL. HSAP system provides query and analytics services like real-time data visualization, real-time report, effect tracking, real-time data application, etc. It helps make better decisions by providing services like real-time data visualization，real-time sales forecast, real-time inventory monitoring, real-time BI report, real-time monitoring of business progress, monitoring operational growth, tracking algorithm effect. Data products such as real-time labels, real-time portraits, competition analytics, customer-targeting, product recommendation, and bonus distribution contribute to precise operation and decision-making.Real-time data services support algorithm control, inventory monitoring and pre-warning and other services. A set of HSAP system realizes all channels and all procedures data sharing and reusing, which solves the data analytics and query requirements from different business perspectives of operator, product-owner, algorithm-owner, developer, analysts or senior manager.
By providing unified real-time storage without requiring any data replication, HSAP architecture offers a one stop service for point lookup queries, OLAP analytics, online data services, and other diversified queries and services.This new architecture greatly reduces the complexity of application and enables us to quickly respond to new business requirements. The second or even sub-second latency in real-time performance makes decisions more promptly and efficiently, thus allowing data to create greater commercial value.