TiDB Statistics: Sync Load and Async Load

Sync Load and Async Load In my previous article, I introduced the statistics in TiDB and how it initializes them. However, there is an issue: even with comprehensive initialization, statistics may still be missing for columns that are not indexed. Additionally, after initialization, statistics may be evicted from memory due to memory pressure. In this article, I will introduce two methods to load statistics in TiDB on the fly: Sync Load and Async Load....

February 5, 2025 · 7 min · Rustin liu

TiDB Statistics: Understanding the Initialization Process

Statistics Statistics collection is a crucial process of modern database systems, forming the backbone of query optimization. In TiDB, statistics are indispensable, serving as the sole source of information for estimating query costs and selecting the most efficient execution plan. TiDB collects several types of statistics for each table, including: TopN values (most frequent values to reflect data skewness) Histograms (data distribution) Number of Distinct Values (NDV) Other statistical metrics These statistics will be stored in some system tables, such as mysql....

February 5, 2025 · 11 min · Rustin liu

Batch Dumping Statistics Delta

Background Recently, we have been tackling the challenge of supporting 3 million tables within a single TiDB cluster. One of the most significant hurdles we’ve faced is optimizing the performance of statistics collection. In its current implementation, TiDB gathers basic table information from all servers and consolidates it into a single system table. While functional, this approach becomes highly inefficient when managing millions of tables, consuming excessive CPU and taking a considerable amount of time....

December 14, 2024 · 8 min · Rustin liu