Why is my primary key not used? How can I check?
Covers a common reason why a primary key is not used in ordering and how we can confirm
Checking your Primary Key
Users may see cases where their query is slower than expected, in the belief they are ordering or filtering by a primary key. In this article we show how users can confirm the key is used, highlighting common reasons its not.
Create table
Consider the following simple table:
CREATE TABLE logs
(
`code` LowCardinality(String),
`timestamp` DateTime64(3)
)
ENGINE = MergeTree
ORDER BY (code, toUnixTimestamp(timestamp))
Note how our ordering key includes toUnixTimestamp(timestamp)
as the second entry.
Populate data
Populate this table with 100m rows:
INSERT INTO logs SELECT
['200', '404', '502', '403'][toInt32(randBinomial(4, 0.1)) + 1] AS code,
now() + toIntervalMinute(number) AS timestamp
FROM numbers(100000000)
0 rows in set. Elapsed: 15.845 sec. Processed 100.00 million rows, 800.00 MB (6.31 million rows/s., 50.49 MB/s.)
SELECT count()
FROM logs
┌───count()─┐
│ 100000000 │ -- 100.00 million
└───────────┘
1 row in set. Elapsed: 0.002 sec.
Basic filtering
If we filter by code we can see the number of rows scanned in the output. - 49.15 thousand
. Notice how this is a subset of the total 100m rows.
SELECT count() AS c
FROM logs
WHERE code = '200'
┌────────c─┐
│ 65607542 │ -- 65.61 million
└──────────┘
1 row in set. Elapsed: 0.021 sec. Processed 49.15 thousand rows, 49.17 KB (2.34 million rows/s., 2.34 MB/s.)
Peak memory usage: 92.70 KiB.
Furthermore, we can confirm the use of the index with the EXPLAIN indexes=1
clause:
EXPLAIN indexes = 1
SELECT count() AS c
FROM logs
WHERE code = '200'
┌─explain────────────────────────────────────────────────────────────┐