Summary: in this tutorial, you will learn how to use the SQLite CUME_DIST()
function to calculate the cumulative distribution of a value within a group of values.
Introduction to SQLite CUME_DIST() Function
The CUME_DIST()
is a window function that returns the cumulative distribution of a value relative to the values in the group.
Here’s the syntax of the CUME_DIST()
function:
CUME_DIST() OVER (
[PARTITION BY partition_expression]
[ORDER BY order_list]
)
Code language: SQL (Structured Query Language) (sql)
In this syntax:
- The
PARTITION BY
clause specifies how the rows are grouped into partitions to which theCUME_DIST()
function applies. If you skip thePARTITION BY
clause, the function treats the whole result set as a single partition. - The
ORDER BY
clause specifies the order of rows in each partition to which theCUME_DIST()
function applies. If you omit theORDER BY
clause, the function returns 1 for all rows.
Suppose N
is the value of the current row of the column specified in the ORDER BY
clause and the order of rows is from low to high, the cumulative distribution of a value is calculated using the following formula:
The number of rows with values <= N / The number of rows in the window or partition
Code language: SQL (Structured Query Language) (sql)
The return value of the CUME_DIST()
function is greater than 0 and less than or equal to 1:
0 < CUME_DIST() <= 1
Code language: SQL (Structured Query Language) (sql)
The rows with the same values receive the same result.
SQLite CUME_DIST() function example
First, create a new table named products
for the demonstration:
CREATE TABLE products(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
color TEXT NOT NULL,
price REAL NOT NULL
);
Code language: SQL (Structured Query Language) (sql)
Second, insert some rows into the products
table:
INSERT INTO products
(name, color, price)
VALUES
('A', 'red', 100),
('B', 'red', 200),
('C', 'red', 200),
('D', 'black', 300),
('E', 'black', 400),
('F', 'white', 500);
Code language: SQL (Structured Query Language) (sql)
Third, query data from the products
table:
SELECT * FROM products;
Code language: SQL (Structured Query Language) (sql)
Output:
id | name | color | price
---+------+-------+------
1 | A | red | 100.0
2 | B | red | 200.0
3 | C | red | 200.0
4 | D | black | 300.0
5 | E | black | 400.0
6 | F | white | 500.0
(6 rows)
Fourth, calculate the cumulative distribution of the price in the products
table:
SELECT
name,
CUME_DIST() OVER (
ORDER BY
price
) PriceCumulativeDistribution
FROM
products;
Code language: SQL (Structured Query Language) (sql)
Here is the output:
name | PriceCumulativeDistribution
-----+----------------------------
A | 0.16666666666666666
B | 0.5
C | 0.5
D | 0.6666666666666666
E | 0.8333333333333334
F | 1.0
(6 rows)
Because we skipped the PARTITION BY
clause, the function treated the whole result set as a single partition. Therefore, the number of rows to be evaluated is 6.
The following example uses the CUME_DIST()
function to calculate the cumulative distribution of prices partitioned by colors:
SELECT
name,
color,
price,
CUME_DIST() OVER (
PARTITION BY color
ORDER BY price
) PriceCumulativeDistribution
FROM
products
ORDER BY color;
Code language: SQL (Structured Query Language) (sql)
Output:
name | color | price | PriceCumulativeDistribution
-----+-------+-------+----------------------------
D | black | 300.0 | 0.5
E | black | 400.0 | 1.0
A | red | 100.0 | 0.3333333333333333
B | red | 200.0 | 1.0
C | red | 200.0 | 1.0
F | white | 500.0 | 1.0
(6 rows)
Summary
- Use the
CUME_DIST()
function to calculate the cumulative distribution of value in a set of values.