SQLite CUME_DIST() Function

Summary: in this tutorial, you will learn how to use the SQLite CUME_DIST() function to calculate the cumulative distribution of a value within a group of values.

Introduction to SQLite CUME_DIST() Function

The CUME_DIST() is a window function that returns the cumulative distribution of a value relative to the values in the group.

Here’s the syntax of the CUME_DIST() function:

CUME_DIST() OVER (
  [PARTITION BY partition_expression]
  [ORDER BY order_list]
)Code language: SQL (Structured Query Language) (sql)

In this syntax:

  • The PARTITION BY clause specifies how the rows are grouped into partitions to which the CUME_DIST() function applies. If you skip the PARTITION BY clause, the function treats the whole result set as a single partition.
  • The ORDER BY clause specifies the order of rows in each partition to which the CUME_DIST() function applies. If you omit the ORDER BY clause, the function returns 1 for all rows.

Suppose N is the value of the current row of the column specified in the ORDER BY clause and the order of rows is from low to high, the cumulative distribution of a value is calculated using the following formula:

The number of rows with values <= N / The number of rows in the window or partition
Code language: SQL (Structured Query Language) (sql)

The return value of the CUME_DIST() function is greater than 0 and less than or equal to 1:

0 < CUME_DIST() <= 1Code language: SQL (Structured Query Language) (sql)

The rows with the same values receive the same result.

SQLite CUME_DIST() function example

First, create a new table named products for the demonstration:

CREATE TABLE products(
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    color TEXT NOT NULL,
    price REAL NOT NULL 
);Code language: SQL (Structured Query Language) (sql)

Second, insert some rows into the products table:

INSERT INTO products
   (name, color, price)
VALUES
   ('A', 'red', 100),
   ('B', 'red', 200),
   ('C', 'red', 200),
   ('D', 'black', 300),
   ('E', 'black', 400),
   ('F', 'white', 500);Code language: SQL (Structured Query Language) (sql)

Third, query data from the products table:

SELECT * FROM products;Code language: SQL (Structured Query Language) (sql)

Output:

id | name | color | price
---+------+-------+------
1  | A    | red   | 100.0
2  | B    | red   | 200.0
3  | C    | red   | 200.0
4  | D    | black | 300.0
5  | E    | black | 400.0
6  | F    | white | 500.0

(6 rows)

Fourth, calculate the cumulative distribution of the price in the products table:

SELECT
  name,
  CUME_DIST() OVER (
    ORDER BY
      price
  ) PriceCumulativeDistribution
FROM
  products;Code language: SQL (Structured Query Language) (sql)

Here is the output:

name | PriceCumulativeDistribution
-----+----------------------------
A    | 0.16666666666666666
B    | 0.5
C    | 0.5
D    | 0.6666666666666666
E    | 0.8333333333333334
F    | 1.0

(6 rows)

Because we skipped the PARTITION BY clause, the function treated the whole result set as a single partition. Therefore, the number of rows to be evaluated is 6.

The following example uses the CUME_DIST() function to calculate the cumulative distribution of prices partitioned by colors:

SELECT
  name,
  color,
  price,
  CUME_DIST() OVER (
    PARTITION BY color
    ORDER BY price
  ) PriceCumulativeDistribution
FROM
  products
ORDER BY color;Code language: SQL (Structured Query Language) (sql)

Output:

name | color | price | PriceCumulativeDistribution
-----+-------+-------+----------------------------
D    | black | 300.0 | 0.5
E    | black | 400.0 | 1.0
A    | red   | 100.0 | 0.3333333333333333
B    | red   | 200.0 | 1.0
C    | red   | 200.0 | 1.0
F    | white | 500.0 | 1.0

(6 rows)

Summary

  • Use the CUME_DIST() function to calculate the cumulative distribution of value in a set of values.
Was this tutorial helpful ?