Thursday, January 31, 2019

rank vs dense_rank vs row_number with partition

  • One of the most obvious and useful set of window functions are ranking functions where rows from your result set are ranked according to a certain scheme. 
  • There are three ranking functions: ROW_NUMBER() RANK() DENSE_RANK() 
DDL Scripts
Create table dup_employees
(
     id int,
     first_name nvarchar(50),
     last_name nvarchar(50),
     gender nvarchar(50),
     salary int
);
Insert into dup_employees values (1, 'Mark', 'Hastings', 'Male', 60000); Insert into dup_employees values (1, 'Mark', 'Hastings', 'Male', 60000); Insert into dup_employees values (1, 'Mark', 'Hastings', 'Male', 60000); Insert into dup_employees values (2, 'Mary', 'Lambeth', 'Female', 30000); Insert into dup_employees values (2, 'Mary', 'Lambeth', 'Female', 30000); Insert into dup_employees values (3, 'Ben', 'Hoskins', 'Male', 70000); Insert into dup_employees values (3, 'Ben', 'Hoskins', 'Male', 70000); Insert into dup_employees values (3, 'Ben', 'Hoskins', 'Male', 70000);



dup_employees table

+------+------------+-----------+--------+--------+
| id   | first_name | last_name | gender | salary |
+------+------------+-----------+--------+--------+
|    1 | Mark       | Hastings  | Male   |  60000 |
|    1 | Mark       | Hastings  | Male   |  60000 |
|    1 | Mark       | Hastings  | Male   |  60000 |
|    2 | Mary       | Lambeth   | Female |  30000 |
|    2 | Mary       | Lambeth   | Female |  30000 |
|    3 | Ben        | Hoskins   | Male   |  70000 |
|    3 | Ben        | Hoskins   | Male   |  70000 |
|    3 | Ben        | Hoskins   | Male   |  70000 |
+------+------------+-----------+--------+--------+

ROW_NUMBER()
Assigns unique numbers to each row within the PARTITION given the ORDER BY clause
Note that some SQL dialects (e.g. SQL Server) require an explicit ORDER BY clause in the OVER() clause:

RANK()
Behaves like ROW_NUMBER(), except that “equal” rows are ranked the same. If we substitute RANK() into our previous query

DENSE_RANK()
Trivially, DENSE_RANK() is a rank with no gaps, i.e. it is “dense”.

PARTITION
*** Would be meaningful when we have the same partition and order by column for ROW_NUMBER. When we use it with RANK() and DENSE_RANK() it would try to reset to previous and would not be meaningful

In the select query we are using ROW_NUMBER,RANK AND DENSE_RANK with and without PARTITION
select row_number() over (order by id) as rn ,
       row_number() over (partition by id) as rn_p,
       rank() over (order by id) as rnk ,
       dense_rank() over (order by id) as dnrk,
       rank() over (partition by id order by id) as p_rnk,
       dense_rank() over (partition by id order by id) p_dense_rnk ,
       first_name 
from dup_employees;

Output
+----+------+-----+-------+------+-------------+------------+
| rn | rn_p | rnk | p_rnk | dnrk | p_dense_rnk | first_name |
+----+------+-----+-------+------+-------------+------------+
|  1 |    1 |   1 |     1 |    1 |           1 | Mark       |
|  2 |    2 |   1 |     1 |    1 |           1 | Mark       |
|  3 |    3 |   1 |     1 |    1 |           1 | Mark       |
|  4 |    1 |   4 |     1 |    2 |           1 | Mary       |
|  5 |    2 |   4 |     1 |    2 |           1 | Mary       |
|  6 |    1 |   6 |     1 |    3 |           1 | Ben        |
|  7 |    2 |   6 |     1 |    3 |           1 | Ben        |
|  8 |    3 |   6 |     1 |    3 |           1 | Ben        |
+----+------+-----+-------+------+-------------+------------+

No comments:

Post a Comment