How to Use ROW_NUMBER() in DB2

Data Tutorial

IBM DB2 provides a number of flexible and powerful functions that save the effort of manually iterating through data or performing other menial tasks when running queries. This group of functions are known as online analytical processing or OLAP functions.

In this tutorial we’ll explore the ROW_NUMBER() OLAP function in particular to provide an overview of how this function can be utilized in day-to-day scenarios.

What is ROW_NUMBER()?

ROW_NUMBER() is a function that generates a psuedo-column containing consecutive numbers starting from 1 and counting up for each row of returned results (hence the name of ROW_NUMBER()).

To see ROW_NUMBER() in action, let’s take a look at some tables in our LIBRARIAN schema: books and languages.

SELECT
    books."id",
    TRIM(books."title") title,
    TRIM(books."author") author,
    books."year_published",
    books."language_id"
FROM
    books;

With result rows looking like this:

id  TITLE                       AUTHOR                  year_published  language_id
 1   In Search of Lost Time     Marcel Proust                     1913            2
 2   Ulysses                    James Joyce                       1922            1
 3   Don Quixote                Miguel de Cervantes               1605            5
 4   Moby Dick                  Herman Melville                   1851            1
 5   Hamlet                     William Shakespeare               1601            1
 6   War and Peace              Leo Tolstoy                       1869            8
 7   The Odyssey                Homer                             -700            9
 8   The Great Gatsby           F. Scott Fitzgerald               1925            1
 9   The Divine Comedy          Dante Alighieri                   1472           10
10   Madame Bovary              Gustave Flaubert                  1857            2
SELECT * FROM languages;

With a simple list of language names:

id  name
--  -------
 1  English
 2  French
 3  German
 4  Mandarin
 5  Spanish
 6  Arabic
 7  Japanese
 8  Russian

Now to make use of ROW_NUMBER(), the simplest way to begin is to throw it in as another SELECT column. To keep things tidy, we’ll only return a few standard columns from our books table:

SELECT
    books."id",
    TRIM(books."title") title,
    ROW_NUMBER() OVER () rownumber
FROM
    books;
id    TITLE                     ROWNUMBER
 1    In Search of Lost Time            1
 2    Ulysses                           2
 3    Don Quixote                       3
 4    Moby Dick                         4
 5    Hamlet                            5
 6    War and Peace                     6
 7    The Odyssey                       7
 8    The Great Gatsby                  8

We can see by the results that, as expected, our addition of the ROW_NUMBER() function is simply adding a consecutive number to our result rows (in this case, they match up with the id field as well).

Using the Order By Clause

Just adding a consecutive number to each row can have its uses, but typically you’ll require more of the functionality provided by ROW_NUMBER. You may have noticed that in our query above, ROW_NUMBER() is followed by an empty OVER() function call. It is inside these parentheses of OVER() that you can specify both PARTITION and ORDER BY clauses, allowing you to effectively group and/or order the resulting data before assigning it a consecutive number from the ROW_NUMBER() function.

For example, we can specify within the OVER() call that we want to ORDER BY our title field, like so:

SELECT
    ROW_NUMBER() OVER (ORDER BY books."title") AS RowNumber,
    books."title"
FROM
    books;

The results, as one might expect, are simply the list of books in alphabetical order with consecutive row number values associated:

ROWNUMBER   title
1           2001: A Space Odyssey
2           2666
3           A Bell for Adano
4           A Bend in the River
5           A Book of Luminous Things
6           A Canticle for Leibowitz
7           A Christmas Carol
8           A Clockwork Orange

Granted, this isn’t very useful in and of itself, since we’re not making use of the ROW_NUMBER() counting and the same thing could be accomplished without with a standard ORDER BY clause. However, we can really start to glean the power of ROW_NUMBER() when we throw in the final clause into the mix, PARTITION BY.

Using the Partition By Clause

As briefly mentioned earlier, in addition to the ORDER BY clause used for ordering, we can also add the PARTITION BY clause which acts similarly to the GROUP BY clauses of other relational database systems.

In effect, this allows us to group up our result set based on the values in a particular field, and then we can proceed with the ORDER BY and incremental number assignment using ROW_NUMBER().

As an example of the power of this functionality, take a look at the following query:

SELECT
    ROW_NUMBER() OVER (PARTITION BY books."language_id" ORDER BY books."title") AS RowNumber,
    languages."name",
    books."title"
FROM
    books
LEFT OUTER JOIN
    languages
ON
    books."language_id" = languages."id"
ORDER BY
    languages."name" DESC;

There’s a lot going on here, but the basic explanation is that we want to retrieve the written languages.name value associated with each book, so rather than seeing the languages.id number that represents it, we can instead see that War and Peace was published in Russian and so on. Thus – ignoring the ROW_NUMBER() selection clause for now – the query is simply an OUTER JOIN of our books and languages table and displaying the languages.name and books.title of each book in the system, sorted in reverse alphabetical order by languages.name.

Now let’s examine what our ROW_NUMBER() select clause is accomplishing:

ROW_NUMBER() OVER (PARTITION BY books."language_id" ORDER BY books."title")

By adding the PARTITION BY sub-clause, we’re telling DB2 that we want to partition (or group) our resulting data based on matches in the books.language_id field. As we’ll see in a moment when we examine the resulting data, these partitions are considered separate chunks of data to the ROW_NUMBER() function, which means the consecutive row number counter will in fact reset itself at the beginning of each chunk of partitioned data.

The ORDER BY sub-clause will, as expected, ensure that the ordering of each partitioned chunk is based on the books.title field. Thus, all books written in English are grouped up and ordered by title, then all the books in Spanish and so on.

With this purpose in mind, now we can look at the results of this query:

ROWNUMBER   name                        title
        1   Spanish                     Don Quixote
        1   Russian                     The Brothers Karamazov
        2   Russian                     War and Peace
        1   Italian                     The Divine Comedy
        1   Greek                       The Odyssey
        1   French                      In Search of Lost Time
        2   French                      Madame Bovary
        1   English                     2001: A Space Odyssey
        2   English                     2666
        3   English                     A Bend in the River

Note: For this example database, only a handful of books were actually assigned an appropriate language, in order to better illustrate the results.

Immediately we can see some interesting behavior due to our PARTITION BY and ORDER BY sub-clauses in the ROW_NUMBER() function. Since the parent query was told to ORDER BY languages.name in descending order, we begin with the first row of Don Quixote which was published in Spanish. Notice that this first result row has a ROWNUMBER of 1 as expected, but then the next row also has a ROWNUMBER of 1.

This behavior is due to the power of PARTITION BY. Since we partitioned (grouped) our resulting data based on the language of the books, we’ve effectively created six separate partitions within the dataset (one for each language). Due to this partitioning, DB2 counts up each ROW_NUMBER() in a given partition as expected, but as soon as a new partition begins, the ROW_NUMBER() is reset back to 1 and the counting begins anew.

While this just begins to scratch the surface of what ROW_NUMBER() can accomplish, this has hopefully given you a bit of insight into the types of queries and partionings that can be generated with clever use of this particular OLAP function.