How to Sort a Data Frame by Multiple Columns in R
To begin understanding how to properly sort data frames in
R, we of course must first generate a data frame to manipulate.
# run.R # Generate data frame dataframe <- data.frame( x = c("apple", "orange", "banana", "strawberry"), y = c("a", "d", "b", "c"), z = c(4:1) ) # Print data frame dataframe
Note: The spacing isn't necessary, but it improves legibility.
run.R script outputs the list of vectors in our data frame as expected, in the order they were entered:
$ Rscript run.R x y z 1 apple a 4 2 orange d 3 3 banana b 2 4 strawberry c 1
The Order Function
While perhaps not the easiest sorting method to type out in terms of syntax, the one that is most readily available to all installations of
R, due to being a part of the
base module, is the
order function accepts a number of arguments, but at the simplest level the first argument must be a sequence of values or logical vectors.
For example, we can use
order() to simply sort a vector of five randomly ordered numbers with this script:
# Create unordered vector vector = c(2, 5, 1, 3, 4) # Print vector vector # Sort in ascending order vector[order(vector)]
Executing the script, we see the initial output of the unordered vector, followed by the now ordered list afterward:
$ Rscript run.R  2 5 1 3 4  1 2 3 4 5
Sorting a Data Frame by Vector Name
order() function in our tool belt, we'll start sorting our data frame by passing in the vector names within the data frame.
For example, using our previously generated
dataframe object, we can sort by the vector
z by adding the following code to our script:
# Sort by vector name [z] dataframe[ with(dataframe, order(z)), ]
What we're effectively doing is calling our original
dataframe object, and passing in the new index order that we'd like to have. This index order is generated using the
with() function, which effectively creates a new environment using the passed in data in the first argument along with an expression for evaluating that data in the second argument.
Thus, we're reevaluating the
dataframe data using the
order() function, and we want to order based on the
z vector within that data frame. This returns a new index order for the data frame values, which is then finally evaluated within the [brackets] of
dataframe, outputting our new ordered result.
$ Rscript run.R x y z 1 apple a 4 2 orange d 3 3 banana b 2 4 strawberry c 1 x y z 4 strawberry c 1 3 banana b 2 2 orange d 3 1 apple a 4
Consequently, we see our original unordered output, followed by a second output with the data sorted by column
Sorting by Column Index
Similar to the above method, it's also possible to sort based on the numeric
index of a column in the data frame, rather than the specific name.
Instead of using the
with() function, we can simply pass the
order() function to our
dataframe. We indicate that we want to sort by the column of index
1 by using the
dataframe[,1] syntax, which causes
R to return the levels (names) of that index
1 column. In other words, similar to when we passed in the
z vector name above,
order is sorting based on the vector values that are within column of index
dataframe[ order( dataframe[,1] ), ]
As expected, we get our normal output followed by the sorted output in the first column:
$ Rscript run.R x y z 1 apple a 4 2 orange d 3 3 banana b 2 4 strawberry c 1 x y z 1 apple a 4 3 banana b 2 2 orange d 3 4 strawberry c 1
Sorting by Multiple Columns
In some cases, it may be desired to sort by multiple columns. Thankfully, doing so is very simple with the previously described methods.
To sort multiple columns using vector names, simply add additional arguments to the
order() function call as before:
# Sort by vector name [z] then [x] dataframe[ with(dataframe, order(z, x)), ]
Similarly, to sort by multiple columns based on column index, add additional arguments to
order() with differing indices:
# Sort by column index  then  dataframe[ order( dataframe[,1], dataframe[,3] ), ]