Close
Data TutorialsData Analytics

How to Rename Columns in the Pandas Python Library

Posted by AJ Welch

The Pandas Python library is an extremely powerful tool for graphing, plotting, and data analysis. However, the power (and therefore complexity) of Pandas can often be quite overwhelming, given the myriad of functions, methods, and capabilities the library provides.

In this brief tutorial we’ll explore the basic use of the DataFrame in Pandas, which is the basic data structure for the entire system, and how to make use of the index and column labels to keep track of the data within the DataFrame.

Creating a Basic DataFrame

For this tutorial, we need something to work with, so we’ll create a very simple data frame which consists of 3 book titles and author names:

pd.DataFrame(
  [
    (
      'The Hobbit',
      'J.R.R. Tolkien'
    ),
    (
      'Robinson Crusoe',
      'Daniel Defoe'
    ),
    (
      'Moby-Dick',
      'Herman Melville'
    )
  ]
)

Note: Throughout the tutorial the examples will include a great deal of excess spacing. This spacing is not required, but serves to better illustrate the syntax we’re using.

The result of the above DataFrame creation is a simple 3-row, 2-column table with automatically generated numeric indices and columns:

  0 1
0 The Hobbit J.R.R. Tolkien
1 Robinson Crusoe Daniel Defoe
2 Moby-Dick Herman Melville

Adding Columns and Indices

When initially creating a DataFrame, it is entirely possible to specify the column and index labels. To do so, we’ll need to specify values for the data, index and columns parameters:

pd.DataFrame(
  data=[
    (
      'The Hobbit',
      'J.R.R. Tolkien'
    ),
    (
      'Robinson Crusoe',
      'Daniel Defoe'
    ),
    (
      'Moby-Dick',
      'Herman Melville'
    )
  ],
  columns=[
    'title',
    'author'
  ],
  index=[
    'first',
    'second',
    'third'
  ]
)
  title author
first The Hobbit J.R.R. Tolkien
second Robinson Crusoe Daniel Defoe
third Moby-Dick Herman Melville

Now we see our data structure has some appropriate index and column labels that make a bit of sense. However, what happens when we have an existing DataFrame and we want to update the column labels on the fly?

Modifying Column Labels

There are two methods for altering the column labels: the columns method and the rename method.

Using the Columns Method

If we have our labeled DataFrame already created, the simplest method for overwriting the column labels is to call the columns method on the DataFrame object and provide the new list of names we’d like to specify.

For example, if we take our original DataFrame:

df = pd.DataFrame(
  [
    (
      'The Hobbit',
      'J.R.R. Tolkien'
    ),
    (
      'Robinson Crusoe',
      'Daniel Defoe'
    ),
    (
      'Moby-Dick',
      'Herman Melville'
    )
  ]
)
df
  0 1
0 The Hobbit J.R.R. Tolkien
1 Robinson Crusoe Daniel Defoe
2 Moby-Dick Herman Melville

We can modify the column labels by adding the following line:

df.columns = [
  'title',
  'author'
]
df
  title author
0 The Hobbit J.R.R. Tolkien
1 Robinson Crusoe Daniel Defoe
2 Moby-Dick Herman Melville

Using the Rename Method

The other technique for renaming column labels is to call the rename method on the DataFrame object, then passing our list of label values to the columns parameter:

df = pd.DataFrame(
  [
    (
      'The Hobbit',
      'J.R.R. Tolkien'
    ),
    (
      'Robinson Crusoe',
      'Daniel Defoe'
    ),
    (
      'Moby-Dick',
      'Herman Melville'
    )
  ]
)
df.rename(
  columns={
    0 : 'title',
    1 : 'author'
  },
  inplace=True
)
df

| |title|author| |—–|—–|—–| |0|The Hobbit|J.R.R. Tolkien| |1|Robinson Crusoe|Daniel Defoe| |2|Moby-Dick|Herman Melville|

It’s important to note that since the rename method is attempting to actually rename existing labels, you do need to specify the existing label first followed by the new label to rename it to afterward, as shown in the example above.

Also, we specify the True value for the inplace parameter here because we want to update the existing DataFrame, rather than to have this function call return a newly created DataFrame instead.