Using make.names() in R

By: Karthik Janar Printer Friendly Format    

While doing data analysis, it is highly recommended to use proper naming conventions for files, variables and especially column names. This is very important for two reasons:

  • When merging multiple datasets, if the column names are consistent, the merge happens seamlessly
  • As with any programming language naming convention, it is good to have clean names without spaces and special characters etc.

The make.names() function in R does exactly that. To demonstrate the use of make.names() function, let us use a simple data frame.

Create a simple employee data frame using four variables and 4 rows of values.

vcode <- c(20001,20002,20003,20004)
vFname <- c("Brian","Jeff","Roger","Karthik")
vLname <- c("Caffo","Leek","Peng","Janar")
vSal <- c(10000,15000,18000,20000)
emp <- data.frame(vcode,vFname,vLname,vSal)
str(emp)
## 'data.frame':    4 obs. of  4 variables:
##  $ vcode : num  20001 20002 20003 20004
##  $ vFname: Factor w/ 4 levels "Brian","Jeff",..: 1 2 4 3
##  $ vLname: Factor w/ 4 levels "Caffo","Janar",..: 1 3 4 2
##  $ vSal  : num  10000 15000 18000 20000

As you can see the str shows the column names as the name of the vectors we created earlier. So let us first add some column names as below. We have included some spaces and brackets purposely to show how make.names() converts them.

names(emp) <- c("Code","First Name","Last Name", "Salary(SGD)")
str(emp)
## 'data.frame':    4 obs. of  4 variables:
##  $ Code       : num  20001 20002 20003 20004
##  $ First Name : Factor w/ 4 levels "Brian","Jeff",..: 1 2 4 3
##  $ Last Name  : Factor w/ 4 levels "Caffo","Janar",..: 1 3 4 2
##  $ Salary(SGD): num  10000 15000 18000 20000

Now let us call makes.names() to clean the column names.

names(emp) <- make.names(names(emp))
emp
##    Code First.Name Last.Name Salary.SGD.
## 1 20001      Brian     Caffo       10000
## 2 20002       Jeff      Leek       15000
## 3 20003      Roger      Peng       18000
## 4 20004    Karthik     Janar       20000

Now the spaces and brackets are removed and replaced with dots and looks much cleaner. Make it a habit to always clean the column names of all data frames that you read from different file sources as a first step to data cleaning.



Ask a Question



Most Viewed Articles (in Data Science )

Latest Articles (in Data Science)

Comment on this tutorial