Using make.names() in R
By: Karthik Janar
While doing data analysis, it is highly recommended to use proper naming conventions for files, variables and especially column names. This is very important for two reasons:
- When merging multiple datasets, if the column names are consistent, the merge happens seamlessly
- As with any programming language naming convention, it is good to have clean names without spaces and special characters etc.
The make.names() function in R does exactly that. To demonstrate the use of make.names() function, let us use a simple data frame.
Create a simple employee data frame using four variables and 4 rows of values.
vcode <- c(20001,20002,20003,20004)
vFname <- c("Brian","Jeff","Roger","Karthik")
vLname <- c("Caffo","Leek","Peng","Janar")
vSal <- c(10000,15000,18000,20000)
emp <- data.frame(vcode,vFname,vLname,vSal)
str(emp)
## 'data.frame': 4 obs. of 4 variables:
## $ vcode : num 20001 20002 20003 20004
## $ vFname: Factor w/ 4 levels "Brian","Jeff",..: 1 2 4 3
## $ vLname: Factor w/ 4 levels "Caffo","Janar",..: 1 3 4 2
## $ vSal : num 10000 15000 18000 20000
As you can see the str shows the column names as the name of the vectors we created earlier. So let us first add some column names as below. We have included some spaces and brackets purposely to show how make.names() converts them.
names(emp) <- c("Code","First Name","Last Name", "Salary(SGD)")
str(emp)
## 'data.frame': 4 obs. of 4 variables:
## $ Code : num 20001 20002 20003 20004
## $ First Name : Factor w/ 4 levels "Brian","Jeff",..: 1 2 4 3
## $ Last Name : Factor w/ 4 levels "Caffo","Janar",..: 1 3 4 2
## $ Salary(SGD): num 10000 15000 18000 20000
Now let us call makes.names() to clean the column names.
names(emp) <- make.names(names(emp))
emp
## Code First.Name Last.Name Salary.SGD.
## 1 20001 Brian Caffo 10000
## 2 20002 Jeff Leek 15000
## 3 20003 Roger Peng 18000
## 4 20004 Karthik Janar 20000
Now the spaces and brackets are removed and replaced with dots and looks much cleaner. Make it a habit to always clean the column names of all data frames that you read from different file sources as a first step to data cleaning.
Most Viewed Articles (in Data Science ) |
Latest Articles (in Data Science) |
Comment on this tutorial
- Data Science
- Android
- AJAX
- ASP.net
- C
- C++
- C#
- Cocoa
- Cloud Computing
- HTML5
- Java
- Javascript
- JSF
- JSP
- J2ME
- Java Beans
- EJB
- JDBC
- Linux
- Mac OS X
- iPhone
- MySQL
- Office 365
- Perl
- PHP
- Python
- Ruby
- VB.net
- Hibernate
- Struts
- SAP
- Trends
- Tech Reviews
- WebServices
- XML
- Certification
- Interview