[R] Efficient way to create new column based on comparison with another dataframe

Gaius Augustus gaiusjaugustus at gmail.com
Fri Jan 29 19:52:06 CET 2016


I have two dataframes. One has chromosome arm information, and the other
has SNP position information. I am trying to assign each SNP an arm
identity.  I'd like to create this new column based on comparing it to the
reference file.

*1) Mapfile (has millions of rows)*

Name    Chr   Position
S1      1      3000
S2      1      6000
S3      1      1000

*2) Chr.Arms   file (has 39 rows)*

Chr    Arm    Start   End
1      p      0       5000
1      q      5001    10000


*R Script that works, but slow:*
Arms  <- c()
for (line in 1:nrow(Mapfile)){
      Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr &
 Mapfile$Position[line] > Chr.Arms$Start &  Mapfile$Position[line] <
Chr.Arms$End]}
}
Mapfile$Arm <- Arms


*Output Table:*

Name   Chr   Position   Arm
S1      1     3000      p
S2      1     6000      q
S3      1     1000      p


In words: I want each line to look up the location ( 1) find the right Chr,
2) find the line where the START < POSITION < END), then get the ARM
information and place it in a new column.

This R script works, but surely there is a more time/processing efficient
way to do it.

Thanks in advance for any help,
Gaius

	[[alternative HTML version deleted]]



More information about the R-help mailing list