[R] Problem merging data frames and duplicates
William Michels
wjm1 at caa.columbia.edu
Wed May 27 15:27:08 CEST 2015
Hi Frank!
Ok, bind columns together in a state-wise fashion, allowing for state
duplicates. Below (maybe cheesy) uses the state abbreviations
"state.abb" in the datasets package. Also uses two functions
"rbind.na" and "cbind.na", available from Andrej-Nikolai Spiess'
website at: http://www.dr-spiess.de . These functions are pretty
helpful in that they bind without recycling, inserting NAs instead.
by.state1 <- function(df1, df2) {
df_out <- data.frame( state1=character(), locus1=numeric(),
state2=character(), locus2=numeric() )
for(i in 1:50) {
df_out_per <- cbind.na(df1[df1$state == state.abb[i], ],
df2[df2$state == state.abb[i], ])
df_out <- rbind.na(df_out, df_out_per)
}
df_out
}
data.frame(state=c("IL", "IL", "LA","LA", "MS","MS", "AR", "AR"),
locus=c(1,1,2,2,3,4,5,6)) -> x
##edit y to correct number of loci (6 not 7)
data.frame(state=c("IL", "IL", "AR", "AR", "TN","TN"),
locus=c(1,1,2,3,3,4)) -> y
> by.state1(x,y)
state locus state locus
7 AR 5 AR 2
8 AR 6 AR 3
3 IL 1 IL 1
4 IL 1 IL 1
31 LA 2 <NA> NA
41 LA 2 <NA> NA
5 MS 3 <NA> NA
6 MS 4 <NA> NA
9 <NA> NA TN 3
10 <NA> NA TN 4
The rows will be in state alphabetical order. If you need the row
numbers cleaned up as well (numeric order) you can pre-merge your data
to states.abb:
> a <- merge(state.abb, x, by= 1, all=F)
> b <- merge(state.abb, y, by= 1, all=F)
> colnames(a) <- c("state", "locus")
> colnames(b) <- c("state", "locus")
> by.state1(a,b)
state locus state locus
1 AR 5 AR 2
2 AR 6 AR 3
3 IL 1 IL 1
4 IL 1 IL 1
5 LA 2 <NA> NA
6 LA 2 <NA> NA
7 MS 3 <NA> NA
8 MS 4 <NA> NA
9 <NA> NA TN 3
10 <NA> NA TN 4
>
Hope this helps,
Bill
William Michels, Ph.D.
On Wed, May 27, 2015 at 4:01 AM, Frank Burbrink <burbrink666 at gmail.com> wrote:
> I have figured out a cheesy work around since these problems have to do with
> not having unique identifiers for the States:
>
> 1) Append a unique identifier to each state such that both AR becomes ARa
> and ARb
> 2) run the normal merge(x,y,by=1, all=T)
> 3) Use subst to cut the appended identifiers.
>
> While this is clunky I can just write a function to do it all at once.
>
> On Wed, May 27, 2015 at 6:20 AM, Frank Burbrink <burbrink666 at gmail.com>
> wrote:
>>
>> Thanks Bill,
>>
>> However, unique(merge(x, y, by = 1, all=T)) is giving me:
>>
>> state locus.x locus.y
>> 1 AR 5 2
>> 2 AR 5 3
>> 3 AR 6 2
>> 4 AR 6 3
>> 5 IL 1 1
>> 9 LA 2 NA
>> 11 MS 3 NA
>> 12 MS 4 NA
>> 13 TN NA 3
>> 14 TN NA 4
>>
>> This has AR repeated twice and the normal double IL and LA now only listed
>> singly.
>>
>> What I am hoping for is something like this:
>>
>> state locus.x locus.y
>> 1 AR 5 2
>> 4 AR 6 3
>> 7 IL 1 1
>> 8 IL 1 1
>> 9 LA 2 NA
>> 10 LA 2 NA
>> 11 MS 3 NA
>> 12 MS 4 NA
>> 13 TN NA 3
>> 14 TN NA 4
>>
>> On Wed, May 27, 2015 at 3:53 AM, William Michels <wjm1 at caa.columbia.edu>
>> wrote:
>>>
>>> Hi Frank,
>>>
>>> It looks like you're very close. I think you want:
>>>
>>> unique(merge(x, y, by = 1, all=T))
>>>
>>> Gabor Grothendieck's sqldf package is very useful if you're more
>>> comfortable with SQL-type syntax, see:
>>>
>>> https://github.com/ggrothendieck/sqldf
>>>
>>> Best Regards,
>>>
>>> William (Bill) Michels, Ph.D.
>>>
>>>
>>>
>>> On Tue, May 26, 2015 at 5:12 PM, Frank Burbrink
>>> <burbrink666 at gmail.com> wrote: <SNIP>
>>
>>
>>
>>
>> --
>>
>> ***********************************
>> Frank T. Burbrink, Ph.D.
>> Professor
>> Biology Department
>> 6S-143
>> 2800 Victory Blvd.
>> College of Staten Island/CUNY
>> Staten Island, New York 10314
>> E-Mail:Frank.Burbrink at csi.cuny.edu
>> Phone:718-982-3961
>> Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
>> ***********************************
>> Chair
>> Ecology, Evolutionary Biology, and Behavior
>> Doctoral Subprogram
>> Biology Program
>> City University of New York
>> Graduate Center
>> 365 Fifth Avenue
>> New York, NY 10016-4309
>> **********************************
>
>
>
>
> --
>
> ***********************************
> Frank T. Burbrink, Ph.D.
> Professor
> Biology Department
> 6S-143
> 2800 Victory Blvd.
> College of Staten Island/CUNY
> Staten Island, New York 10314
> E-Mail:Frank.Burbrink at csi.cuny.edu
> Phone:718-982-3961
> Web Page: http://scholar.library.csi.cuny.edu/~fburbrink/
> ***********************************
> Chair
> Ecology, Evolutionary Biology, and Behavior
> Doctoral Subprogram
> Biology Program
> City University of New York
> Graduate Center
> 365 Fifth Avenue
> New York, NY 10016-4309
> **********************************
More information about the R-help
mailing list