[R] problems with merge() - the output has many repeated lines

Cecilia Carmo cecilia.carmo at ua.pt
Sun Aug 22 19:23:56 CEST 2010


I have done
intersect(names(df1), names(df2))
[1] "firm" "year"

This is the key I used to merge
merge(df1,df2,by=c("firm","year"))

And there is just one row firm/year in df1 that matches 
with another firm/year row in df2. Df1 has more firm/year 
rows than df2, and them don't match with none in df2.

Cecília

Em Sun, 22 Aug 2010 12:09:57 -0500
  Erik Iverson <eriki at ccbr.umn.edu> escreveu:
> Cecilia -
> 
>Find what columns you're matching on,
> 
> intersect(names(df1), names(df2)),
> 
> Maybe that will shed some light on the issue.
> 
> On 08/22/2010 12:02 PM, Cecilia Carmo wrote:
>> Thanks, but I don't have multiple matches and the lines 
>>repeated in the
>> final dataframe are exactly equal in all columns.
>>
>> Cecília
>>
>> Sat, 21 Aug 2010 10:58:53 -0500
>> Hadley Wickham <hadley at rice.edu> escreveu:
>>> You may find a close reading of ?merge helpful, 
>>>particularly this
>>> sentence: "If there is more than one match, all possible
>>> matches contribute one row each" (so check that you 
>>>don't have
>>> multiple matches).
>>>
>>> Hadley
>>>
>>> On Sat, Aug 21, 2010 at 10:45 AM, Cecilia Carmo 
>>><cecilia.carmo at ua.pt>
>>> wrote:
>>>> Hi everyone,
>>>>
>>>> I have been merging many big dataframes (about 80000 
>>>>rows each) and I
>>>> never
>>>> had this problem, but now it happened to me and I want 
>>>>to know if
>>>> someone
>>>> knows what could be happening.
>>>> The final dataframe has many rows, an impossible number! 
>>>>I have done
>>>> edit(dataframe) and I saw that there are many repeated 
>>>>rows (all equal).
>>>>
>>>> Thanks for any help,
>>>>
>>>> Cecília Carmo
>>>> Universidade de Aveiro
>>>> Portugal
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, 
>>>>reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Assistant Professor / Dobelman Family Junior Chair
>>> Department of Statistics / Rice University
>>> http://had.co.nz/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, 
>>reproducible code.
>



More information about the R-help mailing list