[R] problems when merging two data sets

Francois COLLIN |@nch@co|||n @end|ng |rom gm@||@com
Tue Feb 5 22:26:52 CET 2019


Quite agree with Jeff Newmiller and Bert Gunter.

The error you get (" 'by' must specify a uniquely valid column") is a 
very common mistake when the function merge is misused. Although, the 
function merge is the good choice. Have you read the manual of the 
function sending the command `?merge`. That is always a good start.

Hereafter is what the function call look like:

`merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, 
all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = 
c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)`

For your matter, you probably need only 4 arguments:

`merge(x = dataset1, y = dataset2, by.x = "key1", by.y = "key2")`

In the example, key1 correspond to the column name in the dataset1 that 
should match the column name in the dataset2. Likewise for key2.

Again, read the manual to understand the other arguments, I would 
especially advise you to look at the arguments suffixes, all.x, all.y 
which will help you doing exactly what you want.

Cheers,

Francois COLLIN

On 05/02/2019 19:49, Bert Gunter wrote:
> Show us your code! (as the posting guide below requests. Please read the
> posting guide).
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Feb 5, 2019 at 10:04 AM sasa kosanic <sasa.kosanic using gmail.com> wrote:
>
>> Dear All,
>>
>> I would like to merge two data sets however I am doing something wrong...
>> 1 data set contains 2 columns of  'species occurrence'(1 column) in Germany
>> and  'species names' (2 column).
>> and the second one names of 'Red list species'(1 column) and 'species
>> status' (2 column).
>> so I would like to merge Red list species with species names from the first
>> table and to sign the  species status
>> I have tried with merge function but got this an error:" 'by' must specify
>> a uniquely valid column"
>> I also tried with the function left_join, however no success.
>>
>> Also columns in two data sets are different in size. 1 table has 7189 rows
>> and 2 table just 426 rows as we do not have much Red list Species.
>>
>> I would appreciate your help.
>>
>> Kind regards,
>> Sasha
>>
>>
>> Dr Sasha Kosanic
>> Ecology Lab (Biology Department)
>> Room M842
>> University of Konstanz
>> Universitätsstraße 10
>> D-78464 Konstanz
>> Phone: +49 7531 883321 & +49 (0)175 9172503
>>
>> http://cms.uni-konstanz.de/vkleunen/
>> https://tinyurl.com/y8u5wyoj
>> https://tinyurl.com/cgec6tu
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list