[R] conditional filling of data.frame - improve code

Fri Mar 11 10:15:50 CET 2022

Thank you Rui for your input.
I thought about mapply() too, but I'm not confident with it, I usually 
prefer loops (more intuitive to me).

It's good to have the choice :)

Ivan

--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

Le 11/03/2022 à 10:14, Rui Barradas a écrit :
> Heello,
>
> I hadn't posted an answer because my mapply is more complicated that 
> the original and much more complicated than Jeff's merge but here it 
> is. But if there's a problem with the output of merge, maybe the 
> mapply can be of use, only the column expressly named is created.
> The result is equal to the original.
> I have changed the name exp to exp1.
>
> mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", 
> "sample1-1", "sample1-1", "sample2-1"))
> exp1 <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", 
> "sample2-2" , "sample2-3"))
>
> for(i in names(exp1)) {
>   mydata[mydata[["sample"]] %in% exp1[[i]], "experiment"] <- i
> }
>
> # must create the new column beforehand
> mydata[["experiment2"]] <- NA_character_
> mapply(\(value, name, s){
>   i <- which(s %in% value)
>   mydata[["experiment2"]][i] <<- name
> }, exp1, names(exp1), MoreArgs = list(s = mydata$sample))
>
> mydata
> #     sample experiment experiment2
> #1 sample2-2        ex2         ex2
> #2 sample2-3        ex2         ex2
> #3 sample1-1        ex1         ex1
> #4 sample1-1        ex1         ex1
> #5 sample1-1        ex1         ex1
> #6 sample2-1        ex2         ex2
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 08:48 de 11/03/2022, Ivan Calandra escreveu:
>> In my first trials, I made a typo, which resulted in more columns 
>> than needed in the output of merge, which is why I needed more 
>> formatting. But now, it is indeed done all in one line and it is, as 
>> I said already, nicer anyway!
>>
>> -- 
>> Dr. Ivan Calandra
>> Imaging lab
>> RGZM - MONREPOS Archaeological Research Centre
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> Le 11/03/2022 à 08:47, Jeff Newmiller a écrit :
>>> What a strange objection. You wouldn't keep the inline definition of 
>>> expts in working code... that would be in a reference data file, and 
>>> the merge is one line.
>>>
>>> On March 10, 2022 11:24:27 PM PST, Ivan Calandra 
>>> <ivan.calandra using rgzm.de> wrote:
>>>> Thank you Jeff and Tim for your ideas. Indeed merge/join is 
>>>> probably the
>>>> nicest way. Still, the code becomes much longer because I need more
>>>> formatting of the input and output objects than with my ugly for 
>>>> loop :)
>>>>
>>>> Cheers,
>>>> Ivan
>>>>
>>>> -- 
>>>> Dr. Ivan Calandra
>>>> Imaging lab
>>>> RGZM - MONREPOS Archaeological Research Centre
>>>> Schloss Monrepos
>>>> 56567 Neuwied, Germany
>>>> +49 (0) 2631 9772-243
>>>> https://www.researchgate.net/profile/Ivan_Calandra
>>>>
>>>> Le 10/03/2022 à 18:58, Ebert,Timothy Aaron a écrit :
>>>>> You could try some of the "join" commands from dplyr.
>>>>> https://dplyr.tidyverse.org/reference/mutate-joins.html
>>>>> https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti 
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Tim
>>>>> -----Original Message-----
>>>>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jeff 
>>>>> Newmiller
>>>>> Sent: Thursday, March 10, 2022 11:25 AM
>>>>> To: r-help using r-project.org; Ivan Calandra <ivan.calandra using rgzm.de>; 
>>>>> R-help <r-help using r-project.org>
>>>>> Subject: Re: [R] conditional filling of data.frame - improve code
>>>>>
>>>>> [External Email]
>>>>>
>>>>> Use merge.
>>>>>
>>>>> expts <- read.csv( text =
>>>>> "expt,sample
>>>>> ex1,sample1-1
>>>>> ex1,sample1-2
>>>>> ex2,sample2-1
>>>>> ex2,sample2-2
>>>>> ex2,sample2-3
>>>>> ", header=TRUE, as.is=TRUE )
>>>>>
>>>>> mydata <- data.frame(sample = c("sample2-2", "sample2-3", 
>>>>> "sample1-1", "sample1-1", "sample1-1", "sample2-1"))
>>>>>
>>>>> merge( mydata, expts, by="sample", all.x=TRUE )
>>>>>
>>>>>
>>>>> On March 10, 2022 7:50:23 AM PST, Ivan Calandra 
>>>>> <ivan.calandra using rgzm.de> wrote:
>>>>>> Dear useRs,
>>>>>>
>>>>>> I would like to improve my ugly (though working) code, but I think I
>>>>>> need a completely different approach and I just can't think out 
>>>>>> of my box!
>>>>>>
>>>>>> I have some external information about which sample(s) belong to 
>>>>>> which
>>>>>> experiment. I need to get that manually into R (either typing 
>>>>>> directly
>>>>>> in a script or read a CSV file, but that makes no difference):
>>>>>> exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1",
>>>>>> "sample2-2" , "sample2-3"))
>>>>>>
>>>>>> Then I have my data, only with the sample IDs:
>>>>>> mydata <- data.frame(sample = c("sample2-2", "sample2-3", 
>>>>>> "sample1-1",
>>>>>> "sample1-1", "sample1-1", "sample2-1"))
>>>>>>
>>>>>> Now I want to add a column to mydata with the experiment ID. The 
>>>>>> best I
>>>>>> could find is that:
>>>>>> for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]],
>>>>>> "experiment"] <- i
>>>>>>
>>>>>> In this example, the experiment ID could be extracted from the 
>>>>>> sample
>>>>>> IDs, but this is not the case with my real data so it really is a
>>>>>> matter of matching. Of course I also have other columns with my 
>>>>>> real data.
>>>>>>
>>>>>> I'm pretty sure the last line (with the loop) can be improved in 
>>>>>> terms
>>>>>> of readability (speed is not an issue here). I have close to no
>>>>>> constraints on 'exp' (here I chose a list, but anything could 
>>>>>> do), the
>>>>>> only thing that cannot change is the format of 'mydata'.
>>>>>>
>>>>>> Thank you in advance!
>>>>>> Ivan
>>>>>>
>>>>> -- 
>>>>> Sent from my phone. Please excuse my brevity.
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e= 
>>>>>
>>>>> PLEASE do read the posting guide 
>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e= 
>>>>>
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.