[R] creating a dataframe with full_join and looping over a list of lists.

Matthew mccorm@ck @end|ng |rom mo|b|o@mgh@h@rv@rd@edu
Tue Mar 26 00:48:55 CET 2019


This is fantastic !  It was exactly what I was looking for. It is part 
of a larger Shiny app, so difficult to provide a working example as part 
of the post, and after figuring out how your code works ( I am an R 
novice), I made a couple of small tweaks and it works great !  Thank you 
very much, Jim, for the work you put into this.

Matthew

On 3/21/2019 11:01 PM, Jim Lemon wrote:
>          External Email - Use Caution
>
> Hi Matthew,
> Remember, keep it on the list so that people know the status of the request.
> I couldn't get this to work with the "_source_info_" variable. It
> seems to be unreadable as a variable name. So, this _may_ be what you
> want. I don't know if it can be done with "merge" and I don't know the
> function "full_join".
>
> WRKY8_colamp_a<-as.character(
>   c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150",
>   "AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920",
>   "AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690",
>   "AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840",
>   "AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975",
>   "AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110",
>   "AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020"))
>
> bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750"))
>
> bHLH10_colamp_a<-as.character(
>   c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620",
>   "AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370",
>   "AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555",
>   "AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540",
>   "AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010",
>   "AT5G57220","AT5G64750","AT5G66020"))
>
> # let myenter be the sorted superset
> myenter<-
>   sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a)))
>
> splice<-function(x,y) {
>   nx<-length(x)
>   ny<-length(y)
>   newy<-rep(NA,nx)
>   if(ny) {
>    yi<-1
>    for(xi in 1:nx) {
>     if(x[xi] == y[yi]) {
>      newy[xi]<-y[yi]
>      yi<-yi+1
>     }
>     if(yi>ny) break()
>    }
>   }
>   return(newy)
> }
>
> comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a,
>   bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a)
> mydf3<-data.frame(myenter,stringsAsFactors=FALSE)
> for(j in 1:length(comatgs)) {
>   tmp<-data.frame(splice(myenter,sort(comatgs[[j]])))
>   names(tmp)<-names(comatgs)[j]
>   mydf3<-cbind(mydf3,tmp)
> }
>
> Jim
>
> On Fri, Mar 22, 2019 at 10:29 AM Matthew
> <mccormack using molbio.mgh.harvard.edu> wrote:
>> Hi Jim,
>>
>>      Thanks for the reply.  That was pretty dumb of me.  I took that out of the loop.
>>
>> comatgs is longer than this but here is a sample of 4 of 569 elements:
>>
>> $WRKY8_colamp_a
>>   [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" "AT1G21120"
>>   [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" "AT1G66090"
>> [15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" "AT2G43620"
>> [22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" "AT4G14370"
>> [29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G11140"
>> [36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" "AT5G66020"
>>
>> $`_source_info_`
>> character(0)
>>
>> $bHLH10_col_a
>> [1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750"
>>
>> $bHLH10_colamp_a
>>   [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" "AT1G57630"
>>   [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370" "AT3G23250" "AT3G55840"
>> [15] "AT4G03460" "AT4G04480" "AT4G04540" "AT4G08555" "AT4G11470" "AT4G11890" "AT4G16820"
>> [22] "AT4G23280" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G20230" "AT5G22530" "AT5G24110"
>> [29] "AT5G56960" "AT5G57010" "AT5G57220" "AT5G64750" "AT5G66020"
>>
>>
>>        I have been thinking of something like this:
>>
>> lenmyen <- length(myenter)                        # get length of longest list
>> length(comatgs[[j]) <- lenmyen                   # make each list length of myenter
>> atglsts <- as.data.frame(comatgs[j])           # create dataframe
>> colnames(atglsts) <- "AGI"                         # rename column to 'AGI'
>>
>> mydf3 <- full_join(mydf3, atglsts, by = "AGI"    # full_join
>>
>> Matthew
>>
>> On 3/21/2019 7:12 PM, Jim Lemon wrote:
>>
>>          External Email - Use Caution
>>
>> Hi Matthew,
>> First thing, don't put:
>>
>> mydf3 <- data.frame(myenter)
>>
>> inside your loop, otherwise you will reset the value of mydf3 each
>> time and end up with only "myenter" and the final list. Without some
>> idea of the contents of comatgs, it is difficult to suggest a way to
>> get what you want.
>>
>> Jim
>>
>> On Fri, Mar 22, 2019 at 8:16 AM Matthew
>> <mccormack using molbio.mgh.harvard.edu> wrote:
>>
>>     My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing.
>>
>> I have been trying create a dataframe by looping through a list of lists,
>> and using dplyr's full_join so as to keep common elements on the same row.
>> But, I have a couple of problems.
>>
>> 1) The lists have different numbers of elements.
>>
>> 2) In the final dataframe, I would like the column names to be the names
>> of the lists.
>>
>> Is it possible ?
>>
>> Code: *for(j in avector){****mydf3 <- data.frame(myenter) ****atglsts <-
>> as.data.frame(comatgs[j]) ****mydf3 <- full_join(mydf3, atglsts) ****}*
>> Explanation: # Start out with a list, myenter, to dataframe. mydf3 now
>> has 1 column. # This first column will be the longest column in the
>> final mydf3. # Loop through a list of lists, comatgs, and with each loop
>> a particular list # is made into a dataframe of one column, atglsts. #
>> The name of the column is the name of the list. # Each atglsts dataframe
>> has a different number of elements. # What I want to do, is to add the
>> newly made dataframe, atglsts, as a # new column of the data frame,
>> mydf3 using full_join # in order to keep common elements on the same
>> row. # I could rename the colname to 'AGI' so that I can join by 'AGI',
>> # but then I would lose the name of the list. # In the final dataframe,
>> I want to know the name of the original list # the column was made from. Matthew
>>
>>
>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list