[R] creating a dataframe with full_join and looping over a list of lists.
Jim Lemon
drj|m|emon @end|ng |rom gm@||@com
Fri Mar 22 04:01:17 CET 2019
Hi Matthew,
Remember, keep it on the list so that people know the status of the request.
I couldn't get this to work with the "_source_info_" variable. It
seems to be unreadable as a variable name. So, this _may_ be what you
want. I don't know if it can be done with "merge" and I don't know the
function "full_join".
WRKY8_colamp_a<-as.character(
c("AT1G02920","AT1G06135","AT1G07160","AT1G11925","AT1G14540","AT1G16150",
"AT1G21120","AT1G26380","AT1G26410","AT1G35210","AT1G49000","AT1G51920",
"AT1G56250","AT1G66090","AT1G72520","AT1G80840","AT2G02010","AT2G18690",
"AT2G30750","AT2G39200","AT2G43620","AT3G01830","AT3G54150","AT3G55840",
"AT4G03460","AT4G11470","AT4G11890","AT4G14370","AT4G15417","AT4G15975",
"AT4G31940","AT4G35180","AT5G01540","AT5G05300","AT5G11140","AT5G24110",
"AT5G25250","AT5G36925","AT5G46295","AT5G64750","AT5G64905","AT5G66020"))
bHLH10_col_a<-as.character(c("AT1G72520","AT3G55840","AT5G20230","AT5G64750"))
bHLH10_colamp_a<-as.character(
c("AT1G01560","AT1G02920","AT1G16420","AT1G17147","AT1G35210","AT1G51620",
"AT1G57630","AT1G72520","AT2G18690","AT2G19190","AT2G40180","AT2G44370",
"AT3G23250","AT3G55840","AT4G03460","AT4G04480","AT4G04540","AT4G08555",
"AT4G11470","AT4G11890","AT4G16820","AT4G23280","AT4G35180","AT5G01540",
"AT5G05300","AT5G20230","AT5G22530","AT5G24110","AT5G56960","AT5G57010",
"AT5G57220","AT5G64750","AT5G66020"))
# let myenter be the sorted superset
myenter<-
sort(unique(c(WRKY8_colamp_a,bHLH10_col_a,bHLH10_colamp_a)))
splice<-function(x,y) {
nx<-length(x)
ny<-length(y)
newy<-rep(NA,nx)
if(ny) {
yi<-1
for(xi in 1:nx) {
if(x[xi] == y[yi]) {
newy[xi]<-y[yi]
yi<-yi+1
}
if(yi>ny) break()
}
}
return(newy)
}
comatgs<-list(WRKY8_colamp_a=WRKY8_colamp_a,
bHLH10_col_a=bHLH10_col_a,bHLH10_colamp_a=bHLH10_colamp_a)
mydf3<-data.frame(myenter,stringsAsFactors=FALSE)
for(j in 1:length(comatgs)) {
tmp<-data.frame(splice(myenter,sort(comatgs[[j]])))
names(tmp)<-names(comatgs)[j]
mydf3<-cbind(mydf3,tmp)
}
Jim
On Fri, Mar 22, 2019 at 10:29 AM Matthew
<mccormack using molbio.mgh.harvard.edu> wrote:
>
> Hi Jim,
>
> Thanks for the reply. That was pretty dumb of me. I took that out of the loop.
>
> comatgs is longer than this but here is a sample of 4 of 569 elements:
>
> $WRKY8_colamp_a
> [1] "AT1G02920" "AT1G06135" "AT1G07160" "AT1G11925" "AT1G14540" "AT1G16150" "AT1G21120"
> [8] "AT1G26380" "AT1G26410" "AT1G35210" "AT1G49000" "AT1G51920" "AT1G56250" "AT1G66090"
> [15] "AT1G72520" "AT1G80840" "AT2G02010" "AT2G18690" "AT2G30750" "AT2G39200" "AT2G43620"
> [22] "AT3G01830" "AT3G54150" "AT3G55840" "AT4G03460" "AT4G11470" "AT4G11890" "AT4G14370"
> [29] "AT4G15417" "AT4G15975" "AT4G31940" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G11140"
> [36] "AT5G24110" "AT5G25250" "AT5G36925" "AT5G46295" "AT5G64750" "AT5G64905" "AT5G66020"
>
> $`_source_info_`
> character(0)
>
> $bHLH10_col_a
> [1] "AT1G72520" "AT3G55840" "AT5G20230" "AT5G64750"
>
> $bHLH10_colamp_a
> [1] "AT1G01560" "AT1G02920" "AT1G16420" "AT1G17147" "AT1G35210" "AT1G51620" "AT1G57630"
> [8] "AT1G72520" "AT2G18690" "AT2G19190" "AT2G40180" "AT2G44370" "AT3G23250" "AT3G55840"
> [15] "AT4G03460" "AT4G04480" "AT4G04540" "AT4G08555" "AT4G11470" "AT4G11890" "AT4G16820"
> [22] "AT4G23280" "AT4G35180" "AT5G01540" "AT5G05300" "AT5G20230" "AT5G22530" "AT5G24110"
> [29] "AT5G56960" "AT5G57010" "AT5G57220" "AT5G64750" "AT5G66020"
>
>
> I have been thinking of something like this:
>
> lenmyen <- length(myenter) # get length of longest list
> length(comatgs[[j]) <- lenmyen # make each list length of myenter
> atglsts <- as.data.frame(comatgs[j]) # create dataframe
> colnames(atglsts) <- "AGI" # rename column to 'AGI'
>
> mydf3 <- full_join(mydf3, atglsts, by = "AGI" # full_join
>
> Matthew
>
> On 3/21/2019 7:12 PM, Jim Lemon wrote:
>
> External Email - Use Caution
>
> Hi Matthew,
> First thing, don't put:
>
> mydf3 <- data.frame(myenter)
>
> inside your loop, otherwise you will reset the value of mydf3 each
> time and end up with only "myenter" and the final list. Without some
> idea of the contents of comatgs, it is difficult to suggest a way to
> get what you want.
>
> Jim
>
> On Fri, Mar 22, 2019 at 8:16 AM Matthew
> <mccormack using molbio.mgh.harvard.edu> wrote:
>
> My apologies, my first e-mail formatted very poorly when sent, so I am trying again with something I hope will be less confusing.
>
> I have been trying create a dataframe by looping through a list of lists,
> and using dplyr's full_join so as to keep common elements on the same row.
> But, I have a couple of problems.
>
> 1) The lists have different numbers of elements.
>
> 2) In the final dataframe, I would like the column names to be the names
> of the lists.
>
> Is it possible ?
>
> Code: *for(j in avector){****mydf3 <- data.frame(myenter) ****atglsts <-
> as.data.frame(comatgs[j]) ****mydf3 <- full_join(mydf3, atglsts) ****}*
> Explanation: # Start out with a list, myenter, to dataframe. mydf3 now
> has 1 column. # This first column will be the longest column in the
> final mydf3. # Loop through a list of lists, comatgs, and with each loop
> a particular list # is made into a dataframe of one column, atglsts. #
> The name of the column is the name of the list. # Each atglsts dataframe
> has a different number of elements. # What I want to do, is to add the
> newly made dataframe, atglsts, as a # new column of the data frame,
> mydf3 using full_join # in order to keep common elements on the same
> row. # I could rename the colname to 'AGI' so that I can join by 'AGI',
> # but then I would lose the name of the list. # In the final dataframe,
> I want to know the name of the original list # the column was made from. Matthew
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list