[Bioc-devel] Use of SummerisedExperiments or MultiAssayExperiments of many many Dataframes/ nested List objects

James W. MacDonald jm@cdon @end|ng |rom uw@edu
Fri Jan 31 15:55:29 CET 2020


I think the last sentence of your email answers your question? You don't
know how to do this, primarily because you seem not to know anything about
SE or MAE objects, and you need to learn about the latter in order to do
the former. So I would recommend reading up about (probably) MAE objects
and figuring out how your existing data would fit into that framework.

On Fri, Jan 31, 2020 at 8:31 AM Krutik Patel (PGR) <K.Patel5 using newcastle.ac.uk>
wrote:

> Hello Bioc-Devel,
>
> This will be a long winded question and I apologise for that, I just want
> to be thorough.
>
> I recently submitted a package onto bioconductor for review, and received
> a response to have SummerisedExperiments or MultiAssayExperiments as the
> standard format for my package. I looked into the usage of SE/ MAE and do
> think they are very useful. I just find it difficult to envision the usage
> of these objects in my package. Namely, because I do not use sequencing
> data and so I do not have any phenoData.
>
> The input to my package is deferentially expressed data from microRNA and
> mRNA data, and I feel like that should stay as data frames to make it
> easier for users to use. From these data frames, many other data frames and
> nested lists are created. I will give a short demonstration of how my
> package functions below and I would appreciate it if any user could
> demonstrate to me how to incorporate SE's or MAE's.
>
> # This will load test data
> > miR <- mm_miR
> > mRNA <- mm_mRNA
> # Visualise the data
> > head(miR [1:5, 1:5])
>
>                  D1.Log2FC D1.adjPVal    D2.Log2FC   D2.adjPVal  D3.Log2FC
> mmu-let-7b-3p -0.008006934 0.97706031 -0.008296431 0.9503666129 -0.1153951
> mmu-let-7c-5p  0.299802302 0.30094186  0.511083040 0.0489321072  0.4663393
> mmu-let-7d-3p  0.430125310 0.06476131  0.483677350 0.0228474958  0.4301441
> mmu-let-7e-3p  0.417901606 0.06543412  0.448677130 0.0301945611  0.3051121
> mmu-let-7e-5p  0.637167321 0.01010895  0.984529549 0.0001462246  0.8917273
>
> > head(mRNA [1:5, 1:5])
>
>          D1.Log2FC   D1.adjPVal  D2.Log2FC   D2.adjPVal D3.Log2FC
> A2m       1.336002 0.4627700063  4.0470385 0.0114355180  3.688919
> AA986860 -1.886142 0.0239685308 -0.8686382 0.2892313624 -1.115943
> Aadac    -2.493883 0.0022213531 -2.1678098 0.0051038251 -1.338884
> Aadat    -3.647727 0.0006583596 -3.3660043 0.0011145806 -2.616356
> Aass     -1.283668 0.0101430103 -1.9567394 0.0004421697 -1.315752
> # As you can see creating this type of data for a user would be quite
> simple if it is kept as data frames
>
> # We use the following functions to retrieve annotation IDs
> # They will produce several data frames each
> > getIDs_miR_mouse(miR)
>
> > head(miR_ensembl)
>
>          GENENAME   ID
> 1   mmu-let-7b-3p <NA>
> 2   mmu-let-7c-5p <NA>
> 3   mmu-let-7d-3p <NA>
> 4   mmu-let-7e-3p <NA>
> 5   mmu-let-7e-5p <NA>
> 6 mmu-let-7f-1-3p <NA>
>
> > head(miR_entrez)
>          GENENAME   ID
> 1   mmu-let-7b-3p <NA>
> 2   mmu-let-7c-5p <NA>
> 3   mmu-let-7d-3p <NA>
> 4   mmu-let-7e-3p <NA>
> 5   mmu-let-7e-5p <NA>
> 6 mmu-let-7f-1-3p <NA>
>
> > getIDs_miR_mouse(mRNA)
>
>   GENENAME                 ID
> 1      A2m ENSMUSG00000030111
> 2 AA986860 ENSMUSG00000042510
> 3    Aadac ENSMUSG00000027761
> 4    Aadat ENSMUSG00000057228
> 5     Aass ENSMUSG00000029695
> 6     Abat ENSMUSG00000057880
>
>   GENENAME     ID
> 1      A2m 232345
> 2 AA986860 212439
> 3    Aadac  67758
> 4    Aadat  23923
> 5     Aass  30956
> 6     Abat 268860
>
> # The following function will combine the two data frames into a new one
> genetic_data <- CombineGenes(miR_data = miR, mRNA_data = mRNA)
>
> # This function will alter the new data frame into a nested list separated
> by a common string
> genelist <- GenesList(method = "c", genetic_data = genetic_data,
> timeString = "D")
> > as.data.frame(lapply(genelist, function(x) dim(x)))
>
>     D1   D2   D3   D7  D14
> 1 2278 2278 2278 2278 2278
> 2    2    2    2    2    2
>
> # Then we can filter out "non-significant" values
> > as.data.frame(lapply(filtered_genelist, function(x) dim(x)))
>
>     D1   D2   D3   D7 D14
> 1 1108 1389 1037 1196 380
> 2    2    2    2    2   2
>
>
> I could go on but I think the point is clear. This package is full of data
> frames and nested lists and it would be nice to use SE or MAE to tidy up
> the global environment. Is there a way of turning many many data frames/
> nested lists into an SE or MEA object? If there is please do let me know, I
> am not sure how to do this, and I feel as though it would be a necessary
> process to (at least) explore if I want my package on bioconductor.
>
> Many Thanks, Krutik.
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list