[Bioc-devel] Use of SummerisedExperiments or MultiAssayExperiments of many many Dataframes/ nested List objects
Krutik Patel (PGR)
K@P@te|5 @end|ng |rom newc@@t|e@@c@uk
Fri Jan 31 14:30:51 CET 2020
Hello Bioc-Devel,
This will be a long winded question and I apologise for that, I just want to be thorough.
I recently submitted a package onto bioconductor for review, and received a response to have SummerisedExperiments or MultiAssayExperiments as the standard format for my package. I looked into the usage of SE/ MAE and do think they are very useful. I just find it difficult to envision the usage of these objects in my package. Namely, because I do not use sequencing data and so I do not have any phenoData.
The input to my package is deferentially expressed data from microRNA and mRNA data, and I feel like that should stay as data frames to make it easier for users to use. From these data frames, many other data frames and nested lists are created. I will give a short demonstration of how my package functions below and I would appreciate it if any user could demonstrate to me how to incorporate SE's or MAE's.
# This will load test data
> miR <- mm_miR
> mRNA <- mm_mRNA
# Visualise the data
> head(miR [1:5, 1:5])
D1.Log2FC D1.adjPVal D2.Log2FC D2.adjPVal D3.Log2FC
mmu-let-7b-3p -0.008006934 0.97706031 -0.008296431 0.9503666129 -0.1153951
mmu-let-7c-5p 0.299802302 0.30094186 0.511083040 0.0489321072 0.4663393
mmu-let-7d-3p 0.430125310 0.06476131 0.483677350 0.0228474958 0.4301441
mmu-let-7e-3p 0.417901606 0.06543412 0.448677130 0.0301945611 0.3051121
mmu-let-7e-5p 0.637167321 0.01010895 0.984529549 0.0001462246 0.8917273
> head(mRNA [1:5, 1:5])
D1.Log2FC D1.adjPVal D2.Log2FC D2.adjPVal D3.Log2FC
A2m 1.336002 0.4627700063 4.0470385 0.0114355180 3.688919
AA986860 -1.886142 0.0239685308 -0.8686382 0.2892313624 -1.115943
Aadac -2.493883 0.0022213531 -2.1678098 0.0051038251 -1.338884
Aadat -3.647727 0.0006583596 -3.3660043 0.0011145806 -2.616356
Aass -1.283668 0.0101430103 -1.9567394 0.0004421697 -1.315752
# As you can see creating this type of data for a user would be quite simple if it is kept as data frames
# We use the following functions to retrieve annotation IDs
# They will produce several data frames each
> getIDs_miR_mouse(miR)
> head(miR_ensembl)
GENENAME ID
1 mmu-let-7b-3p <NA>
2 mmu-let-7c-5p <NA>
3 mmu-let-7d-3p <NA>
4 mmu-let-7e-3p <NA>
5 mmu-let-7e-5p <NA>
6 mmu-let-7f-1-3p <NA>
> head(miR_entrez)
GENENAME ID
1 mmu-let-7b-3p <NA>
2 mmu-let-7c-5p <NA>
3 mmu-let-7d-3p <NA>
4 mmu-let-7e-3p <NA>
5 mmu-let-7e-5p <NA>
6 mmu-let-7f-1-3p <NA>
> getIDs_miR_mouse(mRNA)
GENENAME ID
1 A2m ENSMUSG00000030111
2 AA986860 ENSMUSG00000042510
3 Aadac ENSMUSG00000027761
4 Aadat ENSMUSG00000057228
5 Aass ENSMUSG00000029695
6 Abat ENSMUSG00000057880
GENENAME ID
1 A2m 232345
2 AA986860 212439
3 Aadac 67758
4 Aadat 23923
5 Aass 30956
6 Abat 268860
# The following function will combine the two data frames into a new one
genetic_data <- CombineGenes(miR_data = miR, mRNA_data = mRNA)
# This function will alter the new data frame into a nested list separated by a common string
genelist <- GenesList(method = "c", genetic_data = genetic_data, timeString = "D")
> as.data.frame(lapply(genelist, function(x) dim(x)))
D1 D2 D3 D7 D14
1 2278 2278 2278 2278 2278
2 2 2 2 2 2
# Then we can filter out "non-significant" values
> as.data.frame(lapply(filtered_genelist, function(x) dim(x)))
D1 D2 D3 D7 D14
1 1108 1389 1037 1196 380
2 2 2 2 2 2
I could go on but I think the point is clear. This package is full of data frames and nested lists and it would be nice to use SE or MAE to tidy up the global environment. Is there a way of turning many many data frames/ nested lists into an SE or MEA object? If there is please do let me know, I am not sure how to do this, and I feel as though it would be a necessary process to (at least) explore if I want my package on bioconductor.
Many Thanks, Krutik.
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list