[BioC] identifying drosophila miRNA targets
James W. MacDonald
jmacdon at uw.edu
Fri Mar 29 23:11:14 CET 2013
Hi Fiona,
Probably the easiest way to do this is to convert the flybase_cg ids to
ensembl IDs.
## read sanger data in
## there is some weird cruft in line 4685, best to just remove the
thirteenth column
dat <- read.table("v5.txt.drosophila_melanogaster", sep = "\t",
stringsAsFactors = FALSE)[,-13]
library(drosophila.db)
## map flybase_cg IDs to ensembl
x <- select(org.Dm.eg.db, gsub("-[A-Z]+", "",dat[,12]), c("ENSEMBL"),
"FLYBASECG")
## there are some duplicates here, but I don't think it will matter
## merge back together and write back out
dat$merge <- gsub("-[A-Z]+", "",dat[,12])
dat2 <- merge(dat, x, by.x="merge", by.y=1, all.x = TRUE)
write.table(dat2, "v5.txt.drosophila_melanogaster2", sep = "\t",
col.names = FALSE, row.names = FALSE, quote = FALSE)
## note that I say the file is not sanger, and then tell mirna2mrna()
which columns to use.
test <- mirna2mrna(miRNA, "v5.txt.drosophila_melanogaster2", mRNA,
"org.Dm.eg.db","drosophila2.db", FALSE, 2,14)
With the truncated mRNA and miRNA probe IDs you give below, I get no
mappings, but I assume you have way more mRNA transcripts.
Let me know if this works for you.
Best,
Jim
On 3/29/2013 8:09 AM, Fiona Ingleby wrote:
> Hi Jim,
>
> Thanks very much for pointing that out - it seems mirna2mrna is
> exactly what I was after, I don't know how I managed to overlook it….
>
> I'm a bit puzzled about the results I'm getting, however, and so if
> you have a minute to think this through then I'd be really
> grateful. The help pages are pretty clear, and so I've managed to get
> the function to run with my data without any problems….but I'm getting
> 'named list()' as output. Which might simply suggest that there are no
> correlations between the miRNAs and mRNAs in my data (?). But I'm not
> convinced and I'm wondering if I've done something wrong somewhere
> along the way (I'm looking at 39 differentially expressed miRNAs along
> with 2638 differentially expressed mRNAs, so I'd be surprised if there
> were none that correlate with each other).
>
> I'm wondering if I'm doing something daft like using RNA IDs in the
> wrong format (which might be one explanation for getting 0 matches
> returned from the database?). At the moment I'm just taking character
> vectors directly from the ExpressionSet. So I have 2 ExpressionSets,
> each representing only the probes which are significantly
> differentially expressed in each dataset - I've called these sigmRNA
> (2638 x 12 samples) and sigmiRNA (39 x 12 samples) for mRNA and miRNA
> respectively.
>
> >featureNames(sigmRNA)
> [1] "1622906_at" "1622915_at" "1622917_a_at" "1622920_at"
> "1622926_at" "1622932_s_at" "1622935_at" "1622940_at" "1622946_at"
> [10] "1622952_at" "1622956_at" "1622959_at" "1622960_at"
> "1622965_s_at" "1622974_at" "1622975_at" "1622978_at" "1622992_at"
> [19] "1623002_at" "1623004_a_at" "1623008_at" "1623019_a_at"
> "1623022_at" "1623025_at" "1623026_a_at" "1623030_at" "1623031_a_at"
>
> …and so on for 2638 entries.
>
> >featureNames(sigmiRNA)
> [1] "dme-miR-1002_st" "dme-miR-1004_st" "dme-miR-1017_st"
> "dme-miR-124_st" "dme-miR-2500_st" "dme-miR-286_st"
> [7] "dme-miR-2a_st" "dme-miR-306_st" "dme-miR-310_st"
> "dme-miR-311_st" "dme-miR-312_st" "dme-miR-313_st"
>
> …etc. So I'm using mirna2mrna like this:
>
> test<-mirna2mrna(miRNAids=featureNames(sigmiRNA),
> miRNAannot="v5.txt.drosophila_melanogaster", #downloaded from the
> rbi website and saved in the working directory
> mRNAids=featureNames(sigmRNA),
> orgPkg="org.Dm.eg.db",chipPkg="drosophila2.db",
> sanger=T,miRNAcol=NULL,mRNAcol=NULL,transType="ensembl")
>
> and then I get:
>
> > test
> named list()
>
> I've put the sessionInfo() output at the bottom of the email. I also
> looked through the source code on the Bioconductor code search
> website, pulled out the 'convertIDs' function, and ran this as an
> independent function on the lists of RNAs to check to see what it was
> doing, but I can't see anything that looks odd to me - it removes the
> '_st'/'_at' as I expected.
>
> So I'm a bit stuck. I'm sure I've misunderstood something, but can't
> pick out what it is myself. I suppose it's totally possible that the
> analysis is fine and there are just no correlations between the miRNAs
> and mRNAs of interest in my data - but I thought I would check. If you
> (or anyone) has any ideas, I'd really appreciate the help.
>
> Thanks again,
>
> Fiona
>
> Dr Fiona C Ingleby
>
> Postdoctoral Research Fellow
> University of Sussex
>
> Email: F.Ingleby at sussex.ac.uk <mailto:F.Ingleby at sussex.ac.uk>
> Website: fionaingleby.weebly.com <http://fionaingleby.weebly.com>
> Tel: +44(0)1273678559
>
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] drosophila2.db_2.8.1 org.Dm.eg.db_2.8.0 RSQLite_0.11.2
> DBI_0.2-5 AnnotationDbi_1.20.7 Biobase_2.18.0
> [7] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.16.6 parallel_2.15.2 stats4_2.15.2 tools_2.15.2
>
>
>
> On 28 Mar 2013, at 16:43, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
>> Hi Fiona,
>>
>> I have a function called mirna2mrna (yeah, I know, lame function
>> name...) in my affycoretools package that does this, based on the
>> sanger microcosm targets data that you can download here:
>>
>> http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl
>>
>> there is also a function makeHmap() that will create a heatmap with
>> the miRNA/mRNA pairs, where the color of the cells is based on the
>> correlation between the two RNA species (with the intent to show
>> negative correlations, indicating that the miRNA is hypothetically
>> causing premature degradation of the mRNA).
>>
>> I think the help pages for these two functions are reasonable, but
>> please let me know if you have any questions.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 3/28/2013 12:30 PM, Fiona Ingleby wrote:
>>> Hi everyone,
>>>
>>> I am working with mRNA data from Affy 'drosophila2' arrays and miRNA
>>> data from Affy 'mirna3' arrays. I have identified a list of
>>> differentially expressed mRNAs and miRNAs. I'm having a bit of
>>> trouble with some downstream analyses and I'm hoping someone might
>>> be able to offer some help.
>>>
>>> I would like to use my list of differentially expressed miRNAs to
>>> access online databases (e.g. miRBase, microRNA.org…) and extract
>>> the names of all the potential target mRNAs. Then I'd like to use
>>> this list of mRNAs to look through my mRNA expression data. I'm
>>> aware of packages like 'RmiR' and 'microRNA' which have built-in
>>> functions for finding miRNA targets, but as far as I can tell,
>>> 'RmiR' uses miRNA databases for humans only and 'microRNA' works
>>> with human and mouse data only. So is there a package I am unaware
>>> of (or another application of 'RmiR'/'microRNA' that I am unaware
>>> of) for looking at drosophila data?
>>>
>>> So far I have also considered the 'biomaRt' package to see if the
>>> database query function on there can help me, but I haven't had much
>>> luck. For instance, if I try an example list of miRNAs:
>>>
>>> mirna<-c("dme-miR-1002","dme-miR-312","dme-miR-973")
>>> library(biomaRt)
>>> ensembl<-useMart("ensembl",dataset="dmelanogaster_gene_ensembl")
>>> getBM(attributes="mirbase_accession",filters="mirbase_id",values=mirna,mart=ensembl)
>>>
>>> then 'logical(0)' is returned, as if there are no records for those
>>> miRNAs - but by searching the database manually I know the records
>>> are there.
>>>
>>> Alternatively I can try:
>>>
>>> miRNA<- getBM(c("mirbase_accession","mirbase_id", "ensembl_gene_id",
>>> "start_position", "chromosome_name"), filters = c("with_mirbase"),
>>> values = list(T), mart = ensembl)
>>>
>>> which returns a table of various bits of information on miRNAs, but
>>> I cannot adapt this command to just look at my list of miRNAs of
>>> interest (ie. the 'mirna' vector above). I've included the
>>> sessionInfo() output for these at the bottom of the email, but I
>>> suspect my problem is more to do with the fact I'm not going about
>>> this the right way (as opposed to a problem with package versions
>>> and coding etc.). I'm not even sure that using 'biomaRt' will give
>>> me the information I eventually want (the target mRNAs of these
>>> miRNAs), I was just trying it out, to see what it was capable of in
>>> terms of querying these databases. So I apologise for the
>>> vagueness. Since I haven't managed to get very far by myself then
>>> it's difficult to be more specific, but I'd really appreciate it if
>>> anyone could offer some advice, even just to point me in the
>>> direction of a useful package which might have gone unnoticed by me.
>>>
>>> Many thanks,
>>>
>>> Fiona
>>>
>>> Dr Fiona C Ingleby
>>> Postdoctoral Research Fellow
>>> University of Sussex
>>> Email: F.Ingleby at sussex.ac.uk
>>> Website: fionaingleby.weebly.com
>>>
>>>
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] biomaRt_2.14.0 affy_1.36.1 Biobase_2.18.0
>>> BiocGenerics_0.4.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.26.0 BiocInstaller_1.8.3 grid_2.15.2
>>> lattice_0.20-14 Matrix_1.0-11 MCMCglmm_2.17
>>> [7] preprocessCore_1.20.0 RCurl_1.95-4.1 tools_2.15.2
>>> XML_3.95-0.2 zlibbioc_1.4.0
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list