[BioC] Using custom CDF with 'make.cdf.env'
Scott Robinson
Scott.Robinson at glasgow.ac.uk
Wed Aug 27 17:58:12 CEST 2014
Ah!
Sorry, reading that reply I instantly saw the problem – I forgot to change the probe set ID for the individual rows.
Thanks very much James
From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: 27 August 2014 16:52
To: Scott Robinson
Cc: bioconductor at r-project.org
Subject: Re: Using custom CDF with 'make.cdf.env'
Hi Scott,
I see some of what you have done. As an example, you moved things around, and changed the 'Cell' number:
C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st miRNA-1_0.CDF
129939:Name=bta-let-7a_st
129946:Cell1=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 0
129947:Cell2=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 1
129948:Cell3=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 2 11
129949:Cell4=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 3
C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st newmir1.cdf
43056:Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4 11
43057:Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5 11
43058:Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6 11
43059:Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7 11
This won't change anything. In both cases, there is a probeset called bta-let-7a_st, that has four identical probes. Putting these data somewhere else in the cdf won't change the way it is parsed.
In other words, this:
C:\Users\BioinfAdmin\Desktop> sed -n '43050,43111p' newmir1.cdf
StopPosition=59
CellHeader=X Y PROBE FEAT QUAL EXPOS POS CBASE PBASE TBA
Cell1=2 190 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 0 11 G
Cell2=196 180 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 1 11
Cell3=211 187 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 2 11
Cell4=29 205 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 3 11
Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4 11
Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5 11
Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6 11
Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7 11
Cell9=2 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 8 11 G
Cell10=178 178 ACTCCATCATCCAACATATCAA control cbr-let-7_st 9 11
Cell11=212 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 10 11
Cell12=189 181 ACTCCATCATCCAACATATCAA control cbr-let-7_st 11 11
Cell13=179 178 ACTCCATCATCCAACATATCAA control cel-let-7_st 12 11
Cell14=80 157 ACTCCATCATCCAACATATCAA control cel-let-7_st 13 11
Cell15=215 191 ACTCCATCATCCAACATATCAA control cel-let-7_st 14 11
Cell16=190 181 ACTCCATCATCCAACATATCAA control cel-let-7_st 15 11
Cell17=79 157 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 16 11
Cell18=213 189 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 17 11
Cell19=182 179 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 18 11
Cell20=196 181 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 19 11
Cell21=205 184 ACTCCATCATCCAACATATCAA control dre-let-7a_st 20 11
Cell22=188 181 ACTCCATCATCCAACATATCAA control dre-let-7a_st 21 11
Cell23=216 191 ACTCCATCATCCAACATATCAA control dre-let-7a_st 22 11
Cell24=83 157 ACTCCATCATCCAACATATCAA control dre-let-7a_st 23 11
Cell25=77 157 ACTCCATCATCCAACATATCAA control fru-let-7a_st 24 11
Cell26=212 188 ACTCCATCATCCAACATATCAA control fru-let-7a_st 25 11
Cell27=193 181 ACTCCATCATCCAACATATCAA control fru-let-7a_st 26 11
Cell28=182 180 ACTCCATCATCCAACATATCAA control fru-let-7a_st 27 11
Cell29=188 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 28 11
Cell30=211 189 ACTCCATCATCCAACATATCAA control gga-let-7a_st 29 11
Cell31=78 157 ACTCCATCATCCAACATATCAA control gga-let-7a_st 30 11
Cell32=199 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 31 11
Cell33=214 188 ACTCCATCATCCAACATATCAA control gga-let-7j_st 32 11
Cell34=191 181 ACTCCATCATCCAACATATCAA control gga-let-7j_st 33 11
Cell35=180 177 ACTCCATCATCCAACATATCAA control gga-let-7j_st 34 11
Cell36=203 180 ACTCCATCATCCAACATATCAA control gga-let-7j_st 35 11
Cell37=211 188 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 36 11
Cell38=184 179 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 37 11
Cell39=195 181 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 38 11
Cell40=82 157 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 39 11
Cell41=179 177 ACTCCATCATCCAACATATCAA control mml-let-7a_st 40 11
Cell42=190 182 ACTCCATCATCCAACATATCAA control mml-let-7a_st 41 11
Cell43=214 191 ACTCCATCATCCAACATATCAA control mml-let-7a_st 42 11
Cell44=202 180 ACTCCATCATCCAACATATCAA control mml-let-7a_st 43 11
Cell45=183 179 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 44 11
Cell46=84 157 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 45 11
Cell47=194 181 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 46 11
Cell48=212 187 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 47 11
Cell49=76 157 ACTCCATCATCCAACATATCAA control rno-let-7a_st 48 11
Cell50=192 181 ACTCCATCATCCAACATATCAA control rno-let-7a_st 49 11
Cell51=181 177 ACTCCATCATCCAACATATCAA control rno-let-7a_st 50 11
Cell52=212 191 ACTCCATCATCCAACATATCAA control rno-let-7a_st 51 11
Cell53=187 181 ACTCCATCATCCAACATATCAA control tni-let-7a_st 52 11
Cell54=128 77 ACTCCATCATCCAACATATCAA control tni-let-7a_st 53 11
Cell55=81 157 ACTCCATCATCCAACATATCAA control tni-let-7a_st 54 11
Cell56=213 191 ACTCCATCATCCAACATATCAA control tni-let-7a_st 55 11
Cell57=214 189 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 56 11
Cell58=185 179 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 57 11
Cell59=22 202 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 58 11
Cell60=197 181 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 59 11
will not create a single probeset for let-7a, over all species. And trying to combine 60 identical 25-mers into a single probeset is about as useless as having 15 individual probesets made up of four identical probes. You are still running RMA (or whatever) on essentially the same information, with the only differences between probes being entirely due to technical variability. These arrays are, within the constraints of Affy's system, about as good as you can do. Which is to say, not very good.
If you really want to do what you want to do, then you have to also make the probeset IDs identical within each block. So here you would have to strip off the prepended species abbreviation, and convert the gga-let-7j probes to let-7a_st, and then you would have just one probeset. But that will be a lot of work for what I imagine will be very little gain.
Best,
Jim
On Wed, Aug 27, 2014 at 11:19 AM, James W. MacDonald <jmacdon at uw.edu<mailto:jmacdon at uw.edu>> wrote:
Hi Scott,
As far as I can tell, you haven't made any changes to the cdf at all:
> z <- make.cdf.env("newmir1.cdf")
Reading CDF file.
Creating CDF environment
Wait for about 78 dots.........................................................................
> z
<environment: 0x00000000113d5c08>
> length(ls(z))
[1] 7815
> zz <- as.list(z)
> table(sapply(zz, nrow))
4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94
6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78
> y <- make.cdf.env("miRNA-1_0.CDF")
Reading CDF file.
Creating CDF environment
Wait for about 78 dots..........................................................................
> yy <- as.list(y)
> length(yy)
[1] 7815
> table(sapply(yy, nrow))
4 8 9 10 11 20 25 40 50 67 73 88 89 90 91 92 94
6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1 1 78
> all.equal(names(zz), names(yy))
[1] TRUE
Best,
Jim
On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson <Scott.Robinson at glasgow.ac.uk<mailto:Scott.Robinson at glasgow.ac.uk>> wrote:
Dear All,
Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new ("newmir1.cdf") CDFs, test script and example CEL file:
http://www.files.com/set/53fdeb0aa2176
Thanks,
Scott
________________________________________
From: Scott Robinson [guest] [guest at bioconductor.org<mailto:guest at bioconductor.org>]
Sent: 27 August 2014 13:11
To: bioconductor at r-project.org<mailto:bioconductor at r-project.org>; Scott Robinson
Cc: makecdfenv Maintainer
Subject: Using custom CDF with 'make.cdf.env'
Dear List,
I made a custom CDF by modifying the original Affymetrix miRNA v1 file. As there is a great level of redundancy in this chip I have condensed the original 7815 probe sets into 6190 probe sets (by 'moving' probes from one set to another), however when I try making and attaching my new CDF environment I still seem to have 7815 probe sets so presumably I must have done something wrong.
I have read the vignette and many similar posts to mine however still cannot work out what I am doing wrong. Perhaps the problem is with the CDF itself? I have a short script testing the functionality, the output of which I have copied in below. I will gladly attach the script, CDFs and example CEL file if there is nothing obviously wrong with the code - would do this now but there doesn't appear to be an option on the webform.
Many thanks,
Scott
> folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\"
>
> setwd(paste0(folder,"CEL"))
> options(stringsAsFactors=FALSE)
> library(affy)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following object is masked from ‘package:stats’:
xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval,
Filter, Find, get, intersect, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int<http://pmax.int>, pmin, pmin.int<http://pmin.int>, Position, rank,
rbind, Reduce, rep.int<http://rep.int>, rownames, sapply, setdiff, sort, table,
tapply, union, unique, unlist
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> library(makecdfenv)
Loading required package: affyio
>
> cleancdfname("newmir1.cdf")
[1] "newmir1.cdf"
> newmir1 = make.cdf.env("newmir1.cdf")
Reading CDF file.
Creating CDF environment
Wait for about 78 dots.......................................................................
> Data <- ReadAffy()
> Data at cdfName <- "newmir1"
>
> Data
AffyBatch object
size of arrays=230x230 features (17 kb)
cdf=newmir1 (7815 affyids)
number of samples=1
number of genes=7815
annotation=mirna102xgain
notes=
>
> dim(exprs(rma(Data)))
Background correcting
Normalizing
Calculating Expression
[1] 7815 1
-- output of sessionInfo():
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1 Biobase_2.20.1
[5] BiocGenerics_0.6.0
loaded via a namespace (and not attached):
[1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2
[4] zlibbioc_1.6.0
--
Sent via the guest posting facility at bioconductor.org<http://bioconductor.org>.
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list