[BioC] Using custom CDF with 'make.cdf.env'
James W. MacDonald
jmacdon at uw.edu
Wed Aug 27 17:51:53 CEST 2014
Hi Scott,
I see some of what you have done. As an example, you moved things around,
and changed the 'Cell' number:
C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st miRNA-1_0.CDF
129939:Name=bta-let-7a_st
129946:Cell1=185 178 ACTCCATCATCCAACATATCAA control
bta-let-7a_st 0
129947:Cell2=197 180 ACTCCATCATCCAACATATCAA control
bta-let-7a_st 1
129948:Cell3=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 2
11
129949:Cell4=210 187 ACTCCATCATCCAACATATCAA control
bta-let-7a_st 3
C:\Users\BioinfAdmin\Desktop>grep -n bta-let-7a_st newmir1.cdf
43056:Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4
11
43057:Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5
11
43058:Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6
11
43059:Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7
11
This won't change anything. In both cases, there is a probeset called
bta-let-7a_st, that has four identical probes. Putting these data somewhere
else in the cdf won't change the way it is parsed.
In other words, this:
C:\Users\BioinfAdmin\Desktop> sed -n '43050,43111p' newmir1.cdf
StopPosition=59
CellHeader=X Y PROBE FEAT QUAL EXPOS POS CBASE
PBASE TBA
Cell1=2 190 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 0 11
G
Cell2=196 180 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 1
11
Cell3=211 187 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 2
11
Cell4=29 205 ACTCCATCATCCAACATATCAA control hsa-let-7a_st 3
11
Cell5=185 178 ACTCCATCATCCAACATATCAA control bta-let-7a_st 4
11
Cell6=197 180 ACTCCATCATCCAACATATCAA control bta-let-7a_st 5
11
Cell7=83 156 ACTCCATCATCCAACATATCAA control bta-let-7a_st 6
11
Cell8=210 187 ACTCCATCATCCAACATATCAA control bta-let-7a_st 7
11
Cell9=2 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 8 11
G
Cell10=178 178 ACTCCATCATCCAACATATCAA control cbr-let-7_st 9
11
Cell11=212 189 ACTCCATCATCCAACATATCAA control cbr-let-7_st 10
11
Cell12=189 181 ACTCCATCATCCAACATATCAA control cbr-let-7_st 11
11
Cell13=179 178 ACTCCATCATCCAACATATCAA control cel-let-7_st 12
11
Cell14=80 157 ACTCCATCATCCAACATATCAA control cel-let-7_st 13
11
Cell15=215 191 ACTCCATCATCCAACATATCAA control cel-let-7_st 14
11
Cell16=190 181 ACTCCATCATCCAACATATCAA control cel-let-7_st 15
11
Cell17=79 157 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 16
11
Cell18=213 189 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 17
11
Cell19=182 179 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 18
11
Cell20=196 181 ACTCCATCATCCAACATATCAA control cfa-let-7a_st 19
11
Cell21=205 184 ACTCCATCATCCAACATATCAA control dre-let-7a_st 20
11
Cell22=188 181 ACTCCATCATCCAACATATCAA control dre-let-7a_st 21
11
Cell23=216 191 ACTCCATCATCCAACATATCAA control dre-let-7a_st 22
11
Cell24=83 157 ACTCCATCATCCAACATATCAA control dre-let-7a_st 23
11
Cell25=77 157 ACTCCATCATCCAACATATCAA control fru-let-7a_st 24
11
Cell26=212 188 ACTCCATCATCCAACATATCAA control fru-let-7a_st 25
11
Cell27=193 181 ACTCCATCATCCAACATATCAA control fru-let-7a_st 26
11
Cell28=182 180 ACTCCATCATCCAACATATCAA control fru-let-7a_st 27
11
Cell29=188 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 28
11
Cell30=211 189 ACTCCATCATCCAACATATCAA control gga-let-7a_st 29
11
Cell31=78 157 ACTCCATCATCCAACATATCAA control gga-let-7a_st 30
11
Cell32=199 180 ACTCCATCATCCAACATATCAA control gga-let-7a_st 31
11
Cell33=214 188 ACTCCATCATCCAACATATCAA control gga-let-7j_st 32
11
Cell34=191 181 ACTCCATCATCCAACATATCAA control gga-let-7j_st 33
11
Cell35=180 177 ACTCCATCATCCAACATATCAA control gga-let-7j_st 34
11
Cell36=203 180 ACTCCATCATCCAACATATCAA control gga-let-7j_st 35
11
Cell37=211 188 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 36
11
Cell38=184 179 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 37
11
Cell39=195 181 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 38
11
Cell40=82 157 ACTCCATCATCCAACATATCAA control mdo-let-7a_st 39
11
Cell41=179 177 ACTCCATCATCCAACATATCAA control mml-let-7a_st 40
11
Cell42=190 182 ACTCCATCATCCAACATATCAA control mml-let-7a_st 41
11
Cell43=214 191 ACTCCATCATCCAACATATCAA control mml-let-7a_st 42
11
Cell44=202 180 ACTCCATCATCCAACATATCAA control mml-let-7a_st 43
11
Cell45=183 179 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 44
11
Cell46=84 157 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 45
11
Cell47=194 181 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 46
11
Cell48=212 187 ACTCCATCATCCAACATATCAA control mmu-let-7a_st 47
11
Cell49=76 157 ACTCCATCATCCAACATATCAA control rno-let-7a_st 48
11
Cell50=192 181 ACTCCATCATCCAACATATCAA control rno-let-7a_st 49
11
Cell51=181 177 ACTCCATCATCCAACATATCAA control rno-let-7a_st 50
11
Cell52=212 191 ACTCCATCATCCAACATATCAA control rno-let-7a_st 51
11
Cell53=187 181 ACTCCATCATCCAACATATCAA control tni-let-7a_st 52
11
Cell54=128 77 ACTCCATCATCCAACATATCAA control tni-let-7a_st 53
11
Cell55=81 157 ACTCCATCATCCAACATATCAA control tni-let-7a_st 54
11
Cell56=213 191 ACTCCATCATCCAACATATCAA control tni-let-7a_st 55
11
Cell57=214 189 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 56
11
Cell58=185 179 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 57
11
Cell59=22 202 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 58
11
Cell60=197 181 ACTCCATCATCCAACATATCAA control xtr-let-7a_st 59
11
will not create a single probeset for let-7a, over all species. And trying
to combine 60 identical 25-mers into a single probeset is about as useless
as having 15 individual probesets made up of four identical probes. You are
still running RMA (or whatever) on essentially the same information, with
the only differences between probes being entirely due to technical
variability. These arrays are, within the constraints of Affy's system,
about as good as you can do. Which is to say, not very good.
If you really want to do what you want to do, then you have to also make
the probeset IDs identical within each block. So here you would have to
strip off the prepended species abbreviation, and convert the gga-let-7j
probes to let-7a_st, and then you would have just one probeset. But that
will be a lot of work for what I imagine will be very little gain.
Best,
Jim
On Wed, Aug 27, 2014 at 11:19 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Scott,
>
> As far as I can tell, you haven't made any changes to the cdf at all:
>
> > z <- make.cdf.env("newmir1.cdf")
> Reading CDF file.
> Creating CDF environment
> Wait for about 78
> dots.........................................................................
> > z
> <environment: 0x00000000113d5c08>
> > length(ls(z))
> [1] 7815
> > zz <- as.list(z)
> > table(sapply(zz, nrow))
>
> 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91
> 92 94
> 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1
> 1 78
> > y <- make.cdf.env("miRNA-1_0.CDF")
> Reading CDF file.
> Creating CDF environment
> Wait for about 78
> dots..........................................................................
> > yy <- as.list(y)
> > length(yy)
> [1] 7815
> > table(sapply(yy, nrow))
>
> 4 8 9 10 11 20 25 40 50 67 73 88 89 90 91
> 92 94
> 6703 8 14 32 959 9 1 1 2 1 1 1 2 1 1
> 1 78
> > all.equal(names(zz), names(yy))
> [1] TRUE
>
> Best,
>
> Jim
>
>
>
>
> On Wed, Aug 27, 2014 at 10:31 AM, Scott Robinson <
> Scott.Robinson at glasgow.ac.uk> wrote:
>
>> Dear All,
>>
>> Since it exceeds 1MB, here is a link to the old ("miRNA-1_0.CDF") and new
>> ("newmir1.cdf") CDFs, test script and example CEL file:
>>
>> http://www.files.com/set/53fdeb0aa2176
>>
>> Thanks,
>>
>> Scott
>> ________________________________________
>> From: Scott Robinson [guest] [guest at bioconductor.org]
>> Sent: 27 August 2014 13:11
>> To: bioconductor at r-project.org; Scott Robinson
>> Cc: makecdfenv Maintainer
>> Subject: Using custom CDF with 'make.cdf.env'
>>
>> Dear List,
>>
>> I made a custom CDF by modifying the original Affymetrix miRNA v1 file.
>> As there is a great level of redundancy in this chip I have condensed the
>> original 7815 probe sets into 6190 probe sets (by 'moving' probes from one
>> set to another), however when I try making and attaching my new CDF
>> environment I still seem to have 7815 probe sets so presumably I must have
>> done something wrong.
>>
>> I have read the vignette and many similar posts to mine however still
>> cannot work out what I am doing wrong. Perhaps the problem is with the CDF
>> itself? I have a short script testing the functionality, the output of
>> which I have copied in below. I will gladly attach the script, CDFs and
>> example CEL file if there is nothing obviously wrong with the code - would
>> do this now but there doesn't appear to be an option on the webform.
>>
>> Many thanks,
>>
>> Scott
>>
>>
>> > folder <- "C:\Work\COPD-ASTHMA\microRNA files\newCDF\test\"
>> >
>> > setwd(paste0(folder,"CEL"))
>> > options(stringsAsFactors=FALSE)
>> > library(affy)
>> Loading required package: BiocGenerics
>> Loading required package: parallel
>>
>> Attaching package: ‘BiocGenerics’
>>
>> The following objects are masked from ‘package:parallel’:
>>
>> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>> clusterExport, clusterMap, parApply, parCapply, parLapply,
>> parLapplyLB, parRapply, parSapply, parSapplyLB
>>
>> The following object is masked from ‘package:stats’:
>>
>> xtabs
>>
>> The following objects are masked from ‘package:base’:
>>
>> anyDuplicated, as.data.frame, cbind, colnames, duplicated, eval,
>> Filter, Find, get, intersect, lapply, Map, mapply, match, mget,
>> order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
>> rbind, Reduce, rep.int, rownames, sapply, setdiff, sort, table,
>> tapply, union, unique, unlist
>>
>> Loading required package: Biobase
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material; view with
>> 'browseVignettes()'. To cite Bioconductor, see
>> 'citation("Biobase")', and for packages 'citation("pkgname")'.
>>
>> > library(makecdfenv)
>> Loading required package: affyio
>> >
>> > cleancdfname("newmir1.cdf")
>> [1] "newmir1.cdf"
>> > newmir1 = make.cdf.env("newmir1.cdf")
>> Reading CDF file.
>> Creating CDF environment
>> Wait for about 78
>> dots.......................................................................
>> > Data <- ReadAffy()
>> > Data at cdfName <- "newmir1"
>> >
>> > Data
>> AffyBatch object
>> size of arrays=230x230 features (17 kb)
>> cdf=newmir1 (7815 affyids)
>> number of samples=1
>> number of genes=7815
>> annotation=mirna102xgain
>> notes=
>> >
>> > dim(exprs(rma(Data)))
>> Background correcting
>> Normalizing
>> Calculating Expression
>> [1] 7815 1
>>
>>
>> -- output of sessionInfo():
>>
>> > sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252
>> [2] LC_CTYPE=English_United Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] makecdfenv_1.36.0 affyio_1.28.0 affy_1.38.1
>> Biobase_2.20.1
>> [5] BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] BiocInstaller_1.10.4 preprocessCore_1.22.0 tools_3.0.2
>> [4] zlibbioc_1.6.0
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list