[BioC] Detection copy number variations with cn.MOPS package

Günter Klambauer klambauer at bioinf.jku.at
Mon Dec 10 10:22:35 CET 2012


Hello Laura,

     For the "CNV regions"  I join the individual CNV regions, when they 
cluster at some location. Therefore the CNV region can be longer
than the individual CNVs. To obtain the copy number for this region I 
summarize the counts from the initial segments. Therefore it
may be that the posterior probability of CN2 is the largest for all 
samples. The posterior probabilities for other copy numbers should also 
be high.
I suggest you look in the slot "cnvs", "segmentation" or 
"integerCopyNumber" -- there you should find copy numbers different from 2.
Another possibility is to lower the parameter "priorImpact". This 
parameter enforces the algorithm to explain the data by copy number 2. 
Lowering it
leads to more detections. It should be adjusted to each data set.

I hope this helps,
Regards,
Günter

On 12/07/2012 04:35 PM, lpascual wrote:
> Dear group,
>
> I'm trying to detect copy number variations with the cn.MOPS package. I
> have eight different samples (coming from different individuals
> re-sequenced by Solexa). Sequences have been mapped against the
> reference genome with  BWA, the genome coverage of my samples ranges
> from 13x to 6x. I have run the cn.mops using the package default parameters.
> However, when I run the algorithm it detects some CNV regions for which
> the copy number is 2 for all the individuals. Does anyone have an
> explanation for this result?
> I paste you here the code I have used and one example of a detected CNV
> where the copy number is 2.
>
> Thanks in advance
>
> Laura
>
>   > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>    [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>    [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>    [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>    [7] LC_PAPER=C                 LC_NAME=C
>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Rsamtools_1.8.6      Biostrings_2.24.1    cn.mops_1.2.6
> [4] GenomicRanges_1.8.12 IRanges_1.14.4       Biobase_2.16.0
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-4.1 stats4_2.15.1  zlibbioc_1.2.0
>
>   > Countsreads_SL2.40ch02<-getReadCountsFromBAM(BAMFiles,
> c("Cervil","Criollo_new","Ferum_new","LA0147","LA1420","Levovil","Plovdiv","Stupike"),refSeqName=c("SL2.40ch02"),mode="unpaired")
>
>   > resCNMOPS_SL2.40ch02<-cn.mops(Countsreads_SL2.40ch02)
>
>   > head(params(resCNMOPS_SL2.40ch02),n=11L)
> $method
> [1] "cn.mops"
> $folds
> [1] 0.025 0.500 1.000 1.500 2.000 2.500 3.000 3.500 4.000
> $classes
> [1] "CN0" "CN1" "CN2" "CN3" "CN4" "CN5" "CN6" "CN7" "CN8"
> $priorimpact
> [1] 1
> $cyc
> [1] 20
> $normType
> [1] "poisson"
> $normQu
> [1] 0.25
> $upperThreshold
> [1] 0.5
> $lowerThreshold
> [1] -0.9
> $minWidth
> [1] 3
> $SegmentationParams
> character(0)
>
>   > *Countsreads*_SL2.40ch02[1512]
> GRanges with 1 range and 8 elementMetadata cols:
>           seqnames               ranges strand |   Levovil Criollo_new
> Ferum_new Stupike   Plovdiv    LA1420    Cervil    LA0147
> <Rle> <IRanges> <Rle> | <integer> <integer> <integer> <integer>
> <integer> <integer> <integer> <integer>
>     [1] SL2.40ch02 [*43819001, 43848000*]      * | *112         403
> 164 **1306      1192      1181       907       617*
>
>       seqlengths:
>      SL2.40ch02
>              NA
>
>   > *cnvs*(resCNMOPS_SL2.40ch02)[6]
> GRanges with 1 range and 4 elementMetadata cols:
>           seqnames               ranges strand |  sampleName
> median      mean *CN*
> <Rle> <IRanges> <Rle> | <factor> <numeric> <numeric> *<character>*
>     [1] SL2.40ch02 [*43819001, 43906000*]      * | Criollo_new
> 0.5849618   0.63563 *CN2*
>
>     seqlengths:
>      SL2.40ch02
>              NA
>
>   > *cnvr*(resCNMOPS_SL2.40ch02)[7]
> GRanges with 1 range and 8 elementMetadata cols:
>           seqnames               ranges strand | CN.Levovil
> CN.Criollo_new    CN.Ferum_new CN.Stupike CN.Plovdiv CN.LA1420 CN.Cervil
> CN.LA0147
> <Rle> <IRanges> <Rle> | <factor> <factor> <factor> <factor> <factor>
> <factor> <factor> <factor>
>     [1] SL2.40ch02 [*43819001, 43906000*]      * | *CN2            CN2
> CN2        CN2        CN2       CN2       CN2       CN2*
>
>     seqlengths:
>      SL2.40ch02
>              NA
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list