[Bioc-devel] seqlevelsStyle() warning-message by message-warning, and NCBI/Ensembl seq styles

Robert Castelo robert.castelo at upf.edu
Tue Jan 26 09:32:07 CET 2016


hi Hervé,

that's great, thanks for implementing quickly these bits.

robert.

On 01/26/2016 01:19 AM, Hervé Pagès wrote:
> Hi Robert,
>
> I've made the following changes to the seqlevelsStyle() getter and
> setter:
> 1) No more warning when the getter returns more than 1 compatible
> style.
> 2) When more than 1 style is supplied, the setter uses the 1st one
> only with a warning.
>
> That should address the issues you reported below. I'm sure the behavior
> of the seqlevelsStyle() setter could be refined when more than 1 style
> is supplied but the new behavior should get us going for now.
>
> These changes are in GenomeInfoDb 1.6.3 (release) and 1.7.5 (devel).
>
> Note that the Ensembl mappings were contributed by Jo last week (thanks
> Jo) and they're indeed the same as the NCBI mappings for Human but they
> differ for all the other organisms supported by GenomeInfoDb at the
> moment. Anyway, generally speaking, it sounds like the user should be
> able to do seqlevelsStyle(x) <- "Ensembl" independently of whether this
> will result in seqlevels that are the same as if s/he had done
> seqlevelsStyle(x) <- "NCBI".
>
> Cheers,
> H.
>
> On 01/25/2016 02:39 AM, Robert Castelo wrote:
>> hi,
>>
>> i would like to ask if current line #142 of
>> GenomeInfoDb/R/seqlevelsStyle.R:
>>
>> message("warning! Multiple seqlevels styles found.")
>>
>> could be replaced by
>>
>> warning("Multiple seqlevels styles found.")
>>
>> since, after all, the message is a warning.
>>
>> the reason for my request is that a recent update on GenomeInfoDb added
>> the 'Ensembl' sequence style and this executes the previous line when i
>> try to figure out the sequence style of a BAM file produced from human
>> sequence data and GATK. for instance:
>>
>> path2bam <- file.path(system.file("extdata",
>> package="VariantFiltering"), "NA12878.subset.bam")
>>
>> hdr <- scanBamHeader(path2bam)
>> names(hdr[[1]]$targets)
>> [1] "1" "2" "3" "4" "5"
>> [6] "6" "7" "8" "9" "10"
>> [11] "11" "12" "13" "14" "15"
>> [16] "16" "17" "18" "19" "20"
>> [21] "21" "22" "X" "Y" "MT"
>> [26] "GL000207.1" "GL000226.1" "GL000229.1" "GL000231.1" "GL000210.1"
>> [31] "GL000239.1" "GL000235.1" "GL000201.1" "GL000247.1" "GL000245.1"
>> [36] "GL000197.1" "GL000203.1" "GL000246.1" "GL000249.1" "GL000196.1"
>> [41] "GL000248.1" "GL000244.1" "GL000238.1" "GL000202.1" "GL000234.1"
>> [46] "GL000232.1" "GL000206.1" "GL000240.1" "GL000236.1" "GL000241.1"
>> [51] "GL000243.1" "GL000242.1" "GL000230.1" "GL000237.1" "GL000233.1"
>> [56] "GL000204.1" "GL000198.1" "GL000208.1" "GL000191.1" "GL000227.1"
>> [61] "GL000228.1" "GL000214.1" "GL000221.1" "GL000209.1" "GL000218.1"
>> [66] "GL000220.1" "GL000213.1" "GL000211.1" "GL000199.1" "GL000217.1"
>> [71] "GL000216.1" "GL000215.1" "GL000205.1" "GL000219.1" "GL000224.1"
>> [76] "GL000223.1" "GL000195.1" "GL000212.1" "GL000222.1" "GL000200.1"
>> [81] "GL000193.1" "GL000194.1" "GL000225.1" "GL000192.1" "NC_007605"
>> [86] "hs37d5"
>>
>> seqlevelsStyle(names(hdr[[1]]$targets))
>> warning! Multiple seqlevels styles found.
>> [1] "NCBI" "Ensembl"
>>
>> this is all fine in interactive mode so that the user is aware that
>> he/she may have to take action on this fact. however, at a specific
>> place of my package VariantFiltering i'd like to take action without
>> telling anything to the user. for that purpose i would like to use
>> suppressWarnings() but does not work currently with the message() call.
>> i could use suppressMessages() but would not prefer to.
>>
>> on a side note, both NCBI and Ensembl sequence style seem to be
>> identical:
>>
>> genomeStyles()$Homo_sapiens
>> circular auto sex NCBI UCSC dbSNP Ensembl
>> 1 FALSE TRUE FALSE 1 chr1 ch1 1
>> 2 FALSE TRUE FALSE 2 chr2 ch2 2
>> 3 FALSE TRUE FALSE 3 chr3 ch3 3
>> 4 FALSE TRUE FALSE 4 chr4 ch4 4
>> 5 FALSE TRUE FALSE 5 chr5 ch5 5
>> 6 FALSE TRUE FALSE 6 chr6 ch6 6
>> 7 FALSE TRUE FALSE 7 chr7 ch7 7
>> 8 FALSE TRUE FALSE 8 chr8 ch8 8
>> 9 FALSE TRUE FALSE 9 chr9 ch9 9
>> 10 FALSE TRUE FALSE 10 chr10 ch10 10
>> 11 FALSE TRUE FALSE 11 chr11 ch11 11
>> 12 FALSE TRUE FALSE 12 chr12 ch12 12
>> 13 FALSE TRUE FALSE 13 chr13 ch13 13
>> 14 FALSE TRUE FALSE 14 chr14 ch14 14
>> 15 FALSE TRUE FALSE 15 chr15 ch15 15
>> 16 FALSE TRUE FALSE 16 chr16 ch16 16
>> 17 FALSE TRUE FALSE 17 chr17 ch17 17
>> 18 FALSE TRUE FALSE 18 chr18 ch18 18
>> 19 FALSE TRUE FALSE 19 chr19 ch19 19
>> 20 FALSE TRUE FALSE 20 chr20 ch20 20
>> 21 FALSE TRUE FALSE 21 chr21 ch21 21
>> 22 FALSE TRUE FALSE 22 chr22 ch22 22
>> 23 FALSE FALSE TRUE X chrX chX X
>> 24 FALSE FALSE TRUE Y chrY chY Y
>> 25 TRUE FALSE FALSE MT chrM chMT MT
>>
>> this leads to the previous use case with a BAM file, and also to this
>> other one:
>>
>> library(BSgenome.Hsapiens.NCBI.GRCh38)
>>
>> seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)
>> warning! Multiple seqlevels styles found.
>> [1] "NCBI" "Ensembl"
>>
>> note that now if you have a UCSC-style GRanges object like this one:
>>
>> gr <- GRanges(c("chr1", "chr2"), IRanges(c(10, 20), c(30, 40)))
>> seqlevelsStyle(gr)
>> [1] "UCSC"
>>
>> that you want to use with the BSgenome object, the following simple
>> operation will not work anymore:
>>
>> seqlevelsStyle(gr) <- seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)
>> warning! Multiple seqlevels styles found.
>> Error in mapSeqlevels(x_seqlevels, value, drop = FALSE) :
>> the supplied seqname style must be a single string
>>
>> of course, this has an easy fix:
>>
>> seqlevelsStyle(gr) <- seqlevelsStyle(BSgenome.Hsapiens.NCBI.GRCh38)[1]
>> warning! Multiple seqlevels styles found.
>>
>> but in my view, we loose simplicity in the grammar of moving back and
>> forth between sequence styles. one workaround to keep the former simpler
>> syntax would be that 'seqlevelsStyle()<-' takes the first value and
>> gives a warning instead of prompting an error.
>>
>> in any case, just out of curiosity, what is the reason to have two
>> identical sequence styles under different names?
>>
>>
>> thanks!
>>
>> robert.
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioc-devel mailing list