[Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

Martin, Tiphaine tiphaine.martin at kcl.ac.uk
Mon Oct 13 15:31:34 CEST 2014


both methods work well. 
Thanks,
Tiphaine

________________________________________
From: Hahne, Florian <florian.hahne at novartis.com>
Sent: 13 October 2014 08:46
To: Vincent Carey; Martin, Tiphaine
Cc: bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

Hi Tiphaine,
You can follow Vince¹s advice and transform all the data into proper ASCII
character. Or you can just get rid of the culprit (being the @biomart slot
of the object) before serialising. The easiest way to do that is:
foo at biomart <- NULL
The slot is only present to cache the BiomaRt connection, which is lost
anyways when serialising. The object is smart enough to realise that and
just reconnects the next time it is plotted. That is how I handled things
for the serialised BiomartGeneRegionTracks in Gviz.
Florian



On 12/10/14 20:35, "Vincent Carey" <stvjc at channing.harvard.edu> wrote:

>I don't know exactly how you are triggering this warning.  If you have the
>ability to prefilter your content before serializing, that may be best.
>The following
>is from the gwascat package.  You have very little chance, I believe, of
>getting an
>institutional guarantee that only ascii will go into their emissions.
>
>fixNonASCII = function(df) {
> hasNonASCII = function(x) {
>   asc = iconv(x, "latin1", "ASCII")
>   any(asc != x | is.na(asc))
>   }
> havebad = sapply(df, function(x) hasNonASCII(x))
> if (!(any(havebad))) return(df)
> message("NOTE: input data had non-ASCII characters replaced by '*'.")
> badinds = which(havebad)
> for (i in 1:length(badinds))
>   df[,badinds[i]] = iconv(df[,badinds[i]], to="ASCII", sub="*")
> df
>}
>
>
>
>On Sun, Oct 12, 2014 at 2:14 PM, Martin, Tiphaine
><tiphaine.martin at kcl.ac.uk
>> wrote:
>
>> Hi,
>>
>>
>> I need to create dataset BiomartGeneRegionTrack via Gviz package to run
>> examples in my packages. But when I run
>>
>> "R CMD check coMET", i have warning message for the checking :
>>
>>
>>  checking data for non-ASCII characters ... WARNING
>>   Warning: found non-ASCII strings
>>   '[alpha cell,acidophil cell,acinar cell,adipoblast,adipocyte,amacrine
>> cell,beta cell,capsular cell,cementocyte,chief
>> cell,chondroblast,chondrocyte,chromaffin cell,chromophobic
>> cell,corticotroph,delta cell,dendritic cell,enterochromaffin
>>
>>cell,ependymocyte,epithelium,erythroblast,erythrocyte,fibroblast,fibrocyt
>>e,follicular
>> cell,germ cell,germinal epithelium,giant cell,glial
>>cell,glioblast,goblet
>> cell,gonadotroph,granulosa cell,haemocytoblast,hair
>> cell,hepatoblast,hepatocyte,hyalocyte,interstitial cell,juxtaglomerular
>> cell,keratinocyte,keratocyte,lemmal cell,leukocyte,luteal
>>cell,lymphocytic
>> stem cell,lymphoid cell,lymphoid stem cell,macroglial
>>cell,mammotroph,mast
>>
>>cell,medulloblast,megakaryoblast,megakaryocyte,melanoblast,melanocyte,mes
>>angial
>> cell,mesothelium,metamyelocyte,monoblast,monocyte,mucous neck
>>cell,muscle
>> cell,myelocyte,myeloid cell,myeloid stem cell,myoblast,myoepithelial
>>
>>cell,myofibrobast,neuroblast,neuroepithelium,neuron,odontoblast,osteoblas
>>t,osteoclast,osteocy!
>>  te,oxyntic cell,parafollicular cell,paraluteal cell,peptic
>> cell,pericyte,phaeochromocyte,phalangeal
>>cell,pinealocyte,pituicyte,plasma
>>
>>cell,platelet,podocyte,proerythroblast,promonocyte,promyeloblast,promyelo
>>cyte,pronormoblast,reticulocyte,retinal
>> pigment epithelium,retinoblast,somatotroph,stem cell,sustentacular
>> cell,teloglial cell,zymogenic cell,small cell,Th1,Cell
>>Type,M<c3><bc>ller
>> cell,primary oocyte,Claudius' cell,Th2,follicular dendritic
>> cell,astrocyte,white,T-lymphoblast,basal cell,T-lymphocyte,helper
>>induced
>> T-lymphocyte:Th2,B-lymphocyte,neutrophil,oocyte,unclassifiable (Cell
>> Type),natural killer cell,helper induced T-lymphocyte,brown,CD4+,Hensen
>> cell,lymphocyte,cardiac muscle cell,lymphoblast,Paneth cell,alveolar
>> macrophage,macrophage,squamous cell,oligodendrocyte,smooth muscle
>> cell,gamete,spermatid,Schwann cell,CD34+,spermatocyte,helper induced
>>
>>T-lymphocyte:Th1,astroblast,eosinophil,oligodendroblast,basophil,peripher
>>al
>> blood mononuclear cell,histiocyte,Sertoli cel!
>>  l,endothelium,granulocyte,spermatozoon,Merkel cell,skeletal muscle cel
>> l,thymocyte,foam cell,ovum,secondary spermatocyte,Langerhans
>>cell,primary
>> spermatocyte,transitional,Purkinje cell,Kupffer cell,secondary
>> oocyte,B-lymphoblast]' in object 'biomTrack'
>>
>>
>> chrom <- "chr2"
>> start <- 38290160
>> end <- 38303219
>> gen <- "hg19"
>>
>>   biomTrack <- BiomartGeneRegionTrack(genome = gen,
>>                                       chromosome = chr, start = start,
>>                                       end = end,  name = "ENSEMBL",
>>                                       fontcolor="black",
>>groupAnnotation =
>> "group",
>>                                       just.group =
>>"above",showId=showId )
>>
>>
>> Do you have an idea to correct this error? I think that we need to
>>discuss
>> with EMBL to correct that, do we ?
>>
>>
>> Tiphaine
>>
>>
>> ----------------------------
>> Tiphaine Martin
>> PhD Research Student | King's College
>> The Department of Twin Research & Genetic Epidemiology | Genetics &
>> Molecular Medicine Division
>> St Thomas' Hospital
>> 4th Floor, Block D, South Wing
>> SE1 7EH, London
>> United Kingdom
>>
>> email : tiphaine.martin at kcl.ac.uk
>> Fax: +44 (0) 207 188 6761
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>       [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioc-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/bioc-devel




More information about the Bioc-devel mailing list