[BioC] Reading GFF using Starr
Wolfgang Huber
whuber at embl.de
Fri Mar 4 23:22:38 CET 2011
Dear Feseha
I would suggest omitting the 'feature' argument in your call to
'read.gffAnno' and then select those rows that you care about yourself.
The 'Starr' maintainer might be able to provide more details in the
function's manual page, or to allow 'feature' to be a vector or a
regular expression.
Best wishes
Wolfgang
Il Mar/4/11 9:44 PM, Feseha Abebe-Akele ha scritto:
> Dear Wolfgang;
>
> "cat" indeed helped reading the GFF. However, I am still unclear about the
> feature="transcript" parameter. In the example that shipped with the
> package
> all entries are "transcript". In the gff I downloaded from NCBI the same
> column is populated by things like CDS, gene, tRNA etc.. Am I suposed to
> convert entries like: CDS, gene, mRNA, tRNA, snRNA ... which appear in the
> 4th column of the gff in to a generic "transcript" entry or would Starr
> take
> them in as is with the feature="transcript" parameter and use them?
>
> Thanks a lot.
>
> Feseha
>
>
>
> * Wolfgang Huber <whuber at embl.de> [Fri 04 Mar 2011 01:30:18 PM EST]:
>
>> Dear Feseha
>>
>> I am not sure whether this will solve your question, but have you tried
>>
>> cat chrI.gff chrII.gff chrIII.gff chrIV.gff chrV.gff chrX.gff > all.gff
>>
>> (on the OS command line) and then
>>
>> transcriptAnno = read.gffAnno("all.gff", feature="transcript")
>>
>> (in R). Alternatively, if you are so unfortunate to work with an
>> operating system that does not have 'cat', you could also e.g. use R's
>> readLines and writeLines.
>>
>> Best wishes
>> Wolfgang
>>
>>
>>
>> Il Mar/2/11 3:48 AM, Feseha Abebe-Akele ha scritto:
>>> Hello everyone;
>>> I am trying to analyze Tiling array data using Starr Package
>>> and I am stuck at reading GFF files for the 7 genomic sequences
>>> of C. elegans. In the example that come with the vignette, a
>>> single primordial gff file (20 lines?) is used whic is not
>>> anywhere near the 56 MN (combined) gff files.
>>>
>>> My question is: how do I read in multiple gff files for analysis?
>>> among other things I have tried reading them like:
>>>
>>> gffs <- c(file.path(dataPath,"chrI.gff"),
>>> file.path(dataPath,"chrII.gff"), file.path(dataPath,"chrIII.gff"),
>>> file.path(dataPath,"chrIV.gff"), file.path(dataPath,"chrV.gff"),
>>> file.path(dataPath,"chrX.gff"))
>>>
>>> transcriptAnno <- read.gffAnno(gffs, feature="transcript")
>>>
>>> But none worked for me.
>>>
>>> I would appreciate any help in getting my analysis to the next level:
>>>
>>> FYI:
>>> I am trying to analyze TEST vs CONTROL experession differential
>>> on the C. elegans Tiling Array 1.0 chips.
>>>
>>> Thanks
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>>
>>
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
More information about the Bioconductor
mailing list