[Bioc-sig-seq] BED to WIG format conversion

Ivan Gregoretti ivangreg at gmail.com
Fri Sep 18 15:24:24 CEST 2009


Hi Parick and everybody,


> To everyone,
> What other data reduction operations would you like to have on bed file
> import?
>
>
> Patrick

BED functionality must-haves:

well, a very common task is to load all chromosome BED records but
segregating by strand. In ChIP-seq analysis for example, an
accumulation of forward reads and the left and reverse reads on the
right is a good indicator of true peak presence.

So, we need to be given the choice of loading "+", "-", or
unspecified. The BED specification
http://genome.ucsc.edu/goldenPath/help/customTrack.html#BED
says that a record without field number 6 (strand) is perfectly valid.

Now, regarding the WIG block counting, the user should be able to
specify the shiftSize. What's shiftSize? Well, each read is only the
end of a DNA fragment that is typically 120 to 200 bases. So, the
inferred position of the fragment should the its start position plus
60 to 100 bases. If the fragment matches the reverse strand, then the
inferred centre of the fragment should be it 'end' minus 60 to 100.
That is the shiftSize.

When no strand is specified, the centre of tag should be an acceptable choice.

BED functionality to brag about:

It would be extremely useful to be able to selectively load BED
records contained in a set of genomic regions. (Something like the
%in% functionality that Martin recently added to the ShortRead
package.)
So, lets imagine a tags-containing file and a big regions-containing
file. Then we'd do

myBigRegions <- import('myBigRegions.bed')
insideRegions <- import('myTags.bed', in=myBigRegions, strand=c("+"))
or also perhaps
outsideRegions <- import('myTags.bed', not_in=myBigRegions, strand=c("+"))

Thank you,

Ivan



Ivan Gregoretti, PhD
National Institute of Diabetes and Digestive and Kidney Diseases
National Institutes of Health
5 Memorial Dr, Building 5, Room 205.
Bethesda, MD 20892. USA.
Phone: 1-301-496-1592
Fax: 1-301-496-9878



More information about the Bioc-sig-sequencing mailing list