[BioC] CGH analysis without genome positions

Wed Apr 7 23:36:41 CEST 2010

On Wed, Apr 7, 2010 at 4:25 PM, adam_pgsql <adam_pgsql at witneyweb.org> wrote:
>
> On 7 Apr 2010, at 00:01, Sean Davis wrote:
>
>> On Tue, Apr 6, 2010 at 6:53 PM, adam_pgsql <adam_pgsql at witneyweb.org> wrote:
>>>
>>> On 6 Apr 2010, at 19:15, Sean Davis wrote:
>>>
>>>> On Tue, Apr 6, 2010 at 1:58 PM, Sean Davis <seandavi at gmail.com> wrote:
>>>>> On Tue, Apr 6, 2010 at 1:47 PM, adam_pgsql <adam_pgsql at witneyweb.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to do some CGH analysis with Agilent arrays, but all the analyses methods seem to require genome position information. Does anyone know of any packages that will call genes as present/absent without the genome position?
>>>>>>
>>>>>
>>>>> I don't think of CGH analysis as "present/absent", but perhaps I am
>>>>> not clear on what you mean by CGH analysis.  For Agilent arrays,
>>>>> presumably you have two colors, one representing the sample and the
>>>>> other the reference.  Simply make a ratio and then rank the probes
>>>>> based on that.
>>>>
>>>> I'm making an assumption here that you are using some custom array
>>>> based on an organism with no assembled genome.  If there is an
>>>> assembled genome, then you should map your probes to the genome using
>>>> an alignment tool (blast, blat, etc.) and use those alignments for
>>>> more standard CGH analysis.
>>>
>>> Thanks Sean for your reply.
>>>
>>> This is a custom bacterial pan-genome array. The problem is that many of the oligos target genes found in unfinished genome sequences (not the reference strain) and as such I don't really have a genome position. Also due to the nature of bacterial genomes when i hybridise DNA from unsequenced strains there is no guarantee that the gene arrangement would be exactly the same as the sequenced reference strain.
>>>
>>> in terms of "present/absent" i would like to score each gene sequence represented on the array as present or absent in the test strain. I guess this could be done by ranking the ratios and determining some cutoff for presence or absence, but the question is are there any tools that provide a more statistically sound approach to suggesting a good cutioff value to use?
>>>
>>
>> Hi, Adam.
>>
>> There are many ways to go here, but one would really need to know the
>> experimental design in more detail.  If you have replicates, then
>> there are MANY statistical methodologies that could be applied to find
>> differences between the reference and the test.  Any gene expression
>> hypothesis testing packages could probably be applied.
>>
>> Sean
>
> thanks again for your reply Sean.
>
> for the arrays that have been performed so far, the design is simply test against reference strain (no biological replicates), 3 or more different oligos per gene, printed in duplicate. The problem with the reference design is that as I mentioned before many of the oligos map to genes that are not present in the reference strain, so there will be lots of features with little or no signal in the reference channel. We would in fact like to be able to do this with single colour data if possible. Are there any packages that could help with this?
>

Agilent generates several statistics that might be relevant.  You
might look at the Feature Extraction manual to determine which columns
of output will help you determine if a single channel is thought to be
above background.

In any case, I don't think there are any bioconductor packages that
will do exactly what you want without some creativity.

Sean