[BioC] Missing ProbeSets in Affymetrix MoGene 1.0 ST chips

Thu Sep 11 23:58:25 CEST 2008

Hi folks, I got a reply from Affymetrix today regarding the missing  
probesets on the MoGene chip:

reply from Casey Gates from Affymetrix:
--------------------------------------------------------------------
The 48 transcript cluster IDs that you have identified as not in the  
PGF file
are from what we call low-coverage transcript clusters: those having
less than 4 probes. These tend to be very short, non-biologically
interesting sequences and were excluded from the PGF with the intent
that they should not be analyzed by users. So the advice is that you can
safely ignore them.

The reason they are in the NetAffx CSV file is that the NetAffx team
used the GFF files as a source for the array design data, which contain
these low-coverage transcript clusters. They should have been excluded
from the CSV annotation files and NetAffx website, and they will be
excluded in future annotation releases.
----------------------------------------------------------------------

I hope that helps everyone,

Mark

On 05/09/2008, at 10:27 AM, Mark Cowley wrote:

> no, not yet! I will do now.
>
> On 04/09/2008, at 10:52 PM, James W. MacDonald wrote:
>
>> Have you asked anybody at Affy?
>>
>> Mark Cowley wrote:
>>> Dear list,
>>> There are 93 transcript_cluster_id's on the MoGene 1.0 ST chip  
>>> that are listed in the csv annotation file, and searchable in the  
>>> MoGene chip at NetAffx, but that are not present in the  
>>> [unsupported] CDF file from netaffx.
>>> 45 of these ID's are present in the MoGene PGF file, and  
>>> correspond to the antigenomic probesets, but the remaining 48 are  
>>> not in the PGF file either.
>>> From NetAffx, the 48 non-control probesets are: 11 snRNA's, a  
>>> RefSeq gene (Lphn2) and 2 other novel transcripts, with the  
>>> remaining 44 having no annotation other than their genomic  
>>> location. This isn't a problem, unless Lphn2 is your gene of  
>>> interest, which it isn't in my case, but it would be nice to know  
>>> what's going on here!
>>> If you RMA normalise using the CDF file (like genespring does)  
>>> then you end up with 93 rows of missing data, or if you normalise  
>>> using the PGF/CLF files then you will end up missing out on the  
>>> remaining 48 probesets.
>>> Has anyone else come across this and know what is going on here??
>>> These transcript_cluster_ids are:
>>> c("10361826", "10362430", "10362444", "10362452", "10502768",  
>>> "10532622", "10349381", "10350469", "10354866", "10362438",  
>>> "10362872", "10369759", "10374030", "10391748", "10395778",  
>>> "10411504", "10422960", "10436496", "10436660", "10446349",  
>>> "10453719", "10457089", "10458079", "10460144", "10461932",  
>>> "10481652", "10482786", "10487009", "10498317", "10501216",  
>>> "10502040", "10503414", "10513713", "10521665", "10535929",  
>>> "10546555", "10552810", "10553535", "10560364", "10582560",  
>>> "10582566", "10582570", "10582576", "10585872", "10586931",  
>>> "10592453", "10601614", "10602194", "10338002", "10338005",  
>>> "10338006", "10338007", "10338008", "10338009", "10338010",  
>>> "10338011", "10338012", "10338013", "10338014", "10338015",  
>>> "10338016", "10338018", "10338019", "10338020", "10338021",  
>>> "10338022", "10338023", "10338024", "10338027", "10338028",  
>>> "10338030", "10338031", "10338032", "10338033", "10338034",  
>>> "10338038", "10338039", "10338040", "10338043", "10338045",  
>>> "10338046", "10338048", "10338049", "10338050", "10338051",  
>>> "10338052", "10338053", "10338054", "10338055", "10338057",  
>>> "10338058", "10338061", "10338062")
>>> cheers,
>>> Mark
>>> -----------------------------------------------------
>>> Mark Cowley, BSc (Bioinformatics)(Hons)
>>> Peter Wills Bioinformatics Centre
>>> Garvan Institute of Medical Research, Sydney, Australia
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> Hildebrandt Lab
>> 8220D MSRB III
>> 1150 W. Medical Center Drive
>> Ann Arbor MI 48109-0646
>> 734-936-8662
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor