[BioC] ALL dataset by Chiaretti et al.

Robert Gentleman rgentlem at fhcrc.org
Sat Apr 16 14:11:45 CEST 2005


   All packages have maintainers (this is true for data packages too).  
And if you want to know some of the intimate details of such packages  
the maintainer is the first one to contact.
  It is very simple to find out who that is.
 > packageDescription("ALL")$Main
[1] "Xiaochun Li <xiaochun at jimmy.harvard.edu>"

And note, the source you give for the package is not appropriate. It  
was put there for students of that course, but more recent versions are  
available through standard channels (and those should be preferred).  
ie. http://www.bioconductor.org/data/experimental.html

   Next, you need to be a bit more specific about what you  want to do.  
If you want to verify the exact outputs, that in general is hard - and  
sometimes impossible. My guess is that it is impossible for this paper  
(although you should be able to come close). Why is it impossible?  
Well, you will need access to the original data (which I believe is  
available -although it seems that three CEL files have not been put up  
- I will see about tracking down the differences). Next you need access  
to the right version of the software used to do the preprocessing (in  
this case you will need to find the version of dChip that was used -  
hard to do, as it changed often, and that was some years ago). So, you  
might be able to ask the first author for some of the transformed data  
to see how things go from there. And so on. For most of the  
Bioconductor/R software used you should be able to get old versions,  
but since then bugs have been fixed, ideas improved and so on.

   If instead what you want is to come approximately close, then it is  
somewhat easier. Reading the paper should have told you that the  
analysis was on patients with T-cell ALL (and that there are two types  
of ALL, B-cell derived and T-cell derived). And then the ALL package  
 > table(ALL$BT)

  B B1 B2 B3 B4  T T1 T2 T3 T4
  5 19 36 23 12  5  1 15 10  2

  Which certainly suggests a starting point as there are precisely 33  
samples with T-cell derived ALL.
But, as I said above, different methodologies were used for  
normalization, so the actual values will be different (sometimes by a  
lot) than those used in the original paper and hence the answers you  
get will be different - but probably not by very much. If they seem to  
be very different then the first author of the original paper is the  
person to contact.

  Best wishes,

On Apr 15, 2005, at 9:30 AM, Heike Pospisil wrote:

> Dear list member,
> I would like to reconstruct the analysis presented by Chiaretti et al.  
> in BLOOD Vol. 103(7), 2004. The data set ALL within library(ALL)  
> (found at MBI Lab4 by Sandrine Dudoit et al.) consists of 128 samples.  
> But I read in the publication that 33 patients were evaluated by gene  
> expression profiling. (Btw: the source  
> http://bioconductor.org/Docs/Papers/2002/Chiaretti gives only 30  
> CEL-files.. ?) Which of the 128 samples in ALL are the 33 mentioned  
> patients in the publication?
> Moreover, it would be great to have the original R code behind the  
> above mentioned publication - does it exists somewhere in the  
> Bioconductor repository?
> Thanks in advance,
> Heike
> --  
> Dr. Heike Pospisil
> Center for Bioinformatics, University of Hamburg
> Bundesstrasse 43, 20146 Hamburg, Germany
> phone: +49-40-42838-7303 fax: +49-40-42838-7312
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
| Robert Gentleman              phone: (206) 667-7700                    
| Head, Program in Computational Biology   fax:  (206) 667-1319   |
| Division of Public Health Sciences       office: M2-B865               
| Fred Hutchinson Cancer Research Center                                 
| email: rgentlem at fhcrc.org                                              

More information about the Bioconductor mailing list