[Bioc-devel] ExperimentHub::GSE62944 outdated

Ludwig Geistlinger Ludwig.Geistlinger at bio.ifi.lmu.de
Thu Jun 2 16:06:49 CEST 2016


Hi,

I would like to do some analysis on the TCGA data as provided in
ExperimentHub's GSE62944 ExpressionSet.

The Description of the dataset reads:

"TCGA re-processed RNA-Seq data from 9264 Tumor Samples and 741 normal
samples across 24 cancer types"

However, when loading the dataset via

> eh <- ExperimentHub()
> query(eh , "GSE62944")
> tcga_data <- eh[["EH1"]]

and counting the samples

> dim(tcga_data)
Features  Samples
   23368     7706

as well as the cancer types

> length(table(pData(tcga_data)[,"CancerType"]))

results in the observed discrepancies with the above description,
indicating that this is an outdated version of the dataset.

Is it possible to

(1) update it accordingly
(2) include a varLabel, i.e. pData column indicating whether this is a
tumor or an adjacent normal sample for the respective cancer type.

That would be great!

Thx & Best,
Ludwig

-- 
Dr. Ludwig Geistlinger

Lehr- und Forschungseinheit für Bioinformatik
Institut für Informatik
Ludwig-Maximilians-Universität München
Amalienstrasse 17, 2. Stock, Büro A201
80333 München

Tel.: 089-2180-4067
eMail: Ludwig.Geistlinger at bio.ifi.lmu.de



More information about the Bioc-devel mailing list