[BioC] Possible problem with lumi function addControlData2lumi and proposed solution
Pan Du
dupan at northwestern.edu
Wed Oct 22 05:36:52 CEST 2008
Thanks! Davis.
However, this problem was resolved in the developing version. It should run
well if you use the developing version 1.7.xx. Anyway, thanks for reporting
the problem.
Have a nice day,
Pan
On 10/21/08 5:00 AM, "bioconductor-request at stat.math.ethz.ch"
<bioconductor-request at stat.math.ethz.ch> wrote:
> Date: Mon, 20 Oct 2008 14:51:46 -0500
> From: "Davis, Wade" <davisjwa at health.missouri.edu>
> Subject: [BioC] Possible problem with lumi function
> addControlData2lumi and proposed solution
> To: <bioconductor at stat.math.ethz.ch>
> Message-ID:
> <09E4AABB118D2C47AEDFC6210BF1DEA902F2A702 at UM-XMAIL02.um.umsystem.edu>
> Content-Type: text/plain
>
> Dear Pan Du et al (and other bioconductoRs):
>
>
>
> First, let me say that I have been using the lumi package for a few
> years now, and I appreciate the time and effort it takes to maintain a
> package. I am very grateful.
>
>
>
> I just recently received some data from Illumina's Human HT-12 (V3)
> chip. It is the first time I've used it with lumi. I decided to finally
> take advantage of the controlData slot by using the addControlData2lumi
> function. However, when using it, I got the following error message.
>
>
>
>> addControlData2lumi(controlFile, eth.lumi)
>
> [1] "Inputting the data ..."
>
> [1] "Adding nuID to the data ..."
>
> Error in addControlData2lumi(controlFile, eth.lumi) :
>
> SampleID does not match up between controlData and x.lumi!
>
> In addition: Warning message:
>
> In addNuId2lumi(x.lumi, lib = lib) :
>
> Please provide the annotation file or lumi annotation library!
>
>
>
> I am primarily concerned about the message: "SampleID does not match up
> between controlData and x.lumi!"
>
> I had the problem with both lumi 1.6.3 and 1.4.0.
>
>
>
> Digging into the function, it appears that the creation of the
> controlData dataframe is causing the problem.
>
>
>
> function (controlData, x.lumi)
>
> {
>
> if (missing(x.lumi) || missing(controlData))
>
> stop("Both controlData and x.lumi are required!")
>
> if (is.character(controlData)) {
>
> controlFile <- controlData
>
> allControlInfo <- lumiR.batch(controlFile, lib = NULL,
>
> checkDupId = FALSE)
>
> controlData <- as.data.frame(exprs(allControlInfo))
>
> controlType <-
> as.character(pData(featureData(allControlInfo))$TargetID)
>
> ProbeID <-
> as.character(pData(featureData(allControlInfo))$ProbeID)
>
> controlData <- data.frame(controlType = controlType,
>
> ProbeID = ProbeID, controlData)
>
> }
>
> if (is.matrix(controlData))
>
> controlData <- as.data.frame(controlData)
>
> if (is(controlData, "data.frame")) {
>
> sampleID <- as.character(pData(phenoData(x.lumi))$sampleID)
>
> if (is.null(sampleID))
>
> sampleID <- sampleNames(x.lumi)
>
> controlSampleID <- names(controlData)
>
> if ("TargetID" %in% controlSampleID) {
>
> controlSampleID[controlSampleID == "TargetID"] <-
> "controlType"
>
> }
>
> if (all(sampleID %in% controlSampleID)) {
>
> x.lumi at controlData <- controlData[, c("controlType",
>
> "ProbeID", sampleID)]
>
> }
>
> else {
>
> sampleIDInfo <- strsplit(sampleID, split = "_")
>
> newID <- NULL
>
> temp <- lapply(sampleIDInfo, function(x) {
>
> newID <<- c(newID, paste(x[1:2], collapse = "_"))
>
> })
>
> if (all(newID %in% controlSampleID)) {
>
> x.lumi at controlData <- controlData[, c("controlType",
>
> "ProbeID", newID)]
>
> }
>
> else {
>
> stop("SampleID does not match up between controlData and
> x.lumi!")
>
> }
>
> }
>
> names(x.lumi at controlData) <- c("controlType", "ProbeID",
>
> sampleNames(x.lumi))
>
> }
>
> else {
>
> stop("Input data type is not supported!")
>
> }
>
> return(x.lumi)
>
> }
>
>
>
> When controlData is first defined as
>
> controlData <- as.data.frame(exprs(allControlInfo))
>
> the variable names from my data are:
>
> [1] "4421321204_A" "4421321204_B" "4421321204_C" "4421321204_D"
> "4421321204_E" "4421321204_F"
>
> [7] "4421321204_G" "4421321204_H" "4421321204_I" "4421321204_J"
> "4421321204_K" "4421321204_L"
>
> which are just the chip names.
>
>
>
> After this line:
>
> controlData <- data.frame(controlType = controlType,
>
> ProbeID = ProbeID, controlData)
>
> the variable names from my data become:
>
> 1] "controlType" "ProbeID" "X4421321204_A" "X4421321204_B"
> "X4421321204_C" "X4421321204_D"
>
> [7] "X4421321204_E" "X4421321204_F" "X4421321204_G" "X4421321204_H"
> "X4421321204_I" "X4421321204_J"
>
> [13] "X4421321204_K" "X4421321204_L"
>
> As you can see, the chip names now have the letter X appended. Thus,
> when the function compares controlSampleID to newID, it says they do not
> match.
>
>
>
> The X prefix on the chip names comes from the make.names function being
> employed because the default behavior of data.frame is to have
> check.names=T, which triggers make.names. I modified
> addControlData2lumi() by adding check.names=F as such:
>
> controlData <- data.frame(controlType = controlType,
>
> ProbeID = ProbeID, controlData,check.names=F)
>
>
>
> Then the function works fine.
>
>
>
> So I suggest making this change, although I don't know what unintended
> behavior this could cause with other lumi functions. If the change is
> made, more code might be needed to make sure that chip names are not
> duplicated.
>
>
>
> I hope this message has been useful, and that I have not overlooked
> other messages on this topic. Thank for your time.
>
>
>
> Regards,
> Wade
>
>
>
>
>
> J. Wade Davis, PhD
>
> University of Missouri
>
>
More information about the Bioconductor
mailing list