Dear Pan Du et al  (and other bioconductoRs):

 

First, let me say that I have been using the lumi package for a few
years now, and I appreciate the time and effort it takes to maintain a
package. I am very grateful.

 

I just recently received some data from Illumina's Human HT-12 (V3)
chip. It is the first time I've used it with lumi. I decided to finally
take advantage of the controlData slot by using the addControlData2lumi
function. However, when using it, I got the following error message.

 

> addControlData2lumi(controlFile, eth.lumi)

[1] "Inputting the data ..."

[1] "Adding nuID to the data ..."

Error in addControlData2lumi(controlFile, eth.lumi) : 

  SampleID does not match up between controlData and x.lumi!

In addition: Warning message:

In addNuId2lumi(x.lumi, lib = lib) :

  Please provide the annotation file or lumi annotation library!

 

I am primarily concerned about the message: "SampleID does not match up
between controlData and x.lumi!"

I had the problem with both lumi 1.6.3 and 1.4.0.

 

Digging into the function, it appears that the creation of the
controlData dataframe is causing the problem.

 

function (controlData, x.lumi) 

{

    if (missing(x.lumi) || missing(controlData)) 

        stop("Both controlData and x.lumi are required!")

    if (is.character(controlData)) {

        controlFile <- controlData

        allControlInfo <- lumiR.batch(controlFile, lib = NULL, 

            checkDupId = FALSE)

        controlData <- as.data.frame(exprs(allControlInfo))

        controlType <-
as.character(pData(featureData(allControlInfo))$TargetID)

        ProbeID <-
as.character(pData(featureData(allControlInfo))$ProbeID)

        controlData <- data.frame(controlType = controlType, 

            ProbeID = ProbeID, controlData)

    }

    if (is.matrix(controlData)) 

        controlData <- as.data.frame(controlData)

    if (is(controlData, "data.frame")) {

        sampleID <- as.character(pData(phenoData(x.lumi))$sampleID)

        if (is.null(sampleID)) 

            sampleID <- sampleNames(x.lumi)

        controlSampleID <- names(controlData)

        if ("TargetID" %in% controlSampleID) {

            controlSampleID[controlSampleID == "TargetID"] <-
"controlType"

        }

        if (all(sampleID %in% controlSampleID)) {

            x.lumi@controlData <- controlData[, c("controlType", 

                "ProbeID", sampleID)]

        }

        else {

            sampleIDInfo <- strsplit(sampleID, split = "_")

            newID <- NULL

            temp <- lapply(sampleIDInfo, function(x) {

                newID <<- c(newID, paste(x[1:2], collapse = "_"))

            })

            if (all(newID %in% controlSampleID)) {

                x.lumi@controlData <- controlData[, c("controlType", 

                  "ProbeID", newID)]

            }

            else {

                stop("SampleID does not match up between controlData and
x.lumi!")

            }

        }

        names(x.lumi@controlData) <- c("controlType", "ProbeID", 

            sampleNames(x.lumi))

    }

    else {

        stop("Input data type is not supported!")

    }

    return(x.lumi)

}

 

When controlData is first defined as 

        controlData <- as.data.frame(exprs(allControlInfo))

the variable names from my data are:

[1] "4421321204_A" "4421321204_B" "4421321204_C" "4421321204_D"
"4421321204_E" "4421321204_F"

 [7] "4421321204_G" "4421321204_H" "4421321204_I" "4421321204_J"
"4421321204_K" "4421321204_L"

which are just the chip names.

 

After this line: 

        controlData <- data.frame(controlType = controlType, 

            ProbeID = ProbeID, controlData)

the variable names from my data become:

1] "controlType"   "ProbeID"       "X4421321204_A" "X4421321204_B"
"X4421321204_C" "X4421321204_D"

 [7] "X4421321204_E" "X4421321204_F" "X4421321204_G" "X4421321204_H"
"X4421321204_I" "X4421321204_J"

[13] "X4421321204_K" "X4421321204_L"

As you can see, the chip names now have the letter X appended. Thus,
when the function compares controlSampleID to newID, it says they do not
match.

 

The X prefix on the chip names comes from the  make.names function being
employed because the default behavior of data.frame is to have
check.names=T, which triggers make.names. I modified
addControlData2lumi() by adding check.names=F as such:

        controlData <- data.frame(controlType = controlType, 

            ProbeID = ProbeID, controlData,check.names=F)

 

Then the function works fine.

 

So I suggest making this change, although I don't know what unintended
behavior this could cause with other lumi functions. If the change is
made, more code might be needed to make sure that chip names are not
duplicated.

 

I hope this message has been useful, and that I have not overlooked
other messages on this topic. Thank for your time.

 

Regards,
Wade

 

 

J. Wade Davis, PhD

University of Missouri

 


	[[alternative HTML version deleted]]

