[BioC] read.ilmn error: length of 'dimnames' [1] not equal to array extent

Sun Nov 18 11:28:13 CET 2012

Hi Mark,

Thanks for the info. Yes, your control data are definitely not in GenomeStudio format so read.ilmn wouldn't be able to read in data from it.

To use your control data for the normalization, you can read in your control data (using read.table for example), convert it to a matrix and then combine (rbind) it with the expression data of your regular probes ('E' component of the object returned from read.ilmn command). You can then call 'neqc' to perform the normalization. But you will have to provide a 'status' vector to neqc. This vector should specify the probe types. You may assign 'regular' to the gene probes and assign control probes with corresponding probe types using information in your control data (eg. 'negative', 'biotin','housekeeping' etc.).

Hope you can work this out, but let me know if I can be of any further help.

Cheers,

Wei

On Nov 18, 2012, at 12:11 AM, Mark Ebbert wrote:

> Wei,
> 
> I figured out part of the problem. The file with the sample data contains columns that read.ilmn is not expecting. Digging through the source code I discovered that there were multiple column names that *contain* the word "PROBE_ID". So when .read.oneilmnfile calls "pids <- x[, grep(tolower(probeid), tolower(colnames(x)))]" it gets multiple columns back which causes the error when setting row names with "rownames(elist$E) <- pids". I deleted the extra column ("OBSOLETE_PROBE_ID") and was able to read in the sample data.
> 
> I still cannot read in the control data, however. I have included the first 10 lines below. After comparing this file to the example file from the Mammary Progenitor data, it's clear that my file is not in the same format. I'm not sure how to ask them to export the control data.
> 
> Thanks for your help!
> 
> 
> ### BEGIN FIRST 10 LINES OF CONTROL DATA ###
> ID      8636 - 53R      9431 - 80R      8629 - 49L      9437 - 86R      9434 - 82L      9428 - 70L      9422 - 73L      8648 - 59R      9427 - 70R      9438 - 86L      8649 - 59L      8624 - 47R      9433 - 82R      9436 - 84L      8625 - 47L      9423 - 77R      9426 - 76L      9421 - 73R      8637 - 53L      9425 - 76R      8640 - 42R      8620 - 45R      9424 - 77L      9435 - 84R      9430 - 78L      9429 - 78R      8628 - 49R      9432 - 80L      8631 - 50L      8641 - 42L      8630 - 50R      8621 - 45L
> Detected Genes (0.01)   6189    7056    6573    7359    6734    6758    6058    5160    6025    7469    6106    5919    5781    7609    5284    5782    6583    6433    6182    6566    6263    6101    6200    7262    6148    6579    6539    6926    5585    5948    5978    6866
> Detected Genes (0.05)   7486    8522    8125    8697    8152    8049    7471    6476    7591    8940    7468    7366    7076    8911    6698    7178    8000    7756    7534    8070    7741    7477    7673    8538    7651    7943    8082    8406    7110    7370    7704    8487
> Signal Average  563.274 700.856 646.443 724.202 641.452 572.532 535.633 316.646 588.907 751.397 595.575 601.884 423.521 701.26  399.673 475.459 620.766 524.632 599.798 538.671 571.853 434.97  531.654 684.812 540.199 552.591 582.514 633.396 357.146 459.406 556.973 578.572
> Signal P05      88.4618 87.9676 90.4178 88.9003 89.4656 87.836  87.8903 88.0403 115.642 116.99  116.533 114.427 114.204 117.668 110.959 111.491 111.402 102.736 106.848 107.111 104.073 101.953 101.687 103.844 101.826 99.6758 103.058 102.988 94.866  99.949  98.9388 95.6721
> Signal P25      99.773  99.2284 101.83  100.656 100.027 98.3976 97.9409 97.7289 130.536 131.414 130.087 127.673 126.811 131.573 123.858 125.926 125.11  114.428 119.247 119.54  115.425 113.579 113.176 116.577 114.845 111.179 114.728 114.671 105.125 111.011 110.364 108.536
> Signal P50      112.893 115.941 118.002 118.766 114.809 113.539 110.227 106.945 145.676 150.74  143.768 141.147 138.796 151.626 134.729 139.328 141.859 128.488 133.425 135.119 129.605 125.757 127.021 134.554 129.241 125.657 129.745 130.51  115.146 123.399 123.949 126.521
> Signal P75      226.307 306.497 277.366 330.317 283.021 260.861 214.124 144.805 269.787 371.962 262.388 257.241 217.723 366.383 188.87  227.385 298.45  242.636 265.713 269.495 256.326 205.254 237.958 334.767 257.626 263.459 276.011 303.819 170.6   212.57  243.08  298.179
> Signal P95      1933.16 2683.67 2362.11 2746.05 2430.87 2054.33 1849.95 821.062 1983.76 2773.07 1939.22 2041.77 1301.86 2547.98 1144.34 1402.24 2241.74 1667.22 2061.29 1787.84 1955.92 1301.94 1743.09 2547.19 1867.7  1984.56 2045.33 2358.17 1050.88 1421.2  1946.83 2093.26
> BIOTIN  6978.78 7279.75 7220.51 8722.11 9062.96 8442.25 8455.43 8197.56 11915.3 13687.3 12582.9 12884.3 13716.5 12710.7 13519   13061.7 13325.7 13305.8 14005.4 14736   16656.5 16071.6 16082.8 16576.3 12724.8 12945   13507.2 12573   12713   12840.7 12969.9 11471.8
> CY3_HYB 3233.7  3539.25 3489.48 3690.17 3659.12 3567.58 3495.55 3637.83 4627.33 4840.17 4942.27 4967.1  5534.5  4872.86 5115.99 5174.79 5176.34 5177.32 5242.69 5413.66 5637.46 6012.44 5675.89 5625.28 4688.33 4656.13 4834.07 4987.29 4584.55 4530.13 4617.75 4065.53
> HOUSEKEEPING    12691.3 17449.3 15492.7 18813.7 15275.8 16440.9 12521.1 7093.01 12566.1 18260.7 13124.4 12132.2 9642.46 18259   8504.51 12559.6 16721.7 12861.5 12751.4 16050.2 12026.1 9285.31 13422.9 15044.1 13949.6 14376.2 13672.3 14656   8102.84 10304.4 12381.6 13525.8
> LABELING        95.8223 93.0128 95.2206 96.4703 94.7787 93.499  94.2896 96.351  124.036 124.988 125.079 122.569 122.38  125.969 121.753 121.724 119.831 110.09  115.534 113.135 111.39  110.596 107.5   111.8   109.309 105.768 111.535 109.78  101.358 107.106 107.802 104.707
> LOW_STRINGENCY_HYB      3248.83 3548.49 3499.77 3703.72 3676.67 3582.29 3512.34 3644.03 4646.39 4852.42 4971.08 4989.31 5563.07 4902.84 5126.24 5191.66 5206.69 5196.74 5277.94 5440.25 5676.79 6043.03 5724.67 5663.11 4736.44 4707.98 4886.81 5034.88 4632.57 4568.5  4664.51 4094.51
> ### END ###
> 
> On Nov 17, 2012, at 1:52 AM, Wei Shi wrote:
> 
> Dear Mark,
> 
> Could you provide the commands you used for reading the data and also session info? It will also be helpful to dissect the problem if you could provide the first 10 lines of your sample data.
> 
> Cheers,
> 
> Wei
> 
> On Nov 17, 2012, at 6:16 AM, Mark Ebbert wrote:
> 
> Hi,
> 
> This is my first time analyzing Illumina microarray data, so I'm not familiar with the data format as exported by BeadStudio or GenomeStudio. The problem is that I can't get read.ilmn to read in either the sample data or the control data. The error I get when reading in the sample data alone is as follows:
> 
> Error in `rownames<-`(`*tmp*`, value = list(PROBE_ID = c("ILMN_2896528",  :
> length of 'dimnames' [1] not equal to array extent
> 
> When reading in the control data I get the following error:
> 
> Error in readGenericHeader(fname, columns = expr) :
> Specified column headings not found in file
> 
> This data comes from the Genome Technology Access Center at WashU and they claim the data is directly exported from GenomeStudio.
> 
> Could it be from unexpected characters in the file (e.g. R doesn't like #, -, and other characters in certain situations)? There aren't any #'s in the file, but I'm curious is read.ilmn handles these cases. The sample names do have '-' in the name, but I tried removing those and it didn't make a difference.
> 
> I appreciate your help!
> 
> Mark
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:14}}