[BioC] read.ilmn error: length of 'dimnames' [1] not equal to array extent
Wei Shi
shi at wehi.EDU.AU
Sun Nov 18 11:28:13 CET 2012
Hi Mark,
Thanks for the info. Yes, your control data are definitely not in GenomeStudio format so read.ilmn wouldn't be able to read in data from it.
To use your control data for the normalization, you can read in your control data (using read.table for example), convert it to a matrix and then combine (rbind) it with the expression data of your regular probes ('E' component of the object returned from read.ilmn command). You can then call 'neqc' to perform the normalization. But you will have to provide a 'status' vector to neqc. This vector should specify the probe types. You may assign 'regular' to the gene probes and assign control probes with corresponding probe types using information in your control data (eg. 'negative', 'biotin','housekeeping' etc.).
Hope you can work this out, but let me know if I can be of any further help.
Cheers,
Wei
On Nov 18, 2012, at 12:11 AM, Mark Ebbert wrote:
> Wei,
>
> I figured out part of the problem. The file with the sample data contains columns that read.ilmn is not expecting. Digging through the source code I discovered that there were multiple column names that *contain* the word "PROBE_ID". So when .read.oneilmnfile calls "pids <- x[, grep(tolower(probeid), tolower(colnames(x)))]" it gets multiple columns back which causes the error when setting row names with "rownames(elist$E) <- pids". I deleted the extra column ("OBSOLETE_PROBE_ID") and was able to read in the sample data.
>
> I still cannot read in the control data, however. I have included the first 10 lines below. After comparing this file to the example file from the Mammary Progenitor data, it's clear that my file is not in the same format. I'm not sure how to ask them to export the control data.
>
> Thanks for your help!
>
>
> ### BEGIN FIRST 10 LINES OF CONTROL DATA ###
> ID 8636 - 53R 9431 - 80R 8629 - 49L 9437 - 86R 9434 - 82L 9428 - 70L 9422 - 73L 8648 - 59R 9427 - 70R 9438 - 86L 8649 - 59L 8624 - 47R 9433 - 82R 9436 - 84L 8625 - 47L 9423 - 77R 9426 - 76L 9421 - 73R 8637 - 53L 9425 - 76R 8640 - 42R 8620 - 45R 9424 - 77L 9435 - 84R 9430 - 78L 9429 - 78R 8628 - 49R 9432 - 80L 8631 - 50L 8641 - 42L 8630 - 50R 8621 - 45L
> Detected Genes (0.01) 6189 7056 6573 7359 6734 6758 6058 5160 6025 7469 6106 5919 5781 7609 5284 5782 6583 6433 6182 6566 6263 6101 6200 7262 6148 6579 6539 6926 5585 5948 5978 6866
> Detected Genes (0.05) 7486 8522 8125 8697 8152 8049 7471 6476 7591 8940 7468 7366 7076 8911 6698 7178 8000 7756 7534 8070 7741 7477 7673 8538 7651 7943 8082 8406 7110 7370 7704 8487
> Signal Average 563.274 700.856 646.443 724.202 641.452 572.532 535.633 316.646 588.907 751.397 595.575 601.884 423.521 701.26 399.673 475.459 620.766 524.632 599.798 538.671 571.853 434.97 531.654 684.812 540.199 552.591 582.514 633.396 357.146 459.406 556.973 578.572
> Signal P05 88.4618 87.9676 90.4178 88.9003 89.4656 87.836 87.8903 88.0403 115.642 116.99 116.533 114.427 114.204 117.668 110.959 111.491 111.402 102.736 106.848 107.111 104.073 101.953 101.687 103.844 101.826 99.6758 103.058 102.988 94.866 99.949 98.9388 95.6721
> Signal P25 99.773 99.2284 101.83 100.656 100.027 98.3976 97.9409 97.7289 130.536 131.414 130.087 127.673 126.811 131.573 123.858 125.926 125.11 114.428 119.247 119.54 115.425 113.579 113.176 116.577 114.845 111.179 114.728 114.671 105.125 111.011 110.364 108.536
> Signal P50 112.893 115.941 118.002 118.766 114.809 113.539 110.227 106.945 145.676 150.74 143.768 141.147 138.796 151.626 134.729 139.328 141.859 128.488 133.425 135.119 129.605 125.757 127.021 134.554 129.241 125.657 129.745 130.51 115.146 123.399 123.949 126.521
> Signal P75 226.307 306.497 277.366 330.317 283.021 260.861 214.124 144.805 269.787 371.962 262.388 257.241 217.723 366.383 188.87 227.385 298.45 242.636 265.713 269.495 256.326 205.254 237.958 334.767 257.626 263.459 276.011 303.819 170.6 212.57 243.08 298.179
> Signal P95 1933.16 2683.67 2362.11 2746.05 2430.87 2054.33 1849.95 821.062 1983.76 2773.07 1939.22 2041.77 1301.86 2547.98 1144.34 1402.24 2241.74 1667.22 2061.29 1787.84 1955.92 1301.94 1743.09 2547.19 1867.7 1984.56 2045.33 2358.17 1050.88 1421.2 1946.83 2093.26
> BIOTIN 6978.78 7279.75 7220.51 8722.11 9062.96 8442.25 8455.43 8197.56 11915.3 13687.3 12582.9 12884.3 13716.5 12710.7 13519 13061.7 13325.7 13305.8 14005.4 14736 16656.5 16071.6 16082.8 16576.3 12724.8 12945 13507.2 12573 12713 12840.7 12969.9 11471.8
> CY3_HYB 3233.7 3539.25 3489.48 3690.17 3659.12 3567.58 3495.55 3637.83 4627.33 4840.17 4942.27 4967.1 5534.5 4872.86 5115.99 5174.79 5176.34 5177.32 5242.69 5413.66 5637.46 6012.44 5675.89 5625.28 4688.33 4656.13 4834.07 4987.29 4584.55 4530.13 4617.75 4065.53
> HOUSEKEEPING 12691.3 17449.3 15492.7 18813.7 15275.8 16440.9 12521.1 7093.01 12566.1 18260.7 13124.4 12132.2 9642.46 18259 8504.51 12559.6 16721.7 12861.5 12751.4 16050.2 12026.1 9285.31 13422.9 15044.1 13949.6 14376.2 13672.3 14656 8102.84 10304.4 12381.6 13525.8
> LABELING 95.8223 93.0128 95.2206 96.4703 94.7787 93.499 94.2896 96.351 124.036 124.988 125.079 122.569 122.38 125.969 121.753 121.724 119.831 110.09 115.534 113.135 111.39 110.596 107.5 111.8 109.309 105.768 111.535 109.78 101.358 107.106 107.802 104.707
> LOW_STRINGENCY_HYB 3248.83 3548.49 3499.77 3703.72 3676.67 3582.29 3512.34 3644.03 4646.39 4852.42 4971.08 4989.31 5563.07 4902.84 5126.24 5191.66 5206.69 5196.74 5277.94 5440.25 5676.79 6043.03 5724.67 5663.11 4736.44 4707.98 4886.81 5034.88 4632.57 4568.5 4664.51 4094.51
> ### END ###
>
> On Nov 17, 2012, at 1:52 AM, Wei Shi wrote:
>
> Dear Mark,
>
> Could you provide the commands you used for reading the data and also session info? It will also be helpful to dissect the problem if you could provide the first 10 lines of your sample data.
>
> Cheers,
>
> Wei
>
> On Nov 17, 2012, at 6:16 AM, Mark Ebbert wrote:
>
> Hi,
>
> This is my first time analyzing Illumina microarray data, so I'm not familiar with the data format as exported by BeadStudio or GenomeStudio. The problem is that I can't get read.ilmn to read in either the sample data or the control data. The error I get when reading in the sample data alone is as follows:
>
> Error in `rownames<-`(`*tmp*`, value = list(PROBE_ID = c("ILMN_2896528", :
> length of 'dimnames' [1] not equal to array extent
>
> When reading in the control data I get the following error:
>
> Error in readGenericHeader(fname, columns = expr) :
> Specified column headings not found in file
>
> This data comes from the Genome Technology Access Center at WashU and they claim the data is directly exported from GenomeStudio.
>
> Could it be from unexpected characters in the file (e.g. R doesn't like #, -, and other characters in certain situations)? There aren't any #'s in the file, but I'm curious is read.ilmn handles these cases. The sample names do have '-' in the name, but I tried removing those and it didn't make a difference.
>
> I appreciate your help!
>
> Mark
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:14}}
More information about the Bioconductor
mailing list