[BioC] ComBat
W. Evan Johnson
wej at bu.edu
Thu Nov 1 22:57:31 CET 2012
Hi Tam,
Sorry about the confusion. Two items:
1. Adding a column of numbers (row number) is a a known "harmless" bug in the ComBat function. It is basically adding a row of column numbers to your dataset, which can easily just be deleted in Excel (and the data can be shifted over). However I recognize that your case is a little different, hence:
2. On line 3154, your gene description has some character formatting that is causing issues with R's "read.table" function--which is used by ComBat. I think its reading the apostrophe as quoted text, so it is then concatenating everything after that as text until the next apostrophe. Anyway, here is how you fix it: open up your dataset ('12arraysCombatImputed_2.txt') in Excel, then save it as a .csv. Then run ComBat using the option: type='csv'. Alternatively, you can remove the second column from your dataset for ComBat and then add them back in after adjustment.
I just did this myself on your data and it worked. However, you still need to delete the column numbers from item #1 before the data are ready to go!
Also, I noticed that your variances are also not well-behaved (see the plot that comes up), so I'd recommend that you use a non-parametric prior (par.prior=F). Note that this may take an hour or so to run so make sure that the parametric prior is working before you try the non-parametric one.
Thanks!
Evan
Okay, I looked at your data. On line 3154, R's "read.table"
On Oct 31, 2012, at 10:33 AM, SSc Array Core wrote:
> I am running Combat on the attached files. 2 channel array (with reference). Problem is the adjusted file returns an added column of numbers where CLID should be. This column then stops delivering said numbers around line 3154, returning to CLID, shifting all the information and data to the left. I am stumped as to why this is happening. Please advise.
>
> thanks,
>
> tam
>
> Reading Sample Information File
> Reading Expression Data File
> Found 2 batches
> Found 1 covariate(s)
> Found 260 Missing Data Values
> Standardizing Data across genes
> Fitting L/S model and finding priors
> Finding parametric adjustments
> Adjusting the Data
> Adjusted data saved in file: Adjusted_12arraysCombatImputed_2.txt_.xls
> > ComBat('12arraysCombatImputed_2.txt','sample_info_file_mouse.txt',skip=2,write=T)
> <12arraysCombatImputed_2.txt><sample_info_file_mouse.txt><Adjusted_12arraysCombatImputed_2.txt_.xls>
More information about the Bioconductor
mailing list