[BioC] ComBat: Could it utilize technical replicates?
Essi Laajala
essi.laajala at gmail.com
Mon Aug 12 12:29:11 CEST 2013
Dear Evan,
Thank you for your message! That's a good plan but don't you think it
should lead to the singularity error? At least that's what it does to me.
I've been using the old ComBat script and now I tried the Bioconductor
version as well. I've attached the real sample_info file. (Sorry it's a bit
complicated: there are actually 4 batches and batch 4 is the one with the
re-hybridizations. I have 113 samples but 15 are re-hybridized so
altogether 128 arrays. The "high risk" group has the label E, "medium risk"
is T and "low risk" is P.) Here's what I did with the Bioconductor ComBat:
> library(sva)
> b <- sample_info[,"Batch"]
> mm <- model.matrix(~as.factor(Covariate1), data=sample_info)
> data_combat <- ComBat(exprs_data, b, mm)
Found 4 batches
Found 112 categorical covariate(s)
Standardizing Data across genes
Error in solve.default(t(design) %*% design) :
Lapack routine dgesv: system is exactly singular: U[51,51] = 0
Best regards,
Essi
On Sun, Aug 11, 2013 at 2:57 AM, Johnson, William Evan <wej at bu.edu> wrote:
> Hi Essi,
>
> Yes, ComBat can definitely utilize this information. Just replace your
> current 'Covariate 1' with a covariate that just has the sample letter
> (e.g. A, B, C, C, D, D, E, ... ). Note that this will be sufficient because
> your 'Covariate 1' is nested within sample letter. Under this setup, ComBat
> will preserve all variation due to sample type (and as a result risk level)
> and effectively just use the repeated samples to adjust for batch.
>
> Hope this helps. Thanks!
>
> Evan
>
>
> On Aug 9, 2013, at 8:23 AM, Essi Laajala wrote:
>
> > Hi,
> >
> > I'm dealing with quite an unusual study design. Originally (due to
> unfortunate and inevitable circumstances) we had all "high_risk" and
> "medium_risk" samples on batch 1 and "low_risk" samples on batch 2. Then we
> discussed the batch effect and decided to re-hybridize some randomly
> selected samples from each risk group on batch 3. The resulting study
> design looks a bit like like this (in reality we have 30 - 45 samples in
> each group and 16 samples re-hybridized but you'll get an idea):
> >
> > Array name Batch Covariate 1
> > sample_A 1 High_risk
> > sample_B 1 High_risk
> > sample_C 1 High_risk
> > sample_C_2 3 High_risk
> > sample_D 1 High_risk
> > sample_D_2 3 High_risk
> > sample_E 1 Medium_risk
> > sample_F 1 Medium_risk
> > sample_G 1 Medium_risk
> > sample_G_2 3 Medium_risk
> > sample_H 2 Low_risk
> > sample_I 2 Low_risk
> > sample_J 2 Low_risk
> > sample_J_2 3 Low_risk
> > sample_K 2 Low_risk
> > sample_K_2 3 Low_risk
> >
> > For example Sample_C and Sample_C_2 are the same RNA sample and the only
> difference between them is the batch (the same applies to D and D_2 etc.).
> Such array pairs should be valuable for estimating batch effects. The
> question is: Can ComBat utilize this information? Or can you recommend some
> other batch correction method that could? For now, I've applied ComBat
> after removing the replicated samples on batches 1 and 2 (C, D, G, J and K
> in the above example) but this is certainly not an optimal solution.
> >
> > Best regards,
> >
> > Essi Laajala
> > PhD student in bioinformatics
> > Turku, Finland
> >
>
>
-------------- next part --------------
"Batch" "Covariate1"
"E0107_53_E05.CEL" 1 "E0107"
"E0112_54_E06.CEL" 1 "E0112"
"E0116_55_E07.CEL" 1 "E0116"
"E022_13_B01.CEL" 1 "E022"
"E024_14_B02.CEL" 1 "E024"
"E025_15_B03.CEL" 1 "E025"
"E026_16_B04.CEL" 1 "E026"
"E027_17_B05.CEL" 1 "E027"
"E029_130023_E07.CEL" 4 "E029"
"E029_18_B06.CEL" 1 "E029"
"E031_19_B07.CEL" 1 "E031"
"E033_20_B08.CEL" 1 "E033"
"E036_21_B09.CEL" 1 "E036"
"E044_22_B10.CEL" 1 "E044"
"E048_23_B11.CEL" 1 "E048"
"E049_24_B12.CEL" 1 "E049"
"E050_25_C01.CEL" 1 "E050"
"E051_26_C02.CEL" 1 "E051"
"E052_27_C03.CEL" 1 "E052"
"E054_28_C04.CEL" 1 "E054"
"E057_30_C06.CEL" 1 "E057"
"E060_31_C07.CEL" 1 "E060"
"E061_32_C08.CEL" 1 "E061"
"E063_33_C09.CEL" 1 "E063"
"E066_34_C10.CEL" 1 "E066"
"E067_130023_F07.CEL" 4 "E067"
"E067_35_C11.CEL" 1 "E067"
"E068_36_C12.CEL" 1 "E068"
"E069_37_D01.CEL" 1 "E069"
"E070_38_D02.CEL" 1 "E070"
"E071_39_D03.CEL" 1 "E071"
"E074_40_D04.CEL" 1 "E074"
"E082_41_D05.CEL" 1 "E082"
"E083_42_D06.CEL" 1 "E083"
"E086_130023_G07.CEL" 4 "E086"
"E086_43_D07.CEL" 1 "E086"
"E088_44_D08.CEL" 1 "E088"
"E091_45_D09.CEL" 1 "E091"
"E093_46_D10.CEL" 1 "E093"
"E096_47_D11.CEL" 1 "E096"
"E098_48_D12.CEL" 1 "E098"
"E102_49_E01.CEL" 1 "E102"
"E104_130023_H07.CEL" 4 "E104"
"E104_50_E02.CEL" 1 "E104"
"E105_51_E03.CEL" 1 "E105"
"E106_52_E04.CEL" 1 "E106"
"E118_56_E08.CEL" 1 "E118"
"E120_57_E09.CEL" 1 "E120"
"E121_58_110049_E09.CEL" 2 "E121"
"E125_59_110049_F09.CEL" 2 "E125"
"E128_60_110049_G09.CEL" 2 "E128"
"E133_61_110049_H09.CEL" 2 "E133"
"P001_130003_A05.CEL" 3 "P001"
"P005_130003_B05.CEL" 3 "P005"
"P009_130003_C05.CEL" 3 "P009"
"P014_130003_D05.CEL" 3 "P014"
"P017_130003_E05.CEL" 3 "P017"
"P017_130023_A05.CEL" 4 "P017"
"P020_130003_G05.CEL" 3 "P020"
"P021_130003_H05.CEL" 3 "P021"
"P024_130003_A07.CEL" 3 "P024"
"P025_130003_B07.CEL" 3 "P025"
"P025_130023_C05.CEL" 4 "P025"
"P026_130003_C07.CEL" 3 "P026"
"P027_130003_D07.CEL" 3 "P027"
"P028_130003_E07.CEL" 3 "P028"
"P030_130003_F07.CEL" 3 "P030"
"P031_130003_G07.CEL" 3 "P031"
"P033_130003_H07.CEL" 3 "P033"
"P033_130023_D05.CEL" 4 "P033"
"P035_130003_A09.CEL" 3 "P035"
"P036_130003_B09.CEL" 3 "P036"
"P039_130003_C09.CEL" 3 "P039"
"P041_130003_D09.CEL" 3 "P041"
"P041_130023_E05.CEL" 4 "P041"
"P042_130003_E09.CEL" 3 "P042"
"P042_130023_F05.CEL" 4 "P042"
"P044_130003_F09.CEL" 3 "P044"
"P045_130003_G09.CEL" 3 "P045"
"P046_130003_H09.CEL" 3 "P046"
"P047_130003_A05.CEL" 3 "P047"
"P048_130003_B05.CEL" 3 "P048"
"P052_130003_C05.CEL" 3 "P052"
"P054_130003_D05.CEL" 3 "P054"
"P055_130003_E05.CEL" 3 "P055"
"P056_130003_F05.CEL" 3 "P056"
"P061_130003_G05.CEL" 3 "P061"
"P063_130003_H05.CEL" 3 "P063"
"P066_130003_A07.CEL" 3 "P066"
"P066_130023_G05.CEL" 4 "P066"
"P067_130003_B07.CEL" 3 "P067"
"P070_130003_C07.CEL" 3 "P070"
"P072_130003_D07.CEL" 3 "P072"
"P073_130003_E07.CEL" 3 "P073"
"P074_130003_F07.CEL" 3 "P074"
"P075_130003_G07.CEL" 3 "P075"
"P077_130003_H07.CEL" 3 "P077"
"P082_130003_A09.CEL" 3 "P082"
"P082_130023_H05.CEL" 4 "P082"
"T021_130023_A07.CEL" 4 "T021"
"T021_68_F04.CEL" 1 "T021"
"T032_71_F07.CEL" 1 "T032"
"T038_72_F08.CEL" 1 "T038"
"T056_73_F09.CEL" 1 "T056"
"T059_74_F10.CEL" 1 "T059"
"T062_75_F11.CEL" 1 "T062"
"T063_76_F12.CEL" 1 "T063"
"T064_77_G01.CEL" 1 "T064"
"T065_78_G02.CEL" 1 "T065"
"T066_79_G03.CEL" 1 "T066"
"T069_80_G04.CEL" 1 "T069"
"T070_81_G05.CEL" 1 "T070"
"T071_82_G06.CEL" 1 "T071"
"T073_83_G07.CEL" 1 "T073"
"T076_84_G08.CEL" 1 "T076"
"T077_85_G09.CEL" 1 "T077"
"T078_130023_B07.CEL" 4 "T078"
"T078_86_G10.CEL" 1 "T078"
"T093_110049_H09.CEL" 1 "T093"
"T094_90_H02.CEL" 1 "T094"
"T095_130023_C07.CEL" 4 "T095"
"T095_91_H03.CEL" 1 "T095"
"T099_93_H05.CEL" 1 "T099"
"T103_96_H08.CEL" 1 "T103"
"T106_98_H10.CEL" 1 "T106"
"T109_130023_D07.CEL" 4 "T109"
"T109_99_H11.CEL" 1 "T109"
"T111_100_H12.CEL" 1 "T111"
More information about the Bioconductor
mailing list