[BioC] edgeR uneven group sizes
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Jul 6 01:01:03 CEST 2013
Dear Charles,
The link you give is to a user question. I replied to that post
explaining how to solve the problem without removing samples. Have you
not read my reply?
https://stat.ethz.ch/pipermail/bioconductor/2012-November/049087.html
The advice that I gave there applies also to your data.
The problem is that the model.matrix() function in R adds superfluous
columns to the design matrix that have to removed manually. In your case
you have to remove the design columns for disease patients 3 and 4,
because there are no such patients. It is beyond the scope of the edgeR
package to rewrite the model.matrix() function, which is maintained by R
core, so I can only advise on work-arounds.
Best wishes
Gordon
On Fri, 5 Jul 2013, Charles Determan Jr wrote:
> Gordon,
>
> The reason I ask is because I get an error if I attempt to run a design
> formula of (~group + group:subject + group:time) and I run
> estimateGLMCommonDisp(dge, design) I get the error:
>
> Error in glmFit.default(y, design = design, dispersion = dispersion,
> offset = offset, :
> Design matrix not of full rank. The following coefficients not estimable:
>
>
> The mailing list post I am referring to, with the same error, is at the
> following link:
> https://stat.ethz.ch/pipermail/bioconductor/2012-November/049055.html
>
> Am I simply writing the design formula incorrectly to still account for the
> subject variation?
>
> Regards,
> Charles
>
>
>
>
> On Thu, Jul 4, 2013 at 6:49 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Charles,
>>
>> There is no requirement in edgeR for equal group sizes, and never has
>> been. I am puzzled why you might think there is such an assumption. edgeR
>> always allows you to use all the available data that is scientifically
>> meaningful.
>>
>> You say that you read "the initial posting that lead to this section of
>> the manual and it said to drop the samples that don't have equal numbers"
>> but I do not know what you are refering to. I have never seen such advice.
>>
>> Best wishes
>> Gordon
>>
>> Date: Wed, 3 Jul 2013 09:49:30 -0500
>>> From: Charles Determan Jr <deter088 at umn.edu>
>>> To: Bioconductor mailing list <bioconductor at r-project.org>
>>> Subject: [BioC] edgeR uneven group sizes
>>>
>>> Hello,
>>>
>>> I recently had a question regarding repeated measures RNA-seq analysis.
>>> This has been thoroughly answered through an extension of the edgeR manual
>>> section 3.5. However this has lead to me towards another question as I
>>> attempted to extend such concepts to another experiment wherein the sample
>>> size in each group is different. For example, here is a dataframe modified
>>> from the edgeR user manual concerning between and within subjects
>>> comparisons (Section 3.5) and another containing specific times points to
>>> explain my point, both dataframes re-numbered as recommended by the manual.
>>>
>>> targets
>>>>
>>> Disease Patient Treatment
>>> 1 Healthy 1 None
>>> 2 Healthy 1 Hormone
>>> 3 Healthy 2 None
>>> 4 Healthy 2 Hormone
>>> 5 Healthy 3 None
>>> 6 Healthy 3 Hormone
>>> 7 Disease1 1 None
>>> 8 Disease1 1 Hormone
>>> 9 Disease1 2 None
>>> 10 Disease1 2 Hormone
>>> 11 Disease2 1 None
>>> 12 Disease2 1 Hormone
>>> 13 Disease2 2 None
>>> 14 Disease2 2 Hormone
>>> 15 Disease2 3 None
>>> 16 Disease2 3 Hormone
>>>
>>> sample_data
>>>>
>>> Condition Subject Time
>>> 1 control 1 0hr
>>> 2 control 1 1hr
>>> 3 control 1 2hr
>>> 4 control 2 0hr
>>> 5 control 2 1hr
>>> 6 control 2 2hr
>>> 7 control 3 0hr
>>> 8 control 3 1hr
>>> 9 control 3 2hr
>>> 10 control 4 0hr
>>> 11 control 4 1hr
>>> 12 control 4 2hr
>>> 13 Disease 1 0hr
>>> 14 Disease 1 1hr
>>> 15 Disease 1 2hr
>>> 16 Disease 2 0hr
>>> 17 Disease 2 1hr
>>> 18 Disease 2 2hr
>>>
>>> I have read the initial posting that lead to this section of the
>>> manual and it said to drop the samples that don't have equal numbers.
>>> Now this doesn't seem to be a big deal if only dropping from one group
>>> a sample or two but could potentially be a problem such as above where
>>> dropping four or six samples seems more of a sacrifice. I begin to
>>> think of experiments which (assuming repeated/dependent samples) group
>>> numbers very more significantly as a result of difficulty acquiring
>>> samples. Are there any recommendations from the community regarding
>>> such a situation? All I have found assumes that the samples within
>>> each group are equal.
>>>
>>> Regards,
>>> --
>>> Charles Determan
>>> Integrated Biosciences PhD Candidate
>>> University of Minnesota
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list