Thank you very much Sean, I have been working with function aggregate and it
is exactly what I need. However, there is still a painful detail that I
cannot get rid of. I hope you can help me too with this.
I have this text file:

A B C D
d1 2 23 2
d1 4 22 2
d1 5 24 2
d2 10 7 2
d2 20 8 3
d1 7 23 2
d3 2 14 30
d3 4 14 50
d2 30 8 4
d4 12 13 15
d5 1 5 90
d2 40 7 3
d6 34 2 5

(I use it as a test)
If I type:
> data<-read.table("test.txt",sep="\t")
> agr<-aggregate(data[2:4], by=list(data$V1), FUN=mean)

I get 21 warning messages and all the values are "NA", including header B,
C, and D.
However, if I remove A,B,C,D from the previous file, and type the same
commands, it works perfectly fine, getting what I wanted. The problem is
that the real datasets I need to work with are really large and it is
difficult to remove and add the headers without danger of doing something
wrong.
Is there any command or parameter that I should introduce to the function in
order to solve this issue?

Thank you so much,

Hernando

2010/6/11 Sean Davis <sdavis2@mail.nih.gov>

>
>
> On Fri, Jun 11, 2010 at 8:12 AM, Hernando Martínez <hernybiotec@gmail.com>wrote:
>
>> Hello everyone, my name is Hernando, and I am new to R. I have a little
>> problem that maybe you can help me with, as I have been looking through
>> the
>> packages with no success, and it shouldn't be very difficult to solve.
>> I have a text file containing a list of genes, with expression values for
>> each along a set of microarray experiments. Ex:
>>
>> geneID     sample1      sample 2   ....
>>
>> gene1      45               58        ....
>>
>> gene1       43              63      .....
>>
>> gene2      32              21         ....
>>
>> ......        .....           ......        .....
>>
>> In this list, there are some genes repeated, but with different values
>> (like
>> in the example). This repetitions come from different probes targeting the
>> same gene.
>> What I want is a new text file, but with each gene appearing only once,
>> and
>> with three possibilities for the expression values of repeated genes:
>>
>> - Each value (for each column (sample)) is the average of the previous
>> values (in the example, sample 1 for gene1 should be 44, and 60,5 in
>> sample
>> 2)
>> - Instead of the average, the median.
>> - The highest values.
>>
>> I would prefer the median or the average, but I don't know if getting the
>> highest values is easier.
>>
>> I have seen this function: "findLargest" of "genefilter" package, but it
>> works with probes and I have already converted files (to geneIDs).
>>
>>
> Hi, Hernando.  Have a look at the aggregate() function.
>
> Sean
>
>



-- 
Hernando Martínez Vergara

	[[alternative HTML version deleted]]

