[BioC] Pointers on importing peptide (protein) expression data

Tue Apr 26 12:52:43 CEST 2005

On Apr 26, 2005, at 6:06 AM, Jamie Sherman wrote:

> I'm new to BioConductor and have given the faqs a read through as well 
> as some of the tutorials but need someone to point me in the right 
> direction because the data I have is a little unusual. I'll explain 
> what I have and then what I would like to do.
>
> What I have.
> The data I have looks like this
>
> Protein_Name  Peptide_Mass: T1 T2 T3 T4 T5 T6 T7
>
> T[1-7]  is the expression ratio at time point [1-7] and is a ratio of 
> abundances of the sample to a reference.
>
> What I would like to do.
>
> This data seems similar to what you might get out of an array 
> experiment. I am wondering if I can load the data in a way that would 
> allow me to make use of the annotation package to attach GO 
> information and then use the applicable array analysis packages.

You will need to have a standard identifier for the proteins (like 
Entrez Gene ID or refseq (mRNA) identifier).  Since microarrays are 
typically build around DNA, most of the tools for annotation in 
bioconductor are build around mRNA identifiers, not the protein 
counterparts.  Look at the AnnBuilder package for how to build an 
"annotation" package for your experiment.

>
> 	Is BioConductor a suitable tool for this?

Yes.

> 	What is the best way to load this data? (where should I be looking)
>

You can import your data as a tab-separated file using read.table.  
Just type ?read.table for help on using the function.

> 	Can you recommend  analysis packages to clustering by GO and to 
> cluster on patters in the protein expression data?
>

There are many means of clustering data in R and bioconductor.  hclust 
is a reasonable place to start.  The heatmap function does clustering 
of samples and genes.  There is also the GoCluster package in 
bioconductor.

The searchable archives of R and bioconductor can be very helpful, also.

1)  Searchable bioconductor archives
http://files.protsuggest.org/cgi-bin/biocond.cgi
2)  R site search (and archive search)
http://finzi.psych.upenn.edu/search.html

Sean