[BioC] Ringo/Starr getProfiles function

Mon Dec 14 02:59:47 CET 2009

Dear Wolfgang,

Thank you for your response.  Sorry about the "sessionInfo()" mistake... I think I included it earlier but I will include each time (I included it below).  Benedikt Zacher caught my NA error earlier since I had my probeAnnot object pointing to one name of chromosomes (Affymetrix nomenclature) and the tssAnno pointing to another (ensembl nomenclature).  I am still struggling somewhat with other issues if you should chose to read on.  Thank you much for your time and assistance.

Sincerely,

Noah

Dear Benedikt,

Maybe it will come to me sending the R object to you, but I think I am close to understanding the workflow a little better.  I think I just missed some of the essential early steps necessary to using Starr (sorry I am new to R and trying to get up to speed quickly).  I think the following workflow would be helpful to include in the vignette (or maybe I am getting this wrong).  Here is what I am going to try (I excluded some of the commands to load or read files):

### fire up R and create a raw Annotation file using Biomart

	library(biomaRt)

 	ensembl = useMart("fungal_mart_3")

 	ensembl = useDataset("scerevisiae_eg_gene", mart = ensembl)

 	chrom <- c("I")

	 transcriptAnno <- getBM(attributes=c("ensembl_gene_id", "chromosome_name", 
						+  "strand", "transcript_start", "transcript_end"), 
						+ filters = "chromosome_name", values = chrom, mart = ensembl)

         transAnnoChr = transcriptAnno

	transAnnoChr[,2][transAnnoChr[,2]=="I"] <- "Sc:Oct_2003;chr1"

###change the column names to match your file in the Starr vignette and then put them in the right order:

	names(transAnnoChr) = c("name", "chr", "strand", "start", "end")

				chrOnly   = transAnnoChr[,2]
				startOnly = transAnnoChr[,4]
				endOnly   = transAnnoChr[,5]
				strandOnly= transAnnoChr[,3]
				nameOnly  = transAnnoChr[,1]

	tssAnno = cbind(chrOnly,startOnly, endOnly, strandOnly, nameOnly)

### make a minimal Expression set so that the writeGFF function works

	library(Biobase)

	exprs = as.matrix(tssAnno, header = TRUE, sep = "\t", row.names = 1, as.is = TRUE)

	minimalSet = new("ExpressionSet", exprs = exprs)

### Now we can use the Starr package to write the gffAnno file

	library(Starr)

	bpmap = readBpmap("Sc03b_MR_v04.bpmap")

	probeAnno = bpmapToProbeAnno(bpmap)

	writeGFF(minimalSet, probeAnno, "tssAnno.gff")

	transcriptAnno = read.gffAnno("tssAnno.gff", feature = ???)

### I am not sure what to use as a feature for the read.gffAnno function.  
### I think this should get me ready to use the getProfiles function, your thoughts??

Thank you so much for your help and for creating this package.  I have not run through these commands but wanted to see if the workflow of
1. Annotation file creation (biomaRt)
2. ExpressionSet Creation (BioBase)
3. gffAnno Creation (Starr)

Is this the way to go?

Best,

Noah

>> sessionInfo()
> R version 2.10.0 (2009-10-26) 
> i386-apple-darwin9.8.0 
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] Starr_1.2.0        affxparser_1.18.0  affy_1.24.0        Ringo_1.10.0       Matrix_0.999375-31 lattice_0.17-26   
> [7] limma_3.2.1        RColorBrewer_1.0-2 Biobase_2.6.0     
> 
> loaded via a namespace (and not attached):
> [1] affyio_1.14.0        annotate_1.24.0      AnnotationDbi_1.8.0  DBI_0.2-4            genefilter_1.28.0   
> [6] MASS_7.3-3           preprocessCore_1.8.0 pspline_1.0-13       RSQLite_0.7-3        splines_2.10.0      
> [11] survival_2.35-7      xtable_1.5-5