[R] write.table: strange output has been produced

Igor Chernukhin igorc at essex.ac.uk
Wed Sep 19 20:44:07 CEST 2012


Hi Jim - 
Thank you for your reply.

-----------------------------8<------------------------------------
> str(annot)
'data.frame':   6895 obs. of  4 variables:
 $ id          : int  231803 231804 231805 231810 231811 231816 231818
177697 223131 231823 ...
 $ kogdefline  : Factor w/ 1898 levels "17 beta-hydroxysteroid
dehydrogenase type 3, HSD17B3",..: 1633 693 704 1627 1042 507 1870 1448
730 185 ...
 $ kogClass    : Factor w/ 26 levels "","Amino acid transport and
metabolism ",..: 26 4 24 20 18 24 10 22 25 6 ...
 $ kogGroup    : Factor w/ 5 levels "","CELLULAR PROCESSES AND
SIGNALING",..: 3 4 2 2 2 2 2 3 3 2 ...

> str(statdata)
'data.frame':   3887 obs. of  8 variables:
 $ id            : chr  "267533" "246792" "271961" "237478" ...
 $ baseMean      : num  288 519 309 189 341 ...
 $ baseMeanA     : num  574 1025 617 375 661 ...
 $ baseMeanB     : num  1.392 13.592 0.535 2.23 21.621 ...
 $ foldChange    : num  0.002426 0.013258 0.000866 0.00594 0.032733 ...
 $ log2FoldChange: num  -8.69 -6.24 -10.17 -7.4 -4.93 ...
 $ pval          : num  2.82e-104 1.70e-94 4.82e-81 1.63e-79
6.62e-78 ...
 $ padj          : num  7.31e-100 2.20e-90 4.16e-77 1.06e-75
3.43e-74 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:1235] 17 18 20 22 31
33 39 43 44 45 ...
  .. ..- attr(*, "names")= chr [1:1235] "NA" "NA.1" "NA.2" "NA.3" ...

> str(extra)
'data.frame':   3887 obs. of  3 variables:
 $ kogdefline: chr  NA NA NA NA ...
 $ kogClass  : chr  NA NA NA NA ...
 $ kogGroup  : chr  NA NA NA NA ...
-------------------------8<----------------------------

Also I tried "stringsAsFactors = FALSE", it doesn't seem to make any
difference.

-Igor

On Wed, 2012-09-19 at 13:36 -0400, jim holtman wrote:
> It would also be helpful if you could provide the output of 'str' for
> all the objects that you are using.
> 
> e.g.,  str(statdata)    str(extra)
> 
> 
> Also in creating your data.frame, use "stringsAsFactors = FALSE":
> 
> extra = data.frame(kogdefline=rep(NA,n)
>     , kogClass = rep(NA,n)
>     , kogGroup = rep(NA,n)
>     , stringsAsFactors = FALSE
> )
> 
> On Wed, Sep 19, 2012 at 12:12 PM, Igor <igorc at essex.ac.uk> wrote:
> > Good afternoon all -
> >
> > While making a steady progress in learning R after Matlab I encountered
> > a problem which seems to require some extra help to move over.
> > Basically I want to merge a data from biological statistical dataset
> > with annotation data extracted from another dataset using an 'id'
> > crossreference and write it to report file. The first part goes
> > absolutely fine, I have merged both data into data.frame but when I try
> > to write it to csv file using 'write.table' it seems like it does write
> > the 'data.frame' object but it also insert some parts from the
> > annotation data which are not suppose to be there...
> > There is a little snapshot of the file output below to illustrate. The
> > upper half is fine, that's how it should be. The lower half, which is
> > actually appears to be space-separated, not coma, obviously grabbed from
> > the annotation dataset and is not supposed to be here.
> >
> > --------------------------------8<--------------------------------------------
> > "344","166128",126.44286392082,179.904700814932,72.9810270267088,0.40566492535281,-1.3016395254146,2.47449355237252e-07,4.2901159299567e-06,"Chitinas
> > "18816","238247",92.5282508325735,135.981255262454,49.0752464026927,0.36089714209487,-1.47034037615176,2.5330054329543e-07,4.38862252337004e-06,"Prot
> > "22072","222365",30.8191942806426,52.4262903365628,9.21209822472236,0.17571524068522,-2.50868876576414,2.54433836512085e-07,4.40531098485028e-06,NA,N
> > "25062","226605",30.808007579908,50.3976662241578,11.2183489356581,0.22259659575825,-2.16749656564076,2.54934711860645e-07,4.41103467375713e-06,NA,NA
> > "7539","247009",75.4175439970731,34.4643221134552,116.370765880691,3.37655751642533,1.75555313265164,2.60010673210741e-07,4.49585878338091e-06,NA,NA,
> > "407","267139",425.559675915702,279.393013150954,571.72633868045,2.04631580522577,1.03302881149302,2.61074218843609e-07,4.51123710239304e-06,NA,NA,NA
> > "26530","171300",146.80096060985,80.0063286553601,213.595592564339,2.66973370924738,1.4166958484644,2.68061220749976e-07,4.62888115991058e-06,NA,NA,N
> > "3078","159013",34.3260176515511,52.4580790080106,16.1939562950917,0.308702808057816,-1.69570948866688,2.69104298652827e-07,4.64379716436078e-06,"40S
> > "4657","159998",133.10761487064,185.450704462326,80.7645252789532,0.435504009074069,-1.19924209513405,2.75544399955331e-07,4.75176501174632e-06,"IMP-
> >
> > 171597  171597  KOG1347 Uncharacterized membrane protein, predicted
> > efflux pump General function prediction only    POORLY CHARACTERIZED
> > 171658  171658  KOG4290 Predicted membrane protein  Function unknown
> > POORLY CHARACTERIZED
> > 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> > intracellular trafficking and secretion  Signal transduction mechanisms
> > CELLULAR
> > 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> > intracellular trafficking and secretion  Intracellular trafficking,
> > secretion, and
> > 171703  171703  KOG2674 Cysteine protease required for autophagy -
> > Apg4p/Aut2p  Cytoskeleton    CELLULAR PROCESSES AND SIGNALING
> > 171703  171703  KOG2674 Cysteine protease required for autophagy -
> > Apg4p/Aut2p  Intracellular trafficking, secretion, and vesicular
> > transport   CELLU
> > and metabolism     METABOLISM
> > --------------------------------8<--------------------------------------------
> > And this is a piece of code that produced this:
> >
> > --------------------------------8<--------------------------------------------
> >>n = nrow(statdata)
> >>extra = data.frame(kogdefline=rep(NA,n), kogClass = rep(NA,n), kogGroup
> > = rep(NA,n))
> >>subset = intersect(statdata$id, annot$id)
> >>MR = match(subset, annot$id)
> >>ML = match(subset, statdata$id)
> >
> >>extra[ML,1] = as.character(annot[MR,2])
> >>extra[ML,2] = as.character(annot[MR,3])
> >>extra[ML,3] = as.character(annot[MR,4])
> > # strangely, if I do
> > # extra[ML,] = as.character(annot[MR,2:4])
> > # it produces digits (???) instead of a string value
> >
> >>mergedData = data.frame(statdata, extra)
> >>write.table(mergedData, 'filename.csv', sep=',')
> > --------------------------------8<--------------------------------------------
> >
> > Any ideas why this is happening?
> >
> > Many thanks
> > -Igor
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
>




More information about the R-help mailing list