[R] write.table: strange output has been produced
jim holtman
jholtman at gmail.com
Wed Sep 19 19:36:12 CEST 2012
It would also be helpful if you could provide the output of 'str' for
all the objects that you are using.
e.g., str(statdata) str(extra)
Also in creating your data.frame, use "stringsAsFactors = FALSE":
extra = data.frame(kogdefline=rep(NA,n)
, kogClass = rep(NA,n)
, kogGroup = rep(NA,n)
, stringsAsFactors = FALSE
)
On Wed, Sep 19, 2012 at 12:12 PM, Igor <igorc at essex.ac.uk> wrote:
> Good afternoon all -
>
> While making a steady progress in learning R after Matlab I encountered
> a problem which seems to require some extra help to move over.
> Basically I want to merge a data from biological statistical dataset
> with annotation data extracted from another dataset using an 'id'
> crossreference and write it to report file. The first part goes
> absolutely fine, I have merged both data into data.frame but when I try
> to write it to csv file using 'write.table' it seems like it does write
> the 'data.frame' object but it also insert some parts from the
> annotation data which are not suppose to be there...
> There is a little snapshot of the file output below to illustrate. The
> upper half is fine, that's how it should be. The lower half, which is
> actually appears to be space-separated, not coma, obviously grabbed from
> the annotation dataset and is not supposed to be here.
>
> --------------------------------8<--------------------------------------------
> "344","166128",126.44286392082,179.904700814932,72.9810270267088,0.40566492535281,-1.3016395254146,2.47449355237252e-07,4.2901159299567e-06,"Chitinas
> "18816","238247",92.5282508325735,135.981255262454,49.0752464026927,0.36089714209487,-1.47034037615176,2.5330054329543e-07,4.38862252337004e-06,"Prot
> "22072","222365",30.8191942806426,52.4262903365628,9.21209822472236,0.17571524068522,-2.50868876576414,2.54433836512085e-07,4.40531098485028e-06,NA,N
> "25062","226605",30.808007579908,50.3976662241578,11.2183489356581,0.22259659575825,-2.16749656564076,2.54934711860645e-07,4.41103467375713e-06,NA,NA
> "7539","247009",75.4175439970731,34.4643221134552,116.370765880691,3.37655751642533,1.75555313265164,2.60010673210741e-07,4.49585878338091e-06,NA,NA,
> "407","267139",425.559675915702,279.393013150954,571.72633868045,2.04631580522577,1.03302881149302,2.61074218843609e-07,4.51123710239304e-06,NA,NA,NA
> "26530","171300",146.80096060985,80.0063286553601,213.595592564339,2.66973370924738,1.4166958484644,2.68061220749976e-07,4.62888115991058e-06,NA,NA,N
> "3078","159013",34.3260176515511,52.4580790080106,16.1939562950917,0.308702808057816,-1.69570948866688,2.69104298652827e-07,4.64379716436078e-06,"40S
> "4657","159998",133.10761487064,185.450704462326,80.7645252789532,0.435504009074069,-1.19924209513405,2.75544399955331e-07,4.75176501174632e-06,"IMP-
>
> 171597 171597 KOG1347 Uncharacterized membrane protein, predicted
> efflux pump General function prediction only POORLY CHARACTERIZED
> 171658 171658 KOG4290 Predicted membrane protein Function unknown
> POORLY CHARACTERIZED
> 171660 171660 KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion Signal transduction mechanisms
> CELLULAR
> 171660 171660 KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion Intracellular trafficking,
> secretion, and
> 171703 171703 KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p Cytoskeleton CELLULAR PROCESSES AND SIGNALING
> 171703 171703 KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p Intracellular trafficking, secretion, and vesicular
> transport CELLU
> and metabolism METABOLISM
> --------------------------------8<--------------------------------------------
> And this is a piece of code that produced this:
>
> --------------------------------8<--------------------------------------------
>>n = nrow(statdata)
>>extra = data.frame(kogdefline=rep(NA,n), kogClass = rep(NA,n), kogGroup
> = rep(NA,n))
>>subset = intersect(statdata$id, annot$id)
>>MR = match(subset, annot$id)
>>ML = match(subset, statdata$id)
>
>>extra[ML,1] = as.character(annot[MR,2])
>>extra[ML,2] = as.character(annot[MR,3])
>>extra[ML,3] = as.character(annot[MR,4])
> # strangely, if I do
> # extra[ML,] = as.character(annot[MR,2:4])
> # it produces digits (???) instead of a string value
>
>>mergedData = data.frame(statdata, extra)
>>write.table(mergedData, 'filename.csv', sep=',')
> --------------------------------8<--------------------------------------------
>
> Any ideas why this is happening?
>
> Many thanks
> -Igor
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list