[BioC] modify colClasses in read.columns?
Wolfgang Huber
huber at ebi.ac.uk
Sun Apr 27 23:20:48 CEST 2008
Dear Henrik,
with a file test.txt as follows:
A B C
1 4711 34.50
2 ZAZA 01.40
and the call
z=read.table("test.txt", colClasses=c("integer", "NULL", "character"),
header=TRUE, sep="\t")
I get
> str(z)
'data.frame': 2 obs. of 2 variables:
$ A: int 1 2
$ C: chr "34.50" "01.40"
so maybe the functionality you wish is already provided by read.table?
From looking at its code and man page, I don't think read.columns is
designed to accept user input for what it takes as colClasses. In fact,
when I try to supply colClasses to read.columns, I get:
Errore in read.table(file = file, header = TRUE, col.names = allcnames:
l'argumento formale "colClasses" è associato a diversi argomenti passati
Best wishes
Wolfgang
> sessionInfo()
R version 2.8.0 Under development (unstable) (2008-04-27 r45517)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fortunes_1.3-4
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
Henrik Parn a écrit 25/04/2008 21:21:
> Dear Herve,
>
> Thanks for your rapid answer!
>
> Sorry, I forgot to paste the sessionInfo into my previous mail:
>
> > sessionInfo()
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] coda_0.13-1 limma_2.13.8 lme4_0.99875-9
> Matrix_0.999375-9 lattice_0.17-6
>
> loaded via a namespace (and not attached):
> [1] grid_2.7.0 tools_2.7.0
> > sessionInfo()
>
>
> The read.columns function is a part of the limma package in Bioconductor:
> source("http://bioconductor.org/biocLite.R")
> biocLite("limma")
>
> I would like to use the read.columns function to read a subset of
> columns from several data files. Here is some example columns (out of
> many) and rows of the data:
>
> ID i ID j Ni Nj S A R1 B R2
> C R3 D R4
> 8414341.20 8414342.20 1 2 -1 1 0.425183 1 0.758413
> 1 0.551275 1 0.543045
> 8414341.20 8414343.20 1 3 -1 1 0.128981 1 0.034859
> 1 -0.001998 1 0.002093
>
> In this example, there are 13 tab-delimited columns of which I want to
> use only ID i, ID i, R1, R2, R3 and R4. The problem with the data in its
> current form is the unfortunate format of the ID i and ID j columns: I
> need ID i and ID j to be treated as characters although they look like
> numeric (if they are read as numeric the .20 will become a .2). When I
> have used read.table(), I have first read all columns, and by using the
> argument colClasses = c("character", "character",...), I have preserved
> the format of ID i and ID j. In the next step I have selected only the
> relevant columns.
>
> I thought read.columns could be a convenient alternative to select only
> the relevant columns when reading the data, by using e.g. required.col =
> c("ID i", "ID j"), text.to.search = "R". However, in read.columns I
> cannot specify colClasses. As it says in the help text "It uses
> |required.col| and |text.to.search| to set up the |colClasses| argument
> of |read.table|.". So, I wonder anyone could advice me on how to modify
> the read.columns code to be able to specify colClasses, if it is not to
> complicated.
>
> Thanks in advance!
>
>
> Henrik
>
>
>
> Herve Pages wrote:
>
>> Hi Henrik,
>>
>> I don't have read.columns() when I start a fresh R session so it looks
>> like it's
>> not part of the default R installation. Which package does it belong to?
>> Providing your sessionInfo() is always a good idea as it would at
>> least give
>> us a clue of where to look for the read.columns() function. Also a
>> small example
>> (with code) of what you are trying to do would be very useful.
>>
>> Thanks!
>> H.
>>
>>
>> Henrik Parn wrote:
>>
>>> Dear all,
>>>
>>> I have received some data sets with some variables that certainly
>>> looks like numeric: they are individual IDs that are composed of some
>>> numbers separated by ".", e.g. 6534231.18, 8783234.20. Not
>>> surprisingly they are treated as numeric by read.columns, and
>>> 8783234.20 ends up like 8783234.2 when read to R. When I used
>>> read.table I specified in colClasses that these variables should be
>>> read as |characters. However, in read.columns| |required.col| and
>>> |text.to.search| is used to set up the |colClasses| argument of
>>> |read.table|.| Does anyone have a suggestion of how I can modify the
>>> read.columns function so I can specify the colClasses myself?
>>>
>>> Thanks in advance! |
>>>
>
More information about the Bioconductor
mailing list