[R] reading table

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jan 9 18:48:10 CET 2008


Read in lines using readLines, delete all T and G characters
and reread using read.table:

Lines.raw <- "T  3     0    --    --     --     T     --    --  -- 18.98
3  1  6.75  4.39    39     --    --     -- 18.58
3  2  6.90  4.90    43     --    --     -- 18.63
3  3  7.07  5.39    48     --    --     -- 18.78
G  4     0  7.41  5.54     47     G     --    --  -- 18.90
4  1  7.44  5.99    30  10.93  5.30     23 18.95
4  2  7.27  6.05    23  11.16  5.74     19 18.96
4  3  7.27  5.54    27  11.58  5.95     18 18.97
"
# in reality next line would be Lines <- readLines("myfile.dat")
Lines <- readLines(textConnection(Lines.raw))
DF <- read.table(textConnection(gsub("[TG]", "", Lines)), na.strings = "--")



On Jan 9, 2008 10:18 AM, Abi Ghanem josephine
<josephine.abighanem at ibpc.fr> wrote:
> Hi,
> I am encountering a problem in reading a file,
> the file looks like that:
> T  3  0       --       --     --    T     --       --     --     18.98
>   3  1      6.75     4.39    39          --       --     --     18.58
>   3  2      6.90     4.90    43          --       --     --     18.63
>   3  3      7.07     5.39    48          --       --     --     18.78
> G  4  0      7.41     5.54    47    G     --       --     --     18.90
>   4  1      7.44     5.99    30        10.93     5.30    23     18.95
>   4  2      7.27     6.05    23        11.16     5.74    19     18.96
>   4  3      7.27     5.54    27        11.58     5.95    18     18.97
> the first an the 7th column contains only T and G
> my problem is i want to have the 4th column as a vector : 6.75, 6.90,
> 7.07, 7.41, 7.44, 7.27, 7.27.
>
> when i do a simple read.delim(data, sep="", header=FALSE), i get this
>
> T  3     0    --    --     --     T     --    --  -- 18.98
> 3  1  6.75  4.39    39     --    --     -- 18.58
> 3  2  6.90  4.90    43     --    --     -- 18.63
> 3  3  7.07  5.39    48     --    --     -- 18.78
> G  4     0  7.41  5.54     47     G     --    --  -- 18.90
> 4  1  7.44  5.99    30  10.93  5.30     23 18.95
> 4  2  7.27  6.05    23  11.16  5.74     19 18.96
> 4  3  7.27  5.54    27  11.58  5.95     18 18.97
>
> with the first line containing T, 3, 3, 3, G, 4, 4, 4 so the values are
> shifted in the 1st and 5th row
>
>
> i tried to change sep="" to sep="\t", but than i don't get a matrix
> i just get a one column file.
> "  T  3  0       --       --     --    T     --       --     --     18.98"
> "     3  1      6.75     4.39    39          --       --     --     18.58"
> "     3  2      6.90     4.90    43          --       --     --     18.63"
> "     3  3      7.07     5.39    48          --       --     --     18.78"
> "  G  4  0      7.41     5.54    47    G     --       --     --     18.90"
> "     4  1      7.44     5.99    30        10.93     5.30    23     18.95"
> "     4  2      7.27     6.05    23        11.16     5.74    19     18.96"
> "     4  3      7.27     5.54    27        11.58     5.95    18     18.97"
>
> My question is there is a way to read the file either with skipping the
> first column and the 7th,
> Or how can i get to have a vector with the 4th column.
>
> Thanks for the help,
> Josephine
>
> --
>
>
> Josephine ABI GHANEM
> IBPC, UPR 9080
> 13, rue P. et M. Curie,
> 75005 Paris, FRANCE
>
> email: josephine.abighanem at ibpc.fr
> tel: 01 58 41 51 67
>      06 28 07 25 71
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list