[R] Read

Tue Feb 23 02:14:13 CET 2021

Let us take the max space is two and the output should not be fixed
filed but preferable a csv file.

On Mon, Feb 22, 2021 at 8:05 PM jim holtman <jholtman using gmail.com> wrote:
>
> Messed up did not see your 'desired' output which will be hard since there is not a consistent number of spaces that would represent the desired column number.  Do you have any hit as to how to interpret the spacing especially you have several hundred more lines?  Is the output supposed to the 'fixed' field?
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Mon, Feb 22, 2021 at 5:00 PM jim holtman <jholtman using gmail.com> wrote:
>>
>> Try this:
>>
>> > library(tidyverse)
>>
>> > text <-  "x1  x2  x3 x4\n1 B12 \n2       C23 \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         "
>>
>> > # read in the data as characters and replace multiple blanks with single blank
>> > input <- read_lines(text)
>>
>> > input <- str_replace_all(input, ' +', ' ')
>>
>> > mydata <- read_delim(input, ' ', col_names = TRUE)
>> Warning: 5 parsing failures.
>> row col  expected    actual         file
>>   1  -- 4 columns 3 columns literal data
>>   2  -- 4 columns 3 columns literal data
>>   4  -- 4 columns 3 columns literal data
>>   5  -- 4 columns 2 columns literal data
>>   6  -- 4 columns 3 columns literal data
>>
>> > mydata
>> # A tibble: 6 x 4
>>      x1 x2    x3    x4
>>   <dbl> <chr> <chr> <lgl>
>> 1     1 B12   NA    NA
>> 2     2 C23   NA    NA
>> 3   322 B32   D34   NA
>> 4     4 D44   NA    NA
>> 5    51 D53   NA    NA
>> 6    60 D62   NA    NA
>> >
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Mon, Feb 22, 2021 at 4:49 PM Val <valkremk using gmail.com> wrote:
>>>
>>> That is my problem. The spacing between columns is not consistent.  It
>>>   may be  single space  or multiple spaces (two or three).
>>>
>>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <williamwdunlap using gmail.com> wrote:
>>> >
>>> > You said the column values were separated by space characters.
>>> > Copying the text from gmail shows that some column names and column
>>> > values are separated by single spaces (e.g., between x1 and x2) and
>>> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
>>> > up the spacing or is there some other way to tell where the omitted
>>> > values are?
>>> >
>>> > -Bill
>>> >
>>> > On Mon, Feb 22, 2021 at 2:54 PM Val <valkremk using gmail.com> wrote:
>>> > >
>>> > > I Tried that one and it did not work. Please see the error message
>>> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
>>> > > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
>>> > > :
>>> > >   more columns than column names
>>> > >
>>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <williamwdunlap using gmail.com> wrote:
>>> > > >
>>> > > > Since the columns in the file are separated by a space character, " ",
>>> > > > add the read.table argument sep=" ".
>>> > > >
>>> > > > -Bill
>>> > > >
>>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val <valkremk using gmail.com> wrote:
>>> > > > >
>>> > > > > Hi all, I am trying to read a messy data  but facing  difficulty.  The
>>> > > > > data has several columns separated by blank space(s).  Each column
>>> > > > > value may have different lengths across the rows.   The first
>>> > > > > row(header) has four columns. However, each row may not have the four
>>> > > > > column values.  For instance, the first data row has only the first
>>> > > > > two column values. The fourth data row has the first and last column
>>> > > > > values, the second and the third column values are missing for this
>>> > > > > row..  How do I read this data set correctly? Here is my sample data
>>> > > > > set, output and desired output.   To make it clear to each data point
>>> > > > > I have added the row and column numbers. I cannot use fixed width
>>> > > > > format reading because each row  may have different length for  a
>>> > > > > given column.
>>> > > > >
>>> > > > > dat<-read.table(text="x1  x2  x3 x4
>>> > > > > 1 B22
>>> > > > > 2         C33
>>> > > > > 322 B22      D34
>>> > > > > 4                 D44
>>> > > > > 51         D53
>>> > > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
>>> > > > >
>>> > > > > Output
>>> > > > >       x1  x2     x3     x4
>>> > > > > 1   1     B12 <NA> NA
>>> > > > > 2   2    C23 <NA>  NA
>>> > > > > 3 322  B32  D34   NA
>>> > > > > 4   4   D44  <NA>  NA
>>> > > > > 5  51 D53  <NA>   NA
>>> > > > > 6  60 D62  <NA>  NA
>>> > > > >
>>> > > > >
>>> > > > > Desired output
>>> > > > >    x1   x2     x3       x4
>>> > > > > 1   1    B22    <NA>   NA
>>> > > > > 2   2   <NA>  C33     NA
>>> > > > > 3 322  B32    NA      D34
>>> > > > > 4   4   <NA>   NA      D44
>>> > > > > 5  51  <NA>  D53     NA
>>> > > > > 6  60   D62   <NA>   NA
>>> > > > >
>>> > > > > Thank you,
>>> > > > >
>>> > > > > ______________________________________________
>>> > > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> > > > > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.