[R] Data separated by spaces, getting data into R using field lengths
Lauri Nikkinen
lauri.nikkinen at iki.fi
Tue Sep 8 14:42:17 CEST 2009
Thanks for the suggestion, but I don't have an access to this
database, I just got this messy file.
-L
2009/9/8 Duncan Murdoch <murdoch at stats.uwo.ca>:
> On 9/8/2009 8:21 AM, Lauri Nikkinen wrote:
>>
>> This data is from database and the maximum length of a field is
>> defined. I mean that every column has a maximum length and I want to
>> use this maximum length as a separator. So if one "cell" in that
>> column is shorter than the maximum, "cell" should be padded with white
>> spaces or something like that. This seems to be hard to explain.
>
> Your problem is the intermediate file. Why not get R to read directly from
> the database, using RODBC?
>
> Duncan Murdoch
>
>>
>> Regards,
>> L
>>
>> 2009/9/8 Duncan Murdoch <murdoch at stats.uwo.ca>:
>>>
>>> On 9/8/2009 8:07 AM, Lauri Nikkinen wrote:
>>>>
>>>> Thanks, I tried it but I got
>>>>
>>>>> varlength <- c(2, 2, 18, 5, 18)
>>>>> read.fwf("c:temppi.txt", widths=varlength)
>>>>
>>>> V1 V2 V3 V4 V5
>>>> 1 DF 12 This is an exampl e 1 T his
>>>> 2 DF 12 This is an 1232 T his i s
>>>> 3 DF 14 This is 12334 Thi s is an
>>>> 4 DF 15 This 23 This is a n exa mple
>>>>
>>>> Which is not the way I want it.
>>>
>>> It looks as though that's because you don't have fixed width data. "
>>> This
>>> is an example" is 19 chars, including the leading space. You told R it
>>> was
>>> 18. " This is an " is only 12 characters.
>>>
>>> I would say you have two fixed width fields, and three varying fields,
>>> with
>>> no delimiters. If the middle one of the three always contains digits and
>>> the others don't, you can probably extract them using sub(), but you
>>> can't
>>> use any of the read.* functions to do this: your format is too strange.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> structure(list(V1 = structure(c(1L, 1L, 1L, 1L), .Label = "DF", class
>>>> = "factor"),
>>>> V2 = c(12L, 12L, 14L, 15L), V3 = structure(c(4L, 3L, 2L,
>>>> 1L), .Label = c(" This 23 This is a", " This is 12334 Thi",
>>>> " This is an 1232 T", " This is an exampl"), class = "factor"),
>>>> V4 = structure(c(1L, 2L, 4L, 3L), .Label = c("e 1 T", "his i",
>>>> "n exa", "s is "), class = "factor"), V5 = structure(c(2L,
>>>> 4L, 1L, 3L), .Label = c("an ", "his", "mple", "s"), class =
>>>> "factor")), .Names = c("V1",
>>>> "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA,
>>>> -4L))
>>>>
>>>> Any ideas?
>>>> -L
>>>>
>>>> 2009/9/8 Duncan Murdoch <murdoch at stats.uwo.ca>:
>>>>>
>>>>> On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:
>>>>>>
>>>>>> I have a text file similar to this (separated by spaces):
>>>>>>
>>>>>> x <- "DF12 This is an example 1 This
>>>>>> DF12 This is an 1232 This is
>>>>>> DF14 This is 12334 This is an
>>>>>> DF15 This 23 This is an example
>>>>>> "
>>>>>>
>>>>>> and I know the field lengths of each variable (there is 5 variables in
>>>>>> this data set), which are:
>>>>>>
>>>>>> varlength <- c(2, 2, 18, 5, 18)
>>>>>>
>>>>>> How can I import this kind of data into R, using the varlength
>>>>>> variable as an field separator indicator?
>>>>>
>>>>> See ?read.fwf.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list