[R] Data separated by spaces, getting data into R using fiel

(Ted Harding) Ted.Harding at manchester.ac.uk
Tue Sep 8 15:03:27 CEST 2009


On 08-Sep-09 12:21:53, Lauri Nikkinen wrote:
> This data is from database and the maximum length of a field is
> defined. I mean that every column has a maximum length and I want to
> use this maximum length as a separator. So if one "cell" in that
> column is shorter than the maximum, "cell" should be padded with white
> spaces or something like that. This seems to be hard to explain.
> 
> Regards,
> L

Perhaps not just hard to explain, but possibly inpossible to inplement
without further indications of where the breaks between fields may
occur, since it is possible for spaces to occur within a field!

Taking the example, and the field-width data, which you first supplied:


>>>> On 9/8/2009 7:53 AM, Lauri Nikkinen wrote:
>>>>>
>>>>> I have a text file similar to this (separated by spaces):
>>>>>
>>>>> x <- "DF12 This is an example 1 This
>>>>> DF12 This is an 1232 This is
>>>>> DF14 This is 12334 This is an
>>>>> DF15 This 23 This is an example
>>>>> "
>>>>>
>>>>> and I know the field lengths of each variable (there is 5
>>>>> variables in this data set), which are:
>>>>>
>>>>> varlength <- c(2, 2, 18, 5, 18)
>>>>>
>>>>> How can I import this kind of data into R, using the varlength
>>>>> variable as an field separator indicator?

I am now inferring that it might be as follows:

  record 1: DF|12|This is an example|1    |This              |
  record 2: DF|12|This is an        |1232 |This is           |
  record 3: DF|14|This is           |12334|This is an        |
  record 4: DF|15|This              |23   |This is an example|

This inference is based on:
1: noticing that the length of "This is an example" is 18, and
2: noticing that there are two cases of "18" in your field lengths,
     followed by
3: some mental shuffling to see how the data you supplied could fit
     into that pattern in a not-too-nonsensical way.

Without the final step (which is not computable, so R is out; and
step 1 also depends on recognising a complete coherent phrase),
you could have had:

  record 1: DF|12|This is an example|1    |This              |
  record 2: DF|12|This is an 1232   |This |is                |
  record 3: DF|14|This is 12334     |This |is an             |
  record 4: DF|15|This 23 This is   |an   |example           |

(or several similar variants), and there is nothing in the information
you supplied which could help to choose between them. You have no
deifinition of "cell"!

I see that you said (in a later mail):
  "I don't have an access to this database, I just got this messy file."

In that case, unless you have further information about how the
space-separated bits of text/number should be formed into individual
fields ("cells"), I think you are stuck -- you have no basis on
which to make progress.

If you have further information, plase share it. Otherwise no-one
will be able to get past the above!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Sep-09                                       Time: 14:03:12
------------------------------ XFMail ------------------------------




More information about the R-help mailing list