[R] StrSplit
David Winsemius
dwinsemius at comcast.net
Sat Oct 9 19:04:37 CEST 2010
On Oct 9, 2010, at 12:46 PM, Jeffrey Spies wrote:
> Jim's solution is the ideal way to read in the data: using the sep=";"
> argument in read.table.
>
> However, if you do for some reason have a vector of strings like the
> following (maybe someone gives you an Rdata file instead of the raw
> data file):
>
> MF_Data <- c("106506;AIG India Liquid Fund-Institutional Plan-Daily
> Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010","106511;AIG
> India Liquid Fund-Institutional Plan-Growth
> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010")
>
> Then you can use this to get a data frame:
>
> as.data.frame(do.call(rbind, lapply(MF_Data, function(x)
> unlist(strsplit(x, ';')))))
>
If you are suggesting that Jim's solution would not work here, then I
would disagree and suggest you try offering your vector (without the
<cr>'s inserted by our mail clients) to his code. It should work just
fine and be far more readable.
On the other hand if you were offering this with an explanation that
strsplit's split argument is more flexible than the sep argument in
the read functions because it accepts regular expressions and so can
handle situations where multiple separators exist in the same line,
then I would applaud you.
--
David.
> Cheers,
>
> Jeff.
>
> On Sat, Oct 9, 2010 at 12:30 PM, jim holtman <jholtman at gmail.com>
> wrote:
>> Is this what you are after:
>>
>>> x <- c("Scheme Code;Scheme Name;Net Asset Value;Repurchase
>>> Price;Sale Price;Date"
>> + , ""
>> + ,"Open Ended Schemes ( Liquid )"
>> + , ""
>> + , ""
>> + , "AIG Global Investment Group Mutual Fund"
>> + , "106506;AIG India Liquid Fund-Institutional Plan-Daily Dividend
>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>> + , "106511;AIG India Liquid Fund-Institutional Plan-Growth
>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
>> + , "106507;AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>> Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
>> + , "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010")
>>>
>>> myData <- read.table(textConnection(x[7:10]), sep=';')
>>> closeAllConnections()
>>> str(myData)
>> 'data.frame': 4 obs. of 6 variables:
>> $ V1: int 106506 106511 106507 106503
>> $ V2: Factor w/ 4 levels "AIG India Liquid Fund-Institutional
>> Plan-Daily Dividend Option",..: 1 2 3 4
>> $ V3: num 1001 1210 1002 1001
>> $ V4: num 1001 1210 1002 1001
>> $ V5: num 1001 1210 1002 1001
>> $ V6: Factor w/ 1 level "02-Oct-2010": 1 1 1 1
>>> myData
>> V1
>> V2 V3 V4 V5 V6
>> 1 106506 AIG India Liquid Fund-Institutional Plan-Daily Dividend
>> Option 1001.000 1001.000 1001.000 02-Oct-2010
>> 2 106511 AIG India Liquid Fund-Institutional Plan-Growth
>> Option 1210.461 1210.461 1210.461 02-Oct-2010
>> 3 106507 AIG India Liquid Fund-Institutional Plan-Weekly Dividend
>> Option 1001.876 1001.876 1001.876 02-Oct-2010
>> 4 106503 AIG India Liquid Fund-Retail Plan-DailyDividend
>> Option 1001.000 1001.000 1001.000 02-Oct-2010
>>>
>>>
>>
>>
>> On Sat, Oct 9, 2010 at 12:18 PM, Santosh Srinivas
>> <santosh.srinivas at gmail.com> wrote:
>>> Newbie question ...
>>>
>>> I am looking something equivalent to read.delim but which accepts
>>> a text line as parameter instead of a file input.
>>>
>>> Below is my problem, I'm unable to get the exact output which is a
>>> simple data frame of the data where the delimiter exists ...
>>> coming quite close though
>>>
>>> I have a data frame with 10 lines called MF_Data
>>>> MF_Data [1:10]
>>> [1] "Scheme Code;Scheme Name;Net Asset Value;Repurchase
>>> Price;Sale Price;Date"
>>> [2] ""
>>> [3] "Open Ended Schemes ( Liquid )"
>>> [4] ""
>>> [5] ""
>>> [6] "AIG Global Investment Group Mutual Fund"
>>> [7] "106506;AIG India Liquid Fund-Institutional Plan-Daily
>>> Dividend Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>> [8] "106511;AIG India Liquid Fund-Institutional Plan-Growth
>>> Option;1210.4612;1210.4612;1210.4612;02-Oct-2010"
>>> [9] "106507;AIG India Liquid Fund-Institutional Plan-Weekly
>>> Dividend Option;1001.8765;1001.8765;1001.8765;02-Oct-2010"
>>> [10] "106503;AIG India Liquid Fund-Retail Plan-DailyDividend
>>> Option;1001.0000;1001.0000;1001.0000;02-Oct-2010"
>>>
>>>
>>> Now for the lines below .. they are delimted by ; ... I am using
>>>
>>> tempTxt <- MF_Data[7]
>>> MF_Data_F <- unlist(strsplit(tempTxt,";", fixed = TRUE))
>>> tempTxt <- MF_Data[8]
>>> MF_Data_F1 <- unlist(strsplit(tempTxt,";", fixed = TRUE))
>>> MF_Data_F <- rbind(MF_Data_F,MF_Data_F1)
>>>
>>> But MF_Data_F is not a simple 2X6 data frame which is what I want
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list