[R] problem with white space

jim holtman jholtman at gmail.com
Mon Mar 31 00:15:45 CEST 2008


Here is one way of doing it.  I would suggest that you read in the
data with readLines and then combine into one single string so that
you can use substring on it.  Since you did not provide  provide
commented, minimal, self-contained, reproducible code, I will take a
guess at that your data looks like:

# create some test data -- might be read in the readLines
sdata <- sapply(1:10, function(x){  # 10 lines of strings with 50 characters
    paste(sample(LETTERS, 50, TRUE), collapse='')
})
# put into one large string so you can do substring on it
sdata <- paste(sdata, collapse='')
# now create 10 sample of size 20 and write in files (file1, file2, ... file10)
for (i in 1:10){
    x <- sample(nchar(sdata), 20)
    writeLines(paste(substring(sdata, x, x), collapse=''),
con=paste("file", i, sep=''))
}





On Sun, Mar 30, 2008 at 3:41 PM, Suraaga Kulkarni
<suraaga.kulkarni at gmail.com> wrote:
> Hi,
>
> I need to resample characters from a dataset that consists of an extremely
> long string that is written over hundreds of thousands of lines, each of
> length 50 characters.  I am currently doing this by first inserting a space
> after each character in the dataset and then using the following commands:
>
> y <- as.matrix(read.table("data.txt"), stringsAsFactors=FALSE)
> bstrap <- sample(length(y), 100000, TRUE)
> write(y[bstrap], file="Rep1.txt", ncolumns=50, append=FALSE)
> bstrap <- sample(length(y), 100000, TRUE)
> write(y[bstrap], file="Rep2.txt", ncolumns=50, append=FALSE)
> bstrap <- sample(length(y), 100000, TRUE)
> .
> .
> .
> and so on for 500 reps.
>
>
> I think there should be a better way of doing this.  My specific questions:
>
> 1. Is there a way to avoid inserting spaces between the characters before
> calling the "sample" command (because I don't want spaces between the
> resampled characters in the output either; see number 2 below)?
>
> 2. If I have no choice but to insert the spaces in my data before
> resampling, is there a way to output the resampled data without spaces, but
> simply as 50-character long strings one below the other)?  I tried inserting
> the following command: strip.white=TRUE in the write command line, but it
> gave me an error as it did not understand the command.
>
> 3. Finally, since I have to get 500 such resampled reps from each dataset
> (and there are over 20 such huge datasets) is there a way around having to
> write a separate write command for each rep?
>
> Any suggestions will be greatly appreciated.
>
> Thanks,
>
> S.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list