[R] "Best" way to merge 300+ .5MB dataframes?

Grant Rettke grettke at acm.org
Tue Aug 12 15:48:11 CEST 2014


Thank you all kindly.
Grant Rettke | ACM, AMA, COG, IEEE
grettke at acm.org | http://www.wisdomandwonder.com/
“Wisdom begins in wonder.” --Socrates
((λ (x) (x x)) (λ (x) (x x)))
“Life has become immeasurably better since I have been forced to stop
taking it seriously.” --Thompson


On Tue, Aug 12, 2014 at 1:07 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Aug 11, 2014, at 8:01 PM, John McKown wrote:
>
>> On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams <tea3rd at gmail.com> wrote:
>>> Grant,
>>>
>>> Assuming all your filenames are something like file1.txt,
>>> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
>>> the directory where your files are located...
>>>
>>> This will strip off the 1st lines, that is, your header lines:
>>>
>>> for file in *.txt;do
>>> sed -i '1d'${file};
>>> done
>>>
>>> Then, do this:
>>>
>>> cat *.txt > newfilename.txt
>>>
>>> Doing both should only take a few seconds, depending on your file sizes.
>>>
>>> Cheers!
>>> Tom
>>>
>>
>> Using sed hadn't occurred to me. I guess I'm just "awk-ward" <grin/>.
>> A slightly different way would be:
>>
>> for file in *.txt;do
>>  sed '1d' ${file}
>> done >newfilename.txt
>>
>> that way the original files are not modified.  But it strips out the
>> header on the 1st file as well. Not a big deal, but the read.table
>> will need to be changed to accommodate that. Also, it creates an
>> otherwise unnecessary intermediate file "newfilename.txt". To get the
>> 1st file's header, the script could:
>>
>> head -1 >newfilename.txt
>> for file in *.txt;do
>>   sed '1d' ${file}
>> done >>newfilename.txt
>>
>> I really like having multiple answers to a given problem. Especially
>> since I have a poorly implemented version of "awk" on one of my
>> systems. It is the vendor's "awk" and conforms exactly to the POSIX
>> definition with no additions. So I don't have the FNR built-in
>> variable. Your implementation would work well on that system. Well, if
>> there were a version of R for it. It is a branded UNIX system which
>> was designed to be totally __and only__ POSIX compliant, with few
>> (maybe no) extensions at all. IOW, it stinks. No, it can't be
>> replaced. It is the z/OS system from IBM which is EBCDIC based and
>> runs on the "big iron" mainframe, system z.
>>
>> --
>
> On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send.
>
> --
>
>
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list