[R] Newbie wants to compare 2 huge RDSs row by row.

Marsh Hardy ARA/RISK mhardy at ara.com
Sun Jan 28 04:14:51 CET 2018

```Cool, looks like that'd do it, almost as if converting an entire record to a character string and comparing strings.

If your two objects have class "data.frame" (look at class(objectName)) and they
both have the same number of columns and the same order of columns and the
column types match closely enough (use all.equal(x1, x2) for that), then you can try
which( rowSums( x1 != x2 ) > 0)
E.g.,
> x1 <- data.frame(X=1:5, Y=rep(c("A","B"),c(3,2)))
> x2 <- data.frame(X=c(1,2,-3,-4,5), Y=rep(c("A","B"),c(2,3)))
> x1
X Y
1 1 A
2 2 A
3 3 A
4 4 B
5 5 B
> x2
X Y
1  1 A
2  2 A
3 -3 B
4 -4 B
5  5 B
> which( rowSums( x1 != x2 ) > 0)
[1] 3 4

If you want to allow small numeric differences but exactly character matches
you will have to get a bit fancier.  Splitting the data.frames into character and
numeric parts and comparing each works well.

Hi Guys, I apologize for my rank & utter newness at R.

I used summary() and found about 95 variables, both character and numeric, all with "Length:368842" I assume is the # of records.

I'd like to know the record number (row #?) of any record where the data doesn't match in the 2 files of what should be the same output.

Also, it will be easier to provide helpful information if you'd describe what in your data you want to compare and what you hope to get out of the comparison.

Best wishes,
Ulrik

Hi Marsh,
An RDS is not a data structure such as a data.frame. It can be anything.
For example if I want to save my objects a, b, c I could do:
> saveRDS( list(a,b,c,), file="tmp.RDS")
Then read them back later with
> myList <- readRDS( "tmp.RDS" )

Eric

> Each RDS is 40 MBs. What's a slick code to compare them row by row, IDing
> row numbers with mismatches?
