[R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

Fri May 3 22:57:09 CEST 2024

On Fri, 3 May 2024 18:17:52 +0200
Maciej Nasinski <nasinski.maciej using gmail.com> wrote:

> I found the https://github.com/hrbrmstr/rdaradar solution and ran it
> on the 100 most downloaded R packages.
> Happily, all data/inst rda files are safe/non-exposed to RDS exploit
> (using the linked solution).

This is a bit useful - knowing that there are no obvious exploits in
the 100 most downloaded CRAN packages is better that not knowing that - 
but it is important to keep the big picture in mind. Bob himself said
that the script is "super basic". Currently, it only checks whether an
*.rda file, when loaded in the global environment, would shadow certain
important functions. This is not an attack a package author would
perform; this is something one would send directly to the victim.

In order to defeat an attacker, you must think like an attacker.

Here's someone jokingly describing how they would trojan the world's
online shop checkout systems if they wanted to commit financial crimes:
https://archive.ph/FCdBu
(With kindness and pull requests.)

Here's someone spending two years to plant a fake maintainer with a
backdoor in a key free software project:
https://lwn.net/Articles/967192/
(The backdoor was assembled from obfuscated "test files for the
decompressor".)

Here's the 2015 Underhanded C Contest, where people competed in writing
the most harmless-looking code that would instead do something
nefarious: http://www.underhanded-c.org/

On the one hand, hiding the bad functions in a data file (which is
compressed and binary) instead of the R files (which are plain text and
indexed everywhere) would be the obvious first step, so it may be
useful to flag data files with functions in them for human review.

On the other hand, an evil package author has so many tools at their
disposal that they may not need this one in particular. There are CRAN
packages with tens of megabytes of compiled code inside. Sneaking a
little extra something in a file starting with "// This is generated
grammar parser. Do not edit!" followed by an impenetrable wall of C
could be easier and stay undetected for longer. How many packages use
Java? You don't even have to ship the Java source together with an R
package, so one of your *.jars could have a poisoned dependency with
nobody being the wiser.

Attackers are very cunning, and we don't even know what exactly we are
looking for. We can automate some of it, but the kind of code review
that will spot an evil function tucked 50 layers inside a giant
auxiliary data object is a lot of effort, hours to days per package.

> It will be great to run it on all CRAN packages, but I imagine we
> should be sure that the check is decent enough to not overload the
> servers without a need.

This probably counts as creating an unofficial CRAN mirror:
https://cran.r-project.org/mirror-howto.html

(I remember someone sending too many requests to download packages one
my one and losing access from a university address to CRAN as a result.)

You'll need 12.7 Gb for the current versions of the packages or >400 Gb
for the whole archive.

-- 
Best regards,
Ivan