[R-pkg-devel] Possible malware(?) in a vignette

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Sat Jan 27 15:35:13 CET 2024


First, let's take a step back, because I think there is way too much confusion here.

The original report was about the vignette from the poweRlaw package version 0.70.6. That package contains a vignette file d_jss_paper.pdf with the SHA256 hash 9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9 (md5 e0439db551e1d34e9bf8713fca27887b). This is the same file that would be available for download from the web view until the new version was published. However, I assume we are talking about the same file based on the fact that Iñaki's VirusTotal URL has exactly the same hash, i.e., web view and the package are identical (I also checked the other hashes just to be really sure). That's why I think we're barking up the wrong tree here since this is not about cache poisoning, file swaps or anything like that - the file has never been modified - it is the same file that has been submitted to CRAN in 2020.

That's why I was saying that this most likely has nothing to do with CRAN at all, but rather the question is if that old file has included some malware for the last 4 years or if simply the AV software is misclassifying due to a false-positive detection. I'm not a security expert, but based on the little information available and inspection of the streams I came to the conclusion that it's likely a false-positive. The main reason that made me think so was that submitting the exact same *identical* PDF payload with just one-byte change to the /ID (which is functionally not used by Acrobat) results in the file NOT being flagged as malicious by VirusTotal by any of the security vendors. That said, I'm not a security expert, so I may be wrong or I'm missing something, that's why I was asking for someone with more expertise to actually look at the file as opposed to just trusting auto-generated reports that may be wrong. But that is not beyond my power.

(Also if it turns out that the file did contain malware, it would be good to know what we can do - for example, nowadays we are re-compressing streams and/or filtering through GS so one could imagine that it could be also effective at removing PDF malware - if it is real.)

More responses inline.


> On Jan 28, 2024, at 1:10 AM, Bob Rudis <bob using rud.is> wrote:
> 
> Simon: Is there a historical record of the hashes of just the PDFs
> that show up in the CRAN web view?
> 

Not the website, but hashes are recorded in the packages - so you can verify that the file has not changed for years (I can directly confirm it has not changed as far back as May 2021).


> Ivan: do you know what mirror NOAA used at that time to get that version of
> the package? Or, did they pull it "directly" from cran.r-project.org
> (scare-quotes only b/c DNS spoofing is and has been a pretty solid attack
> vector)?
> 
> I've asked the infosec community if anyone has VT Enterprise to do a
> historical search on any PDFs that come directly from cran.r-project.org (I
> don't have VT Enterprise). It is possible there are other PDFs from that
> timeframe with similar issues (again, not saying CRAN had any issues; this
> could still be crawler cache poisoning).
> 
> I don't know if any university folks have grad student labor to harness,
> but having a few of them do some archive.org searches for other PDFs in
> that timeframe, and note the source of the archive (likely Common Crawl) if
> there are other real issues, that'd be a solid path forward for triage.
> 
> The fact that the current PDF on CRAN — which uses some of the same
> 7-year-old PDF & JPEG images from —
> https://github.com/csgillespie/poweRlaw/tree/main/vignettes — is not being
> flagged, means it's likely not an issue with Colin's sources.
> 
> Simon: it might be a good idea for all *.r-project.org sites to set up CAA
> records (
> https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization)
> since that could help prevent adjacent TLS spoofing.
> 
> Also having something running — https://github.com/SSLMate/certspotter> can let y'all know if certs are created for *.r-project.org domains. That
> won't help for well-resourced attacks, but it does add some layers that may
> give a heads-up for any mid-grade spoofing attacks.
> 


All well meant, but remember that CRAN is mirrored worldwide, we have control pretty much only over the WU master. That said, we can have a look, but DNS changes are not as easy as you would think.

Cheers,
Simon



More information about the R-package-devel mailing list