[R-meta] Importing Correlations from PDF to table format

Kiet Huynh k|etduchuynh @end|ng |rom gm@||@com
Wed Mar 2 04:23:19 CET 2022

Hi Wolfgang,

Thank you for your recommendation. Using both the tabulizer package and and rcalc function has done exactly what I was hoping for.

I found the tabulizer package to be much more accurate than the pdftools package. The tabulizer package is mostly accurate, but sometimes it struggles with correctly identifying negative numbers in the correlation table. So I still have to do some data cleaning in R to fix incorrect values. Despite these issues, my process for coding meta-analysis is much more efficient and accurate now. 

Big thanks to you and James for your help!


> On Feb 28, 2022, at 11:04 AM, Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> Hi Kiet,
> The rcalc() function from metafor could be used for this. It even computes the var-cov matrix of the elements in the correlation matrix for you:
> library(metafor)
> R <- matrix(c(1, .3, .5, .3, 1, .6, .5, .4, 1), 3, 3)
> R
> rcalc(R, ni=50)
> Best,
> Wolfgang
>> -----Original Message-----
>> From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org <mailto:r-sig-meta-analysis-bounces using r-project.org>] On
>> Behalf Of Kiet Huynh
>> Sent: Monday, 28 February, 2022 19:45
>> To: James Pustejovsky
>> Cc: R meta
>> Subject: Re: [R-meta] Importing Correlations from PDF to table format
>> Hi James,
>> Thank you for recommending these helpful packages. I was able to import the pdf
>> correlation table into a dataframe format in R. Are you aware of any R code that
>> could convert that correlation matrix dataframe into a meta-analysis type
>> dataframe (i.e., a column for variable 1, a column for variable 2, and a column
>> for correlation effect size)?
>> Best,
>> Kiet
>>> On Feb 25, 2022, at 10:58 AM, James Pustejovsky <jepusto using gmail.com> wrote:
>>> The pdftools package might be helpful:
>>> https://github.com/ropensci/pdftools <https://github.com/ropensci/pdftools> <https://github.com/ropensci/pdftools <https://github.com/ropensci/pdftools>>
>>> It has very low-level utilities for extracting text from pdf. You'd still have
>> to do some data clean-up to get the correlations into the form needed for
>> analysis.
>>> The tabulizer package is meant to provide tools customized for working with pdf
>> tables:
>>> https://github.com/ropensci/tabulizer <https://github.com/ropensci/tabulizer> <https://github.com/ropensci/tabulizer <https://github.com/ropensci/tabulizer>>
>>> But it requires Java and it appears to be archived on CRAN. I'm not sure what
>> its development status is. Caveat emptor, I guess.
>>> James
>>> On Fri, Feb 25, 2022 at 12:20 PM Kiet Huynh <kietduchuynh using gmail.com <mailto:kietduchuynh using gmail.com>
>> <mailto:kietduchuynh using gmail.com <mailto:kietduchuynh using gmail.com>>> wrote:
>>> Hello,
>>> I was wondering if anyone knows of a way to automate in R (or any software) the
>> process of importing correlation values from PDF to usable data in a table format
>> that can be used in meta-analysis? My process has been to copy the correlations
>> manually one-by-one from the PDF to excel (which takes a lifetime!), and then
>> import the excel data into R. I'm sure there must be a better, faster, and less
>> error-prone way to do this.
>>> Thank you,
>>> Kiet
>>> ----
>>> Kiet D. Huynh, Ph.D.
>>> Pronouns: he/him
>>> CLEAR Goldblum-Carr Postdoctoral Fellow
>>> Palo Alto University
>>> 1791 Arastradero Rd.
>>> Palo Alto, CA 94304

	[[alternative HTML version deleted]]

More information about the R-sig-meta-analysis mailing list