[R] Combine recursive lists in a single list or data frame and write it to file

Ek Esawi e@@wiek @ending from gm@il@com
Thu Dec 20 02:13:10 CET 2018


Hi All—

 I am using the R tabulizer package to extract tables from pdf files.
The output is a set of lists of matrices. The package extracts tables
and a lot of extra stuff which is nearly impossible to clean with
RegEx. So, I want to clean it manually.
To do so I need to (1) combine all lists in a single list or data
frame and (2) then write the single entity to a text file to edit it.
I could not figure out how.

I tried something like this but did not work.
lapply(MyTables, function(x)
lapply(x,write.table(file="temp.txt",append = TRUE)))

 Any help is greatly appreciated.

 Here is my code:

install.packages("rJava")    ;library(rJava)
install.packages("tabulizer");library(tabulizer)
MyPath <- "C:/Users/name/Documents/tEMP"
ExtTable <- function (Path,CalOrd){
  FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE)
  MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream"))
  if(CalOrd == "Yes"){
    MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames))
    MyOFiles <- match(MyOFiles,month.name)
    MyNFiles <- MyFiles[order(MyOFiles)]}
  else
    MyFiles
}
MyTables <- ExtTable(Path=MyPath,CalOrd = "No")

Here is cleaned portion of the output: The whole output consists of 3
lists, each contains 12, 15, and 12 sub-lists.

 [[2]][[2]]
 [,1]        [,2]    [,3]    [,4]  [,5]    [,6]    [,7]    [,8]    [,9]  [,10]
 [1,] ""          "Avg."  "+_ lo" "n"   "Med."  ""      "Avg."  "+_
lo" "n"   "Med."
 [2,] "SiOz"      "44.0"  "1.26"  "375" "44.1"  "Nb"    "4.8"   "6.3"
 "58"  "2.7"
 [3,] "T i O  2"  "0.09"  "0.09"  "561" "0.09"  "Mo(b)" "50"    "30"
 "3"   "35"
 [4,] "A1203"     "2.27"  "1.10"  "375" "2.20"  "Ru(b)" "12.4"  "4.1"
 "3"   "12"
 [5,] "FeO total" "8.43"  "1.14"  "375" "8.19"  "Pd(b)" "3.9"   "2.1"
 "19"  "4.1"
 [6,] "MnO"       "0.14"  "0.03"  "366" "0.14"  "Ag(b)" "6.8"   "8.3"
 "17"  "4.8"
 [7,] "MgO"       "41.4"  "3.00"  "375" "41.2"  "Cd(b)" "41"    "14"
 "16"  "37"
 [8,] "CaO"       "2.15"  "1.11"  "374" "2.20"  "In(b)" "12"    "4"
 "19"  "12"
 [9,] "Na20"      "0.24"  "0.16"  "341" "0.21"  "Sn(b)" "54"    "31"
 "6"   "36"
[10,] "K20"       "0.054" "0.11"  "330" "0.028" "Sb(b)" "3.9"   "3.9"
 "11"  "3.2"
[11,] "P205"      "0.056" "0.11"  "233" "0.030" "Te(b)" "11"    "4"
 "18"  "10"
[12,] "Total"     "98.88" ""      ""    "98.43" "Cs(b)" "10"    "16"
 "17"  "1.5"
[13,] ""          ""      ""      ""    ""      "Ba"    "33"    "52"
 "75"  "17"
[14,] "Mg-value"  "89.8"  "1.1"   "375" "90.0"  "La"    "2.60"  "5.70"
 "208" "0.77"
[15,] "Ca/AI"     "1.28"  "1.6"   "374" "1.35"  "Ce"    "6.29"  "11.7"
 "197" "2.08"
[16,] "AI/Ti"     "22"    "29"    "361" "22"    "Pr"    "0.56"  "0.87"
 "40"  "0.21"
[17,] "F e / M n" "60"    "10"    "366" "59"    "Nd"    "2.67"  "4.31"
 "162" "1.52"
[18,] ""          ""      ""      ""    ""      "Sm"    "0.47"  "0.69"
 "214" "0.25"
[19,] "Li"        "1.5"   "0.3"   "6"   "1.5"   "Eu"    "0.16"  "0.21"
 "201" "0.097"
[20,] "B"         "0.53"  "0.07"  "6"   "0.55"  "Gd"    "0.60"  "0.83"
 "67"  "0.31"
[21,] "C"         "110"   "50"    "13"  "93"    "Tb"    "0.070"
"0.064" "146" "0.056"
[22,] "F"         "88"    "71"    "15"  "100"   "Dy"    "0.51"  "0.35"
 "58"  "0.47"
[23,] "S"         "157"   "77"    "22"  "152"   "Ho"    "0.12"  "0.14"
 "54"  "0.090"
[24,] "C1"        "53"    "45"    "15"  "75"    "Er"    "0.30"  "0.22"
 "52"  "0.28"
[25,] "Sc"        "12.2"  "6.4"   "220" "12.0"  "Tm"    "0.038"
"0.026" "40"  "0.035"
[26,] "V"         "56"    "21"    "132" "53"    "Yb"    "0.26"  "0.14"
 "201" "0.27"
[27,] "Cr"        "2690"  "705"   "325" "2690"  "Lu"    "0.043"
"0.023" "172" "0.045"
[28,] "Co"        "112"   "10"    "166" "111"   "Hf"    "0.27"  "0.30"
 "71"  "0.17"
[29,] "Ni"        "2160"  "304"   "308" "2140"  "Ta"    "0.40"  "0.51"
 "38"  "0.23"
[30,] "Cu"        "11"    "9"     "94"  "9"     "W(b)"  "7.2"   "5.2"
 "6"   "4.0"
[31,] "Zn"        "65"    "20"    "129" "60"    "Re(b)" "0.13"  "0.11"
 "18"  "0.09"
[32,] "Ga"        "2.4"   "1.3"   "49"  "2.4"   "Os(b)" "4.0"   "1.8"
 "18"  "3.7"
[33,] "Ge"        "0.96"  "0.19"  "19"  "0.92"  "Ir(b)" "3.7"   "0.9"
 "34"  "3.0"
[34,] "As"        "0.11"  "0.07"  "7"   "0.10"  "Pt(b)" "7"     "-"
 "1"   "-"
[35,] "Se"        "0.041" "0.056" "18"  "0.025" "Au(b)" "0.65"  "0.53"
 "30"  "0.5"
[36,] "Br"        "0.01"  "0.01"  "6"   "0.01"  "Tl(b)" "1.2"   "1.0"
 "13"  "0.9"
[37,] "Rb"        "1,9"   "4.8"   "97"  "0.38"  "Pb"    "0.16"  "0.11"
 "17"  "0.16"
[38,] "Sr"        "49"    "60"    "110" "20"    "Bi(b)" "1.7"   "0.7"
 "13"  "1.6"
[39,] "Y"         "4.4"   "5.5"   "86"  "3.1"   "Th*"   "0.71"  "1.2"
 "71"  "0.22"
[40,] "Zr"        "21"    "42"    "82"  "8.0"   "U"     "0.12"  "0.23"
 "48"  "0.040"
[[2]][[4]]
[,1]       [,2]                 [,3]     [,4]                  [,5]
 [,6]
 [1,] ""         "Spinel peridotites" ""       "Garnet  peridotites"
""       "Primitive"
 [2,] ""         "Avg. Meal."         "M-A sp" "M-A gt B-M"
"Jordan" "mantle"
 [3,] "SiO 2"    "44.0 44.1"          "44.15"  "44.99 45.00"
"45.55"  "44.8"
 [4,] "TiO 2"    "0.09 0.09"          "0.07"   "0.06 0.08"
"0.11"   "0.21"
 [5,] "A1203"    "2.27 2.20"          "1.96"   "1.40 1.31"
"1.43"   "4.45"
 [6,] "Cr203"    "0.39 0.39"          "0.44"   "0.32 0.38"
"0.34"   "0.43"
 [7,] "FeOtotal" "8.43 8.19"          "8.28"   "7.89 6.97"
"7.61"   "8.40"
 [8,] "Mn O"     "0.14 0.14"          "0.12"   "0.11 0.13"
"0.11"   "0.14"
 [9,] "MgO"      "41.4 41.2"          "42.25"  "42.60 44.86"
"43.55"  "37.2"
[10,] "NiO"      "0.27 0.27"          "0.27"   "0.26 0.29"
"-"      "0.24"
[11,] "CaO"      "2.15 2.20"          "2.08"   "0.82 0.77"
"1.05"   "3.60"
[12,] "Na  20"   "0.24 0.21"          "0.18"   "0.11 0.09"
"0.14"   "0.34"
[13,] "K 2 0"    "0.054 0.028"        "0.05"   "0.04 0.10"
"0.11"   "0.028"
[14,] "P205"     "0.056 0.030"        "0.02"   "- 0.01"
"-"      "0.022"
[15,] "Total"    "99.49 99.05"        "99.87"  "98.60 100.00"
"100.00" "99.86"
[16,] "Mg-value" "89.8 90.0"          "90.1"   "90.6 92.0"
"91.1"   "88.8"
[17,] "olivine"  "62 63"              "67"     "65 68"
"66"     "56 57"
[18,] "opx"      "24 24"              "22"     "28 25"
"28"     "22 17"
[19,] "cpx"      "12 11"              "9"      "3 2"
"3"      "19 10"
[20,] "spinel"   "2 2"                "2"      "- -"
"-"      "3 -"

Here is portion of the output for str(MyTables):

str(MyTables)

List of 3
 $ :List of 12
$ : chr [1:3, 1:2] "south of the artificial lake Lokka. Intrusive
complexes" "of alkaline rocks are found at Sokli (phosphorite-bear-"
"ing and a possible Nb-occurrence) in Finland, and at" "(Eriksson,
1992). During this period, Northern Europe" ...
  ..$ : chr [1:55, 1:15] "Element" "Ag" "Al" "Al_XRF" ...
  ..$ : chr [1:56, 1:2] "in the till is mainly of local origin,
although some cob-" "bles and boulders may have been transported over
sev-" "eral kilometres. The moraine formations in the study" "area are
mostly gravelly and sandy tills, locally hum-" ...
  ..$ : chr [1:53, 1:2] "requisites. PCA accounts for maximum variance
of all" "variables, while FA is based on the correlation structure"
"of the variables. The model of factor analysis allows that" "the
common factors do not explain the total variation of" ...
  ..$ : chr [1:54, 1:7] "lished examples of the use of factor
analysis, it is neglec-" "ted that regional geochemical (and
environmental) data" "almost never follow a normal distribution.
Continuing Method" "with factor analysis in such a case must lead to
biased" ...
  ..$ : chr [1:16, 1:2] "shows the factor loadings of the different
variables" "entering each factor. Names of variables with an abso-"
"lute value of the loadings <0.3 are not plotted. Fig. 5" "shows 8
results of factor analyses using a selection of all" ...
  ..$ : chr [1:21, 1:2] "pretable results, notwithstanding the fact
that on the" "basis of the foregoing discussion it should probably
not" "be used with these data. Do these results warrant the use" "of a
quite work-intensive method? Unfortunately not," ...
  ..$ : chr [1:55, 1:8] "" "Ag" "Al" "Al_XRF" ...
  ..$ : chr [1:23, 1:2] "addition, geochemical reasoning (e.g.
geochemical asso-" "ciations and/or pathfinder elements for different
types of" "ore deposits) was used to select further sub-sets of vari-"
"ables. In geochemistry, the selection of elements entered" ...
  ..$ : chr [1:55, 1:2] "Fig. 10C cuts several geological units, and
is most likely" "indicative of alteration processes related to a
deep-" "seated fault. It was revealed again in a factor analysis"
"carried out with all those elements extracted by aqua" ...
  ..$ : chr [1:50, 1:2] "well justified in stating that it is not very
scientific to" "play with the selection of elements and number of
fac-" "tors extracted until one
â\200\230â\200\230findsâ\200\231â\200\231 an
â\200\230â\200\230interestingâ\200\231â\200\231 result." "On the other
hand, even all the different results pre-" ...
  ..$ : chr [1:24, 1:2] "Niemelä, J., Ekman, I., Lukashov, A. (Eds.),
1993. Quaternary" "Deposits of Finland and Northwestern Part of
Russian Fed-" "eration and Their Resources 1:1,000,000. Geological
Survey" "of Finland, Espoo, Finland." ...
 $ :List of 15



More information about the R-help mailing list