[R] How to create a vector by searching information in multiple data.tables in r?

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Fri Jan 31 21:01:29 CET 2020


On Fri, 31 Jan 2020 18:06:00 +0000
Ioanna Ioannou <ii54250 using msn.com> wrote:

> I want to extract e.g., the country from all these files. How can i
> add NA for the files for which the country is not mentioned?

I am starting from the beginning, since I don't know what you have
tried and where exactly you are stuck.

> A<- data.frame( name1 = c('fields', 'fields', 'fields'),
>                               name2= c('category', 'asset',
> 'country'), value  = c('Structure Class', 'Building', 'Colombia')

Given one such data frame, we can use logical vector subscripts to
extract the 'country' field. The following command returns a logical
vector:

A[, 'name2'] == 'country'
# [1] FALSE FALSE  TRUE

If we pass it to the subscript operator (type ?'[' in the R prompt for
more info), we can get the matching rows of the data frame:

subs <- A[, 'name2'] == 'country'
A[subs, ]
#    name1   name2    value
# 3 fields country Colombia

Okay, now we just need to choose the correct column:

A[subs, 'value']
# [1] Colombia
# Levels: Building Colombia Structure Class

What happens if there is no "country" row?

C[C[, 'name2'] == 'country', 'value']
# factor(0)
# Levels: Building Fragility Structure Class

We get a 0-length vector instead of the NA we want. The length()
function and the `if` control-flow construct should let us test for
0-length vectors (see ?length and ?'if'):

x <- C[C[,'name2'] == 'country','value']
if (length(x) == 1) x else NA
# [1] NA

Bonus question: what happens if there is more than one "country" line
in the data frame? What should happen instead?

See also:
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors

Note that the "value" column is a factor (that's why we are getting
these "Levels:" when we print the vectors; see ?factor). You want a
character vector, so we will coerce the value to the desired type using
the as.character() function.

> essentially i want a vector called country which will look like this:
> 
> Country <- c('Colombia', 'Greece', NA)

Once we have a procedure to deal with one data frame, we can apply it
to multiple data frames by putting the procedure into a function and
calling it on a list of data frames using one of the *apply functions
(see ?vapply):

# TODO: produce the list programmatically by calling the JSON reading
# function on a vector of filenames
dataframes <- list(A, B, C)
# perform an anonymous function on each of the data frames,
# return the result as a vector
sapply(dataframes, function(x) {
 country <- x[x[,'name2'] == 'country','value'] # look for "country" row
 # return the country as a string if found one row, NA otherwise
 if (length(country) == 1) as.character(country) else NA
})

I am pretty sure there are other ways to perform this operation, but I
find this one the easiest to explain.

-- 
Best regards,
Ivan

P.S.

> 	[[alternative HTML version deleted]]

Please post e-mails in plain text, not HTML. See
<http://www.R-project.org/posting-guide.html> for more info.



More information about the R-help mailing list