[R] How to create a vector by searching information in multiple data.tables in r?
Ivan Krylov
kry|ov@r00t @end|ng |rom gm@||@com
Fri Jan 31 21:01:29 CET 2020
On Fri, 31 Jan 2020 18:06:00 +0000
Ioanna Ioannou <ii54250 using msn.com> wrote:
> I want to extract e.g., the country from all these files. How can i
> add NA for the files for which the country is not mentioned?
I am starting from the beginning, since I don't know what you have
tried and where exactly you are stuck.
> A<- data.frame( name1 = c('fields', 'fields', 'fields'),
> name2= c('category', 'asset',
> 'country'), value = c('Structure Class', 'Building', 'Colombia')
Given one such data frame, we can use logical vector subscripts to
extract the 'country' field. The following command returns a logical
vector:
A[, 'name2'] == 'country'
# [1] FALSE FALSE TRUE
If we pass it to the subscript operator (type ?'[' in the R prompt for
more info), we can get the matching rows of the data frame:
subs <- A[, 'name2'] == 'country'
A[subs, ]
# name1 name2 value
# 3 fields country Colombia
Okay, now we just need to choose the correct column:
A[subs, 'value']
# [1] Colombia
# Levels: Building Colombia Structure Class
What happens if there is no "country" row?
C[C[, 'name2'] == 'country', 'value']
# factor(0)
# Levels: Building Fragility Structure Class
We get a 0-length vector instead of the NA we want. The length()
function and the `if` control-flow construct should let us test for
0-length vectors (see ?length and ?'if'):
x <- C[C[,'name2'] == 'country','value']
if (length(x) == 1) x else NA
# [1] NA
Bonus question: what happens if there is more than one "country" line
in the data frame? What should happen instead?
See also:
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors
Note that the "value" column is a factor (that's why we are getting
these "Levels:" when we print the vectors; see ?factor). You want a
character vector, so we will coerce the value to the desired type using
the as.character() function.
> essentially i want a vector called country which will look like this:
>
> Country <- c('Colombia', 'Greece', NA)
Once we have a procedure to deal with one data frame, we can apply it
to multiple data frames by putting the procedure into a function and
calling it on a list of data frames using one of the *apply functions
(see ?vapply):
# TODO: produce the list programmatically by calling the JSON reading
# function on a vector of filenames
dataframes <- list(A, B, C)
# perform an anonymous function on each of the data frames,
# return the result as a vector
sapply(dataframes, function(x) {
country <- x[x[,'name2'] == 'country','value'] # look for "country" row
# return the country as a string if found one row, NA otherwise
if (length(country) == 1) as.character(country) else NA
})
I am pretty sure there are other ways to perform this operation, but I
find this one the easiest to explain.
--
Best regards,
Ivan
P.S.
> [[alternative HTML version deleted]]
Please post e-mails in plain text, not HTML. See
<http://www.R-project.org/posting-guide.html> for more info.
More information about the R-help
mailing list