[R] HOW TO FILTER DATA

Rui Barradas ruipbarradas at sapo.pt
Wed Jan 3 22:45:51 CET 2018


Hello,

If you want to select rows with just one IPC, use `==`.
If you want to select rows with several IPC's, use `%in%`.
See the code below for the two ways of doing this.


oecd <- read.table(text = "
Appln_id|Prio_Year|App_year|IPC
1|1999|2000|H04Q007/32
1|1999|2000|G06K019/077
1|1999|2000|H01R012/18
1|1999|2000|G06K017/00
1|1999|2000|H04M001/2745
1|1999|2000|G06K007/00
1|1999|2000|H04M001/02
1|1999|2000|H04M001/275
2|1991|1992|C12N015/62
2|1991|1992|C12N015/09
2|1991|1992|C07K019/00
2|1991|1992|C07K016/26
", header = TRUE, sep = "|")


select_one <- "H04Q007/32"
select_many <- c("H04Q007/32", "H04M001/275")

oecd2 <- subset(oecd, IPC == select_one)
oecd3 <- subset(oecd, IPC %in% select_many)


Hope this helps,

Rui Barradas

On 1/3/2018 7:53 PM, Saptorshee Kanto Chakraborty wrote:
> Hello,
> 
> I have a data of Patents from OECD in delimited text format with IPC being
> one column, I want to filter the data by selecting only certain IPC in that
> column and delete other rows which do not have my required IPCs. Please,
> can anybody guide me doing it, also the IPC codes are string variables.
> 
> The data is somewhat like below, but its a huge dataset containing more
> than 11 million rows
> 
> 
> Appln_id|Prio_Year|App_year|IPC
> 1|1999|2000|H04Q007/32
> 1|1999|2000|G06K019/077
> 1|1999|2000|H01R012/18
> 1|1999|2000|G06K017/00
> 1|1999|2000|H04M001/2745
> 1|1999|2000|G06K007/00
> 1|1999|2000|H04M001/02
> 1|1999|2000|H04M001/275
> 2|1991|1992|C12N015/62
> 2|1991|1992|C12N015/09
> 2|1991|1992|C07K019/00
> 2|1991|1992|C07K016/26
> 
> 
> 
> Thanking You
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list