[R] comparing two strings from data

Eric Berger ericjberger at gmail.com
Fri Oct 13 06:39:05 CEST 2017


Combining and completing the advice from Greg and Boris the complete
solution is two lines:

data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE)
match_list <- match( data_2$data1, data_2$data2 )

The vector match_list will have the matching position when it exists and
NA's otherwise. Its length will be the same as the length of data_2$data1.

You should get experience in reading the help information for R functions.
In this case, type ?match to get information about the 'match' function.

HTH,
Eric


On Fri, Oct 13, 2017 at 12:16 AM, Boris Steipe <boris.steipe at utoronto.ca>
wrote:

> It's generally a very good idea to examine the structure of data after you
> have read it in. str(data2) would have shown you that read.csv() turned
> your strings into factors, and that's why the == operator no longer does
> what you think it does.
>
> use ...
>
> data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE)
>
> ... to turn this off. Also, the %in% operator will achieve more directly
> what you are trying to do. No need for loops.
>
> B.
>
>
>
>
> > On Oct 12, 2017, at 4:25 PM, Yasin Gocgun <yasing053 at gmail.com> wrote:
> >
> > Hi,
> >
> > I have two columns that contain numbers along with letters (as shown
> below)
> > and have different lengths. Each entry in the first column is likely to
> be
> > found in the second column at most once.
> >
> > For each entry of the first column, if that entry is found in the second
> > column, I would like to get the corresponding index. For instance, if the
> > first entry of the first column is 5th entry in the second column, I
> would
> > like to keep this index 5.
> >
> > AST2017000005534   TUR2017000001428
> > CTS2017000079930    CTS2017000071989
> > CTS2017000079931     CTS2017000072015
> >
> > In a loop, when I use the following code to get those indices,
> >
> >
> > data_2 = read.csv("excel_data.csv")
> > column_1 = data_2$data1
> > column_2 = data_2$data2
> >
> > match_list <- array(0,dim=c(310,1));  # 310 is the length of the first
> > column
> >
> > for (indx in 1: 310){
> >    for(indx2 in 1:713){ # 713 is the length of the second column
> >        if(column_1[indx] == column_2[indx2] ){
> >            match_list[indx,1] = indx2;
> >            break;
> >        }
> >    }
> > }
> >
> >
> > R provides the following error:
> >
> > Error in Ops.factor(column_1[indx], column_2[indx2]) :
> >  level sets of factors are different
> >
> > So can someone explain me how I can resolve this issue?
> >
> > Thnak you,
> >
> > Yasin
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list