[R] FOR TAKING PERCENTAGES of OTUS in each column (n=2910 COLUMNs)

David L Carlson dcarlson at tamu.edu
Tue Mar 7 14:41:31 CET 2017


If you read your data into R, it is simple to compute the percentages. Use Save As in Excel to save your data as a .csv (comma separated variables) file. Then use read.csv() to create a data frame in R as Jim indicated. Put it in the default directory that R is using (this depends on what operating system you are using). Then import the file with

raw_data <- read.csv("YourData.csv")

You may need to add some arguments in read.csv() depending on if you have column headings or not. Blank fields in Excel will be interpreted as missing values, not zeros, but you did not give us any of your data (even just the first 10, rows and columns) so it is impossible to be more specific. Once you have the data frame (and have replaced the missing values with zeros if necessary), the process is simple:

pct_data <- prop.table(as.matrix(raw_data), 2) * 100

will produce a matrix with percentages down each column and store it as a matrix object (variable) called pct_data. R uses different methods to store different kinds of data. The read.csv() function creates a data frame which can handle a mixture of character and numeric data, but the prop.table() function only accepts a matrix of numeric data and returns a matrix of numeric data. The data you described is all numeric so it is easy to switch the data frame to a matrix (and then back again if you want). If you are going to use R, you will need to spend some time reading about how it works, but as you can see, that time invested will make some operations much simpler than Excel and will allow you to conduct analyses that Excel does not even attempt. 

You can get details on these three functions by running the following commands in R:

?read.csv
?prop.table
?as.matrix

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Lemon
Sent: Tuesday, March 7, 2017 2:24 AM
To: Thilini Maddegoda Vidanelage <tmad109 at aucklanduni.ac.nz>; r-help mailing list <r-help at r-project.org>
Subject: Re: [R] FOR TAKING PERCENTAGES of OTUS in each column (n=2910 COLUMNs)

Hi Thilini,
It is fairly simple in R once you have imported the data. Say you have
a data frame obtained by exporting the Excel table to CSV and then
importing it with "read.csv". I'm not sure whether you have a number
in each cell or just a 0/1 absent/present value, but it may not
matter. Assume the data frame is named "tjdf"

for(column in 1:dim(tjdf)[2])
 tjdf[,paste("pct",column,sep="")]<-100*tjdf[,column]/sum(tjdf[,column])

Alternatively, you could create a new data frame with just the percentages.

Jim


On Tue, Mar 7, 2017 at 12:16 PM, Thilini Maddegoda Vidanelage
<tmad109 at aucklanduni.ac.nz> wrote:
> Hi,
> I am analyzing a huge excel table with OTUs. In the table, I have 2910
> columns and 365 rows.Each column represents one individual (n=2910). Rows
> represent microbial species (n=365).
> I have the total of all OTUs of microbial species under each column. Then I
> need to get the percentages of each species in each individual.I started to
> do this in excel but I have to repeat this for 2910 times which is going to
> be very time-consuming.  I am sure there should be a smart way to do this
> and just wondering whether there is any R script to do this.Any help is
> much appreciated.
> Many thanks, Thilini
>
> *Thilini Jayasinghe*
> PhD Candidate
> Liggins Institute
> The University of Auckland
> Building 503/201, 85 Park Road, Grafton, Auckland 2023
> Mobile: +64 220211604
> Email: tmad109 at aucklanduni.ac.nz
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list