[R-sig-genetics] Polymorphic sites counting and fasta file expanding

Zhian N. Kamvar zk@mv@r @end|ng |rom gm@||@com
Thu Apr 23 19:16:06 CEST 2020


You can use the functions in the ape package for this (where DNAbin is 
defined).

To find the number of polymorphic sites, you can use the `seg.sites()` 
function:

library(ape)

data(woodmouse)

length(seg.sites(woodmouse))


To expand the the fasta file, read it in as a DNAbin object and then use 
the table with `rep()` to expand the indices in the matrix:

mytable <- data.frame(ids = names(woodmouse), n = sample(10, 
nrow(woodmouse), rep = TRUE), stringsAsFactors = FALSE)

expanded_woodmouse <- woodmouse[with(mytable, rep(ids, n)), ]

Hope that helps,

Zhian

On 4/23/20 6:02 AM, Guillaume SCHWOB wrote:
> Hello dear community,
>
> I would have two questions:
>
> The first one. I am looking for a way to count the number of polymorphic sites across a DNAbin object in R ?
> I was thinking about using the package Pegas, but I didn’t find any function that could do that.
>
> The second one. How can I expand a fasta file that contains unique sequences of haplotypes according to a frequency table ?
>
> I would be grateful to any guidance,
>
> Best wishes,
>
> Guillaume SCHWOB
> Post Doctorado en Ecología Microbiana
> Proyecto Genomics Antarctic Biodiversity (GAB)
>
> Laboratorio de Ecología Molecular (LEM)
> Departamento de Ciencias Ecológicas
> Facultad de Ciencias, Universidad de Chile
> Las Palmeras 3425,
> CP 7800003, Ñuñoa, Santiago, Chile
>
> http://www.antarcticgenomics.cl <http://www.antarcticgenomics.cl/>
> https://www.researchgate.net/profile/Guillaume_Schwob <https://www.researchgate.net/profile/Guillaume_Schwob>
>
>
>
>
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-genetics mailing list
> R-sig-genetics using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-genetics



More information about the R-sig-genetics mailing list