[R-sig-genetics] ape, read.dna to phase fasta file not working properly, tajima

Ella Bowles bowlese at gmail.com
Fri Sep 1 00:11:58 CEST 2017


Hello,

I wanted to send a follow-up note to say that the developer helped me with
my problem. His reply was
The problem is that your data are too big (too many sequences) and
tajima.test() needs to compute the matrix of all pairwise distances. You
could this check by trying:

dist.dna(DNAbin8c18, "N")

One possibility for you is to sample randomly some observations, and repeat
this many times, eg:

tajima.test(DNAbin8c18[sample(n, size = 1000), ])

This could be:

N <- 1000 # number of repeats
RES <- matrix(N, 3)
for (i in 1:N)
    RES[, i] <- unlist(tajima.test(DNAbin8c18[sample(n, size = 10000), ]))

You may adjust N and 'size =' to have something not too long to run. Then
you may look at the distribution of the columns of RES.

On Wed, Aug 30, 2017 at 4:49 PM, Ella Bowles <bowlese at gmail.com> wrote:

>>  fasta8c18.fa
> <https://drive.google.com/file/d/0B6qb8IlaQGFZX0Q0YzNUVGNHV1E/view?usp=drive_web>
> ​Hello,
>
> I'm trying to complete the very simple task of reading in an unphased
> fasta file and phasing it using ape, and then calculating Tajima's D using
> pegas, but my data doesn't seem to be reading in correctly. Input and
> output is as follows:
> library("ape")
> library("adegenet")
> library("ade4")
> library("pegas")
>
> > DNAbin8c18 <- read.dna(file="fasta8c18.fa", format="f")
> > data(DNAbin8c18)
> Warning message:
> In data(DNAbin8c18) : data set ‘DNAbin8c18’ not found
>
> ##clearly the data is not read in properly, so looked at what had been
> loaded
>
> > DNAbin8c18
> 817452 DNA sequences in binary format stored in a matrix.
>
> All sequences of same length: 96
>
> Labels:
> CLocus_12706_Sample_1_Locus_34105_Allele_0 [BayOfIslands_s08...
> CLocus_12706_Sample_2_Locus_31118_Allele_0 [BayOfIslands_s08...
> CLocus_12706_Sample_3_Locus_30313_Allele_0 [BayOfIslands_s09...
> CLocus_12706_Sample_5_Locus_33345_Allele_0 [BayOfIslands_s09...
> CLocus_12706_Sample_7_Locus_37388_Allele_0 [BayOfIslands_s09...
> CLocus_12706_Sample_8_Locus_29451_Allele_0 [BayOfIslands_s09...
> ...
>
> More than 10 million nucleotides: not printing base composition
>
> ##although likely won't work, trying taj d test to see what happens
> > tajima.test(DNAbin8c18)
> Error: cannot allocate vector of size 2489.3 Gb
>
> ​I'm sending the datafile along as a link as well.
>
> Any thoughts would be much appreciated.
>
> Ella​
>
> --
> Ella Bowles, PhD
> Postdoctoral Researcher
> Department of Biology
> Concordia University
>
> Website: https://ellabowlesphd.wordpress.com/
> Email: bowlese at gmail.com
>



-- 
Ella Bowles, PhD
Postdoctoral Researcher
Department of Biology
Concordia University

Website: https://ellabowlesphd.wordpress.com/
Email: bowlese at gmail.com

	[[alternative HTML version deleted]]



More information about the R-sig-genetics mailing list