[BioC] building a SNPlocs data package [was: other human genomes, other SNP sets?]
Paul Shannon
pshannon at systemsbiology.org
Thu Mar 5 14:13:48 CET 2009
'How to forge a BSgenome data package' answered my first question.
Is there a comparable write up for building a SNPlocs data package?
All I have found is Herve's comments to Praveen on Feb 12 2009:
> The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved
> from dbSNP, from this location to be precise:
>
> ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/
That flat file has snp information from other assemblies -- celera and
HuRef, and that is some of the information I want.
(See example below for one snp from ORM1 on chromosome 9.)
If I want this level of detail, should I parse the original file myself?
Are the parsing code and instructions for building a SNPlocs available?
Thanks,
- Paul
rs1766074|human|9606|snp|genotype=NO|submitterlink=YES|updated
2004-10-04 13:51
ss2622917|SC_JCM|AL356796.4_74652|orient=+|ss_pick=YES
SNP|alleles='C/T'|het=?|se(het)=?
VAL|validated=NO|min_prob=?|max_prob=?|notwithdrawn
CTG|assembly=Celera|chr=9|chr-pos=87735017|NW_924573.1|ctg-
start=1222848|ctg-end=1222848|loctype=2|orient=-
LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A|
frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2|
prot_acc=NP_000598.2
LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3|
residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2
CTG|assembly=HuRef|chr=9|chr-pos=86692957|NW_001839236.2|ctg-
start=2686784|ctg-end=2686784|loctype=2|orient=+
LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A|
frame=3|residue=E|aa_position=141|mrna_acc=NM_000607.2|
prot_acc=NP_000598.2
LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3|
residue=E|aa_position=141|mrna_acc=NM_000607.2|prot_acc=NP_000598.2
CTG|assembly=reference|chr=9|chr-pos=116127163|NT_008470.18|ctg-
start=24408547|ctg-end=24408547|loctype=2|orient=-
LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A|
frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2|
prot_acc=NP_000598.2
LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3|
residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2
On Mar 4, 2009, at 5:12 AM, Paul Shannon wrote:
> Has anyone created a BSgenome object from the Ventner (HuRef),
> Watson, or other recently completed sequencing projects? Or SNPlocs
> data packages for these genomes?
>
> If not, can you offer any advice or cautions to me as I attempt to
> do so myself?
>
> Thanks -
>
> - Paul Shannon
> Institute for Systems Biology
> Seattle
More information about the Bioconductor
mailing list