[BioC] (no subject)

Fri Oct 31 00:43:46 MET 2003

1. I have altered this sentence as it was confusing. Please check that
it says what you intended. If not, please supply replacement text in
this comment. 

Being able to align pairs *OF* very large of molecular sequences is an
essential process for a range of comparative genomic studies

2. This phrase needs clarification, even for an abstract. Is there
another, more explicit way to say what you mean here?

Firstly, the features can define ‘alignment anchor points’ that can be
reasoned with as to their validity

*Firstly, the features can define ‘alignment anchor points’ that can be
reasoned to be significant biologically: that is, they provide a
biologically significant reference point (boundary between biological
features) that can be used to anchor an alignment.*

3. This term is relatively new so it would be valuable to supply the
reader with a definition after it in parantheses. Please advise.

(Common abbreviation, contraction of insertion/deletion.)

4. If it is common to use various programs in this strategy, that should
be said. If not, this sentence doesn’t really fit here.

In addition, alignment outputs that are in different file formats from
various programs can also add to the complexity of this approach.

*In addition, it is normal to use more than one program in the alignment
process, so program outputs that are in different file formats from can
also add to the complexity of this approach.*

5. Which are the authors’ surnames? Please advise so that this citation
and the references can be amended.

Tao Jiang and Peng Zhao 2000

*Have to look for this one.*

6. Should this be ‘Arslan et al 2001’ or is this a separate publication
that needs to be added to the ref list

[Arslan et al, 2001]

7. This means that it isn’t actually a dotplot that is displayed. If
that is true, what is displayed instead? Or do you mean a dotplot?

The display is either a dotplot analogue of these areas of similarity or
a percent identity.

*deelete analogue*

8. This needs clarification. Perhaps you mean a value of percentage
similarity?

The display is either a dotplot analogue of these areas of similarity or
a percent identity

*The display is either a dotplot of these areas of similarity or a value
representing the percent identity of each individual area of simularity*

9. I was not able to find reference to this software in the url you
provided. Please confirm that you have the url listed correctly.

Its there!  unless they mean an academic reference, and there is none
that I am aware of for blastz itself, though it gets a passing mention
in pipmakers reference.

10. Should this be 1990 to refer to the reference listed already or is
it a new one to add to the list?

*[Altschul et al 1990]*

11. This publication isn’t in the reference list. Please supply details,
or shall I delete the citation?

*Couldnt find Zang as primary author, but a better reference would be:*

[Schwartz S., 2000]

Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R,
Hardison R, Miller W. (2000) PipMaker--a web server for aligning two
genomic DNA sequences.  Genome Res 2000 Apr;10(4):577-86.

12. I have modified this sentence to specify which software output was
being referred to. Please check that the modified text is correct.

*looks ok to me now*

13. Is this a specific jargon term or do you mean ‘a suite of programs’?
*MUMmer is more a pipeline (a linked chain of processing modules),
rather than a single program, and has three modules: (1) creation of the
suffix tree, (2) sorting and extraction, and (3) gap alignment.*

14. Is ‘k’ a variable, as the sentence implies? If so, it needs to be
italicised. Please advise if the ‘k’ in ‘k-tuples’ is also a variable (=
number of cont. bases).

Yes, both are variables, and according to my word-processor, bith are
already italicised.

15. Occurrence of what?

Sequences in a database are pre-processed by breaking them into
consecutive k-tuples of k contiguous bases, which are then stored in a
hash table as the position of each occurrence.

*Sequences in a database are pre-processed by breaking them into
consecutive k-tuples of k contiguous bases, which are then stored in a
hash table as the position of each k-tuple.*

Note that this is a quote from the original abstract.

16. Do you mean FBSA here?

Sequences used to demonstrate this system against the reference sequence
AL022723.4.

*Sequences used to demonstrate the FBSA algorithm against the reference
sequence AL022723.4.*

17. Why in this column and in later tables and text do you not use the
suffices in the sequence names (eg .4 or .1?

consistency: should we add all version numbers, or remove all?

18. The url details will be shifted into the references. For citation,
please provide Surnames and year webfile written if possible. For refs 
we will need a title of the web file or page and its version, what it is
(eg [computer program]), when you accessed it.

"We haven't published a paper on RepeatMasker yet but would appreciate
it if you could refer to either this web page (Smit, AFA & Green, P
RepeatMasker at http://ftp.genome.washington.edu/RM/RepeatMasker.html)
or to Smit, AFA & Green, P., unpublished results." 

[Smit, AFA & Green, P., unpublished results.]
Smit, AFA & Green, P., unpublished results. (circa 1997) Computer
program: RepeatMasker at
http://ftp.genome.washington.edu/RM/RepeatMasker.html

still going ...

-- 
William Kenworthy
BioInformatics Officer
  Centre for BioInformatics and Biological Computing
  WA State Lotteries Microarray Facility
W.Kenworthy at murdoch.edu.au
+61 (0)8 9360 2790