[Bioc-devel] Bioconductor code search (beta)

Itoshi NIKAIDO dritoshi at gmail.com
Wed Jun 6 23:42:07 CEST 2012


Hi Michael

Thank your for your suggestion.

I used a tokenizer defined in Unicode Standard Annex #29 in my system.
http://unicode.org/reports/tr29/#Word_Boundaries

Your idea is impressive. In my other idea, i should use a n-gram indexer is a
making contiguous sequence of n items from a given sequence of text.
Bioconductor package has not only R code but also C and C++ code. a
N-gram indexer will not depend on programming language. I need
general way for indexing to apply some kind of codes and documents.

I would like to compare and try some idea.

Thanks

Itoshi NIKAIDO, Ph.D.
FF20 8296 ED6F D9E5 7D05  8A0F 65D8 C2F5 C8D7 2CE2


On Wed, Jun 6, 2012 at 11:13 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Hi Itoshi,
>
> This is a cool idea. I'm wondering if it would be easier to implement if you
> just used the R parser to do the tokenization. A Lucene analyzer for R code
> would be generally useful.
>
> Does anyone know if there are tools similar to "sonar" for Java that produce
> code quality metrics, etc? That would be fun to run over Bioconductor.
>
> Michael
>
>
> On Tue, Jun 5, 2012 at 11:06 AM, Itoshi NIKAIDO <dritoshi at gmail.com> wrote:
>>
>> Hi Martin
>>
>> Thank you for your quick response. Now, i am improving a tokenizer
>> to optimize index of database.
>>
>> Itoshi
>>
>> Itoshi NIKAIDO, Ph.D.
>> FF20 8296 ED6F D9E5 7D05  8A0F 65D8 C2F5 C8D7 2CE2
>>
>>
>> On Wed, Jun 6, 2012 at 2:36 AM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> > On 6/5/2012 10:33 AM, Itoshi NIKAIDO wrote:
>> >>
>> >> Dear bioconductor developers
>> >>
>> >> I'm Itoshi Nikaido, developer of BrainStars package and maintainer of
>> >> Japan mirror of Bioconductor.
>> >>
>> >> http://www.bioconductor.org/packages/release/bioc/html/BrainStars.html
>> >> http://bioconductor.jp/
>> >>
>> >> Today, I am happy to announce of opening a full code search engine for
>> >> all bioconductor package (only devel version).  The site has following
>> >> features:
>> >>
>> >> - fast full text search
>> >> - code highlighting
>> >> - facet search by package name
>> >>
>> >> This site is beta version. I have some plans to improve and add
>> >> functions.
>> >>
>> >> - Optimization of tokenizer
>> >> - Query keyword highlighting
>> >> - Linking to official SVN site in each code
>> >> - Permalink have to be change nonsense code id to package file path
>> >>
>> >> and so on.
>> >>
>> >> Bioconductor Code Search
>> >> http://search.bioconductor.jp/
>> >>
>> >> If you have idea for improving the site, Please comment to me.
>> >
>> >
>> > neat; I searched for
>> >
>> > trimLRPatterns
>> >
>> > and got hits in ShortRead but not Biostrings (where it is defined).
>> >
>> > Martin
>> >>
>> >>
>> >> Thanks
>> >>
>> >> --
>> >> Itoshi NIKAIDO, Ph.D.
>> >> RIKEN Center for Developmental Biology
>> >> D-E203, 2-2-3, Minatojima-minamimachi, Chuo-ku,
>> >> Kobe, Hyogo 650-0047, Japan
>> >> http://www.hackingisbelieving.org/
>> >>
>> >> _______________________________________________
>> >> Bioc-devel at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>> >
>> >
>> > --
>> > Dr. Martin Morgan, PhD
>> > Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N.
>> > PO Box 19024 Seattle, WA 98109
>> >
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>



More information about the Bioc-devel mailing list