[R] Parsing a Simple Chemical Formula
Bryan Hanson
hanson at depauw.edu
Mon Dec 27 02:28:17 CET 2010
Thanks Spencer, I'll definitely have a look at this package and it's
vignettes. I believe I have looked at it before, but didn't catch it
on this particular search. Bryan
On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:
> p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I
> have not looked at these vignettes, but most vignettes provide
> excellent introductions (though rarely with complete coverage) of
> important capabilities of the package. (The 'sos' package includes
> a vignette, which exposes more capabilities than the example below.)
>
>
> ######################
> Have you considered the 'CHNOSZ' package?
>
>
>> makeup("C5H11BrO" )
> count
> C 5
> H 11
> Br 1
> O 1
>
>
> I found this using the 'sos' package as follows:
>
>
> library(sos)
> cf <- ???'chemical formula'
> found 21 matches; retrieving 2 pages
> cf
>
>
> The print method for "cf" opened the results in a web browser,
> which showed that the "CHNOSZ" package had 14 of these 11 matches,
> and the other 7 were in 7 different packages. Moreover, the
> "CHNOSZ" package is devoted to "Chemical Thermodynamics and Activity
> Diagrams" and provides many more capabilities that might interest you.
>
>
> Hope this helps.
> Spencer
>
>
> On 12/26/2010 5:01 PM, Bryan Hanson wrote:
>> Well let me just say thanks and WOW! Four great ideas, each worthy
>> of
>> study and I'll learn several things from each. Interestingly, these
>> solutions seem more general and more compact than the solutions I
>> found on the 'net using python and perl. More evidence for the power
>> of R! A big thanks to each of you! Bryan
>>
>> On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:
>>
>>> On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <hanson at depauw.edu>
>>> wrote:
>>>> Hello R Folks...
>>>>
>>>> I've been looking around the 'net and I see many complex
>>>> solutions in
>>>> various languages to this question, but I have a pretty simple need
>>>> (and I'm
>>>> not much good at regex). I want to use a chemical formula as a
>>>> function
>>>> argument. The formula would be in "Hill order" which is to list C,
>>>> then H,
>>>> then all other elements in alphabetical order. My example will
>>>> have
>>>> only a
>>>> limited number of elements, few enough that one can search directly
>>>> for each
>>>> element. So some examples would be C5H12, or C5H12O or C5H11BrO
>>>> (note that
>>>> for oxygen and bromine, O or Br, there is no following number
>>>> meaning a 1 is
>>>> implied).
>>>>
>>>> Let's say
>>>>
>>>>> form <- "C5H11BrO"
>>>>
>>>> I'd like to get the count of each element, so in this case I need
>>>> to
>>>> extract
>>>> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the
>>>> molecular
>>>> weight by mulitplying). Sounds pretty simple, but my experiments
>>>> with grep
>>>> and strsplit don't immediately clue me into an obvious solution.
>>>> As
>>>> I said,
>>>> I don't need a general solution to the problem of calculating
>>>> molecular
>>>> weight from an arbitrary formula, that seems quite challenging,
>>>> just
>>>> a way
>>>> to convert "form" into a list or data frame which I can then do the
>>>> math on.
>>>>
>>>> Here's hoping this is a simple issue for more experienced R users!
>>>> TIA,
>>>
>>> This can be done by strapply in gsubfn. It matches the regular
>>> expression to the target string passing the back references (the
>>> parenthesized portions of the regular expression) through a
>>> specified
>>> function as successive arguments.
>>>
>>> Thus the first arg is form, your input string. The second arg is
>>> the
>>> regular expression which matches an upper case letter optionally
>>> followed by lower case letters and all that is optionally followed
>>> by
>>> digits. The third arg is a function shown in a formula
>>> representation. strapply passes the back references (i.e. the
>>> portions
>>> within parentheses) to the function as the two arguments. Finally
>>> simplify is another function in formula notation which turns the
>>> result into a matrix and then a data frame. Finally we make the
>>> second column of the data frame numeric.
>>>
>>> library(gsubfn)
>>>
>>> DF <- strapply(form,
>>> "([A-Z][a-z]*)(\\d*)",
>>> ~ c(..1, if (nchar(..2)) ..2 else 1),
>>> simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
>>> FALSE))
>>> DF[[2]] <- as.numeric(DF[[2]])
>>>
>>> DF looks like this:
>>>
>>>> DF
>>> V1 V2
>>> 1 C 5
>>> 2 H 11
>>> 3 Br 1
>>> 4 O 1
>>>
>>>
>>>
>>> --
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Spencer Graves, PE, PhD
> President and Chief Operating Officer
> Structure Inspection and Monitoring, Inc.
> 751 Emerson Ct.
> San José, CA 95126
> ph: 408-655-4567
>
More information about the R-help
mailing list