[R] Parsing a Simple Chemical Formula
hanson at depauw.edu
Mon Dec 27 02:28:17 CET 2010
Thanks Spencer, I'll definitely have a look at this package and it's
vignettes. I believe I have looked at it before, but didn't catch it
on this particular search. Bryan
On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:
> p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes. I
> have not looked at these vignettes, but most vignettes provide
> excellent introductions (though rarely with complete coverage) of
> important capabilities of the package. (The 'sos' package includes
> a vignette, which exposes more capabilities than the example below.)
> Have you considered the 'CHNOSZ' package?
>> makeup("C5H11BrO" )
> C 5
> H 11
> Br 1
> O 1
> I found this using the 'sos' package as follows:
> cf <- ???'chemical formula'
> found 21 matches; retrieving 2 pages
> The print method for "cf" opened the results in a web browser,
> which showed that the "CHNOSZ" package had 14 of these 11 matches,
> and the other 7 were in 7 different packages. Moreover, the
> "CHNOSZ" package is devoted to "Chemical Thermodynamics and Activity
> Diagrams" and provides many more capabilities that might interest you.
> Hope this helps.
> On 12/26/2010 5:01 PM, Bryan Hanson wrote:
>> Well let me just say thanks and WOW! Four great ideas, each worthy
>> study and I'll learn several things from each. Interestingly, these
>> solutions seem more general and more compact than the solutions I
>> found on the 'net using python and perl. More evidence for the power
>> of R! A big thanks to each of you! Bryan
>> On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:
>>> On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <hanson at depauw.edu>
>>>> Hello R Folks...
>>>> I've been looking around the 'net and I see many complex
>>>> solutions in
>>>> various languages to this question, but I have a pretty simple need
>>>> (and I'm
>>>> not much good at regex). I want to use a chemical formula as a
>>>> argument. The formula would be in "Hill order" which is to list C,
>>>> then H,
>>>> then all other elements in alphabetical order. My example will
>>>> only a
>>>> limited number of elements, few enough that one can search directly
>>>> for each
>>>> element. So some examples would be C5H12, or C5H12O or C5H11BrO
>>>> (note that
>>>> for oxygen and bromine, O or Br, there is no following number
>>>> meaning a 1 is
>>>> Let's say
>>>>> form <- "C5H11BrO"
>>>> I'd like to get the count of each element, so in this case I need
>>>> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the
>>>> weight by mulitplying). Sounds pretty simple, but my experiments
>>>> with grep
>>>> and strsplit don't immediately clue me into an obvious solution.
>>>> I said,
>>>> I don't need a general solution to the problem of calculating
>>>> weight from an arbitrary formula, that seems quite challenging,
>>>> a way
>>>> to convert "form" into a list or data frame which I can then do the
>>>> math on.
>>>> Here's hoping this is a simple issue for more experienced R users!
>>> This can be done by strapply in gsubfn. It matches the regular
>>> expression to the target string passing the back references (the
>>> parenthesized portions of the regular expression) through a
>>> function as successive arguments.
>>> Thus the first arg is form, your input string. The second arg is
>>> regular expression which matches an upper case letter optionally
>>> followed by lower case letters and all that is optionally followed
>>> digits. The third arg is a function shown in a formula
>>> representation. strapply passes the back references (i.e. the
>>> within parentheses) to the function as the two arguments. Finally
>>> simplify is another function in formula notation which turns the
>>> result into a matrix and then a data frame. Finally we make the
>>> second column of the data frame numeric.
>>> DF <- strapply(form,
>>> ~ c(..1, if (nchar(..2)) ..2 else 1),
>>> simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
>>> DF[] <- as.numeric(DF[])
>>> DF looks like this:
>>> V1 V2
>>> 1 C 5
>>> 2 H 11
>>> 3 Br 1
>>> 4 O 1
>>> Statistics & Software Consulting
>>> GKX Group, GKX Associates Inc.
>>> tel: 1-877-GKX-GROUP
>>> email: ggrothendieck at gmail.com
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> Spencer Graves, PE, PhD
> President and Chief Operating Officer
> Structure Inspection and Monitoring, Inc.
> 751 Emerson Ct.
> San José, CA 95126
> ph: 408-655-4567
More information about the R-help