[R] Parsing a Simple Chemical Formula
Bryan Hanson
hanson at depauw.edu
Mon Dec 27 04:36:49 CET 2010
Hi David & others...
I did find the function you recommended, plus, it's even easier (but a
little hidden in the doc): >element(form, "mass"). But, this uses the
atomic masses from the periodic table, which are weighted averages of
the isotopes of each element. What I'm doing actually involves mass
spectrometry, so I need the isotope masses, which are integers (think
12C, 13C, 14C, but the periodic table says 12.011 reflecting the
relative abundances). I used Gabor's solution and got my little
function humming. Plus, I have several things to read through from
the various recommendations.
Thanks again, Bryan
On Dec 26, 2010, at 10:21 PM, David Winsemius wrote:
>
> On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:
>
>> Thanks Spencer, I'll definitely have a look at this package and
>> it's vignettes. I believe I have looked at it before, but didn't
>> catch it on this particular search. Bryan
>
> Using the thermo list that the makeup function accesses to get its
> valid atomic symbols one can arrive at the the answer you posited
> would be too difficult in you first posting, the atomic weight from
> the formulae:
>
> > str(thermo$element)
> 'data.frame': 130 obs. of 6 variables:
> $ element: chr "Z" "O" "H" "He" ...
> $ state : chr "aq" "gas" "gas" "gas" ...
> $ source : chr "CWM89" "CWM89" "CWM89" "CWM89" ...
> $ mass : num 0 16 1.01 4 20.18 ...
> $ s : num -15.6 49 31.2 30.2 35 ...
> $ n : int 1 2 2 1 1 1 1 1 2 2 ...
>
> patts <- paste("^", rownames(makeup(form)), "$", sep="")
> makuform<- makeup(form)
> makuform$amass <- sapply(patts, function(x) {return( thermo
> $element[ grep(x, thermo$element[[1]])[1], "mass"])} )
> sum(makuform$amass *makuform$count)
> # [1] 167.0457
>
>>
>> On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:
>>
>>> p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes.
>>> I have not looked at these vignettes, but most vignettes provide
>>> excellent introductions (though rarely with complete coverage) of
>>> important capabilities of the package. (The 'sos' package
>>> includes a vignette, which exposes more capabilities than the
>>> example below.)
>>>
>>>
>>> ######################
>>> Have you considered the 'CHNOSZ' package?
>>>
>>>
>>>> makeup("C5H11BrO" )
>>> count
>>> C 5
>>> H 11
>>> Br 1
>>> O 1
>>>
>>>
>>> I found this using the 'sos' package as follows:
>>>
>>>
>>> library(sos)
>>> cf <- ???'chemical formula'
>>> found 21 matches; retrieving 2 pages
>>> cf
>>>
>>>
>>> The print method for "cf" opened the results in a web browser,
>>> which showed that the "CHNOSZ" package had 14 of these 11 matches,
>>> and the other 7 were in 7 different packages. Moreover, the
>>> "CHNOSZ" package is devoted to "Chemical Thermodynamics and
>>> Activity Diagrams" and provides many more capabilities that might
>>> interest you.
>>>
>>>
>>> Hope this helps.
>>> Spencer
>>>
>>>
>>> On 12/26/2010 5:01 PM, Bryan Hanson wrote:
>>>> Well let me just say thanks and WOW! Four great ideas, each
>>>> worthy of
>>>> study and I'll learn several things from each. Interestingly,
>>>> these
>>>> solutions seem more general and more compact than the solutions I
>>>> found on the 'net using python and perl. More evidence for the
>>>> power
>>>> of R! A big thanks to each of you! Bryan
>>>>
>>>> On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:
>>>>
>>>>> On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson
>>>>> <hanson at depauw.edu> wrote:
>>>>>> Hello R Folks...
>>>>>>
>>>>>> I've been looking around the 'net and I see many complex
>>>>>> solutions in
>>>>>> various languages to this question, but I have a pretty simple
>>>>>> need
>>>>>> (and I'm
>>>>>> not much good at regex). I want to use a chemical formula as a
>>>>>> function
>>>>>> argument. The formula would be in "Hill order" which is to
>>>>>> list C,
>>>>>> then H,
>>>>>> then all other elements in alphabetical order. My example will
>>>>>> have
>>>>>> only a
>>>>>> limited number of elements, few enough that one can search
>>>>>> directly
>>>>>> for each
>>>>>> element. So some examples would be C5H12, or C5H12O or C5H11BrO
>>>>>> (note that
>>>>>> for oxygen and bromine, O or Br, there is no following number
>>>>>> meaning a 1 is
>>>>>> implied).
>>>>>>
>>>>>> Let's say
>>>>>>
>>>>>>> form <- "C5H11BrO"
>>>>>>
>>>>>> I'd like to get the count of each element, so in this case I
>>>>>> need to
>>>>>> extract
>>>>>> C and 5, H and 11, Br and 1, O and 1 (I want to calculate the
>>>>>> molecular
>>>>>> weight by mulitplying). Sounds pretty simple, but my experiments
>>>>>> with grep
>>>>>> and strsplit don't immediately clue me into an obvious
>>>>>> solution. As
>>>>>> I said,
>>>>>> I don't need a general solution to the problem of calculating
>>>>>> molecular
>>>>>> weight from an arbitrary formula, that seems quite challenging,
>>>>>> just
>>>>>> a way
>>>>>> to convert "form" into a list or data frame which I can then do
>>>>>> the
>>>>>> math on.
>>>>>>
>>>>>> Here's hoping this is a simple issue for more experienced R
>>>>>> users!
>>>>>> TIA,
>>>>>
>>>>> This can be done by strapply in gsubfn. It matches the regular
>>>>> expression to the target string passing the back references (the
>>>>> parenthesized portions of the regular expression) through a
>>>>> specified
>>>>> function as successive arguments.
>>>>>
>>>>> Thus the first arg is form, your input string. The second arg
>>>>> is the
>>>>> regular expression which matches an upper case letter optionally
>>>>> followed by lower case letters and all that is optionally
>>>>> followed by
>>>>> digits. The third arg is a function shown in a formula
>>>>> representation. strapply passes the back references (i.e. the
>>>>> portions
>>>>> within parentheses) to the function as the two arguments. Finally
>>>>> simplify is another function in formula notation which turns the
>>>>> result into a matrix and then a data frame. Finally we make the
>>>>> second column of the data frame numeric.
>>>>>
>>>>> library(gsubfn)
>>>>>
>>>>> DF <- strapply(form,
>>>>> "([A-Z][a-z]*)(\\d*)",
>>>>> ~ c(..1, if (nchar(..2)) ..2 else 1),
>>>>> simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
>>>>> FALSE))
>>>>> DF[[2]] <- as.numeric(DF[[2]])
>>>>>
>>>>> DF looks like this:
>>>>>
>>>>>> DF
>>>>> V1 V2
>>>>> 1 C 5
>>>>> 2 H 11
>>>>> 3 Br 1
>>>>> 4 O 1
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Statistics & Software Consulting
>>>>> GKX Group, GKX Associates Inc.
>>>>> tel: 1-877-GKX-GROUP
>>>>> email: ggrothendieck at gmail.com
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>> --
>>> Spencer Graves, PE, PhD
>>> President and Chief Operating Officer
>>> Structure Inspection and Monitoring, Inc.
>>> 751 Emerson Ct.
>>> San José, CA 95126
>>> ph: 408-655-4567
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
More information about the R-help
mailing list