## Main function `CorMID`

### Idea

The problem in GC-APCI-MS that we try to overcome is the formation of
fragments forming superimposed MIDs. The ones we identified so far are
[M+H], [M+], [M+H]-H_{2} and
[M+H]+H_{2}O-CH_{4}. If we assume [M+H] to be generally
the most abundant and hence use it as our fix point (base MID, shift =
0), than we observe superimposed MIDs starting at -2, -1 and +2 relative
to [M+H] for [M+], [M+H]-H_{2} and
[M+H]+H_{2}O-CH_{4} respectively.

The basic idea of the correction is that we measure a
superimposed/composite MID of one to several fragments all derived from
the same base MID. This base MID (or correct MID, *corMID*) is
exactly what we are looking for. Estimating the *corMID* is
complicated because we do not know the distribution of fragments,
*i.e.* the amount of the individually occurring fragments or
their ratios to each other respectively. Hence, we have to estimate the
*corMID* and the ratio vector *r* which in combination fit
our measurement best.

### Example

Lets start with an artificial Glucose spectrum where 10% is M6 labeled:

```
fml <- "C21Si5"
td1 <- CalcTheoreticalMDV(fml = fml, nbio = 6, nmz = 8)
bMID <- c(0.9, rep(0, 5), 0.1)
md1 <- apply(td1*bMID, 2, sum)
round(md1, 4)
#> M+0 M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8
#> 0.4791 0.2305 0.1321 0.0418 0.0130 0.0029 0.0607 0.0250 0.0148
```

**md1** represents the measured isotopologue
distribution which is equivalent to the vector of measured intensity
values normalized to the vector sum. Please note that the measured MID
contains additional peaks at M+7 and M+8, caused by the natural abundant
isotopes of carbon atoms attached during derivatization. Now we may use
function `CorMID`

to disentangle this vector.

```
CorMID(int=md1, fml=fml, r=unlist(list("M+H"=1)))
#> [class] 'CorMID'
#> MID [%] (estimated)
#> M0 M1 M2 M3 M4 M5 M6
#> 89.84 00.00 00.00 00.00 00.00 00.00 10.16
#> [attr] 'r' (fixed)
#> M+H
#> 1.00
#> [attr] 'err'
#> 0.002707
```

Notice, that we allowed only [M+H] to be present in option
*r*. The result is a labeled vector representing the corrected
MID (or base MID) and attributes providing information on the fitting
error *err* and the parameters *r*atio,
*ratio_status* and *mid_status* as used in the function
call. Please note that during the function call *mid* was
estimated and *r*atio was fixed.

We could achieve something similar testing for all currently defined
fragments by omitting the *r* option:

```
CorMID(int=md1, fml=fml)
#> [class] 'CorMID'
#> MID [%] (estimated)
#> M0 M1 M2 M3 M4 M5 M6
#> 89.84 00.00 00.00 00.00 00.00 00.00 10.16
#> [attr] 'r' (estimated)
#> M+H M+ M-H M+H2O-CH4
#> 1.00 0.00 0.00 0.00
#> [attr] 'err'
#> 0.002707
```

Here, we essentially get the same result as before (except for
*ratio* related attributes) because there is no superimposition
in our test data. *ratio* was estimated and other possible
adducts were tested but found to be of zero presence. Now lets generate
more difficult composite data **md2** to be fit by
including a 20% proton loss (or “[M+]” or “M-1”, respectively) on top of
**md1**.

```
md2 <- unlist(list("M-1" = 0, 0.8*md1)) + c(0.2*md1, 0)
round(md2, 4)
#> M-1 M+0 M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8
#> 0.0958 0.4294 0.2108 0.1140 0.0360 0.0109 0.0145 0.0536 0.0230 0.0119
```

We could have done the same with the convenience function
*recMID*:

```
fml <- "C21Si5"
bMID <- c(0.9, rep(0, 5), 0.1)
r <- list("M+H" = 0.8, "M+" = 0.2)
rMID <- CorMID::recMID(mid = bMID, r = r, fml = fml)
round(rMID, 4)
#> M-1 M+0 M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8
#> 0.0963 0.4314 0.2118 0.1146 0.0362 0.0110 0.0139 0.0510 0.0219 0.0121
#> attr(,"class")
#> [1] "recMID"
plot(rMID, ylim=c(0,0.45))
```

and let `CorMID`

decompose this back…

```
CorMID(int=md2, fml=fml)
#> [class] 'CorMID'
#> MID [%] (estimated)
#> M0 M1 M2 M3 M4 M5 M6
#> 89.84 00.00 00.00 00.00 00.00 00.00 10.16
#> [attr] 'r' (estimated)
#> M+H M+ M-H M+H2O-CH4
#> 0.80 0.20 0.00 0.00
#> [attr] 'err'
#> 0.002434
```

which is pretty close to the truth. :)

### More Function Details

Finally, let’s look into the mathematical details of the function.
Apart from some sanity checks and data preparation steps done by the
wrapper function `CorMID`

, the main idea is to model a
theoretical distribution based on a provided sum formula and fit a base
MID and fragment ratios according to measurement data by function
`FitMID`

which is discussed in the following. The approach is
brute force using two nested estimators for *r* and
*corMID* separately. It builds on the idea to test a crude grid
of parameters first, identify the best solution and use an iterative
method minimizing the grid to approach the true value.

The grid is set by an internal function `poss_local`

.
Basically, if we have a two carbon molecule we expect a *corMID*
of length=3 {M0, M1, M2}. Let’s assume that *corMID* = {0.9, 0,
0.1}. Using a wide grid (step size d= 0.5) we would than test the
following possibilities:

```
CorMID:::poss_local(vec=c(0.5,0.5,0.5), d=0.5, length.out=3)
#> Var1 Var2 Var3
#> 1 1.0 0.0 0.0
#> 2 0.5 0.5 0.0
#> 3 0.0 1.0 0.0
#> 4 0.5 0.0 0.5
#> 5 0.0 0.5 0.5
#> 6 0.0 0.0 1.0
```

and identify {1, 0, 0} as best match after subjecting to a testing function. Taking the best match as our new starting point, we decrease the step size of the grid by 50% and test in the next iteration:

```
CorMID:::poss_local(vec=c(1,0,0), d=0.25, length.out=3)
#> Var1 Var2 Var3
#> 1 1.000 0.000 0.000
#> 2 0.875 0.125 0.000
#> 3 0.750 0.250 0.000
#> 4 0.875 0.000 0.125
#> 5 0.750 0.125 0.125
#> 6 0.750 0.000 0.250
```

and will get closer to the truth and find {0.875, 0, 0.125} to give the lowest error.

In summary, using this approach we can approximate the optimal
vectors of *corMID* and *r* in a finite number of
iterations to reach a desired precision <0.1%. We can nest MID
fitting inside ratio fitting and thereby do both in parallel.

The error function currently employed is simply the square root of
the summed squared errors, comparing the provided measurement data and a
reconstructed MID based on a specific *corMID* and
*r*.