PeakLab v1 Documentation Contents AIST Software Home AIST Software Support

White Paper: Part I - Generalized Chromatographic Models

Part I - Generalized Chromatographic Models

In this first white paper, we describe our discovery of an equivalence in chromatographic peak shapes relative to the concentration dependency in the HVL ((Haarhoff-Van der Linde) and Wade-Thomas NLC theoretical models. These are the two models we have found to be the far and away the most successful in modeling the nonlinear shapes that occur with chromatographic peaks. We then use this equivalence to develop generalized models which are capable of fitting the higher moments in chromatographic peaks, allowing high accuracy fits of LC, GC, HPLC and ultraHPLC (with and without gradients), and preparative or high overload,peaks. In this initial white paper, we cover the generalization of the models that describe the actual chromatographic separations as such applies to analytical peaks where only a third moment generalization is required.

In part II, we will address accounting the additional nonidealities in the chromatographic flow and detection systems. By adding an instrument response function in fitting a convolution model to chromatographic data, we will demonstrate analytic fits with less than 10 ppm least-squares error, and in certain instances, fit errors as low as 1 ppm. By using Fourier methods in the fitting, we will illustrate the performance to be suitable for routine analysis of chromatographic peaks.

In part III, we will specifically address gradient HPLC separations and the additional steps that must be taken to successfully estimate the gradient strength in the chromatographic modeling, this on a peak-by-peak basis. In covering gradient GHPLC fits, we will address twice-generalized chromatographic models which also address fourth moment adjustments.

In part IV, we will address the additional challenges of modeling overload shapes arising from preparative chromatography, estimating the peak shapes that would have been generated had the column had infinite capacity and no overload had taken place.

Peak Shapes in Chromatography

If a chromatographic peak is 'fronted', there is a progression in the strength of this fronting as concentration increases:

This plot of real-world data covers two orders of magnitude of concentration. The peaks are normalized to unit area. Note that the peaks at very low concentration show little apparent fronting, and even an unusual and appreciable tailing. The peaks show an increasingly right triangular fronted shape as concentration increases.

Similarly, if a chromatographic peak is 'tailed', there is a progression in the strength of this tailing as concentration increases:

In the case of fronting, the higher concentration produces a later peak apex. In a tailed peak, the higher concentration produces an earlier apex.

It is this shape dependency with concentration that sets true chromatographic models apart from other density models. You can double the concetration, as in the progression from the green to blue peaks, and see appreciably different shapes.

The HVL Chromatographic Model

The Haarhoff-Van der Linde (“HVL”) gas chromatography model is defined as follows:

(1)

If we look at the HVL as a statistical model absent its theoretical derivations, and in a form where the
area is an adjustable parameter, we have a four parameter function. In this form, a_{0} is the
peak area, a_{1} is the center or location value, a_{2} is the peak width or scale parameter,
and a_{3} is the shape parameter, positive for a right-skewed asymmetry, negative for left-skewed.
We have labeled the variables in the HVL model so that adjustable parameters a_{0}-a_{3}
correspond to moments 0-3.

The HVL produces a theoretical diffusion width, originally seen as applicable to GC, and was derived using adsorption isotherm arguments.

When the a_{3} distortion parameter is negative, the peaks are fronted; when positive, the peaks
are tailed. In the HVL model, the a_{3} distortion is additionally scaled by a_{1}/a_{2}.
The a_{3} values were adjusted for the a_{1} locations in the plot above to produce a
mirroring, identical measures of fronting and tailing at the two different locations.

Note also the obvious issue with using an apex value for location when concentrations of a given solute can significantly vary. In this plot, the five fronted shapes would all fit to a 4.0 location and all five tailed shapes would fit to a 6.0 location. The same kind of issue applies to using a FWHM as a surrogate for the peak's second moment. In this case, all ten of the peaks in the plot will fit to a single .125 width, the SD of the underlying Gaussian when no distortion is present, the width at the limit of infinite dilution.

The NLC Chromatographic Model

The Wade-Thomas non-linear liquid chromatography model (“NLC”) is defined as follows:

(2)

Where TFn is a modified Bessel function integral:

(3)

When the area is an adjustable parameter, the NLC is also a four parameter function. As with the HVL,
a_{0} is the peak area and a_{1} is the center or location value, and a_{3} is
the distortion parameter, positive for the right-skewed asymmetry of a tailed peak and negative for the
left-skewed asymmetry of a fronted peak. This NLC parameterization uses a time constant instead of a rate
constant for the a_{2} kinetic parameter given in the original publication of the Wade-Thomas
NLC model. As such, the NLC a_{2} is similar to the HVL a_{2}, a scale parameter that
increases with the peak width.

The NLC produces a kinetic time constant, derived for LC when slow kinetics of adsorption and desorption are present, or where mass-transfer can be modeled by first order kinetics.

As with the HVL, when the NLC's a_{3} distortion parameter is negative, the peaks are fronted,
and when positive, the peaks are tailed. In the NLC model, there is no additional scaling of the a_{3}
distortion. The non-mirror shapes with identical magnitude a_{3} values are from the asymmetry
in the underlying Giddings kinetic model which the NLC generates at the infinite dilution (zero concentration)
limit.

Here as well using the apex and FWHM values is fraught with error with respect to concentration independent
estimates of the location and broadening, or the first and second moments. All ten of these NLC shapes
fit to the same a_{2} time constant. The five fronted NLC shapes fit to a 4.0 a_{1} center
value and the five tailed shapes each fit to a 6.0 a_{1} value. These are the mean of the underlying
(zero distortion) Giddings density.

Note also that the 0.001 first order time constant value used in these plots represents exceptionally fast kinetics, and yet the shapes track the real-world data in the initial concentration plots.

The Generalized HVL Template

The HVL reduces to a Gaussian at infinite dilution. We will first generalize the HVL model, using the Gaussian or normal probability density function (PDF):

(4)

We also use the Gaussian or normal cumulative distribution function (CDF):

(5)

We also take note of the complement of the CDF, the reverse cumulative of the normal density, even though it is not used in the HVL:

(6)

We can now rewrite the HVL as a generalized template that accepts any zero-distortion density:

(7)

To regenerate the HVL, **Density** is replaced with (4), the normal PDF, and **Cumulative** with
(5), the normal CDF. Note that any replacement is always done with a matched PDF-CDF (Density-Cumulative)
pair.

The Generalized NLC Template

To create a generalized NLC model template, we use the Giddings density:

(8)

Here we take note of the Giddings cumulative, although it is not used in the NLC:

(9)

We also use the Giddings reverse cumulative:

(10)

The NLC generalized density template can then be expressed as follows:

(11)

Just as with the HVL template, we can create any number of NLC-based generalized models by inserting a matched density-cumulative pair other than the Giddings for the zero-distortion assumption.

To regenerate the NLC, **Density** is replaced with (8), the Giddings PDF, and **RevCumulative**
with (10), the Giddings CDF complement.

The Common Chromatographic Distortion Model

Despite different derivations across decades which targeted different types of chromatography, the generalized templates of the two models produce identical shapes for a given density-cumulative pair.

One can substitute the Gaussian PDF and CDF complement in the NLC template and exactly generate a shape that is exactly fitted by the HVL model. Similarly the Giddings PDF and CDF can be inserted into the HVL template to produce a shape that is exactly fitted by the Wade-Thomas NLC model.

Note that the a_{1} associated with the first moment, and the a_{2} associated with the
second moment, also appear in the templates, and while the a_{1} center values are comparable
(both represent the mean of the underlying ZDD), the a_{2}'s consist of immensely different representations
of the peak broadening, one a Gaussian diffusion width, the other a Giddings kinetic time constant associated
with adsorption-desorption.

Apart from the distortion scaling in the HVL and the use of the CDF in the HVL and the CDF complement in the NLC, the only difference between the HVL and NLC models is their zero-distortion density assumption. The HVL assumes a diffusion-based Gaussian, the simplest possible probabilistic density assumption. The NLC assumes a first order Giddings density, the simplest kinetic density assumption possible.

If you have long used both the HVL and NLC models in fitting chromatographic peaks, you were probably struck by the similarities in the fits. Part of this can be attributed to the similarity between the Gaussian and Giddings zero-distortion densities:

The Giddings density, the amber curve, is a slightly right-asymmetric peak as compared to the symmetric
Gaussian, the blue curve. This symmetry explains why the HVL produces mirrored shapes about a_{1}
with negative distortions, whereas the NLC produces different tailed and fronted shapes with the same
magnitude of the a_{3} distortion parameter.

Extending the HVL and NLC Generalized Templates to Fit Higher Moments in Chromatographic Peaks

The major drawback of the basic HVL and NLC models is that the higher moments are fixed by the Gaussian and Giddings zero-distortion assumptions. Any non-ideality in the chromatographic separation, such as multiple-site adsorptions in the kinetic model, or asymmetry in the diffusion model are not accommodated.

The HVL and NLC generalized templates allow for any density-cumulative pair to be used. The zero distortion density (ZDD) need not have fixed higher moments as locked in by the Gaussian or Giddings assumptions. To create generalized HVL and NLC models, all that is needed is to assume the ZDD is neither Gaussian or Giddings but a more complex density that allows for the third moment, the skewness, to be broadly adjustable. This is what we refer to as a once-generalized model, the addition of third moment or skewness adjustments. Only if one is addressing gradient HPLC or overloaded preparative shapes, is a twice-generalized model, one which also allows for adjustments in the kurtosis (fourth moment, fatness of tails), needed.

By reducing the generalization problem to the zero concentration limiting density, there is an immense simplification, one readily addressed by the statistical sciences. In order to create a once-generalized HVL or NLC, we can use any one of a number of generalized Gaussians where third and/or fourth moments are adjustable. The generalization problem is thus rendered the straightforward one of finding a ZDD which would readily fit HVL and NLC shapes as two families of curves determined by two specific values of a third moment skewness parameter. Given the unlimited possibilities of skewness, such a generalization would also model every chromatographic shape where a skewness was introduced into the infinite dilution density.

A major benefit of a once-generalized closed-form model is an immense simplification of the NLC shape. If the generalization can accurately replicate the Giddings shape, the need for the modified Bessel approximation, and the far more computationally demanding modified Bessel function integral, both of which make the computation of the NLC so onerous, will cease to exist. The NLC shapes will simply be one of the infinite families of shapes the generalized models can produce, the HVL another.

Generalized Default ZDD (One Higher Moment)

We can adopt the widely used asymmetric generalized normal as the density in the templates. This density is not defined at all x, but it is computationally easy to compute:

a_{0} = Area

a_{1} = Center (as mean of asymmetric peak)

a_{2} = Width (SD of underlying Gaussian)

a_{3} = Asymmetry ( fronted -1 > a_{3} > 1 tailed)

GenHVL - Default Generalized Normal ZDD

If we substitute this statistical PDF and its CDF into the HVL template for tailed shapes, and this same PDF and its CDF complement into the NLC for fronted shapes, we can construct the Generalized HVL model for chromatography:

a_{0} = Area

a_{1} = Center (as mean of asymmetric peak)

a_{2} = Width (SD of underlying Gaussian ZDD)

a_{3} = HVL Chromatographic distortion ( -1 > a_{3} > 1 )

a_{4} = ZDD asymmetry ( -1 > a_{4} > 1 )

Note that the a_{4} value controlling the skew of the GenHVL peak appears as a_{3} in
the ZDD nomenclature.

The once-generalized HVL model, and the once generalized NLC model produce identical shapes, and both
reproduce the HVL to full precision and the NLC to 6-8 digits precision. The GenHVL model reports a diffusion
width for a_{2} and a statistical asymmetry for the a_{4} parameter. The GenNLC differs
only in parameterization, reporting a first order kinetic time constant for a_{2} and an asymmetry
indexed to the Giddings/NLC for a_{4}.

An Example of Fitting the GenHVL to Real-World Data

Even though we have a generalized model for the chromatographic separation which accounts a third moment skewness in the infinite dilution density, we have not as yet accounted the real-world non-idealities in a chromatographic system. We will cover this in the next paper in this series. For this illustration, we will jump ahead somewhat and remove the IRF (instrument response function) prior to fitting the GenHVL to a real-world set of IC data standards containing a mix of appreciably fronted and tailed peaks. We thus remove the instrument and system distortions prior to fitting with a Fourier deconvolution procedure that uses values estimated in an IRF determination which quantifies the non-idealities in the flow path and detection.

One of the largest tradeoffs in chromatographic modeling is in using a low enough concentration to see mostly Gaussian peaks and still having a high enough S/N to get effective fits on all components of interest. If one has a model which is capable of managing distorted shapes and reporting true theoretical location and broadening values, independent of concentration, then one can fit the more distorted shapes in a higher concentration sample and benefit from the improved S/N in the data.

Despite the additional noise introduced by the Fourier deconvolution, this high S/N sample, with the higher concentration fronting and tailing, fit to just 11 ppm least squares error. The following analytical fits are for three different concentrations of the above standard.

"Cation Std 5.0ppm (without PDCA)"

Fitted Parameters

r^{2} Coef Det
DF Adj r^{2}
Fit Std Err
F-value
ppm uVar

0.99996468 0.99996464 0.00611751 23,348,983 35.3179206

Peak Type a0 a1 a2 a3 a4

1
GenHVL
2.39409195
__4.86629842__
__0.04836896__
-0.0028304
__0.01010560__

2 GenHVL 0.68483314 7.09399421 0.06635864 -0.0005339 0.01010560

3 GenHVL 0.79975294 8.27604890 0.07330294 -0.0003202 0.01010560

4 GenHVL 0.36554694 12.3963875 0.11414019 0.00043770 0.01010560

5 GenHVL 1.27705415 27.3145721 0.27360663 0.01257608 0.01010560

6
GenHVL
0.72539077
__34.1882845__
__0.33736125__
0.00969516
0.01010560

Measured Values

Peak Type Amplitude Center FWHM Asym50 FW Base Asym10

1
GenHVL
17.5378848
__4.96975730__
__0.12879519__
0.51112283
0.26013732
0.44708895

2 GenHVL 4.09186251 7.11573005 0.15730496 0.90616835 0.31466816 0.88988455

3 GenHVL 4.34018865 8.28935683 0.17315652 0.94978109 0.34642154 0.94378805

4 GenHVL 1.27903707 12.3757241 0.26840030 1.06764105 0.53777053 1.09299830

5 GenHVL 1.77534645 26.8452258 0.67323521 1.75253266 1.37786763 2.01359269

6
GenHVL
0.84270757
__33.8047084__
__0.80653519__
1.45452305
1.63450870
1.60664078

"Cation Std 10ppm (without PDCA)"

Fitted Parameters

r^{2} Coef Det
DF Adj r^{2}
Fit Std Err
F-value
ppm uVar

0.99998410 0.99998408 0.00756954 51,856,486 15.9026113

Peak Type a0 a1 a2 a3 a4

1
GenHVL
4.76595940
__4.84877877__
__0.04839938__
-0.0054830
__0.01425771__

2 GenHVL 1.36140311 7.08591237 0.06632691 -0.0010553 0.01425771

3 GenHVL 1.59490007 8.26994645 0.07297309 -0.0006439 0.01425771

4 GenHVL 0.72807999 12.3913245 0.11433678 0.00099639 0.01425771

5 GenHVL 2.53401578 27.3046832 0.27963325 0.02528927 0.01425771

6
GenHVL
1.45127017
__34.1888513__
__0.34552620__
0.01984923
0.01425771

Measured Values

Peak Type Amplitude Center FWHM Asym50 FW Base Asym10

1
GenHVL
29.5298652
__5.02552694__
__0.15434218__
0.33255295
0.30954552
0.28134430

2 GenHVL 8.02653978 7.12919908 0.15955211 0.81889325 0.31899002 0.78692972

3 GenHVL 8.63873996 8.29746430 0.17358139 0.89449898 0.34700759 0.87811423

4 GenHVL 2.54584420 12.3460701 0.26838043 1.14773956 0.53899929 1.20028957

5 GenHVL 3.17348807 26.4873173 0.74615841 2.57612679 1.56417775 3.16318080

6
GenHVL
1.57429789
__33.4774236__
__0.86111357__
1.96515093
1.77985289
2.31704802

"Cation Std 25ppm (without PDCA)"

Fitted Parameters

r^{2} Coef Det
DF Adj r^{2}
Fit Std Err
F-value
ppm uVar

0.99998833 0.99998832 0.01374504 70,683,285 11.6669314

Peak Type a0 a1 a2 a3 a4

1
GenHVL
11.8142696
__4.81659065__
__0.05245682__
-0.0132830
__0.01604070__

2 GenHVL 3.37933473 7.07533282 0.06774599 -0.0026710 0.01604070

3 GenHVL 3.95466006 8.26853142 0.07360840 -0.0017388 0.01604070

4 GenHVL 1.80133596 12.4132466 0.11490052 0.00268649 0.01604070

5 GenHVL 6.28678029 27.2937733 0.29227426 0.06326881 0.01604070

6
GenHVL
3.59687759
__34.2322923__
__0.36116313__
0.05079139
0.01604070

Measured Values

Peak Type Amplitude Center FWHM Asym50 FW Base Asym10

1
GenHVL
51.3969315
__5.13548306__
__0.22524705__
0.18947414
0.43971849
0.15866465

2 GenHVL 18.3887055 7.18040938 0.17331581 0.61719324 0.34737404 0.55997115

3 GenHVL 20.5719358 8.34339547 0.18101521 0.73016511 0.36202471 0.68469431

4 GenHVL 6.21391502 12.2981959 0.27141977 1.39865836 0.55027979 1.53839159

5 GenHVL 6.17339933 25.7952864 0.95570974 4.78387753 2.05567140 6.26432429

6
GenHVL
3.20405947
__32.8180426__
__1.05023079__
3.46401927
2.23464512
4.40839341** **

Additive-free cation standards were processed at 5, 10, and 25 ppm concentration. The analysis consists of strongly baseline-resolved peaks with highly fronted and tailed peaks at the higher concentrations. Here we answer why one would want to engage the extra effort to mathematically model chromatographic peaks. Let us assume it might be perfectly expected that a solute of interest would vary by a factor of five in its presence in a sample. If you look at the apex values for the first peak, you see that its values change from 4.970 to 5.125 with concentration. The last peak in the standard changes from 33.805 to 32.818. With respect to FWHM values the first peak varies from .129 to .225 and the last peak varies from .806 to 1.05 with the increasing concentration.

If we look at the a_{1} fitted values, the center of the infinite dilution Gaussian, we see close
to concentration independence. The first peak varies from 4.866 to 4.816, and the last from 34.188 to
34.232, across the 5x increase in concentration. If we look at the a_{2} fitted values, the standard
deviation of this infinite dilution Gaussian, the first peak varies from .0484 to .0524, and the last
from .337 to .361. The coefficient of variation for the a_{1} fitted values averages .15%. For
the a_{2} widths, it is 2.26%. By contrast, the CV for the center or apex values varies 1.06%
and the FWHM values vary 11.8%.

Fitting baseline resolved peaks does more than just remove concentration effects from location and broadening
estimates. The a_{3} parameter estimates the measure of fronting or tailing, and a_{3}/a_{0}
will actually offer a concentration independent estimate of the fronted or tailed distortion in any given
peak. The higher moment a_{4} parameter, the skewness in the infinite dilution generalized Gaussian,
increases with concentration in this example, something that would perhaps be expected if this parameter
were estimating the measure of additional site adsorptions or collision effects and those were nonlinear
with concentration. The various parametric estimates tell you much more about each peak, and further,
when fits are this accurate, these five fitted parameters can completely reconstruct each peak, as shown
in the lower portion of the fitted plot.

White Paper: Part II - Instrument Response Convolution Models

In this white paper, we looked only at the enhancements necessary to fully model the actual chromatographic separation, and only for analytic peaks where one higher third moment adjustment suffices to capture virtually all of the variance in the fitting. We have described a generalized HVL which is capable of fitting all HVL and NLC shapes as well as those of any other third moment asymmetry in the infinite dilution ZDD.

Fitting the third moment skewness in the infinite dilution ZDD is important, but for most chromatographic fitting, this removal of instrumental effects will typically be of greater significance in producing near zero-error fits. In the data above, we removed the instrumental/system distortions prior to fitting and realized fits with 35, 16, and 12 ppm error using this once-generalized HVL model. If the IRF is not pre-subtracted, and a pure HVL is fit to this same data, this forced Gaussian ZDD assumption and no modeling of the IRF results in much higher 2587, 2751, and 2414 ppm errors. To effectively fit chromatographic peaks, this higher moment generalization and an accounting of the instrumental distortions are both necessary.

In part II, we will describe the fitting of the real-world non-idealities in chromatographic data using convolution models.