PeakLab v1 Documentation Contents AIST Software Home AIST Software Support

GLM Review

The GLM Review consists of a set of graphical and numerical windows which cover the following:

Main Window Graph of Model/Residuals

If a prediction data matrix has not been imported using the Prediction button, the main graph window will consist of an upper graph containing both a Y and Y2 axis. The Y data vs Y estimate graph of the model data will be plotted on the Y axis along with a linear fit and prediction/confidence intervals of the x=Y data, y = Y estimated, y=a+bx linear fit. The residuals as Y estimated - Y data will be in the Y2 plot. The traditional definition of the Residuals are reversed so that an estimated value greater than the data value that appears above the fitted line also appears as a positive residual above the zero line. The residuals also include a y=a+bx linear fit. By default, the upper graph's background will reflect the quintiles of the Y-data.

Main Window Graph of Model/Prediction

If a prediction data matrix has been imported using the Prediction button, the main graph window will consist of an upper graph with the prediction in the Y2 plot and the model fit in the Y plot. The prediction plot is analogous to the model plot, except the data consists of the imported prediction matrix data and the fit is the done using this data and its predicted estimates. The default graph titles will contain one additional line of information specific to the prediction.

Main Window Graph of Significance of Predictors

The lower graph consists of the significance of this specific model (the t-value and location of each of the predictors). This is overlaid with the signed significance plot for the all retained models of this same predictor count. This reference consists of the sum of the t-values at each wavelength for the retained models of that predictor count, normalized by the total count of predictors across these same retained models. For this reference, a wavelength that appears as a strong predictor in many retained models will have a large magnitude significance peak. A significance of zero means that the wavelength did not appear in any of the retained models of that specific predictor count.

One advantage of the full permutation modeling is that a large count of near-optimal fitted models provide a map of the wavelengths/frequencies that furnish effective predictions as well as their significance. In the above plot, the six wavelengths in the model strongly map to the wavelengths and average significance from the one-hundred retained 6-parameter models. This six-predictor model is deemed strongly 'compliant' with the overall body of effective six-predictor models. PeakLab defines a compliance metric which can be used to sort models of a specific predictor count. This metric is described in the GLM Model List topic.

By selecting only certain predictor counts in the **Filter** menu of the **Model List**, you can
use the **Display the Average Significance Map for all Counts of Predictors** right click menu option
in the main graph to plot all of the significance for all of the model counts currently included in the
list. The following is the lower significance plot for the 5,6,7,8 predictor counts. Note the overall
consistency for these average significances and locations across the predictor counts:

These plots are automatically updated each time a different model is selected for review.

The GLM Review is designed to have as much information at your fingertips as your screen real-estate permits. You will likely find a decided efficiency benefit if you have a large 4K monitor, or multiple monitors where all of the different windows can be simultaneously displayed and refreshed each time a different model is selected in the Model List.

Model List of All Retained Models from the GLM/Stepwise Fitting

The Model
List is always displayed by default. The **List** menu has a **Keep List after Selection** item
which can be turned off to hide the list when reviewing a specific fit. If you use this, you will need
to click the **Model List** button in the main window to redisplay the models. This window is a large
selection list. Simply select the model you wish to review.

You can choose from a large set of prediction metrics in the list using the **List** menu. You can
also choose to display all or just the default metrics. You can also select the font and its size as well
as whether or not the list is displayed in color. Values which improve with increasing value are shown
in red, such as rē, and those which improve with decreasing value are shown in blue, such as the different
estimates of error.

The **Filter** menu allows you to individually toggle the different predictor counts on and off. You
can also limit the list to just the GLM fits, or to just the stepwise fits.

Numeric Summary of the Currently Selected Fit

The **Numeric** button in the GLM Review opens a window containing the Numeric
Summary for the fit of the model currently selected. This will always be an RTF (rich text format)
window where individual content can be highlighted and copied. This summary is automatically updated each
time a different model is selected in the Model
List.

Data-Residuals Summary of the Currently Selected Fit

The **Data** button in the GLM Review opens a Data
Summary window containing the sample by sample error information for the fit of the model currently
selected. This will be an RTF (rich text format) window where individual content can be highlighted and
copied, provided the data size is less than 2048. For larger sets, a much faster display procedure is
used for accommodating the display of up to the program's N=50,000 built-in data limit. This summary is
automatically updated each time a different model is selected in the Model
List.

Prediction Summary of Currently Selected Fit Using Separately Imported Predicted Data

The **Prediction** button in the GLM Review first opens a file selection window where you must specify
a file containing the data matrix that will be used to evaluate the prediction accuracy of model. The
prediction data will usually be out-of-sample data (that which was nowhere used in the design model),
but it can be any data matrix, including the data that was used for the original modeling.

You can only access the Prediction procedure from the GLM Review. This is opened at the conclusion of any current fit, or in loading any saved fit (these are binary files with BIN extensions).

The prediction data's X-predictor names or WL/Wn values must match those in the design data. The prediction
data's Y value column must be specified if its name doesn't match with any of the column names/identifiers
in the design data matrix file. If the Y data does not exist, you must check **No Y data is available**.

Once the prediction data has been loaded, a Prediction Summary window is opened containing Prediction statistics and the sample by sample prediction information for the model currently selected and graphed in the GLM Review. This will be an RTF (rich text format) window where individual content can be highlighted and copied, provided the size of the predicted data consists of less less than 2048 data samples/spectra. For larger sets, a much faster display procedure is used for accommodating the display of up to the program's N=50,000 built-in data limit.

The purpose of the GLM Prediction procedure is to evaluate the predictive accuracy of individual fitted models, and possibly to see the prediction statistics of the original design data when possible data outliers are omitted post-fitting by one or both of two different methods.

Significance Graph of All Retained Models at each Predictor Count

The **Significance** button in the GLM Review opens a Significance
plot of the overall predictors in the model set. This is a comprehensive visualization of the whole of
the fitting, and includes the full permutation GLM models, and if fitted, the smart stepwise models, and
the sparse PLS models. This global significance plot is unsigned, meaning that |t| is used in the sums
of the significance values. Because the absolute value is used, this global visualization does not give
you a picture of whether or not a predictor adds to or subtracts from the prediction estimate, only which
of the x-predictors or wavelengths are important.

The significance plot in the lower graph of the GLM Review shows a signed significance for the current
count of predictors in the presently selected model. The reference in this 2D plot is a sum of the signed
t-values at each predictor (x-value or wavelength) across all retained models having the predictor count
of the currently selected model. This sum is normalized to the total count of predictors across those
same retained models. In the Review's lower plot, you can use the **Display the Average Significance
Map for all Counts of Predictors** right click menu option in the main graph to plot the signed significance
for all of the model counts currently included in the model list.