Self-Validating Ensemble Modeling (SVEM)

Frequently Asked Questions

 

We provide answers to common Frequently Asked Questions (FAQs) about Self-Validating Ensemble Modeling (SVEM) to help you understand more about SVEM and how it’s used by engineers, scientists, and researchers.

If you have a question about SVEM that isn’t listed below, please send an email to info [at] predictum.com and we’ll ask our subject matter experts to help you.

Categories

General FAQs
FAQs on Space-Filling Designs using SVEM

 

General FAQs

What is SVEM?

SVEM stands for Self-Validating Ensemble Modeling. Predictum’s SVEM application uses a novel, model-fitting algorithm that runs using JMP software and leverages bootstrapping to construct an ensemble model.

The intermediate bootstrapped models use anti-correlated fractional weights to emulate partitioning into training and validation sets. SVEM can fit ensemble models based on neural net fits using JMP standard software.  

Can I use SVEM with observational (i.e. non-designed) data?

Yes. As long as there is sufficient coverage of the space of interest, SVEM can reveal insights buried in observational data. In fact, you can combine observations and design of experiments data for analysis with SVEM. 

Is SVEM a technique for designing experiments?

No, SVEM’s methodology is used to fit predictive models. Its distinctive capability is that it can apply machine-learning and cross-validation techniques to small data setsSVEM can fit ensembles of highly flexible neural net models, as well as ensembles of traditional linear and generalized regression linear models (the latter capability requires JMP Pro). As such, SVEM is an ideal tool for building predictive models for the typically small data sets that result from designed experiments. 

How does SVEM compare to the classical techniques (polynomial regression models) typically used to model designed experiments?

SVEM provides stable predictive models (see the published research paper) that are more flexible than the traditional polynomial models. SVEM can model nonlinear behavior using neural net techniques. Notably, SVEM allows the use of space-filling designs.

Polynomial models impose limited curvature between points, which may not represent reality. Because they are based on neural networks, SVEM neural models can capture features in the design space that quadratic models smooth over. Additional features with JMP Pro include advanced options for fitting neural nets and the ability to fit generalized regression models.

Classical optimal experiments are optimal with respect to the intended model. These designs require that you specify which interactions, quadratic and other effects are of interest in the experiment. SVEM does not require that you specify a model in advance.

Are there any research papers that support the validity of the SVEM approach?

Yes, several research papers have been recently published with different industry applications.

The following conference presentation is also available from JMP Software’s user community, which requires a user account to access: 

P.J. Ramsey and C. Gotwalt, “Model Validation Strategies for Designed Experiments Using Bootstrapping Techniques with Applications to Biopharmaceuticals” (PowerPoint presentation, JMP Discovery Summit Americas, October 25, 2018).  

 

Do I need JMP standard or JMP Pro to run SVEM? What are SVEM’s system requirements?

SVEM is required to be run with JMP software, but the system requirements are different depending on which pathway you choose to use and which version of JMP software you have installed.

JMP Pro software contains more extensive functionality and analytical capabilities than JMP standard.  However, to support the ability to run SVEM, JMP standard enables you to fit Neural Net ensemble models, which are the models we find most useful for experimental and observational data. JMP Pro allows you to use SVEM to fit Generalized Regression Models, and provides additional options for fitting Neural Net models.

Which types of experimental designs are the most useful if I fit models using SVEM?

We recommend using a space-filling design to fit models using SVEM. A space-filling design distributes a specified number of design points over a design region with as much coverage as possible. They do not assume any underlying model; they simply try to cover the entire design region.  

A space-filling design also provides good coverage of the entire design region, thereby enabling you to fit interaction effects and complex or nonlinear behaviors that would otherwise be overlooked using a traditional optimal design. 

Using the neural net pathway, a space-filling design leverages SVEM’s ability to fit flexible models.

Here is a three-factor example of a space-filling design. Notice that the design is trying to spread 25 points over the entire region.

 

Does SVEM work with nominal factors?

SVEM works most effectively with continuous factors, though it can handle a blend of nominal and continuous factors. 

Can I use SVEM with factorial and fractional-factorial designs?

Yes, but keep in mind that factorial design combinations tend to be on the corners of the design region with perhaps a few center points. These types of designs leave a lot of the design region without coverage. The inherent limitations of classical factorial and fractional-factorial designs would hinder SVEM’s ability to fix flexible models.

Can I use definitive screening designs or response surface designs with SVEM?

Definitive screening designs and I-optimal designs can be useful because they have at least three levels; therefore, they cover more of the design space than fractional factorial designs. But they assume underlying models, such as the full quadratic.  We generally recommend using spacefilling designs instead. 

Do I need blocking when running experiments across several days?

Blocking is a technique used in explanatory studies to remove variation that would otherwise obscure the effects of other factors or interactions. SVEM is designed to build predictive models and is not a technique to be used for hypothesis testing. If you are interested in prediction and if you have a variable, such as Day, Machine, or Supplier, that might explain a significant amount of variation, then you should obtain data across levels of this variable and include it in your SVEM model. 

 

Traditional experiments require complete randomization. Where randomization can’t be assured, split-plot designs are required. Is that the case with SVEM?

If your goal is to build a predictive model, restricted randomization should not be a concern, no matter which modeling technique you choose to use. You can develop a predictive model using your data as it stands. SVEM is an excellent technique for developing your predictive model. 

Split-plot designs are used when certain factors are difficult to change while other factors are easy to change. The typical examples come from agriculture, where you can only apply some factors (called whole-plot factors), such as irrigation or fertilizer, to large areas. But within those areas, you can vary other factors (called split-plot factors), such as seed variety. The fact that randomization is restricted by the whole plots affects the independence of measurements, and this impacts hypothesis tests. 

If your goal is prediction, these hypothesis tests are not of interest to you. In the above example for agriculture, to predict which seed variety provides the best yield under various conditions, you can simply construct a predictive model based on your data. It is true that you might observe more variation in your whole plot factors, if you were to include more plots. However, you can model the data that you have, and still develop a useful prediction model.

FAQs on Space-Filling Designs using SVEM

If I use a space-filling design, do I need to worry about setting/resetting factors between runs?

Traditional experiments require the setting and resetting of factor levels between all runs. For example, two successive runs, each with a Temperature parameter value set at 50°C, require that the value be reset between the two runs. More specifically, The Temperature value must be taken away from 50°C for a period of time and then returned to ensure that the new measurement reflects the variation in the process.  

With space-filling designs, the design points are randomly placed within the space. It is unlikely that that any column will have the same values for successive points. However, we do suggest resetting factor levels between successive observations, if possible. This practice allows measurements to reflect variation that is inherent in the process. 

How many points should I include in my space-filling design?

The ideal design size for a prediction problem is a largely unexplored area of designed experiments.  As a general guideline for space-filling designs, two times the number of factors seems to work well in terms of estimating good predictive models. To date, there has been no compelling research on this topic.  

Traditional sample size calculations in design of experiments were always suspect to begin with, and were focused on hypothesis testing and power, which are irrelevant to prediction. In other words, it is not merely the number of runs, but how they are distributed that matters more significantly. No criterion for optimality in Design of Experiments directly addresses this issue at present.  

Space-filling designs don’t intentionally place design points on the boundary of the design region. Is this a problem?

We don’t see this as an issue. Space-filling designs place design points in a way that they cover the entire design space. This allows you to develop a predictive model that works well over the entire design space, and these models usually extend well up to the design space boundaries.  

Of the different space-filling methods (such as Sphere Packing, Maximum Entropy), which are the best?

We recommend the Fast Flexible Filling method with the Max Pro criterion, which is the default criterion in JMP software. This method attempts to distribute points throughout the design space by maximizing the distances between potential design points.