Skip to main content

How Many Design Points Do I Need for My DOE?

This is an often-asked question.  In the traditional DOE setting, the required number of design points is driven by the model that you plan to fit. 

More specifically, you choose a polynomial model that you think describes the form of the true response surface, and the minimal number of design points is the number needed to estimate the coefficients of that polynomial model.

Marie Gaudard is a Senior Data Scientist at Predictum. She specializes in predictive modeling, design of experiments, statistics, and machine learning. She consults extensively across industries.

A Simple Example

Let’s take a simple example of a two-factor design. The factors are X1 and X2 and the response is Y.

You decide to fit a response surface model using a 10-run face-centered central composite design. This design has two center points (rows 5 and 7). 

A table with data for a 2-factor design, listing X1 and X2 factors.

Here is a plot of your design points: 

Note that all of your design points are located on the boundary or at the center of the design region. This is a great design if your response surface is, in fact, quadratic.   

But what if the true response surface is not quadratic?  What if this is the true response surface: 

Image of a true response surface

If this is the true response surface, you have selected very good design points for fitting the wrong model!   

Your Fitted Model

Let’s assume that there is no noise in the data, so that the response values at your ten design points are the actual values given by the response surface.  This figure shows the design points and their values on the true response surface. 

An image of a model showing design points and their values on the true response surface.

Given the location of the design points – all on the boundary or at the center of the design region the resulting fitted quadratic response surface looks like this: 

“That’s not quadratic,” you say.  But it is!  The linear terms dominate the fitted model.  Here is the equation for the model:

Y = 40 - 12.072*X1 + 13.333*X2 + 0.300*X1*X2 - 0.300*X1*X1 - 0.500*X2*X2

Let’s compare the fitted model to the true response function.  The true response function is green, the fitted model is blue. 

If you are trying to find settings to match a target, or to maximize or minimize your response, the conclusions that you draw from your fitted model will be erroneous, and likely seriously so.

“How can I do better?” 

  1. Don’t assume that you know the shape of the response functionMore and more, we find that response functions are complexDefault to the assumption that it has a complex shape
  2. Fit a flexible model that can capture that complex shape. We find that neural nets provide flexible fits. However, to use neural nets with small data sets requires an innovative methodology called SVEM (Self-Validating Ensemble Modeling).
  3. Use design settings that cover the space essentially uniformly.  We call such a design a space-filling design.

In our example, we used ten design pointsHere is a space-filling design consisting of ten design points: 

Note that this design gives much better coverage of the design space and supports the fitting of flexible models that reflect the nuances of the true response surface. 

The fit using SVEM with neural nets is shown below.  Note how well the SVEM model fits the true response surface values at the design settings.  

The figure below shows the SVEM fit (blue) and the true response surface (green)Note that the SVEM fit is very close to the true response surface, especially in the interior of the design region. 

“But how many design points?” 

So back to the original question of how many design points are needed.  The answer depends on the complexity of the true response surface and the type of model-fitting algorithm that you use. 

We advocate the use of space-filling designs with the SVEM modeling technique.  In future posts, we will talk more about SVEM and the recommended number of design points, but generally, SVEM combined with space-filling designs requires significantly fewer runs than do classical approaches. 

For over 25 years, Predictum has enabled companies to achieve higher levels of productivity, operational improvement and innovation, and realize significant savings in cost, materials, and time. Our team of engineers, data scientists, statisticians, and programmers leverages deep expertise across various industries to provide our clients with unique solutions and services that transform data into insightful discoveries in engineering, science, and research. To get in touch with our team, visit

to the Mailing List

and stay connected to our news

    More Articles from Predictum

    Man working at a computer with the sun hitting his face.Analytical Systems and ToolsBlog
    March 24, 2023

    Three Data Analysis Bottlenecks Your Teams Should Automate

    IBM had a message that still resonates today: “Machines should work, people should think.” You can see a short excerpt of this message in “Paperwork Explosion,” a short film directed by…
    A researcher in a laboratory looks into a microscope.BlogStatistical Techniques
    March 10, 2023

    How Many Design Points Do I Need for My DOE?

    How Many Design Points Do I Need for My DOE? This is an often-asked question.  In the traditional DOE setting, the required number of design points is driven by the…
    A bald eagle glides over a hilly winter countryside.BlogNews
    December 30, 2022

    Reflections on 2022

    Looking back on 2022 is like looking back on the past, 3 years since the pandemic began. And there is, for me at least, one major lesson: the need to…
    Photo of a bioreactor in a laboratory.Analytical Systems and ToolsBlogNewsStatistical Techniques
    December 12, 2022

    Biotech Success: How SVEM Machine Learning Improves Yield in Bioreactor Research

    It’s no surprise that complex bioreactor platforms rely upon equally sophisticated data analysis methods.   The problem is that linear regression has become the de-facto method of modeling bioreactor factors and…
    Analytical Systems and ToolsBlog
    November 21, 2022

    Six Development Practices You Need to Code Effectively in JSL 

    If you manage a custom JMP application, you may not have a DevOps team that codes in JMP Scripting Language (JSL). But that shouldn’t stop you from harnessing the most…
    November 11, 2022

    Client Success! Automating Control Charts: from Device to Visualized Data

    Extracting data from a report is tedious and error prone, no matter your industry. Recently, Predictum’s DevOps team partnered with an electronic device manufacturer who needed an easy way to…
    Register for "Mixture Experiments Using Machine Learning: 'A How-to Approach'"