(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-PBLTGFJ'); window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-ECTCKSMRLB'); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-564RQKZ'); window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-ECTCKSMRLB'); Skip to main content

How Many Design Points Do I Need for My DOE?

This is an often-asked question.  In the traditional DOE setting, the required number of design points is driven by the model that you plan to fit. 

More specifically, you choose a polynomial model that you think describes the form of the true response surface, and the minimal number of design points is the number needed to estimate the coefficients of that polynomial model.

Marie Gaudard is a Senior Data Scientist at Predictum. She specializes in predictive modeling, design of experiments, statistics, and machine learning. She consults extensively across industries.

A Simple Example

Let’s take a simple example of a two-factor design. The factors are X1 and X2 and the response is Y.

You decide to fit a response surface model using a 10-run face-centered central composite design. This design has two center points (rows 5 and 7). 

A table with data for a 2-factor design, listing X1 and X2 factors.

Here is a plot of your design points: 

Note that all of your design points are located on the boundary or at the center of the design region. This is a great design if your response surface is, in fact, quadratic.   

But what if the true response surface is not quadratic?  What if this is the true response surface: 

Image of a true response surface

If this is the true response surface, you have selected very good design points for fitting the wrong model!   

Your Fitted Model

Let’s assume that there is no noise in the data, so that the response values at your ten design points are the actual values given by the response surface.  This figure shows the design points and their values on the true response surface. 

An image of a model showing design points and their values on the true response surface.

Given the location of the design points – all on the boundary or at the center of the design region the resulting fitted quadratic response surface looks like this: 

“That’s not quadratic,” you say.  But it is!  The linear terms dominate the fitted model.  Here is the equation for the model:

Y = 40 - 12.072*X1 + 13.333*X2 + 0.300*X1*X2 - 0.300*X1*X1 - 0.500*X2*X2

Let’s compare the fitted model to the true response function.  The true response function is green, the fitted model is blue. 

If you are trying to find settings to match a target, or to maximize or minimize your response, the conclusions that you draw from your fitted model will be erroneous, and likely seriously so.

“How can I do better?” 

  1. Don’t assume that you know the shape of the response functionMore and more, we find that response functions are complexDefault to the assumption that it has a complex shape
  2. Fit a flexible model that can capture that complex shape. We find that neural nets provide flexible fits. However, to use neural nets with small data sets requires an innovative methodology called SVEM (Self-Validating Ensemble Modeling).
  3. Use design settings that cover the space essentially uniformly.  We call such a design a space-filling design.

In our example, we used ten design pointsHere is a space-filling design consisting of ten design points: 

Note that this design gives much better coverage of the design space and supports the fitting of flexible models that reflect the nuances of the true response surface. 

The fit using SVEM with neural nets is shown below.  Note how well the SVEM model fits the true response surface values at the design settings.  

The figure below shows the SVEM fit (blue) and the true response surface (green)Note that the SVEM fit is very close to the true response surface, especially in the interior of the design region. 

“But how many design points?” 

So back to the original question of how many design points are needed.  The answer depends on the complexity of the true response surface and the type of model-fitting algorithm that you use. 

We advocate the use of space-filling designs with the SVEM modeling technique.  In future posts, we will talk more about SVEM and the recommended number of design points, but generally, SVEM combined with space-filling designs requires significantly fewer runs than do classical approaches. 

For over 25 years, Predictum has enabled companies to achieve higher levels of productivity, operational improvement and innovation, and realize significant savings in cost, materials, and time. Our team of engineers, data scientists, statisticians, and programmers leverages deep expertise across various industries to provide our clients with unique solutions and services that transform data into insightful discoveries in engineering, science, and research. To get in touch with our team, visit www.predictum.com/contact.

Subscribe
to the Mailing List

and stay connected to our news

    More Articles from Predictum

    Analytical Systems and ToolsBlog
    August 30, 2023

    Client Success! ADM Unlocks the Power of PI/JMP Integration with Web APIs!

    If you work with real-time data or manage complex operations, you may have encountered OSIsoft’s PI System. The PI System collects and stores decades of high-quality, real-time data from virtually any…
    From a side view, the photo shows two engineers sitting at desks where they each are working on computers.BlogNewsStatistical Techniques
    May 9, 2023

    How to Estimate Factor Ranges that Deliver In Spec Product

    R&D researchers and manufacturers desire to know the factor ranges in their products and processes that will deliver in specification product. A prerequisite to calculating acceptable ranges is to have…
    An exterior photo of FDA office on a sunny day, featuring an FDA welcome sign.Analytical Systems and ToolsBlogKnowledge Discovery
    April 28, 2023

    Why Do IND Applications Fail?

    Drug discovery research is one of the more high-risk ventures of the twenty-first century. Millions of U.S. dollars that are invested in materials, employees, and clinical designs can be made or…
    Man working at a computer with the sun hitting his face.Analytical Systems and ToolsBlog
    March 24, 2023

    Three Data Analysis Bottlenecks Your Teams Should Automate

    IBM had a message that still resonates today: “Machines should work, people should think.” You can see a short excerpt of this message in “Paperwork Explosion,” a short film directed by…
    A researcher in a laboratory looks into a microscope.BlogStatistical Techniques
    March 10, 2023

    How Many Design Points Do I Need for My DOE?

    How Many Design Points Do I Need for My DOE? This is an often-asked question.  In the traditional DOE setting, the required number of design points is driven by the…
    A bald eagle glides over a hilly winter countryside.BlogNews
    December 30, 2022

    Reflections on 2022

    Looking back on 2022 is like looking back on the past, 3 years since the pandemic began. And there is, for me at least, one major lesson: the need to…