(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-PBLTGFJ'); window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-ECTCKSMRLB'); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= 'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-564RQKZ'); window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-ECTCKSMRLB'); Skip to main content
Photo of a bioreactor in a laboratory.

It’s no surprise that complex bioreactor platforms rely upon equally sophisticated data analysis methods.  

The problem is that linear regression has become the de-facto method of modeling bioreactor factors and most users optimize using the “one factor at a time” method.  

Bioreactors are inherently complex systems due to the many biological, chemical, and physics factors that make achieving desirable outcomes and higher yields difficult.

Imagine you are on board a ship and you are trying to locate a distant island. You strain your eyes and look for coastline, but you can only see very little.

Using linear regression modeling limits researchers’ ability to understand and assess the true behavior occurring inside bioreactors. And this prevents them from optimizing for higher yields or stopping denaturation from occurring. It’s akin forming a telescope with your hands to see the faraway island, where binoculars would bring the image in, closer and cleaner.

In this article, we will explore how Predictum’s SVEM Machine Learning software improves bioreactor research. For case study, click here.

The Case for Machine Learning Analysis with SVEM

As a biotechnologist, your challenge is scaling up to the large-scale bioreactor sizes of 100-1000 liters for biomanufacturing.

Imagine you are creating recombinant proteins using genetic modifications in a cell line with the goal of expressing specific amounts of therapeutic properties. You need to see clearly what factors are affecting that outcome and maintain a stable therapeutic product. 

Due to the high expenses of running a bioreactor, you will only have limited runs and a small resulting data set.  Biotechnology companies and drug research hinge on whether or not their manufacturing or CMC can be validated for drug approval for the FDA.  

If a drug can’t prove its ability to be scaled up, it will not make it to market. The need for accurate, statistical predictions of drug mechanisms cannot be ignored. That’s where SVEM comes in. 

SVEM for Biotech

Graphic image that shows the SVEm logo, SVEM tagline "Learn more from much less," an images of the SVEM profiler tool.

SVEM is a data analysis tool that applies machine learning to Design of Experiments to provide predictive ability for small data sets that outperforms traditional methods.

There is ever-increasing interest in the use of big data machine learning methods to build predictive models using DOE data. Machine learning has provided exceptional predictive power, but only where large data sets, with thousands of observations, that can be partitioned into training and validation sets. 

Explore SVEM Product Page

Predictum’s Phil Ramsey and JMP Chief Data Scientist, Chris Gotwalt, proposed a method of validation, referred to as self-validation, in which a data set is replicated so that the original set is used for training and the replicant data set is used for validation. Applying random fractional gamma weights ensure anti-correlation between the two sets data.

...when you use SVEM (Self-Validating Ensemble Modeling) you are bridging machine learning and DOE methodologies for more accurate results, using fewer observations...

Using FWB (Fractional Weighted Bootstrapping) and self-validation, their methodology creates a series of self-validated models (or an ensemble), that can be used together to make predictions. 

Effectively, when you use SVEM (Self-Validating Ensemble Modeling) you are bridging machine learning and DOE methodologies for more accurate results, using fewer observations.

In the following hypothetical example, we will be applying SVEM to a pharmaceutical research case with the goal of figuring out which combination of factors will give us the highest yields for a protein of interest. 

Case Study: Optimizing Bioreactor yield with SVEM

Let’s look at a hypothetical experiment. 

We will use x-gal screening of E. coli colonies to determine which E. coli expresses recombinant protein pET28(b+), and then select the strain that provides the highest yield.

Assume we have performed the following experiments before utilizing the bioreactor:

A bioreactor in a white room.
  • Induce E. coli with IPTG (Mimicry of allolactose) 
  • Apply sonication cycles and cycle durations to determine the right to time to extract protein 
  • Apply Bicinchoninic Acid (BCA) Assay kit to determine the amount protein that was expressed 
  • Calculate the standard curve of E. coli population with linear regression
    • R2 is 0.9752 and Conc = 9.9778 * A + 1.0913 to determine the doubling time
    • Maximum cell growth rate (µmax) = 0.0165  
    • Doubling time = 42 minutes
  • Apply Beta-Gel Assay to determine Enzyme Specific Activity 
    • Select Normalization 
    • Select Optimization
  • Confirm that the correct protein is being expressed using SDS-Page 

Once complete, we are now ready to scale up the manufacturing of recombinant protein inside a bioreactor. Starting with 100 mL, our goal is to scale to 1000 L, the target bioreactor size.

In a series of up-scaling tests, we need to ensure a stable protein with a high level of yield.  Below, we show the bioreactor data set from up-scaling tests.

Image of a JMP data table titled "Bioreactor Dataset."

Running SVEM for Bioreactor Data

In the following steps, we will use Predictum’s SVEM add-in for JMP Statistical Discovery. You can find more information and contact our team for licensing by visiting the SVEM product page. 

First, we will launch the SVEM add-in for JMP. The SVEM add-in is found under the Predictum menu: 

JMP Application, showing SVEM selected from menu.

We then select ”Neural,” and proceed to the SVEM application window.  

To determine yield, we select “yield” as the Y response that we want to increase. Then we take “nitrogen,“ “concentrations,” “temperature,” and “duration” as our X factors. 

Image of user selecting experimental factors in the SVEM application.

We click run and then select a random seed of 123 for our machine learning model. Finally, we click “Go” at the bottom of the application window to run SVEM. 

Image of user inputing numbers into the "Random Seed" filed.
Image of SVEM loading bar.

SVEM will then begin to run. After it is complete, you will see a graph and a general regression statistical summary for your data set.  

Next, click on the red triangle in the upper left-hand corner of your window to open the drop-down menu. Then select “Profiler”. 

The Profiler is a tool that allows you to simulate multi-variate responses. Each graph represents how each X factor affects yield and which factors give you the highest yield. 

The profiler is where the magic of SVEM happens for Bioreactor Optimization.

By clicking and dragging the red lines on any of the profiler graphs, we can determine what levels of input factors provide the highest yield. 

We will also observe how changes in input factors affect the other input factors. 

If we set the Baffle variable to “No,” set Nitrogen to 21.95 g/L, and set Speed to 4440 rpm, we will get 71.44 nanograms of recombinant protein pET28(b+) for this theoretical bioreactor run. 

Gif of the SVEM profiler in use.

We can now use these predictive insights to run a real experiment with the optimized factors, listed above, to see if the new experiment matches our SVEM model.   

Upon running the new experiment, we end up with a result of 74 nanograms of protein as the new highest yield. 

The difference is a 10% increase increase when comparing the previous pervious bioreactor run results (62 nanograms) and the bioreactor run after SVEM (74 nanograms). 

That small of an increase in yield could produce millions in additional revenue of for pharmaceutical manufacturers.


While we have only scratched the surface of how SVEM can improve work in biotech and pharmaceuticals, check out our additional SVEM resources to learn more. 

Interested in SVEM for your work?  

Visit our SVEM product page to learn more and contact our team for licensing information. 

Explore SVEM Product Page

Do you have a research problem that we can help solve? Our consultant team can help!

For over 25 years, Predictum has enabled companies to achieve higher levels of productivity, operational improvement and innovation, and realize significant savings in cost, materials, and time. Our team of engineers, data scientists, statisticians, and programmers leverages deep expertise across various industries to provide our clients with unique solutions and services that transform data into insightful discoveries in engineering, science, and research. To get in touch with our team, visit www.predictum.com/contact.