2014BKissanePhD

UFRGS

Anderson Ramos Carvalho

em 19/08/2024

Conteúdos escolhidos para você

116 pág.

Carbon Dots: Síntese e Aplicações

UNIP

280 pág.

Carbon Dots: Propriedades e Aplicações

UNIP

15 pág.

Preparation, characterisation and biological evaluation of biopolymer-coated multi-walled carbon nanotubes for sustained-delivery of silibinin

FASE

13 pág.

Cost-effective urine recycling enabled by a synthetic osteoyeast platform for production of hydroxyapatite

UNESP

Perguntas dessa disciplina

Os minerais podem executar inúmeras funções nas células, com destaque para a função de regulação, que consiste em regular a atividade de algumas pr...

UNIP

A química analítica é um campo que abrange estudos e aplicações práticas interligadas a diversos setores da atividade humana. Nesse domínio, ocorre...

FAVENI

O corpo humano é uma complexa rede de sistemas que depende da interação adequada de diversas moléculas essenciais para manter a saúde, o desempenho...

UniCesumar

Prova Período para responder 8497 29/07/2025 02/08/2025 1 [Laboratório Virtual - Determinação de Sódio e Potássio em Bebida Isotônica por Fotometri...

Uniasselvi

Os tamponantes intracelulares são a primeira linha de defesa para o controle homeostático do pH muscular durante os exercícios de alta intensidade....

UNIP

Material

Conteúdos escolhidos para você

116 pág.

Carbon Dots: Síntese e Aplicações

UNIP

280 pág.

Carbon Dots: Propriedades e Aplicações

UNIP

15 pág.

Preparation, characterisation and biological evaluation of biopolymer-coated multi-walled carbon nanotubes for sustained-delivery of silibinin

FASE

13 pág.

Cost-effective urine recycling enabled by a synthetic osteoyeast platform for production of hydroxyapatite

UNESP

Perguntas dessa disciplina

Os minerais podem executar inúmeras funções nas células, com destaque para a função de regulação, que consiste em regular a atividade de algumas pr...

UNIP

A química analítica é um campo que abrange estudos e aplicações práticas interligadas a diversos setores da atividade humana. Nesse domínio, ocorre...

FAVENI

O corpo humano é uma complexa rede de sistemas que depende da interação adequada de diversas moléculas essenciais para manter a saúde, o desempenho...

UniCesumar

Prova Período para responder 8497 29/07/2025 02/08/2025 1 [Laboratório Virtual - Determinação de Sódio e Potássio em Bebida Isotônica por Fotometri...

Uniasselvi

Os tamponantes intracelulares são a primeira linha de defesa para o controle homeostático do pH muscular durante os exercícios de alta intensidade....

UNIP

Prévia do material em texto

Development of Rapid Spectroscopic
Methods for the Analysis of Cell
Culture Media

Bridget Kissane, B.Sc.

Thesis presented for the degree of Ph.D.
of the National University of Ireland

Submitted:
November 2014

Supervisor: Dr. A.G. Ryder
Head of Department: Prof. Paul Murphy

School of Chemistry
National University of Ireland, Galway

- 2 -

Table of Contents
Declaration ..................................................................................................................... I
List of abbreviations ..................................................................................................... II
Abstract ......................................................................................................................... V
1 Introduction ............................................................................................................ 1
1.1 Biopharmaceuticals – Next Generation Drugs ................................................ 1
1.2 Cell Lines and Expression systems ................................................................. 2
1.3 Culture Requirements ...................................................................................... 3
1.3.1 Feed Strategy ........................................................................................... 3
1.3.2 Cell Culture Media ................................................................................... 4
1.3.3 Media Advances....................................................................................... 7
1.4 Analysis of Cell Culture Media ..................................................................... 10
1.5 Process Analytical Technology (PAT) Principles ......................................... 12
1.6 Spectroscopic Methods Suitable for Cell Culture Media Analysis ............... 13
1.6.1 Raman Spectroscopy .............................................................................. 13
1.6.2 Surface Enhanced Raman Spectroscopy (SERS) .................................. 22
1.6.3 Fluorescence Spectroscopy .................................................................... 35
1.7 Project Objectives ......................................................................................... 50
2 Chemometrics, Materials and Methods ............................................................... 51
2.1 Chemometrics................................................................................................ 51
2.2 Qualitative and Quantitative analysis ............................................................ 51
2.3 Calibration Modelling ................................................................................... 52
2.4 Figures of Merit for Modelling ..................................................................... 53
2.4.1 Correlation Coefficient (r2) .................................................................... 53
2.4.2 Root Mean Square Error of Calibration (RMSEC)................................ 54
2.4.3 Root Mean Square Error of Cross Validation (RMSECV) .................... 54
2.4.4 Root Mean Square Error of Prediction (RMSEP).................................. 55
2.5 Multivariate Analysis .................................................................................... 55
2.5.1 Variance Analysis .................................................................................. 56

2.5.2 Regression .............................................................................................. 58
2.5.3 Factor Analysis ...................................................................................... 60
2.6 Data Pre-Processing ...................................................................................... 63
2.6.1 Mean Centring ....................................................................................... 63
2.6.2 Derivatives ............................................................................................. 63
2.6.3 Multiplicative Scatter Correction (MSC)............................................... 64
2.6.4 Normalisation ......................................................................................... 65
2.7 Variable/Wavelength Selection ..................................................................... 67
2.7.1 Moving Window Partial Least Squares (MWPLS) ............................... 68
2.8 Outliers .......................................................................................................... 68
Materials and Methods ................................................................................................. 70
2.9 Materials ........................................................................................................ 70
2.9.1 Sample Materials ................................................................................... 70
2.9.2 Colloid Materials ................................................................................... 70
2.10 Workflow Description ............................................................................... 70
2.11 Sample Preparation and Handling. ............................................................ 71
2.12 Datasets ...................................................................................................... 71
2.12.1 Model Media Samples: .......................................................................... 71
2.12.2 M1Glu Media Dataset ............................................................................ 72
2.12.3 M3Glu Media Dataset ............................................................................ 73
2.12.4 M5Glu Media Dataset ............................................................................ 74
2.12.5 T5 Test Dataset ...................................................................................... 76
2.13 Complex Media Components Experiments: .............................................. 76
2.13.1 eRDF Media Dataset (M5eRDF) ........................................................... 76
2.13.2 Yeastolate Media Dataset (M5Ye)......................................................... 77
2.14 Measurement Techniques .......................................................................... 78
2.14.1 Raman Spectroscopy and SERS ............................................................ 78
2.14.2 Fluorescence Spectroscopy .................................................................... 80

iii

2.15 Sample Holders.......................................................................................... 81
2.16 Specific Chemometric Procedures ............................................................. 82
2.16.1 Baseline Offset Correction ..................................................................... 82
2.16.2 Water Elimination .................................................................................. 82
2.16.3 Water to Analyte Ratio .......................................................................... 83
2.16.4 Model Evaluation Settings ..................................................................... 83
3 Development using Raman Spectroscopy for the Analysis of Cell Culture Media
Components ................................................................................................................. 86
3.1 Spectral Analysis ........................................................................................... 86
3.1.1 Averaged Aqueous D-glucose (M1Glu) Data........................................ 88
3.1.2 Baseline Offset Correction of the Aqueous D-glucose (M1Glu) Data .. 89
3.1.3 Water Background Elimination of the Aqueous D-glucose (M1Glu)
Data 90
3.2 Reproducibility of Raman Data Collection ................................................... 92
3.3 Evaluation of Spectral Range ........................................................................94
3.4 Calibration Modelling ................................................................................... 96
3.5 Spectral Pre-Processing of M1Glu Sample Set ............................................. 99
3.5.1 Pre-area and Post-area Selection for Spectral Pre-Processing ............... 99
3.5.2 Multiplicative Scatter Correction of M1GluR2 Data ........................... 100
3.5.3 Normalisation of M1GluR2 Data ......................................................... 103
3.5.4 Derivative Pre-Processing of M1GluR2 Data...................................... 108
3.5.5 MSC-FD and FD-MSC Pre-Processing of M1GluR2 Data ................. 110
3.6 Outcomes from M1Glu Data Analysis ........................................................ 113
3.7 Quantification of D-glucose in a ternary mixture (M3Glu-Data) ............... 114
3.7.1 Spectral Analysis of M3Glu Data ........................................................ 115
3.7.2 Reproducibility .................................................................................... 116
3.7.3 Quantitative Analysis: Calibrating D-glucose in M3Glu Data ............ 118
3.8 Quantification of D-glucose in a quinary mixture (M5Glu-Data) .............. 120
3.8.1 Spectral Analysis and Reproducibility of M5GLU Data ..................... 120

3.8.2 Quantification: Glucose in M5Glu Data .............................................. 124
3.9 Quantification of eRDF and Yeastolate in quinary mixtures (M5eRDF and
M5Ye) .................................................................................................................... 125
3.9.1 Spectra Analysis of M5eRDF and M5Ye Data.................................... 125
3.9.2 Quantification: eRDF in M5eRDF....................................................... 126
3.9.3 Quantification: Yeastolate in M5Ye .................................................... 128
3.10 Model Validation ..................................................................................... 129
3.10.1 Prediction Performance by Sample Splitting into a Training and Test set
130
3.10.2 Independent Test Set Prediction .......................................................... 132
3.11 General Conclusions: Raman Analysis ................................................... 134
4 Surface Enhanced Raman Spectroscopy (SERS) Analysis of Complex Media
Components ............................................................................................................... 138
4.1 Rationale for Quantitative Analysis using SERS ........................................ 138
4.2 Experimental Considerations for SERS Analysis ....................................... 139
4.2.1 The Absorption Spectrum (λ max and FWHM) ..................................... 139
4.2.2 Sampling Time ..................................................................................... 140
4.2.3 Reproducibility .................................................................................... 143
4.3 Spectral Analysis of M5eRDF and M5Ye Data .......................................... 144
4.4 Region Selection for Quantitative Analysis ................................................ 146
4.5 Quantitative Analysis of Yeastolate in M5Ye SERS Data ......................... 146
4.6 Quantitative Analysis of eRDF in M5eRDF SERS Data ............................ 149
4.7 Model Evaluation ........................................................................................ 151
4.8 General Conclusions: SERS Analysis of M5eRDF and M5YE .................. 152
5 Fluorescence Spectroscopy Analysis of Complex Media Components ............ 154
5.1 The EEM/TSFS Analytical Procedure ........................................................ 155
5.2 Spectral Overview of Media Samples (M5eRDF and M5Ye) .................... 156
5.3 Assessing Fluorophore Contributions from Media Fluorescence ............... 160
5.3.1 Fluorophore Identification and Profile Changes by PARAFAC ......... 161
5.3.2 Fluorophore Identification by MCR Analysis ..................................... 165

5.4 Variance Analysis ....................................................................................... 171
5.4.1 PCA Analysis ....................................................................................... 171
5.4.2 ROBPCA Analysis............................................................................... 173
5.5 Quantitative Analysis of M5eRDF and M5Ye ............................................ 176
5.5.1 Correlation of eRDF Concentration to the M5eRDF Fluorescence Data
178
5.5.2 Correlation of Ye Concentration to the M5Ye Fluorescence Data ...... 180
5.5.3 Model Evaluation ................................................................................. 182
5.6 General Conclusions: Fluorescence Study .................................................. 184
6 Conclusions and Future Work ........................................................................... 187
6.1 Spectroscopic Conclusions .......................................................................... 187
6.2 Future Studies and Solutions ....................................................................... 189
7 References .......................................................................................................... 191
8 Appendix ............................................................................................................ 214
8.1 Supplementary Information for Chapter One.............................................. 214
8.2 Supplementary Information for Chapter Two ............................................. 217
8.2.1 Kyokuto eRDF Medium ...................................................................... 217
8.2.2 Difco TC Yeastolate UF ...................................................................... 218
8.3 Supplementary Information for Chapter Three ........................................... 219
8.3.1 Calibration Models for M1Glu Data (Replicate Models) .................... 219
8.3.2 Calibration Models For M3Glu Data (Replicate Models) ................... 228
8.3.3 Calibration Models For M5Glu Data (Replicate models).................... 233
8.3.4 Calibration Models For M5eRDF Data for the Conventional Raman
Data 238
8.3.5 Calibration Models for M5Ye Data for the Conventional Raman ....... 240
8.4 Supplementary Information for Chapter Four ............................................. 242
8.4.1 Reproducibility of the Replicate SERS Data using PCA..................... 242
8.4.2 Calibration Models for M5eRDF for the SERS Data .......................... 244
8.4.3 Calibration Models for M5Ye for the SERS Data ............................... 256

8.5 Supplementary Information for Chapter Five ............................................. 268
8.5.1 Spectral Overview ................................................................................ 268
8.5.2 PARAFAC ........................................................................................... 268
8.5.3 Calibration Models for the Fluorescence Data (Replicate Runs) ........ 270

Declaration
I declare that the work included in this thesis is my own work and has not been
previously submitted for a degree to this or any other academic institution.

Bridget Kissane.

List of abbreviations
 ACO – Ant Colony Optimization
 AE-IPAD Anion-Exchange Chromatography - Integrated Pulsed Amperometric
Detection
 API – Active Pharmaceutical Ingredient
 ATP – Adenosine Triphosphate
 AVG – Averaged
 BC – Baseline Correction
 BSS – Balanced Salt Solutions
 CCD – Charge Coupled Device
 CD – Chemically Defined
 CHO – Chinese Hamster Ovary
 CIP – Cleaned In Place
 CoAdReS – Competitive Adaptive Reweighted Sampling
 DMEM – Dulbecco Minimal Essential Medium
 E. coli – EscherichiaColi
 ED – Electrochemical Detection
 EEM – Excitation Emission Matrix
 eRDF – Enhanced RPMI/DMEM/F12
 F12/F10 – Ham’s Nutrient Mixture medium
 FAD – Flavin Adenine Dinucleotide
 FD – First Derivative
 FDA – Food and Drug Administration
 FDMSC – First Derivative Multiplicative Scatter Correction
 FMN – Flavin Mononucleotide
 FSH – Follicle-Stimulating Hormone
 HEPES – N-2-hydroxyethylpiperazine-N’-2-ethanesulfonic Acid
 HPLC – High Performance Liquid Chromatography
 IC – Ion Chromatography
 IFE – Inner Filter Effect
 IR – Infrared
 LB – Lysogeny Broth
 LC-MS – Liquid Chromatography–Mass Spectrometry
 LFH – Laminar Flow Hood
 LH – Luteinizing Hormone

III

 LOD – Limit of Detection
 LOOCV – Leave One Out Cross Validation
 MCCV – Monte Carlo Cross Validation
 MCD – Minimum Covariance Determinant
 MDCK – Madin Darby Canine Kidney cell lines
 MEM – Minimal Essential Medium
 MIR – Mid Infrared
 MSC – Multiplicative Scatter Correction
 MWPLSR – Moving Window Partial Least Squares Regression
 NAD – Nicotinamide Adenine Dinucleotide
 NADH – Reduced Nicotinamide Adenine Dinucleotide
 NADP – Nicotinamide Adenine Dinucleotide Phosphate
 NADPH – Reduced Nicotinamide Adenine Dinucleotide Phosphate
 NIR – Near Infrared
 NMR – Nuclear Magnetic Resonance
 Norm – Normalisation
 PARAFAC – Parallel Factor Analysis
 PAT – Process Analytical Technology
 PCA – Principal Component Analysis
 Phe – Phenylalanine
 PLS – Partial Least Squares
 QE – Quantum Efficiency
 RDF – RPMI/DMEM/F12 2:1:1 Mixture
 REP – Relative Error of Prediction
 RET – Radiative Energy Transfer
 RMSEC – Root Mean Square Error of Calibration
 RMSECV – Root Mean Square Error of Cross Validation
 RMSEP – Root Mean Square Error of Prediction
 RNA – RiboNucleic Acid
 ROBPCA – Robust Principal Component Analysis
 RPMI – Roswell Park Memorial Institute medium
 SERS – Surface Enhanced Raman Spectroscopy
 SFS – Synchronous Fluorescence Scan
 SIMCA – Soft Independent Modelling by Class Analogy
 SSR – Sum of Squared Residue
 TChr – Thiochrome

 Trp – Tryptophan
 TSB – Tryptic Soy Broth
 TSFS – Total Synchronous Fluorescence Scan
 Tyr – Tyrosine
 UPLS – Unfolded Partial Least Squares Regression
 UV – Ultra Violet
 VIP – Variable Importance in Projection
 WE – Water Elimination
 YPD – Yeast Extract-Peptone-Dextrose broth

Abstract
Industrial scale cell culture is used for the production of many therapeutic agents such
as protein and vaccines. Cell culture medium is a vital raw material used in these
production processes. Formulation analysis of the medium is thus an essential task of
any bioprocess. The medium is a critical aspect of the process because it has to supply
all of the necessary nutrients and other factors to ensure growth and productivity.
Small variations in medium composition can alter cell metabolism, thereby changing
process efficiency and productivity. There is an ongoing need for analytical methods
to ensure reproducible medium formulations; therefore, real–time qualitative and
quantitative analysis of medium components by spectroscopic methods in
combination with chemometrics has the potential to be adapted as a PAT tool in
bioprocesses.

This thesis investigates the spectroscopic analysis and quantification of three medium
components - D-glucose, eRDF and yeastolate - in model medium formulations by
Raman, Surface Enhanced Raman Scattering (SERS) and two fluorescence
approaches (Excitation Emission Matrix (EEM) and Total Synchronous Fluorescence
Scan (TSFS)). These methods were used in conjunction with chemometrics to provide
a wealth of information about medium composition: qualitative assessment and outlier
detection through principal component analysis and robust principal component
analysis, fluorophore detection and identification using parallel factor analysis and
multivariate curve resolution, and quantitative analysis achieved with partial least
squares. These studies complement previous studies in this laboratory where specific
component quantification [1, 2] and variance analysis were used for characterising,
screening [3-5] and quantifying the performances of cell culture media by
spectroscopic methods [6-8].

The advantages of spectroscopic methods are that they require little to no sample
preparation and they give spectra with rich information content suitable for the
discrimination of subtle chemical and physical effects. The goal of this work was to
see if these spectroscopic methods could be used to accurately quantify medium
components, both simple (glucose) and complex (yeastolate and eRDF). The end-use

application was to develop a quality assurance method for correct medium
preparation/formulation.

Quantitative accuracy varied with the methods due to various experimental factors.
Various different pre-processing techniques were used to minimise unwanted spectral
effects such as noise, intensity and baseline differences. With Raman, quantification
of D-glucose, eRDF and yeastolate was achieved with an error of ~5%, ~16% and
~38% respectively. The SERS model gave error percentages of 16% for the eRDF and
12% error for yeastolate, while the best fluorescence model gave error figures of 5.4%
for yeastolate and 7.2% for eRDF. These models show the potential of these
spectroscopic methods for the measurement/identification of individual medium
components within complex cell culture medium. However, the error level obtained
suggests that improvement could be achieved through modification of the current
experimental setup which would then lead to more accurate prediction of component
concentrations.

VII

1 Introduction
1.1 Biopharmaceuticals – Next Generation Drugs
Originating in the 1990s, the term biopharmaceuticals represents a class of
therapeutics produced by modern biotechnology techniques. These include protein
based products (produced by genetic engineering), and monoclonal antibodies,
(produced by hybridoma technology). During the 1990’s the concept of nucleic acid
based drugs was developed for use in gene therapy and anti-sense technology. Such
products as well as interfering RNA’s and decoy oligonucleotides are also considered
to be biopharmaceuticals [9].

Developed by Genentech in collaboration with Eli Lilly in 1982, the first
biopharmaceutical to gain marketing approval was Humulin1 produced in E. coli. This
marked the true beginning of the biopharmaceutical industry [10, 11].

Figure 1 The number of food and drug administration (FDA) approvals for new
biopharmaceutical products by year since the first biopharmaceutical in 1982[12].

Since Humulin was first approved, the FDA has approved more than 100 new
recombinant protein therapeutics and more than 300 non-recombinant
biopharmaceuticals [13]. In 2012, 18 products received approval from the FDA; the

1 Recombinant Human Insulin

majority of these products were bio-better, me-too, or follow-on products, nine of
which were considered to be new biopharmaceutical entities [12].
1.2 Cell Lines and Expression systems
Cell lines are the hosts for the production of biopharmaceuticals due to their ability to
produce proteins that can be used for medical treatments. The choice of a cell culture
expression system depends on the product, product yield and timeframe required for
both the growth phase and purification stage. Generally, microbial systems grow
quickly on simple, inexpensive culture media. However, they are incapable of post-
translational modifications and the rate of incorrect folding of proteins is higher than
with mammalian cell lines[14].

The majority of approved biopharmaceuticals are expressed in mammalian cell lines,
mainly Chinese hamster ovary (CHO). However, expression in mammalian cell lines
is more technically complex and expensive when compared to E. coli based systems.
Eukaryotic cell lines, unlike prokaryotic cell lines, are capable of carrying out post-
translational modifications such as glycosylation. Many important biopharmaceuticals
are naturally glycosylated, such as erythropoietin, blood factor VIII and hormones
(follicle-stimulating hormone (FSH), luteinizing hormone (LH)). Glycosylation may
be required for biological activity, to increase serum half-life, protein stability, or
reduce immunological problems. In some cases unglycosylated versions of a naturally
glycosylated protein retain the therapeutic properties of the native protein. While
expression in lower eukaryotic systems such as saccharomyces cerevisiae is possible,
glycosylation patterns are more similar to that of native human protein if expressed in
an animal cell line [13, 15, 16]. Mammalian cells share many metabolic processes
and similar characteristics such as protein expression; however some replica strains of
cell lines may differ in the metabolic requirements and production performance.
These differences originate from genomic changes that occur during transfection of
the parental line to cultivation of the cell line [17].

Current statistics of biopharmaceutical cell expression systems reveal the following
production figures [18]:

 45% originates from mammalian cell lines - CHO is dominant at 35%, while
other cell lines produce 10% of products
 40% originates from bacterial cell lines - 39% in Escherichia coli and 1% in
other bacteria
 The remaining 15% comes from yeast based fermentations.
1.3 Culture Requirements
Bioprocesses involve the cultivation of cell lines within a bioreactor2 for the
production of a desired product. The cultivation and production of target biological
products depends on bioreactor conditions (such as oxygen, pH, temperature and feed
strategy), nutrients supply and cell culture medium.
1.3.1 Feed Strategy
High concentrations of cells and long fermentation times would ideally result in a
higher product yield. However, growth and production can be constrained by the
accumulation of by-products and the depletion of vital nutrients. Several feed
strategies are available and are shown in Figure 2. The choice of feed strategy for the
bioreactor is critical. In order to get the optimum product yield at the end of a
fermentation cycle, it must be matched to the specific cell line for best results [19,
20].

2 A bioreactor or fermentor is a reaction vessel containing a liquid medium to support cell growth.
Fermentor refers to the vessel in which the fermentation of single-celled organisms occurs while a
bioreactor is the vessel for the culture of animal cells.

Figure 2 Different types of feed strategies for fermentation and cell culturing3 [19-21].
1.3.2 Cell Culture Media
Fermentations are controlled by their supply of nutrients, irrespective of the culture
method; hence a critical aspect of any fermentation is the medium. Firstly, the
medium has to supply all of the necessary nutrients and other materials to maintain all
the different processes in the cell, which include: synthesis of new cells and cellular
products and consumption of substrates for energy metabolism. It also supplies
vitamins and minerals to act as catalysts, and bulk inorganic ions which function as
both catalytic and physiological factors [20]. Its secondary functions are to minimise

3 Growth inhibitors are substances that hinder the growth by interfering with metabolism and uptake of
nutrients. In cell culture systems growth inhibition can be caused by a build-up of metabolites such as
lactate, pyruvate, succinate, propionate, isobutyrate, and acetate.

adverse pH changes, minimise toxic by-product formation and maintain homeostasis.
Therefore a medium is comprised of a basal medium4 and other nutrient supplements
like insulin, cholesterol and lipids. A basal medium contains amino acids, minerals,
sugars, inorganic salts, vitamins, organic acid and buffers. The basic composition of
basal medium allows for a wide variety of supplements to be added to enhance growth
and productivity. The requirements vary among cell lines and these differences have
led to the development of an extensive collection of medium formulations [14] [17].
Formulation analysis is a vital task in cell culture medium analysis and pre-
formulation analysis highlights compositional faults prior to starting the culture.
Various spectroscopic techniques (NIR, MIR and Raman) have been applied for
monitoring nutrients during fermentation to ensure on-going process quality [22-25].
1.3.2.1 Energy Sources
Glucose is a primary fuel for heterotrophs5 and D-glucose is the natural form used by
animal cells [26]. Energy derived from glucose is stored in the high-energy phosphate
bonds in ATP or other nucleotide triphosphates. It is also stored in energy-rich
hydrogen atoms associated with the co-enzymes NADP and NAD. Animal cells need
a source of both carbohydrates and the amino acid glutamine to ensure the production
of high energy metabolites (ATP and NADPH). Glucose is vigorously used by cells; it
is, however, subject to glycolysis at high concentrations. It is therefore better to
supplement the medium with glucose, thus avoiding the formation of the pyruvate by-
product [20, 27, 28].

Glucose is metabolized by cells at a faster rate than other carbon sources (galactose
and fructose). Glucose and galactose use the same transporter into the cell but glucose
has a greater affinity and a higher uptake rate than galactose [29]. Fructose is another
carbon source that can be used. Fructose and galactose both result in reduced
formation of lactic acid, but also exhibit a slower cell growth rate [20]. For Vero and
MDCK cell lines, fructose is used as the carbohydrate source as it helps maintain the

4 Minimal Essential Medium (MEM) and other basal media supply the basic needs for cellular
metabolism.
5 A heterotrophic organism utilizes organic compounds to obtain carbon that is essential for growth and
development. Examples of such organisms are animals, which are not capable of manufacturing food
from inorganic sources but must consume organic substrates for nutrition.

lactate/pyruvate ratio and a stable pH in high density cultures [30]. Galactose has been
used as a carbon source for CHO TF 7OR cells as a suitable substrate with an
acceptable growth rate and minimizes the generation of toxic by-products [31].
1.3.2.2 Amino Acids
Cells cannot synthesise all the essential amino acids and vitamins that they require.
Therefore, these nutrients have to be provided by the cell culture medium. There are
thirteen amino acids that are considered to be crucial for cultured cells: arginine,
cysteine, glutamine, histidine, isoleucline, leucine, lysine, methionine, phenylalanine,
threonine, tryptophan, tyrosine and valine [10, 12, 13]. L-Glutamine is not stable in
solution and for this reason it is generally added as a separate component to the
medium.6 The breakdown and metabolism of glutamine produces ammonia, which is
toxic to the cells [32, 33].
1.3.2.3 Vitamins and Minerals
Cell culture medium contains different vitamins or precursors; most are water soluble
but some are fat soluble. Examples include biotin, folic acid, niacinamide, pantothenic
acid, pyridoxine, riboflavin, thiamine, vitamin B12, and ascorbic acid. All can be used
to optimize cell growth and productivity,depending on the requirements of the
culture. The B vitamins serve as functional group carriers for enzymes in various
metabolic pathways for all cell types. Other vitamins regulate cell cycle and redox
potential of specific cell lines [27, 28].

Cells require sodium, potassium, calcium, magnesium, chlorides and phosphates for
proliferation. Balanced Salt Solutions (BSS) of bulk ions supply the required
electrolytes needed for physiological roles. Ions are important for the following
reasons: maintenance of osmotic pressure, controlling the membrane potential, and
coordination of the transport channels in and out of the cells. Ions also participate in
oxidation-reduction, and are used in energy production (Kreb’s cycle) [27, 28].
Phenol red is added to the BSS as a visual indicator of pH. BSS may or may not be
buffered with bicarbonate, depending on the culture setup [34].

6 Glutamine in solution undergoes cyclisation to form a toxin, pyroglutamate (5- oxoproline, 5-
pyrrolidone-2-carboxylic acid). This reaction occurs at room temperature and is accelerated by heat.

1.3.2.4 Buffers
Buffers are added to cell culture media to maintain and avoid adverse changes in pH.
The most common buffering system used in mammalian cell cultures is the
bicarbonate/carbon dioxide system as it mimics the buffering system of blood. The
bicarbonate/carbon dioxide buffer has a pKa of 6.3 at 37 °C and requires the use of a
closed culture system as a result of the gaseous nature of carbon dioxide [30, 35].
HEPES (N-2-hydroxyethylpiperazine-N’-2-ethanesulfonic acid) is a zwitterion buffer
with a pKa of 7.3 at 37 °C that is sometimes supplemented into cell culture media for
more effective buffering in the physiological pH range. HEPES is used in conjunction
with sodium bicarbonate, as bicarbonate also provides some nutritional value. HEPES
is added in a concentration range of approximately 10 mM to 25 mM to maintain pH
stability [36-38].
1.3.2.5 Serum in Cell Culture Media
Serum is an undefined biological fluid which is often added to basal medium as a
growth supplement. Serum contains a variety of factors7 that are required for cell
proliferation and expression. It also protects cells against stress induced by shear
forces within the bioreactor [39]. However it can also contain adventitious agents like
bacterial endotoxins or immunogenic contaminants [40, 41]. Serum is an expensive
element of cell culture as the cost for screening is increasing. Commercial sera are
obtained from a number of different animals such as fetal, bovine, and equine [21]
[33] [42]. The choice of serum is based on the purity, cost, availability and ease of
storage. Vendors test sera for their ability to support the growth of cell lines and also
for their purity, based on the bioburden as well as endotoxin and haemoglobin levels
[42, 43].
1.3.3 Media Advances
1.3.3.1 Serum Free Media
Serum free media comes from the desire to minimise lot to lot variation, cost, foreign
antigens, unplanned viral contaminants and interferences at the purification stage. The

7 Carrier protein; attachment regulators; defence molecules; growth factors; hormones; enzymes and
their regulators.

development of serum free media has advanced existing culture methodologies to
facilitate bioprocesses without serum proteins and endogenous serum substances such
as hormones or natural antibodies [42]. A growing number of alternatives to animal
sera exist for cell lines and primary cultures [21, 33, 40, 42]. Hydrolysates are
enzymatic or acid digests of biological materials such as animal tissues (meat digest),
milk products (casein), microorganisms (yeast) and vegetables (soy, wheat gluten,
rice). Hydrolysates are relatively low-cost medium additives used to provide nutrients
and growth factors to cell cultures in order to partly or fully replace serum. These
hydrolysates are poorly defined complex mixtures of peptides, free amino acids,
lipids, polysaccharides, phenolics, vitamins, nucleic acids, and minerals. These
relatively low cost materials are an ideal addition for large scale production. Some
hydrolysates show an anti-apoptotic activity which can extend the fermentation
lifetime [44-46]. For a better defined production of therapeutic proteins, there is a
move towards animal free hydrolysates originating from yeast and vegetable (soy,
rice, wheat gluten, rapeseed and chickpeas) [47-49]. Yeast hydrolysate8 or yeastolate
is a medium supplement that is cost-effective, non-animal derived, and has been
shown to have a significant positive effect on cell growth. Yeastolate is a complex
mixture known to contain free amino acids, peptides, vitamins, minerals, and
carbohydrates, but it also contains a significant amount of unknown material [43, 50,
51].
1.3.3.2 Chemically Defined Media
The development of chemically defined supplements for media is a gradual process.
Every cell type and fermentation process has its own specific requirements for
growth. Jayme’s review provides a good overview of the development of chemically
defined media [42]. The road to chemically defined cultivations started with Eagle’s
basal medium. This is an isometric, pH balanced mixture of salts, amino acids,
vitamins and other essential nutrients [52]. Eagle’s is a simple medium that is fortified
with additional supplements like serum to support a wide range of mammalian cells.
The compositional information for various widely available media used is listed in

8 Yeastolate is produced by culturing yeast to a certain volume. Once this volume is reached, the
process is stopped with heat shock. The cells are digested to produce unrefined hydrolysates. The
hydrolysate is then filtered, concentrated, ultra-filtered, and spray dried.

Table 42 to Table 44. The development of serum free and chemically defined media
occurred through a series of multiple steps starting with serum based medium.
 Eagle’s medium was altered by increasing the amino acid content to form
Eagle’s Minimal Essential Medium (MEM). This modified version still
required serum for cell growth [33].
 MEM was further adjusted by Dulbecco to form DMEM, and this contained a
fourfold concentration increase of nutrients [53].
 Another media series, Ham’s nutrient mixtures F12 and F10 were shown to
support growth and maintenance of different cell types [54].
 The merger of DMEM and F12 by Sato [55] gave a fortified basal medium.
This amalgamated medium still needed to be supplemented with serum to
support bioreactor production.
 Testing showed that under serum-free conditions, transferrin, insulin,
ethanolamine, linoleic acid, ascorbic acid, hydrocortisone and certain trace
element compounds stimulated hybridoma growth.
 The supplementation of DMEM/F12 with insulin, transferrin, selenium and
ethanolamine provided the additional nutrients required to facilitate serum free
cultivations [33, 56].
 RPMI media series use an increased level of nutrients while maintaining a
constant salt content. The combination of RPMI with supplemented
DMEM/F12 in a 1:1 ratio produced a formulation (RDF) with a superior
performance than DMEM/F12 alone [57-59].
 Further enhancement of basal RDF medium with a three-fold increase in the
level of amino acids and glucose gave the enriched RDF medium (eRDF) [57-
59].
 eRDF is a chemically defined basal medium used in the culturing of
therapeutic proteins and each formulation is proprietary for individual
manufacturers.
 eRDF comprises over 50 compounds including inorganic salts, amino acids,
vitamins, HEPES buffer, glucose, and various others.

1.4 Analysis of Cell Culture Media
In largescale manufacturing, most operational parameters are set. Medium
formulations change with manufacturers, cell line and product type. Extensive use
testing9 is required to select the high performing lots that support cell culture
performance in large scale production [60]. The formulation of medium involves
selecting and blending various components, resulting in a complex medium matrix.
For effective and reproducible culturing, the correct medium formulation and
blending is essential. Changes to the medium can affect growth rate, product yield,
and quality [17]. Therefore, there is an ongoing need for new or improved analytical
methods to ensure reproducible medium formulation. Comprehensive and detailed
analysis of cell culture medium composition and variance can help control and
understand the very complex cell culture based manufacturing process.

In-depth analytical methods for cell culture medium can be time consuming,
multistep, challenging and expensive. Medium samples are centrifuged or filtered to
remove particulates, diluted and derivatized (if necessary) before testing. Detailed
method development is required to address all analytes present, and also to determine
analytes of interest, analyte concentration and overall analysis time. The analysis
method is dependent on the medium and should be determined individually for each
medium to ensure accurate measurement [61, 62]. Exact quantification of nutrients in
cell culture medium is desirable, if not necessary, in order to meet cell line
requirements. It is advantageous to characterise the medium ingredients (including
inorganic ions, carbohydrates, alcohols, and aliphatic carboxylic and amino acids)
because the presence or absence of specific components may impact the yield of the
desired products.

When it comes to characterisation, most of these nutrients and metabolites are ionic or
polar in nature, and do not have the chromophores necessary for analysis by
absorption measurements. Ion chromatography (IC) with electrochemical detection
(ED) is a suitable technique for the determination of these components. Carbohydrate

9 When testing a new basal medium, a scaled down production process is used. In the test, the new
material and the reference material are cultured side by side. The results compare cell growth, product
yield, nutrient usage and by-product formation for the reference versus new material.

analysis uses liquid chromatography with refractive index detection [63]. The
common method for amino acid detection involves liquid chromatography (reverse
phase or cation exchange) resulting in the detection of derivatives [62-64]. Pre- or
post-column derivatization based methods are limited to a specific range of amino
acids. High operating costs and the inability to detect multiple components, such as
certain amino acids and carbohydrates, renders LC methods unattractive [62, 65]. In
order to advance the chromatographic measurements beyond derivative detection and
toward multiple component analysis, anion chromatography with integrated pulsed
amperometric detection was developed [65]. Hanko et al. [61, 62] have developed a
method using anion exchange chromatography – integrated pulsed amperometric
detection (AE–IPAD) technology that allows for simultaneous detection of amino
acids and carbohydrates in four media formulations (YPD broth, LB broth, MEM and
serum free-protein free hybridoma medium).

Identifying lot to lot variability in raw material is another aspect of media analysis.
Variability can arise from several sources: the producer, the raw material used, and
also the aging of the material. Variability in chemical composition and culture
performance may depend on extraction material (i.e. whether extraction is from yeast,
malt or protein digests). Medium supplements like hydrolysates are heterogeneous in
terms of molecular size and chemical diversity. The exact concentration varies per
manufacturer and it is impossible to identify and quantify every individual
component. Yeastolate is inexpensive and lot to lot variation is known to occur. When
six different lots were tested, the free amino acid content varied from 45% to 78%
resulting in different biomass levels and growth rates [66].

Various methods are used to analyse the different components in bioprocess studies.
Commonly glucose, lactate, glutamine, and glutamate are measured using enzyme-
based biosensors. The biosensors are amperometric electrodes that have immobilized
enzymes in their membranes and work by converting the glucose or other substrates
to hydrogen peroxide, which is oxidized to produce an amperometric signal
proportional to substrate concentration [67]. For example, Sun et al. used six different
tests to quantify nutrients, metabolite, product and by-product formation, because a
single method was not able to quantify all of the different components [68]. These

included BioProfile analyser for glucose, lactose and ammonium; two different
enzyme based assays for galactose and ATP; three different variations of HPLC for
quantifying amino acids, vitamins and antibody formation.

Most analytical approaches used for cell culture medium analysis are multistep and
time consuming in nature. As a result, there is a need for a rapid, sensitive and
inexpensive technique that is capable of monitoring multiple components in a single
measurement. Spectroscopic methods seem to be ideal candidates and in this work,
three spectroscopic methods (Raman, SERS and Fluorescence) were used to study cell
culture medium for formulation analysis.
1.5 Process Analytical Technology (PAT) Principles
The Food and Drug Administrations (FDA) guidelines recommend that Process
Analytical Technology (PAT) should be adopted as a regulatory framework that will
encourage growth and innovation in pharmaceutical development, manufacturing, and
quality assurance [69]. FDA considers PAT to be “a system for designing, analysing,
and controlling manufacturing through timely measurements (i.e. in process) of
critical quality and performance attributes of raw and in-process materials and
processes”, with the objective to guarantee the final product quality. Using PAT
ideology, the intention is to design and develop well understood and efficient
processes that will consistently ensure a predefined quality at the end of
manufacturing. The main challenge for process control is the dynamic conditions that
are difficult to predict or simulate. This makes the robustness together with high
precision measurements vital to the overall process control. No specific technologies
are mentioned in the FDA guidance document, thus allowing for various methods to
be implemented. Spectroscopic methods offer fast, non-destructive analysis with
minimal sample preparation. Furthermore, they can be adapted for process monitoring
and have in-situ capabilities. For example, Raman spectroscopy has been shown to be
effective as a PAT tool for in-line and real time monitoring of processes such as
freeze drying, blending, active pharmaceutical ingredient (API) monitoring and
endpoint analysis [70-72]. Chemometric methods for multivariate data analysis were
applied to extract the relevant information from complex datasets, resulting in more
robust models with lower prediction error than manual data evaluation [73].

1.6 Spectroscopic Methods Suitable for Cell Culture
Media Analysis
Spectroscopic methods (such as NIR, Raman and Fluorescence) offer many
advantages: they are fast, easy to use, suitable for automation, require little sample
preparation and have lower setup costs. The development of spectroscopic methods
for media analysis is based on the non-destructive, rapid, reliable, and robust nature of
the measurements[74-81]. In the Nanoscale Biophontics Laboratory (NBL), the
primary focus is developing rapid spectroscopic methods (Raman, surface enhanced
Raman spectroscopy (SERS) and fluorescence) for the analysis of cell culture media
and its components, and targeting their implementation in the biopharmaceutical
industry [3, 6, 82]. The principles and applications of three spectroscopic methods of
interest will be discussed in detail in the following sections.
1.6.1 Raman Spectroscopy
Raman spectroscopy is an optical method that makes use of inelastically scattered
light to measure molecular vibrations [83]. With Raman, when a sample is irradiated
using a monochromatic light source (e.g. from a laser), some light is scattered. When
the scattered light is studied spectroscopically, the majority of this light has the same
frequency as the incident light while a very small fraction is observed at different
frequencies. The scattered light with the same frequency as the incident light is known
as Rayleigh or elastic scattering, while the scattered light at a different frequency is
Raman (inelastic) scattering. Raman shifts10 are independent of the exciting frequency
and are characteristic of the species giving rise to the scattering. There are two types
of Raman transitions: photons may lose some of their energy (Stokes radiation) or
photons may gain some energy (anti-Stokes radiation) [83-85]. The intensity ratio of
the Rayleigh line is about 10–3 with respect to the incident excitation while the Raman
lines are at most 10–6. Rayleigh scattering can be 104 to 106 times stronger than
Raman scattering [86, 87].

10 Difference between the incident and scattered beam frequencies

When a molecule enters an electric field of strength 𝐸, a dipole moment P is induced
in the molecule. The magnitude of the induced dipole moment is 𝑃 = 𝛼𝐸, where α is
the polarizability of the molecule. If the molecule encounters electromagnetic
radiation of frequency ʋo, a varying electric field E is induced. This in turn induces a
varying electric dipole moment, which causes an emission of light identical in
frequency to the incident radiation. This is elastic or Rayleigh scattering. If there is a
change in the polarizability of a bond during a rotation or vibration through
interaction with electromagnetic radiation then the vibrational mode is Raman active
and the emitted light is altered from the incident radiation [87-89].

Figure 3: (Left) A schematic illustrating the scattering of incident light as it interacts with a
molecule, giving off Rayleigh and Raman scatter and (Right) energy level diagram depicting
scattering processes of Rayleigh and Raman. E0 and E1 are the ground and first excited
electronic energy levels, respectively. Reproduced with permission from [90, 91].

The quantum theory for the scattering process treats the monochromatic light
frequency 𝑣0 as a stream of photons having energy ℎ𝑣0, where h is Planck’s constant.
With Rayleigh scattering, the incident photons interact with a molecule and are
scattered without a change in frequency (elastic scattering). However in the Raman
effect, the photon interacts with the vibrational energy levels of the molecule and the
scattered radiation has a different frequency 𝑉𝑣 through either loss or gain of energy
from the incident light (inelastic scattering). A molecule undergoing a vibrational
transition from the ground vibrational energy level (v = 0) to the first excited
vibrational energy level (v’ = 1) will have a corresponding frequency of 𝑣𝑣 and the

scattered photon will be diminished in energy by the amount ℎ𝑣𝑣. The energy of the
scattered photon will be ℎ(𝑣0 − 𝑣𝑣). This is known as Stokes scattering. In contrast,
if the molecule is already in an excited vibrational state when the photon interacts
with it, the transition 𝑣′ → 𝑣′′ may be induced and the photon will be scattered with
an enhanced energy that produces anti-Stokes Raman lines. At room temperature in
accordance with Boltzmann distribution, the population of molecules in the ground
vibrational states is always much greater than those in the excited vibrational states.
As a result, the intensities of anti-Stokes lines will always be much weaker than those
of the Stokes lines. [91, 92]

For a molecule which possesses a centre of symmetry such as CO2, there is a useful
rule - the mutual exclusive rule. This states that for molecules with a centre of
symmetry, fundamental transitions which are active in the infrared (IR) absorption
spectroscopy are forbidden in Raman and vice versa. Together Raman and IR
absorption spectroscopy provide a complete picture of the different vibrational
frequencies in a molecule. Groups which lack strong features in Raman, may exhibit
intense bands in IR and vice versa. In molecules with symmetric elements other than a
centre of symmetry, certain bonds may be Raman active, IR active, both or neither.
All normal modes allowed in both IR and Raman for complex molecules with no
symmetry. The methods are complementary to one another. [87-89]
1.6.1.1 Bioprocess Monitoring using Raman spectroscopy
Raman spectroscopy provides a non-destructive method for gathering both
macroscopic and microscopic information about biological molecules in cells, tissues,
media and plants [93]. The Raman fingerprint region is very sensitive to changes in
chemical composition, bonding and conformation. Raman offers several advantages
over other spectroscopic methods: detailed chemical/structural information content, a
relatively weak water signal (sample dependent) and minimal sample preparation
[94]. However, in many biological samples, a strong fluorescence signal can obscure
the Raman scattered light completely. The best excitation wavelength region for
biological studies is between 780-1064 nm. However, highly sensitive CCD cameras
(that deliver spectra with a good signal to noise ratio in a short time period) only work

well within a 780-850 nm excitation range. This wavelength range also reduces
fluorescence to an acceptable level [93].

As water is the principle medium in cell culture media, its impact is important.
Comparative studies of solid amino acids and their aqueous solutions show the
difference in the spectra of solid amino acids and their aqueous solutions is
significant, (Figure 4). The spectra of solid amino acids are complex and detailed
compared to their aqueous solutions. The aqueous solutions are low concentration
samples, and as the signal intensity is proportional to concentration, this leads to a
weaker signal and loss of spectral detail [95, 96].

Figure 4 Raman spectra of solid and aqueous solutions of Phe (0.3 g/L), Trp (0.11 g/L) and Tyr
(0.004 g/L). Reproduced with permission from [95].

Even though water should have a weak Raman signal, it can be a significant issue
when looking at very low concentrations in aqueous solution. The strong impact of
the water signal on the amino acid signal is significant since the samples prepared as
part of this study are low concentration aqueous solutions. The Raman method may

not be able to detect single analytes within an aqueous solution but we are going to
test its ability in testing the gross signal changes for a whole component (D-glucose,
eRDF and yeastolate).

Raman analysis can add supplementary information to current proteomic diagnostic
methods (chromatographic and mass spectrometry analysis). Raman spectra can give
insight into the changes in protein and amino acid interactions as information on the
microenvironment of aromatic amino acids can be reflected by intensity variations. If
binding or exposure to environmental changes occurs in the presence of aromatic
amino acids, thespectrum will highlight the change. For example, the Raman spectra
of insulin variants demonstrate that the method is capable of providing chemical
information to distinguish proteins of similar structure in biomedical testing (see
Figure 5). Using high quality Raman spectra of low concentration protein solutions
along with multivariate analysis techniques, small spectral differences associated with
insulin variants were identified, and subtle differences among individual proteins,
peptides and mixtures were identified [94].

Figure 5 Average Raman spectra of (a) human, (b) bovine, and (c) porcine insulin on the left and
difference spectra between human and porcine insulin on the right. Reproduced with permission
from [94].

There are several applications of Raman spectroscopy in bioprocess monitoring. The
value of Raman spectroscopy to bioprocess monitoring is that it is rapid, non-
invasive, adaptable to on-line measurements, and is easy to operate and maintain. In
addition, bench top instruments are available [3]. Raman analysis has been utilised in
a variety of applications e.g. from carotenoid production, where a single compound
was monitored [97] in observing the biotransformation of glucose into ethanol in
yeast fermentation [98]. It has also been used to simultaneously measure the changing

concentrations of glucose (30–80 g/L) and lactic acid production during lactic acid
fermentation by L. casei with error values of 2.5 g/L for glucose and 0.74 g/L for
lactic acid [24]. For more complex industrial bioprocesses such as the production of
gibberellic acid (GA3), the Raman results showed that it is possible to quantify the
GA3 product from the spectral data of unprocessed samples [99].

The flexibility of the Raman instrumentation increased with the use of a fiber optic
probe as the delivery and collection system. In-situ Raman spectra were measured for
an E. coli fermentation of phenylalanine production for simultaneous estimation of
glucose, acetate, formate, lactate, and phenylalanine [76]. The substrates were
modelled based on the Raman spectra and the HPLC reference method data. The error
levels for Raman models for a production run were glucose (4.16%), acetate (4.67%),
formate (5.5%), lactate (5.39%), and phenylalanine (not detected). The Raman
estimates for glucose consistently underestimated the reference method, the estimates
for acetate, formate and lactate showed qualitative agreement with error, while
phenylalanine was not detected by the Raman model. The results showed potential
despite the errors introduced by the physical environment of the bioreactor [76]. The
implementation of Raman analysis to fermentations also has potential for tracking
culture parameters. In-line Raman monitoring of a mammalian cell culture bioreactor
was applied for prediction of various media components (glutamine, glutamate,
glucose, lactate, and ammonium) and compared to the standard reference
measurements using a BioProfile 400 Analyzer [100]. Table 1 and Figure 6 show the
model predictions and accuracy based on the Raman spectra throughout the culture.
The predictions follow the overall expected trend. The Raman models accurately
predict decreases in nutrient levels (glutamine, glutamate, glucose) and increases in
metabolite levels (lactate and ammonium). The error level values for glutamine are
close as low accuracy is seen with both the Raman and reference method because of
the low concentration of glutamine. The error levels for glutamate and lactate
compare better as the models accurately predict the behaviour of these analytes. The
models for glucose and ammonium did follow the process; however, their error levels
are poor, reducing their model accuracy. The performances demonstrate that the
Raman method is comparable to the reference method and therefore Raman

spectroscopy provides an attractive approach for monitoring mammalian cell culture
processes.

Figure 6 Comparison of measured nutrient and metabolite concentrations (solid diamonds) and
the predictions (solid lines) from the modelling of Raman data for (a) glutamine, (b) glutamate,
(c) glucose, (d) lactate, and (e) ammonium. Dashed lines indicate the standard deviation
measured for the reference method. Reproduced with permission from [100].

Table 1 Results for predictions of nutrient and metabolite concentrations using in line Raman
meaurements and standard reference measurements from the BioProfile 400 Analyzer.
Reproduced with permission from [100].
Media Component Calibration Range Raman % error Reference % error
Glutamine (mM) 0.66–4.26 30.3 22.0
Glutamate (mM) 2.21–5.72 12.0 17.0
Glucose (g/L) 2.07–6.22 15.3 4.0
Lactate (g/L) 0.23–5.21 12.9 10.0
Ammonium (mM) 2.01–8.51 11.4 4.0

Previous work on the cell culture media analysis by Li et al. showed rapid
identification, characterisation and quality assessment of media components used in
industrial cell culturing [1-8, 101]. Raman was used to identify the different media
types and as a sample quality testing method. Chemometric analysis (PCA and
SIMCA11) were used for sample evaluation. Five different chemically defined (CD)
commercial media components (Figure 7) were investigated. Each of these
components was used in a Chinese Hamster Ovary (CHO) based manufacturing of
recombinant proteins. Raman data provided significant differences within spectra to
identify the different media types, and also outlier analysis allowed for identification
of suspect samples. The “normal” samples were selected for the routine identification
and quality evaluation of the different media components. Five distinct classes were
obtained through SIMCA classification (Figure 7b) where each medium type was
grouped according to their spectral differences [3, 102]. This study clearly showed
that the identification and classification of incoming materials was possible using the
Raman method.

11 SIMCA (Soft independent modelling of class analogy) is a classification method that outperforms
PCA which is based on total variance. In order to build a reliable model, it uses a series of PCA models
where samples are identified as class members by describing relevant spectral variance. The
classification is based on significance tests where the distance from the model center (leverage) and the
distance to the model space (residuals) are examined.

Figure 7 Spectra of five different chemically defined (CD) commercial media and the SIMCA
classification of the 336 sample measurements using the pre-processed Raman spectra of CD–A1,
CD–A2, CD–S1, CD–S2, and eRDF samples. Reproduced with permission from [3].

When Raman was applied to the analysis of bioprocess samples (sample components
may include cells, fresh media, spent media and product proteins), these samples
resulted in a strong water signal and weak signals for the media components. Li et al.
investigated the correlation between the Raman spectra with the glycoprotein yield
from 9 different time points. The generated models used a full region (400–1053 cm–
1), and two variable selection methods (CoAdReS and ACO). The full range model
gave a poor performance while the variable selection greatly improved the prediction
ability. The CoAdReS and ACO models were equally matched but the run time for
the ACO method was very time consuming. This was not acceptable when the goal
was the development of a rapid analysis method. However, using Raman spectra with
CoAdReS variable selection generated an accurate prediction of the glycoprotein
yield in a timelier manner. This opened up the possibility of developing the Raman
method for bioprocess evaluation of product yield during the process in order to
ensure consistent yieldsand prevent losses [6].

1.6.2 Surface Enhanced Raman Spectroscopy (SERS)
An enhanced Raman spectrum was observed for pyridine adsorbed on
electrochemically roughened silver like that in Figure 8. The initial conclusion by
Fleischmann et al. was that a roughened electrode surface area caused a local increase
in pyridine concentration leading to a stronger signal [103]. This was disproved by D.
L. Jeanmaire and R. P. Van Duyne who showed that the signal increase was caused by
a dramatic increase (an estimated 105 fold enhancement) in the Raman scattering
cross section [92, 104, 105].

Figure 8 (a) SERS of the pyridine at silver films and (b) Raman spectrum of the aqueous solution
of 0.01 M pyridine in 0.1 M KCl. Reproduced with permission from [106].

SERS is a form of Raman spectroscopy involving the interaction of molecules with
nanostructured colloids or nanostructured metal surfaces – generally silver or gold.
The adsorption of molecules at or near nano-roughened metal surfaces can enhance
the Raman scattering efficiency by a factor 103 to 106 compared to normal Raman
scattering. The signal enhancement of SERS combines the structural information of
vibrational spectroscopy with extreme sensitivity. The extreme sensitivity allows for
detection of a species at very low concentrations even down to single molecule levels.
The SERS effect also quenches the fluorescence background signal from adsorbed
species (Figure 9) [107-111].

Figure 9 (A) Raman spectrum of 0.05 g/L Acebutolol solution displaying fluorescence
interference and (B) surface enhanced Raman spectrum of 0.05 g/L Acebutolol showing reduced
background, adapted from and reproduced with permission from [112].
1.6.2.1 Mechanisms for SERS
SERS enhancement is the result of two different processes: electromagnetic and
chemical enhancement. Electromagnetic enhancement is the more dominant
phenomenon and occurs when nanostructured metal particles or roughened metal
surfaces are exposed to light of an appropriate wavelength. Chemical enhancement is
the result of an interaction between the adsorbate molecule and the metal, usually
involving electronic effects such as charge transfer. A SERS enhancement factor of
106 can generally be broken down into an electromagnetic enhancement factor of 104
and a chemical enhancement factor of 102 [87]. Electromagnetic field enhancement is
a long range (0-~30 nm) effect whereas chemical enhancement is very short range,
confined to molecules that have direct contact or monolayer coverage with the metal
surface [92, 105, 113]. In chemical adsorption, changes in the adsorbate are not
evident unless the adsorbate is conjugated with the chemical bonds directly. Chemical
adsorption is common in functional groups such as S–H and N–H [114].

SERS signal strength is also dependent on the orientation of the adsorbed species. In
the case of pefloxacin (Figure 10), at high concentrations a more perpendicular
orientation of the adsorbed species occurs, whereas a flatter alignment is observed
with a lower concentration. At 10–6 M, the intensity of the 229 cm–1 band increases,
corresponding to a carboxylate group interacting with the Ag surface in accordance
with the adsorbed molecules lying flat on the surface. At 10–4 M, the intensity of the
210 cm–1 and 1656 cm–1 bands increases, reflecting a more tilted orientation that

results from local steric hindrance, with increased surface coverage and/or repulsive
electrostatic forces between the adsorbed species [115].

Figure 10 (a) SERS spectra of pefloxacin at three concentrations (10–4, 10–5, and 10–6 M) and a
representation of possible orientations for pefloxacin adsorbed on silver colloid; (b) for
pefloxacin concentration of 10–6 M; (c) 10–5 M; (d) 10–4 M, reproduced with permission from
[115].
1.6.2.1.1 Electromagnetic Enhancement
The electromagnetic enhancement mechanism relates to the amplitude of the electric
field for light and is the result of nanostructured roughened metal surface structure
and interaction of the adsorbed molecules with surface plasmons [107]. The intensity
of an electromagnetic field is dependent on the number of excited electrons and on the
volume of nanostructures. When a beam of incident light interacts with a nano-
roughened metal surface, free electron-like behaviour is exhibited. When photons
interact with these electrons they begin to oscillate as a collective group across the
surface; these oscillations are termed surface plasmons. Surface plasmons have a
resonance frequency at which they absorb and scatter light most efficiently. The
frequency depends on the metal and surface morphology12 [92, 116]. The

12 For scattering there needs to be an oscillation perpendicular to the surface, which requires a
roughened surface.

electromagnetic field of the surface plasmon is stronger than the incident light field
and therefore increases the intensity of the Raman scattered light [92]. The excitation
of the surface plasmon greatly increases the local field experienced by the molecule
(Figure 11). The enhanced field depends on the optical conductivity of the metal,
while optical conductivity depends on the wavelength used and the size and shape of
the particle [87, 92, 103, 105].

Figure 11 Illustration of the excitation of the localized surface plasmon resonance of a spherical
nanoparticle by incident electromagnetic field. Reproduced with permission from [117].

The plasmon resonance condition of individual particles is limited to a small distance
range. Signal enhancement is greater at the point between touching particles or in
clusters of particles compared to isolated particles. When metal nanoparticles are in
contact (Figure 12), the contact points show very active electric fields which lead to
an enhanced Raman signal. The particle size, shape and the arrangement into clusters
all contribute to the SERS enhancement [92, 118].

Figure 12 Schematic illustration of the electromagnetic field generated between adjacent
nanospheres upon incident irradiation. The opposing nanosphere sides have opposite
polarization charges, leading to the highly dipolar environment. Reproduced with permission
from [119].

1.6.2.1.2 Chemical Enhancement
Adsorption of the analyte onto the nanostructured metal surface leads to changes in
the molecular orbitals and electron distributions across both the analyte and the metal
surface. This increases the polarizability of the analyte molecule [87, 92, 120]. It is
believed that chemical enhancement is related to the new electron state belonging to
the bond formed between the analyte and the metal surface; the new electron states
are resonant intermediates [92]. When an incident photon excites an electron from the
metal surface into an adsorbed molecule, it creates a negatively charged excited
molecule. The molecular geometry of this excited molecule differs from that of the
neutral species. This allows for charge transfer to occur from the metal to the analyte.
The signal enhancement will take place when the excited electron of the charge
transfer becomes resonant with the incident light [105]. The incident photon is
adsorbed onto the metal nanoparticles, and the associated charge transfer induces a
nuclear relaxation within the excited molecule. This results in the return of the
electron to the metal surface, the creation of a neutral molecule, and the emission of a
Raman scattered photon [92, 107].
1.6.2.2 Substrates
SERS spectra are obtained after molecules interact or are adsorbed onto certain
nanostructured metal surfaces. Many different types of surfaces can be used for
SERS; examples include aggregated colloid suspension [121-123], roughened
electrodes[121, 124, 125], metal films (such as silver island films) [126-128], and
silver coated beads [129-131]. Silver, gold and copper are the most commonly used
substrate materials, with silver being the most widely used. The choice of SERS
substrates is based on the wavelength of the surface plasmon band (e.g. λmax in the UV
or visible spectrum). This is a function of the material used and its size. Both silver
and gold surface plasmons oscillate at frequencies in the visible region making them
suitable for use with the visible and NIR excitation wavelengths typically used in
Raman spectroscopy. Silver has a broad excitation range from the UV to IR while
gold is limited to the red and IR ranges because of band transitions. Silver is less
reliant on the excitation wavelength compared to other SERS active metals like gold
and copper due to its favourable dielectric function [92, 132]. Silver colloids are used
in applied techniques (i.e. silver island films) because silver is a more efficient optical

material giving a SERS signal of 10–100 fold higher than gold. Gold colloids are,
however, used in studies of living organisms because of their chemical stability, better
control of size and shape and higher biocompatibility [118, 133].
1.6.2.2.1 Colloids
A metal colloid is the suspension of metal nanoparticles in a solvent. Silver or gold
colloidal suspensions may be formed by chemical reduction of metal salts. Silver
colloids can be prepared by the sodium borohydride or citrate reduction methods
[134, 135]. The size, shape and dielectric constant of colloids exhibit differences in
plasmon resonance and wavelength dependence. The nanoparticle sizes range from 10
to ~200 nm and depend on the method of preparation [136, 137]. UV-visible
absorbance measurements indicate the size of the particles present in the colloid.
Larger particles produce broader peaks at longer wavelengths. For silver
nanoparticles, an absorbance maximum ranging from 395–405 nm indicates a particle
size of 10–14 nm in diameter, absorbance around 420 nm indicates a size ranging
from 35–59 nm, and absorbance around 438 nm point to a size of 60–80 nm [138].

When the particles in a colloid are too small to exhibit large field enhancement, an
aggregating agent can be added to produce clusters of particles. The most commonly
used aggregating agent is sodium chloride (NaCl). Nanoparticles are kept in
suspension by repulsive electrostatic forces between the particles. The addition of
NaCl buffers the charges allowing the particles to clump together and form
aggregates. The resulting silver particle clusters provide a much higher surface-
enhancement signal [87, 138]. Metal colloids may either be added directly to the
analyte solution or immobilised on a mounting substrate before exposure to the
analyte.
1.6.2.3 Sensitivity of SERS
In comparison to normal Raman spectroscopy, SERS can deliver enhancements of 106
or greater. The sensitivity of SERS signal makes it possible to observe a spectrum
from a single molecule. Etchegoin et al. showed the possibility of chemically
identifying Rhodamine 6G in solution down to a concentration of 10–18 M. At the
single molecule level, detection suffers from fluctuations and is not easily

reproducible. Reproducible spectra can be obtained from samples containing 50–100
analyte molecules [139, 140].

SERS is sensitive to structural differences between equal mass isomers. In a study by
Dressler et al., the use of SERS for characterising the different geometric orientations
of three pyridine compounds – para, meta, and ortho-pyridine carboxylic acid – was
investigated. The SERS spectra were compared to the Raman spectra of the
crystalline form of each isomer (Figure 13). The results showed different profiles for
each isomer, since each interaction with the silver colloid varied depending on the
isomer present [141-143].

Figure 13 Bulk-Raman (black) and SERS (grey) spectra for (left) para-pyridinecarboxylic acid
and (right) meta-pyridinecarboxylic acid. Reproduced with permission from [141].

The profiles of amino acids in solution were studied using SERS spectroscopy [144-
146]. Nineteen different amino acids and their adsorption route onto the metal surface
were examined. Identification of amino acids was possible because of their spectral
differences. The side chain groups influenced the spectra through their interaction
with the metal surface. SERS studies showed that interactions between the silver
surface and the amino acids occur through their deprotonated carboxylate group. The
sulphur containing amino acids also showed a strong interaction between the surface
and the sulphur containing functional groups [146, 147]. The size of the amino acids
can also contribute to the signal strength. In some cases, the signal strength was
weakened when a large molecule size caused limited interaction. The adsorption of
the aromatic and sulphur containing amino acids resulted in stronger band intensity
and was more favourable than the rest [123, 148]. The detection limits of the aromatic
and sulphur containing amino acids were as low as 10–10 M , and the rest of amino
acids were detectable at 10–9 M [144].

HPLC is the standard method for quantification of melamine13. However, it is labour
intensive and time consuming. A faster SERS protocol was developed for detection
and quantification of melamine in foodstuffs. When the SERS method was compared
to HPLC, the qualitative results showed that SERS was capable of detecting trace
amounts of melamine. The limit of detection for melamine with the SERS Klarite14
substrate was estimated at 0.033 µg/L. The HPLC limit of detection for melamine
standard solution was 1.0 µg/L. In a quantitative assessment of the methods; the
HPLC outperformed the SERS method by an r2 value of 0.99 to 0.90. The SERS
method showed high sensitivity (L.O.D. ~0.033 µg/L); however, the accuracy and the
precision was less than the HPLC method. As a result, SERS could potentially be
used as a preliminary screening method for the large sample sets, as it is faster, less
labour intensive and simpler than HPLC. Verification of suspect samples could then
be carried out using the more precise HPLC method [149-151].
1.6.2.4 Concerns relating to SERS Analysis
When conducting a SERS experiment, SERS signals are very sensitive to a wide
range of factors, including substrate, environmental (pH, temperature and solvent),
and compositional changes (analyte concentration and matrix interferents). Therefore
care needs to be taken to generate reproducible SERS spectra for quantitative analysis
[152, 153].
1.6.2.4.1 pH Effects
Under different pH conditions, the surface charges of the nanoparticles change,
affecting the SERS performance [154]. The effect of pH variation on SERS behaviour
can be utilised to study different molecular species. For example, the SERS analysis
of thiamine at different pH discriminated the protonated from un-protonated species
(Figure 14). At low pH, spectra featured the protonated related peaks at 1657 and
1550 cm–1. As the pH increased, the degree of protonation decreased, resulting in the
emergence of un-protonated related peaks at 1590 an 1373 cm–1. At a pH of 9 and
above, the thiamine was destroyed and the SERS spectrum was no longer recorded
[155].

13 Melamine is a nitrogen rich compound that is banned in foodstuffs.
14 The Klarite substrate consists of a silicon surface which has been patterned with a square array of
micrometre-sized square-based pyramidal pits coated with a 300-nm layer of gold.

Figure 14 SERS spectra of thiamine in colloidal gold sol (10–5 mol/L) at different pH values.
Reproduced with permission from [155].
1.6.2.4.2 Colloid Sample RatioFor colloid based SERS substrates the ratio between colloid and sample concentration
is critical. Spectral fluctuations can arise from the sample colloid ratio used as a result
of:
 Competitive binding between different analytes and the number of hotspots.
 Differences in the local SERS enhancement.
 Charge-transfer between molecule and surface interactions.
 Movement of the absorbed molecule on the surface.
The impact of increasing colloid concentration for a B. megaterium sample showed
that as the concentration increased, the baseline offset decreased and signal intensity
increased (Figure 15). Increased colloid concentration had a definite impact on the
reproducibility and the spectral features. Significant improvements were seen with up
to 4 times the initial colloid concentration. Above the fourfold concentration only
marginal improvements were observed, as the colloid underwent aggregation causing
changes in the proximity of the colloid to the sample. The percentage variance for the
ten B. megaterium spectra was calculated. A drop from 54.7% to 16.3% was observed

as the concentration of colloid increased eightfold. The more reproducible SERS
spectrum was achieved by increasing the colloid concentration. This was the result of
the high concentration of nanoparticles and aggregates that remain close to the sample
surface for a stable signal [154, 156].

Figure 15 Improvement in the reproducibility of SERS spectra as the concentration of the
colloidal solution increases from (a) 2X colloid to (b) 4X colloid, and (c) 8X colloid [154].
1.6.2.4.3 Colloid Variation
Different colloid preparation methods are available and can produce different SERS
spectra for the same analyte. Figure 16 shows the effect of different silver colloids for
the same sample where one is produced by sodium citrate reduction and the other by
sodium borohydride reduction. The different preparation methods produce
nanoparticles of varying size, shape and surface charges15 which cause the variances
seen in the spectra [154].

15 The surface charges impact the SERS signal via the strength of interaction and the proximity of the
nanoparticle to the analyte.

Figure 16 Comparison of SERS spectra of E. coli obtained using borohydride reduced and
citrate-reduced silver nanoparticles. [154]

Another limitation in SERS spectroscopy is the batch to batch variation of the SERS
substrate. Therefore all possible sources of variance should be identified and
controlled in the preparation of SERS substrates. There may be changes in the particle
size, shape of roughened surfaces and the distribution of the particles into clusters
after aggregation of a colloid [92, 157].
1.6.2.5 SERS Analysis of Cell Culture Media
In most SERS studies involving cell culture media, the identification of micro-
organisms was the principal concern rather than the analysis of the fermentation
medium [74]. Various studies discussed the SERS signal from growth media [158-
160]. These studies indicated that screening for discrimination, variability and
consistency for media was possible as was monitoring for degradation. Given that
these media generated a SERS response, it is feasible that SERS could be used as an
analysis method for blend and formulation analysis of prepared media.

The Marotta study proposed that the SERS signal seen for bacteria samples was the
residual signal from growth media. The SERS spectra for bacteria cells (EC, BC and
AH) and nutrient broth (NB) showed similarities (Figure 17). The spectra were
collected from dilute bacteria culture samples, rather than the specialised preparation
method of repeated cycles of washing and centrifugation needed for bacteria samples.
Therefore the bacteria samples tested contained both bacteria cells and a diluted
growth medium, which was shown to give a strong SERS response (Figure 17b). The
stock medium spectrum was dominated by a strong fluorescence background signal
while the diluted medium spectrum had well-defined peaks for components in cell
culture medium (Figure 17b) [158]. The Marotta study was followed up by the

Premasiri study [159] in which the SERS signal for bacterial cells and growth media
were shown to be different at a spectroscopic level. Comparison of the SERS spectra
for bacteria and growth media (Figure 18) revealed a common peak observed in
several spectra at 725–730 cm–1. The intensity of the peak varied with each particular
sample. This peak was assigned to the adenine signal from the FAD component found
in both bacterial cells and growth media [159].

Figure 17 (Left) SERS spectra of diluted Nutrient Broth (NB) and the three different bacterial
cells - (EC) E. coli, (BC) B. cereus, and (AH) A. histidinolovorans - prepared in diluted nutrient
broth. (Right) SERS spectrum of Nutrient Broth (a) concentrated stock, and (b) diluted 1:100
(v/v) [158].

Because of the subtle differences observed in the bacteria and media spectra, principal
component analysis (PCA) was needed to confirm the difference between growth
media and bacteria cells. The PCA grouped the data based on their variance and
showed that the SERS signal for growth media was different to properly washed
bacterial cells. In the scores plot (Figure 19a) for the different washes, nutrient broth,
no wash culture and the spun culture, the signal from bacteria and media were
separated. The spun culture and first wash groupings were standalone but the
subsequent three washes overlapped. The nutrient broth was clustered with the no
wash culture as they both contained large amounts of broth. In order to show that
medium had no effect on the bacterial signal, two different growth media (TSB and
LB broth) were used to grow the bacteria E. faecalis (Figure 19b). The PCA results
showed the washed bacteria samples from both fermentations were clustered together
and well separated from either growth medium. Also in Figure 19b, the effect of a
growth medium (TSB) on three different cultures was examined. TSB medium had no
effect as the PCA results showed the TSB medium cluster was well separated from
the bacteria cells clusters [159].

Figure 18 The SERS spectra of the bacterial species with their strain and growth media noted
and on the right, the SERS spectra of the various growth media [159].

Figure 19 PCA analysis results for (A) the sample preparation and (B) the separation seen
amongst the different bacterial cell and growth media [159].

The application of SERS to media analysis has potential considering the SERS
activity of many media components. SERS can then be used as a screening method
for cell culture media and its components. Changes in the sample composition cause
spectral differences [5]. For example, SERS analysis of yeast extracts gave spectra
with deviating band positions and intensities for different batches. This helped to
characterise and discriminate the yeast extracts based on identity, origin and source.
Since SERS observed changes in medium composition, it was also used in a medium
degradation study [101]. SERS was able to detect storage induced changes in

chemically defined cell culture medium. PCA of the SERS data revealed a change
within the media samples during dark storage conditions. The observed change in the
SERS spectra was caused by cysteine oxidation. This was identified as the key event,
resulting in the formation of cystine, which does not promote cell growth, unlike
cysteine. This showed that SERS can be used to detect compositional changes such as
cysteine oxidation that have an impact on sustaining optimal cell growth.
1.6.3 Fluorescence Spectroscopy
1.6.3.1 Mechanism of Fluorescence Emission
Fluorescence is light emissionthat originates from electronic states of the same
multiplicity - usually from the first excited singlet state to the ground state. The
emission rates of fluorescence are typically 10–8 s. The lifetime of a fluorophore is the
time it occupies the excited state - this can be very short, ranging from 0.5 to 100 ns
[161]. Fluorescence can be summarised as a three-stage event (excitation, excited-
state transitions and emission) and is best illustrated by a Jablonski diagram (Figure
20). The lowest horizontal lines (S0) represent the ground state energy of the
molecule, which is typically a singlet state. Each electronic state has numerous
vibrational energy levels, represented by the multiple lines.

In the ground electronic state, molecules can occupy a variety of vibrational energy
levels. At ambient temperature, most molecules are in the lowest vibrational state of
the ground energy. Occupancy of the upper vibrational states depends on the
temperature and the Boltzmann distribution [84, 162-164]. Absorption of light causes
the evaluation of molecules from the ground state into electronically excited states.
The strength of the absorbed energy determines which electronic level (S1 or S2)
becomes populated. In the excited sate, collisions cause excited molecules to lose
energy until they reach the lowest vibrational level of the excited electronic state (S2
and S1). An excited molecule exists only for a limited time (~10–8 s) as a result of
these energy reducing processes. An excited molecule can return to its ground state
(S0) through different steps. The preferred route is the one that minimises the lifetime
of the excited state [162, 163].

Figure 20 Jablonski Energy Diagram for photoluminscent systems. The lowest heavy horizontal
line (S0) represents the ground state of the molecule. The upper lines represent excited electronic
states. S1 and S2 are the first and second electronic singlet states. T1 is the first electronic triplet
state. Each electronic state has numerous vibrational energy levels, [165] adapted from [84, 162].

Figure 20 shows the various routes taken back to the ground state. De-excitation
directly back to the ground state (S0) can occur by fluorescence or internal conversion.
Fluorescence returns the excited molecule to the ground state accompanied by the
emission of light at a longer wavelength. Internal conversions are transitions between
electronic states that allow the return of an excited state to S0 without light emission.
Internal conversion is a non-radiative transition where there is no change in spin [84,
162, 163, 166]. Intersystem crossing is a non-radiative transition involving a change
in spin, for example from the singlet excited state (S1) to the triplet excited state (T1).
From the triplet excited state (T1), the de-excitation of the molecule occurs by internal
conversion or phosphorescence. Phosphorescence is the transition of an excited
molecule from a triplet excited state (T1) to the ground state (S0) with the emission of
light. Intersystem crossing is stimulated in molecules with iodine, [167] bromine
[168] or in the presence of molecular oxygen [84, 162, 163, 169, 170].

1.6.3.2 Stokes Shift and Mirror Image
The energy associated with fluorescence emission is typically less than that of
absorption. The emitted photons have less energy and are shifted to longer
wavelengths. The Stokes shift is a measure of the difference between the maximum
wavelength of absorbance and emission [171]. This Stokes shift arises from the loss
of energy from the excited species through various processes such as excited-state
reactions, energy transfer, solvent effects, and complex formations. The size of the
shift varies with environment, but can range from a few nanometres to over several
hundred nanometres.

Figure 21 Graphical representation of the absorption and emission transitions and normalized
absorption spectra (in dimethylformamide) and fluorescence spectra of quinine sulfate dication
in (a) cyclohexane, (b) diethylether and (c) dimethylformamide. Reproduced with permission
from [172].

The peaks in the absorption spectrum correspond to transitions from the lowest
ground state energy levels to different vibrational levels of the electronic excited state.
Meanwhile, the peaks observed in the fluorescence spectrum arise from transitions
from the lowest vibrational level of the excited electronic state to the different
vibrational levels of the ground state. Following absorption (see Figure 21), an excited
fluorophore quickly undergoes relaxation (yellow arrows) to the lowest vibrational
energy level of the excited state (S1). All subsequent relaxation processes –
fluorescence, radiationless relaxation, and intersystem crossing – will therefore
originate from the lowest vibrational level of the excited state (S1). Thus, the
excitation wavelength should not influence the emission spectrum. Under ideal
conditions for a single fluorophore, the mirror image effect between the emission and

absorption spectra can be observed. In terms of Figure 21, the resulting emission
spectrum at λem = 450 nm strongly resembles the absorption spectrum at Aλ = 350 nm
from the ground state (S0) to the first excited transition state (S1), but not of the entire
absorption spectrum, which may include transitions to higher energy levels (S2) at Aλ
= 320 nm [162, 163].

In complex cell culture media, there are usually multiple fluorophores and thus the
observed emission is a combination of the emission from multiple fluorophores. This
means that the mirror image rule does not hold and that the emission is very sensitive
to the excitation wavelength. Thus in multi-fluorophoric mixtures like cell culture
media, multi-dimensional methods are commonly used to collect the maximum
information [1, 2, 4, 7, 8, 160].
1.6.3.3 Multi-Dimensional Fluorescence Scan Modes
An Excitation Emission Matrix (EEM) provides a total intensity profile of the sample
over a range of excitation and emission wavelengths, revealing all the fluorescent
constituents over a given range [173]. Total Synchronous Fluorescence Scan (TSFS)
is another multi-dimensional scan mode. TSFS involves the emission and excitation
monochromators being set to scan simultaneously in such a way that a constant delta
wavelength interval is kept between emission and excitation wavelengths. A TSFS
spectrum plots the fluorescence intensity as a combined function of excitation
wavelength and delta wavelength intervals [174, 175]. In TSFS, the data is collected
along the diagonal (λem = λex + ∆λ), whereas for the EEM it is collected in lines (λex
= constant). When comparing TSFS with the EEM spectra, TSFS avoids Rayleigh
scattering that diagonally bisects the EEM profile and has a shorter acquisition time.
With EEM data, the Rayleigh scatter has to be removed computationally prior to
analysis as scatter peaks can interfere with data analysis if not effectively removed
[176]. EEMs, however, contain more information than TSFS or conventional
emission and excitation spectra [177-179].

Multi-component mixtures are better analysed using multi-dimensional data like EEM
[180, 181]. EEM and 2D synchronous (SFS) methods were compared for analysing
beer. The EEM data exhibited three bands: one at λex/λem of 250/350 nm, a second at

350/420 nm, and a third at 450/520 nm. These peaks were assigned to the aromatic
amino acids tryptophan, tyrosine and phenylalanine, as well as the vitamin riboflavin.
The 2D synchronous spectra were collected at ∆λ 30 nm and ∆λ 60 nm. Several bands
were observed in the synchronous fluorescence spectra taken at ∆λ = 30 nm. The
sharp and intense short-wavelength emission was attributed to amino acids, while the
longest-wavelength emission band belongedto riboflavin. The fluorescence was
measured directly from the beer and the data was used to quantify the amino acids and
riboflavin content [178]. Both EEM and SFS gave the same quantitative results for
riboflavin and tryptophan corresponding to RMSECV of 14% and 4% respectively.
EEM outperformed SFS for tyrosine and phenylalanine, with RMSECV of 4% and
16% compared to 6% and 31% respectively. The better performance was the result of
the full sample profile by EEM while the SFS was only a cross section of the data.

Figure 22 Three-dimensional plot of EEM for a beer sample studied along with the synchronous
fluorescence scan of beers 1 and 2 at ∆𝛌 = 30 nm. Reproduced with permission from [178].
1.6.3.4 Rayleigh Scatter Elimination in EEM data.
The Rayleigh scatter bears no relevant chemical information relating to the EEM
sample data. It is a by-product of light passing through the sample and interacting
with particles. Rayleigh scatter is increased when samples are opaque or not
completely dissolved. Rayleigh scatter reflects the clarity of the sample but not the
composition.
Three types of scatter can be encountered [176, 182]:
 Tyndall - from large suspended particles. This can be overcome by filtering
the sample prior to analysis.

 Rayleigh - from all molecules. The first order scatter is the prominent
scatterer, while a second order scatter can be seen at a wavelength double that
of the exciting light and is generally weak.
 Raman - from all molecules. This is caused by a light shift to a longer
wavelength.

For the EEM data the scatter follows a series of multiple sharp peaks along the
diagonal line in the matrix (Figure 23a). Light scatter artefact peaks can cause
problems with chemometric analysis of EEM data by interfering with qualitative and
quantitative analysis and swamping the signal. In order to avoid these problems, the
scatter peaks are eliminated as part of the pre-processing (Figure 23b) [183].

Figure 23 (a) EEM landscapes of M5eRDF and (b) the scatter corrected spectrum for the same
M5eRDF sample from this body of work.
1.6.3.5 Factors which influence fluorescence emission
1.6.3.5.1 Quenching
Quenching (collisional or static) is the process by which fluorescence intensity
decreases. Collisional quenching occurs when the excited-state fluorophores are de-
excited by contact with some other molecule. This molecule is called a quencher and
its presence results in the return of the fluorophore to the ground state. The molecules
are not chemically altered in the process. A wide variety of molecules can act as
collisional quenchers. Examples include oxygen, halogens, amines and electron
deficient molecules like acrylamide [162]. Flavins like lumiflavin, riboflavin and
300
400
500
600
300
400
500
0
50
100
150
200
250
300
Emission wavelenth (nm)Excitation wavelenth (nm)
In
te
n
s
it
y
(a)
300
400
500
600
300
400
500
0
50
100
150
200
250
Emission wavelenth (nm)Excitation wavelenth (nm)
In
te
n
s
it
y
(b)

FMN undergo collisional quenching from iodide ions as they exert their quenching
effect by spin orbit perturbation, giving rise to increased intersystem crossing [184].

Fluorophores can sometimes form non-fluorescent complexes with quenchers. This
process is referred to as static quenching since it occurs in the ground state and does
not rely on diffusion or molecular collisions [162]. Static quenching also occurs with
riboflavin following interaction with methionine or cysteine. Riboflavin and
methionine static quenching are assumed to be the result of a non-fluorescent pair
formation of a riboflavin anion and a protonated methionine cation. For cysteine, no
static quenching is observed at low pH (pH = 4). At neutral pH (pH = 7), however, the
deprotonated thiolate form of cysteine interacts with neutral riboflavin to form a
riboflavin anion by the reduction of thiolate to the thiol form. This causes both static
and dynamic quenching [185].
1.6.3.5.2 Inner-Filter Effects
Another cause of reduction in fluorescence intensity is inner-filter effects (IFEs). This
is the attenuation of the incident light by the fluorophore itself or by other absorbing
species. At high concentrations, a spectrum may be affected by IFE and
intermolecular energy transfer, causing a decrease in the fluorescence signal. Any
interfering species that absorbs at the same wavelength as the analyte decreases the
light available to excite the analyte. Also, when an interfering species absorbs at the
emission wavelength, it diminishes the number of emitted photons that reach the
detector [83]. The influence of IFE can be reduced by sample dilution or use of a
shorter excitation path-length.
1.6.3.5.3 Environmental Effects
Fluorescence is a very sensitive measurement method and thus may be affected by
several environmental factors such as pH, temperature and solvent effects [186]. In
some cases, changes in pH will radically affect the intensity and spectral profile.
Buffered solutions are recommended for increased environmental control. Fluorescent
compounds with acid and basic forms are usually dependent on the pH, because the
ionized and non-ionized fractions can have different absorption wavelengths and
emission intensities. For example, riboflavin exists in three different forms: cationic,

anionic and neutral, depending on the solutions pH value. The cationic form is non-
fluorescent, the anionic form is weakly fluorescent and the neutral form fluoresces.
The absorption spectrum differs, with each form dominating at varying pH values
[187, 188].

Figure 24 The cationic, neutral, and anionic structures of riboflavin species, R (ribityl side chain)
= -CH2(CHOH)3CH2OH, reproduced with permission from [188].

An increase in temperature generally results in a decrease in fluorescence because
collisional quenching increases with higher temperatures. This leads to more
radiationless decay processes (internal conversion/intersystem crossing). Other factors
contributing to the decline in fluorescence intensity at high temperature is the loss of
planar configuration for some molecules and the dissociation of molecular complexes
on heating [189, 190]. To control the temperature, most instruments are fitted with a
temperature controller cell holder.

Solvent molecules can also interact with excited state molecules, thus lowering their
energy. Solvent effects relates to the chemical properties of the fluorophores, solvent
and surrounding molecules. The degree of interaction increases with solvent polarity.
Polar fluorophores are highly sensitive to solvent polarity while non-polar
fluorophores remain unaffected by solvent polarity. A large spectral shift for a small
change in the solvent composition generally indicates specific solvent effects. These
effects include hydrogen bonding, preferential solvation, charge transfer interactions
and acid and base chemical reactions. Solvents containing –Br, –I or –NO2 are
undesirable because they promote fluorescence quenching with increased triplet
formation [162, 187].
1.6.3.6 Fluorophores
In biological and fermentation samples, many intrinsic biological fluorophores exist.
Examples include aromatic amino acids (tryptophan, tyrosine and phenylalanine),
vitamins (riboflavin and pyridoxine) and coenzymes (NADH, NADPH, FMN and

FAD) [80, 191]. In cell culture media, the range of fluorophores is less. For example,
in the chemically defined basal medium (eRDF), five significant fluorophores were
identified: tryptophan, tyrosine, pyridoxine, folic acid and riboflavin [1]. The
interactions between all of the various components play a large part in determining
the shape and intensity of obtained spectra. This produces a unique fluorescence 3D
profile whichcan be used to characterize cell culture media. Small changes in media
composition can cause variances in the spectral profiles observed [1, 2].

Figure 25 Typical Biological fluorophores that can be detected with the use of an excitation-
emission matrix (EEM). Reproduced with permission from [191].
1.6.3.6.1 Fluorescent Amino Acids
The aromatic amino acids L-tryptophan (Trp), L-phenylalanine (Phe) and L-tyrosine
(Tyr) exhibit intrinsic fluorescence. The absorption maximum for phenylalanine is
260 nm and for tyrosine and tryptophan, it is 280 nm [192]. Phenylalanine in proteins
typically has a quantum yield16 of 0.03 so its emission is very weak and only occurs in
the absence of tryptophan and tyrosine. The fluorescence maxima of tryptophan and
tyrosine occur at 350 nm and 310 nm, respectively. Tryptophan and tyrosine have
similar fluorescence efficiencies in water but the higher extinction coefficient of
tryptophan results in its stronger fluorescence. Tryptophan can be selectively excited
at 295 – 305 nm which minimises both the tyrosine signal and thus spectral overlap.
In some samples, tryptophan fluorescence dominates because of its large extinction
coefficient and absorbance at a longer wavelength [162, 193, 194].

16 The quantum yield or efficiency (QE) for fluorescence is the ratio of the total number of emitted
molecules to the total number of excited molecules.

1.6.3.6.2 Vitamins and Co-enzymes
Riboflavin (Vitamin B2) is a precursor in the biosynthesis of two co-enzymes flavin
mononucleotide (FMN) and flavin adenine dinucleotide (FAD). Riboflavin is used in
energy metabolism via FMN and FAD [195]. Riboflavin, FMN and FAD absorb light
from the visible region at around 450 nm with a fluorescence emission maximum at
~520 nm. For riboflavin, the quantum yield is 0.26 at pH 7. The fluorescence yield is
the same for the FMN but lower for FAD. FAD fluorescence is weakened by the
presence of adenine [162]. Other fluorescent vitamins include Vitamin B1 (thiamine),
Vitamin B6 (pyridoxine) and Vitamin B9 (folic acid). Thiamine needs to be oxidised
to its fluorescent by-product thiochrome to enable detection. Thiochrome can be
excited at ~360 nm and emits at ~450 nm. Pyridoxine fluoresces at ~395 nm after
excitation at ~320 nm. For folic acid, excitation occurs at ~370 nm and emission is
seen at ~430 nm [179, 196, 197].

Reduced nicotinamide adenine dinucleotide (NADH) and reduced nicotinamide
adenine dinucleotide phosphate (NADPH) both fluoresce at 460 nm after excitation at
360 nm. NADP, the phosphate ester of NAD, oxidises alcohol to aldehydes or
ketones. The reduced form, NADPH, reduces carbonyl compounds to alcohol [162,
198]. Under anaerobic conditions, the NADPH signal increases from the build-up of
the fluorescent NADH as the NADH compound is not being oxidised to form the non-
fluorescent NAD+.
1.6.3.7 Bioprocesses Monitoring and Control using Fluorescence
Spectroscopy
The goals of any fermentation process are high product yield, constant product
quality, optimisation and control. Fermentations are currently monitored using oxygen
and carbon dioxide levels, pH, redox potential and conductivity probes [199, 200].
However, in the rapidly expanding bioprocess industry, there is a need for more real
time compositional information. Online monitoring of fermentation processes
involves performing real time monitoring of biomass, substrate and product levels.
Optical sensor measurements such as fluorescence are ideal for bioprocess
monitoring. They are non-invasive with online and in situ sampling reducing the risk
of contamination and erroneous measurements during sampling [191, 201-204]. One

drawback is that complex mixtures with multiple fluorophores are misrepresented by
single emission spectra. However, the use of multi-wavelength fluorescence gives a
three dimensional landscape for each sample allowing several different fluorophores
to be monitored [199, 205, 206]. Fluorescence monitoring based on the biological
fluorophores from the cell culture medium means that the physiological state, cell
mass and the production level of biological products can be followed online and non-
invasively. Different fluorophores can be correlated to different parameters within the
cell culture medium. The fluorescence profile of a fermentation process can be
measured and used to verify subsequent batches and identify different phases [165,
207].

The main challenges facing in-line measurements are the physical attributes of the
fermentation process. Stirring rates and aeration procedures generate variations in
signals. The stirring rates can decrease the signal to noise ratio (SNR), while air
bubbles interfere with measurements. In general, signals from in-line analysis exhibit
a lower SNR than off-line measurements. In-line spectral data thus requires
smoothing to reduce the impact of noise [206]. The time requirements for data
analysis may affect the real-time output as the information rich spectra require
chemometrics to interpret and evaluate data. This is part of the model development to
be carried out prior to setting up automated analysis [208].
1.6.3.8 Monitoring Cell Concentration and Process Variables
using Single Component and Multiple Component
Measurements
Initial fluorescence monitoring of bioprocesses was through the measurement of
NADPH fluorescence [162, 209, 210]. In bioprocess broth analysis, a correlation was
observed between the fluorescence signal of NADH and the status of the
fermentation. The fluorescence signals from the NADH component as well as the
biomass signal were relatively constant during the exponential growth phase. NADH
is a good indicator of cell density and the signal can be correlated to the metabolic
switches, substrate conditions and oxygen limitations. In Figure 26 the behaviour of
NADH (excitation/emission wavelength (λex/λem) of 350/450 nm) can be seen through
changes in the intensity. The NADH signal is sensitive to the dissolved oxygen

concentration. This NADH fluorescence correlation to oxygen concentrations is more
sensitive than the dissolved oxygen electrode, with the fluorescence signal reaching
90% response in 1 s while the dissolved oxygen electrode required 1min [80, 209,
211].

Figure 26 (a) Spectrum of a starved Yeast suspension at aerobic conditions; (b) spectrum of a
starved Yeast suspension at anaerobic conditions; (c) difference spectrum of a starved Yeast
suspension (anaerobic–aerobic). Reproduced with permission from [211].

The limitation of single compound monitoring arises when other fluorophores overlap
with the signal of interest making signal interpretation difficult. For example,
excitation at 365 nm generates NADPH and riboflavin emission overlap. This overlap
can be avoided with specific excitation wavelength selection. For NADPH, excitation
at 334 nm eliminates the riboflavin emission; meanwhile, for riboflavin, excitation at
404 nm eliminates the NADPH signal [205, 208]. Alternatively, using EEM or TSFS
allows one to simultaneously monitor several biogenic fluorophores such as vitamins,
coenzymes and amino acids, in order to give a clearer picture of the cellular activities
[206]. Different fluorophores can be more sensitive to different process aspects [199,

205, 209]. The most appropriate fluorophore to use for monitoring product
production, nutrient depletion and by-product build up will depend on the particular
fermentation [205].

For example, Li et al. studied fluorophoric behaviour of NADPH, tryptophan,
pyridoxine and riboflavin in three different yeast fermentations. The fluorescence
signal was recorded and correlated to different cellular processes. In the baker’s yeast
on glucose based medium,the tryptophan signal proved optimal for monitoring cell
concentration, while the pyridoxine signal closely followed the cellular activity. In the
Candida utilis on ethanol based medium, the tryptophan signal again proved optimal
for monitoring cell concentration, while the NADPH was a good indicator for cellular
activity. In the S. cerevisiae RTY110/pRB58 fermentation growing on a glucose-
nitrogen based medium, both pyridoxine and NADPH tracked cell concentration but
the pyridoxine signal was stronger. For all systems, riboflavin gave a weak signal
[205].

Another way of analysing cell culture fluorescence was to examine the culture signal
as a whole. Multicomponent fluorescence analysis was performed on P. pastoris
batch culture, which contained NADPH, tryptophan and riboflavin. Prediction models
for biomass concentration were built using the fluorescence signal of the combined
three fluorophores. The combined fluorescence signal offered a more robust measure
than a single fluorophore. The strength of the model was dependent on the overall
signal and the interplay of fluorophores with the process variable [75, 206, 210, 212].
1.6.3.9 Fluorescence Analysis of Cell Culture Media
Raw materials have a within specification composition; changes to the composition
are important to note for process reliability as these changes may have a negative
impact on the process. Media screening is therefore a huge area of potential research.
The ability to determine the efficacy of media before use improves process efficiency
by ensuring the use of reliable starting material leading to consistent product yield. It
can prevent financial losses by determining poor performing media.

From previous work on cell culture media using fluorescence [1, 2, 4, 213], EEM data
has proven a suitable method for media screening and evaluation (i.e. rapidly
identifying different types of media and determination of the sample quality). EEM
data for seven different types of media were collected and the different media lots
were easily classified based on the spectroscopic profiles (Figure 27) using NPLS-DA
scores plot. Scores describes the variation between samples, giving a visualisation
representation of how the samples related to one another. Any significant
compositional changes which cause measurable differences in the spectroscopic data
are represented by changes in scores. Scores plots allow one to easily visualize sample
differences. This meant that different media samples were easy to identify despite
appearing visually similar. Even identical media samples that were changed by
different CHO based productions (A and L process) were separated based on their
slightly different in-process media compositions [213].

Figure 27 Scores plot of LV3 versus LV1 showing NPLS-DA discrimination of CD-A1, CD-A2,
CD-S2, insulin, eRDF, yeastolate, and phytone sample solutions. Reproduced with permission [4].

Changes in the composition within batches of media samples can impact the process;
therefore, it is necessary to determine the quality of the media samples. Chemometric
methods like MROBPCA and MANOVA can be used to identify outliers and define
class variance [4, 213]. MROBPCA identifies outlying samples based on
compositional changes. For example 22 yeastolate samples were tested in triplicate by
EEM. The MROBPCA outlier map identified 14 major and 5 minor outliers. The
spectra of the major outliers displayed either a higher or lower than normal
fluorescence compared to the main body of samples. The minor outliers were singular
events due to experimental error [5].

In conjunction with MROBPCA, MANOVA was used to calculate the class variance
of groups identified. The use of MANOVA can be useful for comparing media over
time. Ryan et al. observed changes to the class variance of media samples with time
indicating that minor compositional changes were occurring within the media during
storage. This can be an issue if the changes become significant and impact the
performance of the media [213].

Media consistency is critical for production efficiency in industrial biotechnology.
Identification of unwanted media changes is thus very important and fluorescence
EEM can be used to identify these changes. Medium degradation such as photo-
damage from improper storage17 can reduce the efficacy of medium. Photo-damage of
medium occurs when light sensitive components react, degrade or form photo
products (i.e. riboflavin degradation leading to the formation of lumichrome). This
causes changes in the EEM profile and these changes correlate to the photo-
degradation of specific media components and the formation of photo-induced by-
products. Comparison of a chemically defined eRDF media stored under different
environmental conditions (light/dark) over 30 days is shown in Figure 28. When
exposed to light, a signal decrease in the EEM profiles of tryptophan, tyrosine, and
pyridoxine was observed whereas an increase was seen in the signal associated with
photo-induced by-product. When stored in the dark the scores for fluorescence
components remained at a constant level over the testing period indicating no change
in the media [2].

17 Media can be stored in transparent bioreactors, media storage vessels, or single-use disposable
bioreactors.

Figure 28 The PARAFAC scores are shown for the two different storage conditions: (left) RT-L
and (right) RT-D. Components 1 to 4 are represented by blue squares (Trp), green inverted
triangles (Tyr), red circles (Py), and cyan upright triangles (FA/Rf and/or photo-products)
respectively. Reproduced with permission from [2].

EEM profiles are sensitive to compositional changes in the media. Therefore, the
EEM data was also tested for quantitative purposes by modelling the changes incurred
by varying analyte concentrations in order to predict the quantity of that analyte
within test samples. Calvet et al. developed a modified standard addition method for
the determination of tryptophan and tyrosine in eRDF media solutions [1] which was
later expanded to pyridoxine, riboflavin and folic acid [2]. eRDF samples produced
relatively complex spectra with strong fluorescence from tyrosine and tryptophan and
weaker contributions from pyridoxine, riboflavin and folic acid. Apart from
quantifying the media components within prepared medium, the best performing
models (tyrosine, tryptophan and riboflavin models) were applied to quantify these
analytes in stored eRDF media samples as they degraded. The tyrosine model failed
due to the dynamic changes seen in the stored media samples. However, the
tryptophan and riboflavin models compared well with the equivalent HPLC result,
which also confirmed changes in the tryptophan and riboflavin concentrations [1, 2].
These quantitative results coupled with the effective qualitative analysis show that
EEM fluorescence is a potential method for a wide-range of analytical tests for cell
culture media.
1.7 Project Objectives
This project sought to use spectroscopy (Raman, SERS, and Fluorescence) for
analysis (qualitative and quantitative) of complex cell culture media components in a

liquid environment. The use of different spectroscopic methods and the correlation
between the spectroscopic signals and ingredient concentrations was produced to aid
in the development of quality control methods for medium formulation analysis:
 By providing robust analytical methods for the accurate quantification of
ingredients in prepared cell culture medium.
 By providing a quality assurance tool in biotechnology with spectroscopic
variance analysis in conjunction with ingredient quantification.
Media was prepared based on an industrial recipe for the formulationof basal
medium. It was a five component medium of D-glucose, L-glutamine, D-galactose,
yeastolate and eRDF and was examined for quantification of D-glucose, yeastolate
and eRDF. The assessment of medium was based on multivariate analysis of different
types of spectroscopic data - Raman, SERS, and Fluorescence.

2 Chemometrics, Materials and Methods
2.1 Chemometrics
During the 1970’s the development of chemometrics coincided with the emergence of
the personal computer and the increased use of computers in chemistry. Modern
instrumentation generates vast amounts of numerical data and the examination of such
data was limited until the introduction of computer based analysis [214]. A definition
of chemometrics is “the chemical discipline that uses mathematical and statistical
methods (a) to design or select optimal measurement procedures and experiments and
(b) to provide maximum chemical information by analysing chemical data”. In other
words, computer based statistical analysis of chemical data [215].
2.2 Qualitative and Quantitative analysis
Chemometrics involves the establishment of relationships between different variables
and the development of suitable mathematical models for descriptive and predictive
purposes [216]. What analyte is present and how much? These types of questions can
be answered using qualitative and quantitative analysis methods. Qualitative analysis
is the identification of an analyte or analytes in a sample. Quantitative analysis is

commonly the determination of the amount of an analyte/analytes a given sample
[217]. In this work, spectral and sample variance were evaluated using Principal
Component Analysis (PCA) for 2D and 3D data, while Robust Principal Component
Analysis (ROBPCA) and Parallel Factor Analysis (PARAFAC) were utilised for the
3D data only. For quantifying components in the data, the Partial Least Squares (PLS)
regression method was used for 2D data and Unfolded Partial Least Squares (UPLS)
regression method was used for the 3D data.
2.3 Calibration Modelling
The objective of calibration modelling is to develop a statistical model that can be
used for the prediction of dependent variables from the numerical values generated by
at least one analytical measurement. A simple univariate calibration model utilises a
single independent response variable X, such as an intensity or absorbance at a single
wavelength to predict the dependent variable Y [218].
The simplest form of a linear calibration model is
Equation 2-1
y𝒊 = 𝑏1𝑥𝑖 + 𝑒𝑖
Where 𝑦𝑖 represents the concentration of the ith calibration sample, 𝑥𝑖 refers to the
corresponding instrument measurement, 𝑏1 stands for the correlation coefficient (a
measure of the slope of the line), and 𝑒𝑖 stands for the error associated with the ith
calibration sample (the error is assumed to be normally distributed) [218, 219].
Values in yi and xi are used to estimate the model parameter b1 by the least squares
procedure. The least-squares estimate of 𝑏1 (�̂�) is calculated by
Equation 2-2
�̂�1 = (𝑋𝑇X) −1𝑋𝑇Y
The “b-hat” character �̂�1 is referred to as signifying the estimate of 𝑏1. The resulting
linear calibration model is developed using Equation 2-1 to predict the concentration
of analyte for an unknown sample �̂�𝑢𝑛𝑘.
Equation 2-3
�̂�𝑢𝑛𝑘 = 𝑋𝑢𝑛𝑘 �̂�1
Where 𝑋𝑢𝑛𝑘 denotes the response signal for the unknown sample, measured at the
calibrated wavelength [220].
A disadvantage of univariate calibration is that a single independent (X) variable can
be insufficient to explain the variation in the dependent variable (Y). This problem can

be overcome by the use of multivariate calibration. Multivariate calibration utilises
several explanatory variables like spectra to predict dependent variables. Thus
multivariate models have increased stability in the prediction of model parameters. It
also corrects for their interfering effects.
A multivariate calibration model can be expressed in a linear form as:
Equation 2-4
𝑌 = 𝑋𝛽 + 𝑒
Where Y is a vector of the measured responses for I objects, X is a (I x K) matrix of
measured spectra for the I objects, 𝛽 is a vector of regression coefficient and 𝑒 is a
vector of the residuals of the linear regression model [214, 220, 221].
2.4 Figures of Merit for Modelling
It is necessary to accurately judge the ability of models to predict unknown samples.
To assess the overall quality of multivariate models, one evaluates the correlation
coefficient or the model’s associated error. Every measurement has an error and the
estimated parameters show the deviations from the true value [218, 222]. The most
common figures of merit for estimating error in chemometrics are the root mean
square error of calibration (RMSEC), root mean square error of prediction (RMSEP)
and root mean square error of cross validation (RMSECV) [216].
2.4.1 Correlation Coefficient (r2)
The correlation coefficient is a measure of the strength and direction of the
relationship between the measured and predicted variables. The correlation
coefficient, r2, is defined as:
Equation 2-5
𝑟2 =
(∑ (𝑁
𝑖=1 �̂� − 1
�̂�

) ∗ (𝑦1 − �̅�))
(∑ (�̂� − �̂� 1

) ∗𝑁
𝑖=1 ∑ (𝑦𝑖
𝑁
𝑖=1 − �̅�))

Where �̅� is the mean of the known y values and �̂̅� is the mean of the model estimated
y values. Correlation coefficients range from -1.00 to +1.0018 [220, 223].

18 A correlation coefficient of 1.00 indicates a positive fit i.e. the relationship between obtained and
predicted variables follow a similar pattern. In the case of the correlation of zero it represents no
relationship between the obtained and the predicted. A correlation coefficient of -1.00 indicates a
negative fit amongst the different variables. It occurs when there is an increase in one variable and a
decrease associated with another variable.

2.4.2 Root Mean Square Error of Calibration (RMSEC)
The RMSEC describes the measure of uncertainty between the estimation obtained
for the calibration samples and the accepted true values of the calibration samples
used to obtain the model parameters in y = xb1 + e according to:
Equation 2-6
𝑅𝑀𝑆𝐸𝐶 = √
1
𝑛 − (𝑚 − 1)
∑(𝑦𝑖
𝑛
𝑖=1
− 𝑦�̂�)2
Because estimating the model parameters uses m+1 degrees of freedom, the
remaining 𝑛 − (𝑚 − 1) degrees of freedom are used to estimate RMSEC. Typically
RMSEC generates overly optimistic values. This is a result of the internal error
estimation. The samples themselves are used to calculate the error; therefore
measurement noise is also modelled in the estimated parameters [218, 224].
2.4.3 Root Mean Square Error of Cross Validation
(RMSECV)
Cross validation can be used to estimate the predictive ability of a calibration model.
It is based on systematic re-sampling of all data present for estimating optimal model
choice and associated error:
Equation 2-7
𝑅𝑀𝑆𝐸𝐶𝑉 − 𝐿𝑂𝑂 = √
1
𝑛
∑(𝑦�̂�𝐿𝑂𝑂 − 𝑦𝑖)
2
𝑛
𝑖=1

The leave one out cross validation (LOOCV) and Monte Carlo cross validation
(MCCV) methods are two different strategies used for error estimating. LOOCV is
performed by generating n calibration models, where each of the N samples is left out
one at a time. In each case the omitted sample is analysed by the model. The
prediction values are averaged giving the estimate of the prediction ability. A pitfall
of LOOCV is the internal nature of the prediction, leading to an overly optimistic
outcome. While the LOOCV approach is often necessary when only small numbers
of calibration samples are available. However, when multiple samples are removed,
the resulting validation is more accurate. With MCCV, the sample set is randomly
split many times into training (calibration) and validation sets. For each split,55

validation is performed ultimately giving an averaged MCCV value from a large
number of random splits [8] [218, 224].
2.4.4 Root Mean Square Error of Prediction (RMSEP)
Validation is performed by either internal or external sampling. Internal validation test
sets are prepared by setting aside some of the available samples and using them to
estimate the model performance. Cross validation is another form of internal
validation where systematic resampling of the available data is performed to test the
model. External validation is carried out using a second independent test set that has a
similar range to the current calibration set. An external validation set provides an
independent assessment of the predictive power since the data used for validation is
different than the one used to build the calibration model [225].

The RMSEP is also known as root mean square error of validation (RMSEV). It is a
measure of the uncertainty that can be expected in future predictions. With RMSEP, a
set of validation samples (test set) are prepared and measured independently from the
calibration samples. The number of validation samples p should be sufficiently large
so that the estimated prediction error accurately reflects all sources of variability
within the calibration method [218, 226]. The RMSEP is computed by:
Equation 2-8
𝑅𝑀𝑆𝐸𝑃 = √
1
𝑝
∑(𝑦𝑖 − 𝑦�̂�)2
𝑝
𝑖=1

The RMSEP judges the prediction ability of the model and indicates if the number of
latent variables used is correct. The RMSEP has the same units as the validation
samples [110].
2.5 Multivariate Analysis
Multivariate analysis methods like PCA and PLS have proven useful in
biopharmaceutical analysis as they handle multidimensional data and variation (such
as experimental error and noise) caused by the changing environment [78, 227]. PCA
can be used to characterise raw materials and analyse the variability within samples,

batches and process variables, while PLS can be used to investigate the correlation
between spectral data and quantitative properties such as product yield [228]. Using
multiple analysis methods allows for a more detailed exploration of the process to be
carried out [229].
2.5.1 Variance Analysis
2.5.1.1 Principal Component Analysis
Principal component analysis (PCA) is a statistical technique that linearly transforms
an original set of variables into a substantially smaller set of uncorrelated variables
that represent the information in the original variables. Its goal is to reduce the
dimensionality of the original dataset, making it easier to understand [230]. One of the
main reasons for the use of PCA resides in the enormous amount of data that is
generated by modern techniques. For example, a typical Raman spectrum contains
500 to 2000 data points [218, 231].

The PCA algorithm reduces the number of variables and the information is projected
onto a smaller number of significant variables, the so-called principal components
(PCs). The principal components are linear combinations of the original variables and
are selected so that the first principal component covers as much of the variation in
the data as possible. The second principal component is orthogonal to the first and
covers as much of the remaining variation as possible and so forth [206].
The mathematical model for the PCA method is as follows:
Equation 2-9
𝑋 = 𝑇𝐴𝑃𝐴
𝑡
𝑡
+ 𝜀
Where T is an N by A matrix containing the scores of the PCs and P is an M-by-A
matrix containing the loadings of the PCs and the ɛ matrix contains unexplained
variance. The scores are the intensities of each of the new compressed variables for all
of the samples and contain information on how the samples relate to each other. The
loadings are the distributions of the new variables in terms of the original variables
and include information on how variables relate to one another.

Figure 29 PCA plot is shown where the blue circles represent the scores of the sample after PCA
analysis. The major axis of the ellipse represents the first principal component, PC1, and its
minor axis the second principal component, PC2.

Orthogonality can be described as two vectors being completely uncorrelated with
one another. The scores are orthogonal to each other, for example the scores of PC1
are unrelated to the scores of PC2. A consequence of orthogonality of the principal
components is that the issue of correlation between X variables is completely
eliminated if one chooses to use principal components instead of original X variables.
If the number of principal components examined is the same as the number of original
X variables, then 100% of the variance in the data is explained. Data compression
occurs when the user chooses a number of principal components that is much lower
than the number of original variables. This necessarily involves ignoring a small
fraction of variation in the original X data [218, 231-233].
2.5.1.2 Robust Principal Component Analysis (ROBPCA)
The goal of robust PCA methods is to obtain principal components that are not
influenced by outliers. The ROBPCA19 method combines ideas of projection pursuit
(PP) and robust covariance estimation [234].

For high dimensional data, where the number of variables is greater than the number
of samples, ROBPCA proceeds as follows: The X data is pre-processed by reducing
its data space to the linear transformed subspace using singular value decomposition.
The dimension of this subspace is at most N-1, where N is the number of samples. A

19 Developed by Hubert et al. in 2005

measure of “outlyingness” is computed for each data point obtained within the new
data space by projecting high dimension data points in many univariate directions. For
every direction a robust centre and scale is computed for the projected data points.
Each direction is scored by its corresponding value of “outlyingness”:
Equation 2-10
𝑜𝑢𝑡𝑙(𝑋𝑖) =
𝑚𝑎𝑥
𝑣
|𝑋𝑖
𝑡𝑉 − 𝑚𝑀𝐶𝐷(𝑋𝑖
𝑡𝑉)|
𝑠𝑀𝐶𝐷

Where 𝑋𝑖
𝑡𝑉 is the standardised distance to the centre measured for each data point,
location 𝑚𝑀𝐶𝐷 and scale sMCD are the univariate minimum covariance determinant
(MCD) estimators and V is the number of univariate directions. Using the data points
with the smallest “outlyingness” to form a covariance matrix, the final number of
principal component K is determined. The data points are projected onto a K
dimensional subspace of which the centre and shape is determined by means of a
reweighted MCD estimate. From this the robust principal components are known and
the robust centre is the MCD location estimate [234-236].
2.5.2 Regression
2.5.2.1 Partial Least Squares Regression
Partial Least Squares regression (PLS) was developed in the 1960s by H. Wold [231].
It has become a highly utilised regression method in the chemometric toolbox. PLS is
a standard method for building multivariate regression models to predict different
parameters from complex samples. The reason for the success of PLS is the
applicability to various types of data, the ability to handle non-linear data and the
development of software which has aided in the interpretation of and visualisation of
the PLS results. There are many different types of PLS algorithms which include
PLS1, PLS2, Moving Window Partial Least Squares (MWPLS) and Unfolded PLS
(U-PLS). In PLS1, a separate calibration model is built for each column in the Y data.
In PLS2 mode a single calibration model is constructed for all columns on the Y data
simultaneously. MWPLS and U-PLS will be explained further, later in the text [237,
238].

PLS regression uses exactly the same mathematical model for compression of X and Y
data. The data matrix X is decomposedinto a matrix of scores T and loadings P and
the response matrix Y is also split into a matrix of scores U and loadings Q:
Equation 2-11
𝑋 = 𝑇𝑃𝑇 + 𝐸
Equation 2-12
𝑌 = 𝑈𝑄𝑇 + 𝐹
The goal of PLS regression is to model all the variables within X and Y in order that
the error in X block, E, and the error in Y block, F, is minimised. The least squares
regression is performed between U and T. An internal correlation is built that relates
the scores of the X block to the scores of the Y block in terms of the maximum
covariance between X and Y:
Equation 2-13
𝑈 = 𝑇𝑊
This is followed by the overall regression step where the decomposition of X is used
to predict y.
Equation 2-14
�̂� = 𝑋�̂�
The regression coefficients are given by
Equation 2-15
�̂� = 𝑃(𝑃𝑇𝑃)−1𝑊𝑄𝑇
2.5.2.2 Unfolded Partial Least Squares (U-PLS)
Unfolded Partial Least Squares (U-PLS) involves the application of PLS to matrices
which have been unfolded into a two-way structure. UPLS is useful in fluorescence
spectroscopy, where EEM and TSFS are forms of 3D data (KxIxJ) [239, 240].

Figure 30 The unfolding scheme for multi-dimensional array into KxIJ slices, adapted from [241]
with permission to reproduce.

Before performing PLS regression, it is possible to unfold the 3D data into a 2D
matrix. Unfolding is the conversion of the three-way data matrix into a stack of two-
J3 …… K

K
J
I
Matrix
(X)

I
J1 J2
I I

way data where simpler mathematical models can be applied. During the unfolding
process, one of the directions remains unchanged while the other two are arranged
slice by slice to give a row vector. A cube (KxIxJ) can be unfolded in three different
directions: row wise (KxIJ), column wise (IxKJ), and tube wise (JxKI). After
unfolding an EEM matrix, the 2D matrix will have the following dimensions Kx(I*J)
where K is the number of samples and I is the number of excitation wavelengths and J
the number of emission wavelengths. PLS regression analysis is performed on the
rearranged two-way data [218, 219, 242-244].
2.5.3 Factor Analysis
2.5.3.1 Multivariate Curve Resolution Alternative Least Squares
(MCR-ALS)
The goal of curve resolution and factor analysis is to mathematically decompose
sample signals into the underlying profiles of each component. The multivariate curve
resolution method describes the bilinear decomposition of the matrix D. the MCR-
ALS model can be written as
Equation 2-16
𝐷 = 𝐶𝑆𝑇 + 𝐸
It decomposes a bilinear spectral data matrix, 𝐷(I×J), into two matrices; 𝐶(I×K),
which contains the relative concentration profile of each component in different
samples and 𝑆(J×K), which contains the true spectral profile of each component,
where I is the number of samples, J is the number of wavelengths (i.e. the wavelength
range over which the spectra were collected), and K is the number of components or
factors.

Figure 31 Scheme of steps for the resolution process in MCR-ALS method, adapted from [245]
with permission to reproduce.

In MCR, to start the iterative ALS process, an initial estimation of the factors present
in the spectral profiles for each sample is performed by methods like PCA. With the
initial estimation, solving for both 𝐶 and 𝑆𝑇 least squares solutions can be
implemented in an alternating cycle, with iterations giving a new 𝐶 or 𝑆𝑇 matrix.
The calculation of 𝐶 and 𝑆𝑇 are repeated until an optimal solution is obtained or
convergence is achieved. Constraints may be imposed on the profiles, for example
non-negativity where the spectra or concentration values cannot be negative. The
MCR-ALS method works with trilinear and non-trilinear data sets. A trilinear
structure can be set as an optional constraint in the MCR-ALS method for the C
matrix [218, 245-247].

2.5.3.2 Parallel Factor Analysis (PARAFAC)
For the analysis of the EEM data, one can also use PARAFAC as a decomposition
method in order to resolve the fluorescence landscape into a number of trilinear
components f, which, in theory, could represent the excitation and emission spectra of
the constituent fluorophores. In PARAFAC, multi-way data are decomposed into sets
of scores and loadings with the same number of factors identified by the model. The
numbers of factors or latent variables are much lower than the number of original
variables making visualisation of the data possible. PARAFAC uses all the original
variables to determine the set of latent variables [200, 248, 249].
The objective for PARAFAC is to build a model that minimises the sum of the
residual 𝑟𝑖𝑗𝑘 present:
Equation 2-17
𝑋𝑖𝑗𝑘 = ∑ 𝑎𝑖𝑓𝑏𝑗𝑓𝑐𝑘𝑓
𝐹
𝑓=1
+ 𝑟𝑖𝑗𝑘
Where 𝑋𝑖𝑗𝑘 is an element of the three-way data and i, j, and k are the indices of this
element on the sample, excitation and emission modes. The fluorescence landscape is
decomposed into sample scores, 𝑎𝑖𝑓, excitation loadings, 𝑏𝑗𝑓 , and emission loadings,
𝑐𝑘𝑓 , for each factor f. The residual 𝑟𝑖𝑗𝑘 contains the variation not captured by the
PARAFAC model [248, 250-252].

Figure 32 Graphical representation of a two component PARAFAC model. A three-way array
𝑿𝒊𝒋𝒌 is expressed as the sum of two trilinear components and three-way array residual 𝒓𝒊𝒋𝒌.
Reproduced with permission from [250].

The core consistency gauge is a method for finding the correct number of components
to use in PARAFAC modelling [253]. In an ideal PARAFAC model, the core array
has ones on the super-diagonal, and zeros elsewhere, implying that no interactions
occur between the components from different modes for the PARAFAC model. A
core array is calculated from the loadings for each component in the model and
compared with the ideal PARAFAC core array of zeros and ones. The optimum
model is computed when the number of components comes together with an

acceptable core array. Core consistency can be increased by lowering the number of
components [254]. The core consistency is the relative sum of squared differences
between the core array and the array of super-diagonal core of ones. Core consistency
provides a quantitative measure of how well the loadings represent variation within
the data. It is generally expressed as percentage and if the percentage is close to 100,
the model gives an appropriate description of the data. In cases of a low core
consistency percentage, the model is not describing the data [255, 256].
2.6 Data Pre-Processing
The purpose of pre-processing is to remove or minimise unwanted variation which is
not related to the analyte of interest. Spectral variations like light scattering, baseline
offset and suppressed analyte signal can be corrected by pre-processing. The correct
selection and implementation of data pre-processing can result in more accurate and
robust chemometric models. Listed below are some of the most commonly
implemented methods and the reasons why they might be used [220, 232, 257, 258].
2.6.1 Mean Centring
Mean centring focuses on the variation in responses by removing the absolute
intensity information (mean response) from each variable. Mean centring involves
calculating the average response for each variable in a dataset then subtracting the
averaged response from the each variable. The pre-processed data can be transformed
back into the original data by adding the mean response to data [220].
2.6.2 Derivatives
Derivatives act as a frequency scaling tool and high pass filter. Derivative pre-
processing minimises lower frequency features such as sloping baseline and retains
the high frequency aspects of the original data like the Raman peaks (Figure 33). A
drawback to derivative pre-processing comes from the frequency response function
used in polynomial smoothing which can introduce distortions into the data. Also thefiltering nature removes substantial amounts of signal, producing a lower signal to
noise ratio in the data [218]. The first order derivative effectively removes the
baseline offset variation in the spectral profiles and is useful where the samples
exhibit a baseline shift. The second derivative removes differences in baseline offset

and baseline slope. In the case of a complex spectrum, the use of a second order
derivative can make spectral interpretation more difficult. However, for a low signal
spectrum, the second order derivative enhances the signal [220, 232, 259, 260]. In this
body of work, the Savitzky-Golay derivative algorithm20 is used [261]. The S-G
algorithm fits individual polynomials to filter windows around each point across the
spectrum. This continues until it reaches the end of the spectrum. It requires the
selection of the size of the window, the order of the polynomial and the order of
derivative [220, 232, 259, 260].

Figure 33 (Left) Raman spectra of M1GLU data (this work) and (Right) the first order derivative
following Savitzy-Golay smoothing of M1GLU data (this work).
2.6.3 Multiplicative Scatter Correction (MSC)
MSC aims to eliminate the additive and/or multiplicative effects; this can include
differences in baseline offsets and slope changes and non-linearity in the spectra
[262]. It removes effects caused by sampling variation such as sample thickness,
sample packing, focal depth, sample temperature and possible water evaporation.
Figure 34 shows the before and after spectra where baseline offset has been removed.
The MSC method is performed using a linear regression model of each spectrum
against a reference spectrum. The mean spectrum for the dataset is generally used as
the reference spectrum. The least squares coefficients are calculated and then these
values are used to calculate the MSC corrected spectrum [218, 220, 232, 263].

20 There are alternative derivative methods such as finite differences and Norris-Williams (NW)
derivation. The former is sensitive to high frequency noise while NW is less applicable to
spectroscopic data than Savitzky-Golay.
500 1000 1500 2000 2500 3000
0
0.5
1
1.5
2
2.5
3
3.5
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-1500
-1000
-500
0
500
1000
1500
2000
2500
Wavenumber (cm-1)
In
te
n
s
it
y

Figure 34 (A) Un-processed NIR spectra and (B) after multiplicative scatter correction of wheat
samples.[264]
2.6.4 Normalisation
Normalising methods attempt to overcome changes in the data due to fluctuations in
absolute intensity due to instrument or measurement factors e.g. light source variation
may be overcome by identifying a feature present in the sample/spectra that should be
constant from one sample to the next and correcting the variables to scale based on
this characteristic. A good normalisation method minimises the variance between
spectra and maximises the signal for classification/discrimination purposes. Full
spectrum normalization captures the general characteristics of the data such as scaling
to the area under the curve. On the other hand, local normalization methods polarise
the spectra such as scaling to a known peak which may be useful with varying noise
levels [265-267].

Below are the equations for the normalisation methods used in this study. [218, 232]
In each case, 𝑋𝑖,𝑛𝑜𝑟𝑚 is the normalized spectrum; 𝑋𝑖 is the spectrum of the ith sample,
𝑋𝑖
∗ is the vector of observed values for the given normalisation; j is the variable
number and n is the total number of variables.
In area normalisation, (Norm1), each variable is scaled to unit area under the curve
equal to one. This is achieved by dividing each variable by the sum of the absolute
value of all variables from the given samples.
Equation 2-18
𝑋𝑖,𝑛𝑜𝑟𝑚1 =
𝑋𝑖
∑ |𝑋𝑖
∗
|𝑛
𝑗=1

With Norm2 normalisation, each variable is divided by the sum of the squared values
of all variables for the given sample. Norm2 returns a vector of unit length (length

equal to one). It is a form of weighted normalisation where larger values are weighted
more heavily in the scaling.
Equation 2-19
𝑋𝑖,𝑛𝑜𝑟𝑚2 =
𝑋𝑖
∑ 𝑋𝑖
∗2𝑛
𝑗=1

For the infinity normalisation mode (maximum norm - NormINF), each variable is
divided by the highest peak observed for all variables of a given sample, giving a
vector scaled to a maximum value equal to one. Therefore all variables are weighed
against the largest value.

Equation 2-20
𝑋𝑖,𝑛𝑜𝑟𝑚𝑖𝑛𝑓 =
𝑋𝑖
(𝑀𝑎𝑥(𝑋𝑖
∗
))

Figure 35 Illustration of different methods of normalisation, (a) Untreated Raman spectra of
M1GLU data (b) Norm 1 Raman Spectra, (c) Norm 2 Raman Spectra and (d) Norm INF Raman
Spectra.
Water is the main component of all samples studied in this work. The water bands can
act as an internal standard as vibrational OH bands are evident above 3000 cm–1 and
at 1640 cm–1 and these non-overlapping bands can be used as internal references. By
normalising to the OH bending vibration we are reducing the impact of the absolute
500 1000 1500 2000 2500 3000
0
0.5
1
1.5
2
2.5
3
3.5
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
0
1
2
3
4
5
x 10
-3
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
0
0.01
0.02
0.03
0.04
0.05
0.06
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wavenumber(cm-1)
In
te
n
s
it
y
(a) (b)
(c) (d)

intensity variation of water signal. The water signal is in such an excess that variance
in this signal is large due to measurement error, but since it is present in excess we
can assume that in reality it is an invariant signal. In other words for the changing
media composition, the water signal should remain unchanged and thus suitable as an
internal standard. Looking to the differences in the analyte-water band ratios, the
relative intensity of water signal gives a good estimation of the analyte signal.
2.7 Variable/Wavelength Selection
Another way to improve the precision, accuracy and robustness of a calibration model
is variable selection. With the right variables, the selection removes irrelevant data21
whilst retaining the more informative areas of the spectra. Wavelength selection
involves choosing subsets for modelling that give a lower error level, better stability
and the removal of minor data permitting easier interpretation of the relationship
between the samples and the model.
Variable selection methods can be organised into three main classes: filter, wrapper
and embedded.
 The filter method involves two steps. A PLS model is fitted to the data. The
threshold is determined from the response with respect to the variable
identified by the fitted PLS model. Variable selection is then carried out based
on the threshold. Examples include the loading weights method or variable
importance in projection (VIP) method [268].
 The wrapper method acts in an iterative way where each search extracts a
subset, and for each subset the model performance is measured. Selection
variables are sorted based on filter constraints like selected frequencies. There
are numerous wrapper methods including genetic algorithm PLS, backward
variable elimination PLS and interval PLS [269].
 Embedded methods combine variable selection and modeling in a single step.
The variable selection is performed within the PLS algorithm. Embedded
methods include interactive variable selection, soft threshold PLS, sparse PLS
and power PLS [270].

21 Full spectrum calibration models can be misled by contrived correlations from system drift or co-
variation amongstconstituents resulting in over-fitting of the model.

2.7.1 Moving Window Partial Least Squares (MWPLS)
Moving Window Partial Least Squares (MWPLS) is a wrapper variable selection
method and was the primary variable selection method used in this body of work. The
objective of MWPLS is to find the informative spectral regions within complex
spectra. Informative regions hold the most relevant information for the PLS analysis
to yield better performing models. The operation of MWPLS uses a fixed sized
window to build a series of PLS models across the whole spectral region. The
informative regions are assigned by the examination of the complexity and the error
level of the PLS models. For each PLS model built along the spectrum, the sum of the
squared residue (SSR) is calculated. In Figure 36, the SSR versus window position
plot provides a display of the residual lines useful in identifying informative regions.
Residue lines show downward-facing bands, which correspond to a particular
wavenumber range. The informative ranges have a low SRR value compared to the
insignificant ranges [271-273].

Figure 36 MWPLS Residue lines obtained from Raman spectra of M1GLU data collected in this
work for the calibration samples. Each colour represents a different residual line.
2.8 Outliers
An outlier is an observation which does not obey the pattern of the majority of the
data. There are two types of outliers: firstly odd measurements, where one of the
replicate measurements is different, possibly due to measurement error; and secondly
odd samples, where samples may be compositionally different to each other. Outliers
should be flagged and discarded as they will introduce unfavourable repercussions in

further analysis. Outlier analysis techniques have the ability to detect the artefacts
(like spectral distortions and deviation from offset) that cannot be seen in the spectra.
If a model describes the variance expected from the calibration test and unknown
samples then one should not encounter outliers. When samples do not fit the model it
is an indicator that either,
a) the sample is different, or
b) the model does not adequately describe the variance space.
In addition to computing principal components, both PCA and ROBPCA also flag
outliers. Outliers are identified on a scores plot if they fall outside of the 95%
confidence limit or are identified as outliers in the corresponding Hoteling’s T2/Q
residual model. There are different categories of outliers which are encountered in a
ROBPCA outlier map (Figure 37): (1) good leverage points (2) orthogonal outliers,
and (3) bad leverage points. Good leverage points lie close to the PCA subspace but
far from the regular observations, whereas bad leverage points have a large
orthogonal distance22 to the PCA subspace yet their projection is within the PCA
space. Orthogonal outliers have a large orthogonal distance and their projection on the
PCA space is far from the regular data [234, 235] [274] [275].

Figure 37: An outlier map plots the orthogonal distance versus the score distance, with
orthogonal outliers and good and bad leverage points.

22 The orthogonal distance is the distance between an observation and its projection in the k-
dimensional subspace.

Materials and Methods
2.9 Materials
2.9.1 Sample Materials
D-glucose (ACS Reagent), L-glutamine (>99%) and D-galactose (min. 98%) were
purchased from Sigma Aldrich and used as received. Isopropanol (Laboratory Grade
Reagent) and nitric acid (Laboratory Grade Reagent, SG 1.42, 70%) were purchased
from Fisher Scientific. All media samples were prepared with purified Millipore water
(18 MΩ resistance). Yeastolate Difco TC UF was purchased from Becton Dickinson
and eRDF basal medium was purchased from Kyokuto, Tokyo. In order to minimise
variation for the complex media components, only one batch of each was purchased
and used over the course of this project.
2.9.2 Colloid Materials
Nitric acid (Laboratory Grade Reagent, SG 1.42, 70%) and hydrochloric acid
(Laboratory Grade Reagent) were purchased from Fisher Scientific. Silver nitrate
(99.99% trace metals basis); sodium citrate (ACS reagent, ≥99.0%) and sodium
bicarbonate (ACS reagent, 99.7-100.3%) were purchased from Sigma Aldrich.
2.10 Workflow Description
Model medium based on a typical formulation of feed medium used in industry were
prepared to provide a platform to test spectroscopic methods that did not compromise
intellectual property issues. The goal was to develop successful analytical methods
that could be adapted for a formulation assay, where the medium was tested for the
correct component levels in the ready to use (liquid) state.

The conventional Raman study covered the development of the analysis method from
the initial data collection setup to data handling and chemometric analysis. It involved
the changing of the sample composition and the progression from single analyte
component samples to five component model medium samples. The chemometric
methods were applied to both the simple and complex samples and with increasing

complexity the degree of accuracy from the model performance was studied. The
Raman study was undertaken firstly to develop a method for measuring D-glucose
concentration in the M5Glu media. Two more sample sets, M5eRDF and M5Ye (see
below), were model media with very complex compositions (amino acids,
carbohydrates, minerals, etc.). For these media the goal was to quantify the amount of
multi-component ingredients (e.g. eRDF and yeastolate) added to each media blend.
For the complex ingredient quantification, more sensitive methods like SERS and
EEM were investigated. Qualitative and quantitative models were built for the eRDF
and yeastolate samples using Raman, SERS, EEM and TSFS.
2.11 Sample Preparation and Handling.
All samples were prepared under sterile conditions in a Laminar Flow Hood (LFH) to
prevent microbial contamination. The LFH was wiped down with 70% isopropanol
before and after use to maintain a sterile environment for sample preparation. The
glassware used to prepare the media was carefully cleaned using the following
procedure: 1) wash thoroughly with a soap solution 2) rinse with distilled water 3)
soak overnight in a 10% nitric acid solution and 4) final thorough rinse with Millipore
water. Once prepared the samples were divided into sterile vials for storage (1 mL and
15 mL aliquots). All samples were stored at –70 oC to ensure sample integrity. For
testing, the samples were removed from storage at –70 oC in batches. The sample
vials were transferred to a refrigerator set at 2–8 oC where they were allowed to
defrost overnight. Prior to sampling, the samples were homogenised using a test tube
shaker and the outside of the sample vial was wiped down with 70% isopropanol
before being transferred to the LFH. Shaking of the samples created bubbles;
therefore the samples were allowed to stand until all bubbles had dissipated. After
sampling was complete, the sample vial was returned to the freezer at –70 oC for long
term storage.
2.12 Datasets
2.12.1 Model Media Samples:
The datasets created provide a model media system for analysis and allow for changes
in media to be characterised. There are five ingredients used for the model media, (D-

glucose, D-galactose, L-glutamine, yeastolate and eRDF). D-glucose is a
monosaccharide sugar that is the main carbon source for cells [276]. D-galactose is a
monosaccharide sugar that works as a transcription promoter as well as a carbon
source [277]. L-glutamine is an amino acid that functions as an intermediate in energy
metabolism, acts as an acid base balance regulator and is used in the detoxification of
ammonia[278, 279]. TC yeastolate products are animal-free and water-soluble
portions of autolyzed Saccharomyces cerevisiae. TC yeastolate contains peptides,
amino acids, vitamins, carbohydrates, simple and complex, and is used as a nutritional
supplement in cell culturing [280, 281]. eRDF is a basal media used in cell culture; it
is a complex mixture of inorganic salts, amino acids, vitamins and other components
that promote cell growth [55, 57, 58].
2.12.2 M1Glu Media Dataset
A simple single analyte (D-glucose) in an aqueous solution was used as a starting
point for uncovering the important experimental parameters. Thirty one different
solutions of varying concentration of D-glucose in the range 1.6 g/L to 49.6 g/L were
prepared by dissolving a known weight of D-glucose in 10 mL Millipore water. A
detailed description of the composition of each sample is tabulated below (Table 2).

Table 2 Composition of the M1Glu samples.
# Sample
D-glucose
(g/L) # Sample
D-glucose
(g/L) # Sample
D-glucose
(g/L)
M1Glu01 1.6 M1Glu11 17.6 M1Glu21 33.6
M1Glu02 3.2 M1Glu12 19.2 M1Glu22 35.2
M1Glu03 4.8 M1Glu13 20.8 M1Glu23 36.8
M1Glu04 6.4 M1Glu14 22.4 M1Glu24 38.4
M1Glu05 8.0 M1Glu15 24.0 M1Glu25 40.0
M1Glu06 9.6 M1Glu16 25.6 M1Glu26 41.6
M1Glu07 11.2 M1Glu17 27.2 M1Glu27 43.2
M1Glu08 12.8 M1Glu18 28.8 M1Glu28 44.8
M1Glu09 14.4 M1Glu19 30.4 M1Glu29 46.4
M1Glu10 16.0 M1Glu20 32.0 M1Glu30 48.0
M1Glu31 49.6

2.12.3 M3Glu Media Dataset
A 3-component model system (D-glucose, L-glutamine and D-galactose) was
generated to determine the potential for quantifying closely related analytes in dilute
aqueous solution. D-glucose concentration was varied throughout the sample set and
L-glutamine and D-galactose concentrations were kept at either a high or low level as
a background influence (Table 3). Chemometric modelling was performed on the
Raman data collected to predict the D-glucose concentration within these samples as it
was giving the strongest signal of the media components used.
Table 3 Composition of the M3Glu model media solutions.
# Sample D-glucose (g/L) D-galactose (g/L) L-glutamine (g/L)
M3Glu01 0.00 1.3 0.44
M3Glu02 0.32 3.7 1.16
M3Glu03 0.64 1.3 0.44
M3Glu04 0.96 3.7 1.16
M3Glu05 1.28 1.3 0.44
M3Glu06 1.60 3.7 1.16
M3Glu07 1.92 1.3 0.44
M3Glu08 2.24 3.7 1.16
M3Glu09 2.56 1.3 0.44
M3Glu10 2.88 3.7 1.16
M3Glu11 3.20 1.3 0.44
M3Glu12 3.52 3.7 1.16
M3Glu13 3.84 1.3 0.44
M3Glu14 4.16 3.7 1.16
M3Glu15 4.48 1.3 0.44
M3Glu16 4.80 3.7 1.16
M3Glu17 5.12 1.3 0.44
M3Glu18 5.44 3.7 1.16
M3Glu19 5.76 1.3 0.44
M3Glu20 6.08 3.7 1.16
M3Glu21 6.40 1.3 0.44
M3Glu22 6.72 3.7 1.16
M3Glu23 7.04 1.3 0.44
M3Glu24 7.36 3.7 1.16
M3Glu25 7.68 1.3 0.44
M3Glu26 8.00 3.7 1.16
M3Glu27 8.32 1.3 0.44
M3Glu28 8.64 3.7 1.16
M3Glu29 8.96 1.3 0.44
M3Glu30 9.28 3.7 1.16
M3Glu31 9.60 1.3 0.44
M3Glu32 9.92 3.7 1.16

The thirty one different solutions of D-glucose prepared in the M1Glu dataset were
used as the stock solutions for D-glucose in the M3Glu dataset. Two stock solutions
were prepared for L-glutamine at 2.2 g/L and 5.8 g/L. Two stock solutions were
prepared for D-galactose at 6.5 g/L and 18.5 g/L. All stock solutions were prepared
with Millipore water. The M3Glu dataset consisted of 32 samples. Sample one was
prepared by pipetting 6 mL of Millipore water, 2 mL of L-glutamine and 2 mL of D-
galactose at the low concentration giving a sample volume of 10 mL. For sample two
to sample thirty two, a 2 mL aliquot of the specified D-glucose solution was pipetted
into a sample vial together with 2 mL of L-glutamine and 2 mL of D-galactose (at
high concentration for the even numbered samples and at low concentration for the
odd numbered samples). The sample volume was made up to 10 mL with 4 mL of
Millipore water.
2.12.4 M5Glu Media Dataset
The M5GLU dataset (Table 4) was generated as a model media system based on
media formulations used within the biopharmaceutical industry. The development of a
calibration model for D-glucose quantification in media involved a set of samples
containing a fixed concentration of eRDF, yeastolate, D-galactose and L-glutamine
together with a concentration of D-glucose that varied. The thirty one different
solutions of D-glucose prepared in the M1Glu dataset were used as the stock solutions
for D-glucose in the M5Glu dataset. Stock solutions of 4 g/L of L-glutamine, 12.5 g/L
of D-galactose, 5 g/L of yeastolate and 17 g/L of eRDF were prepared. All stock
solutions were prepared with Millipore water. The M5Glu samples were prepared by
pipetting 2 mL of the specified D-glucose stock solution, 2 mL of L-glutamine, 2 mL
of D-galactose, 2 mL of yeastolate and 2 mL of eRDF to give a 10 mL sample.

Media can also contain complex ingredients that also contain glucose. Yeastolate and
eRDF are complex mixtures that have D-glucose as a component, for example eRDF
has a glucose concentration of 0.019 mg/g. The same issue arises with yeastolate. For
yeastolate, there is no accurate compositional data available. The amount of glucose
contained in yeastolate and eRDF is so small that it should not have a significant
impact on the model. When quantitative analysis of D-glucose on the M5Glu data was

performed, the concentration of D-glucose in yeastolate and eRDF was not taken into
account.

Table 4 The composition of the M5Glu samples with the additive contribution of D-glucose from
eRDF and yeastolate giving a new range of D-glucose from 0.0 g/L to 9.92 g/L.
Sample No
D-glucose
g/L
eRDF
g/L
Yeastolate
g/L
D-galactose
g/L
L-glutamine
g/L
M5Glu01 0.00 3.4 1 2.5 0.8
M5Glu02 0.32 3.4 1 2.5 0.8
M5Glu03 0.64 3.4 1 2.5 0.8
M5Glu04 0.96 3.4 1 2.5 0.8
M5Glu05 1.28 3.4 1 2.5 0.8
M5Glu06 1.60 3.4 1 2.5 0.8
M5Glu07 1.92 3.4 1 2.5 0.8
M5Glu08 2.24 3.4 1 2.5 0.8
M5Glu09 2.56 3.4 1 2.5 0.8
M5Glu10 2.88 3.4 1 2.5 0.8
M5Glu11 3.20 3.4 1 2.5 0.8
M5Glu12 3.52 3.4 1 2.5 0.8
M5Glu13 3.84 3.4 1 2.5 0.8
M5Glu14 4.16 3.4 1 2.5 0.8
M5Glu15 4.48 3.4 1 2.5 0.8
M5Glu16 4.80 3.4 1 2.5 0.8
M5Glu17 5.12 3.4 1 2.5 0.8
M5Glu18 5.44 3.4 1 2.5 0.8
M5Glu19 5.76 3.4 1 2.5 0.8
M5Glu20 6.08 3.4 1 2.5 0.8
M5Glu21 6.40 3.4 1 2.5 0.8
M5Glu22 6.72 3.4 1 2.5 0.8
M5Glu23 7.04 3.4 1 2.5 0.8
M5Glu24 7.36 3.4 1 2.5 0.8
M5Glu25 7.68 3.4 1 2.5 0.8
M5Glu26 8.00 3.4 1 2.5 0.8
M5Glu27 8.32 3.4 1 2.5 0.8
M5Glu28 8.64 3.4 1 2.5 0.8
M5Glu29 8.96 3.4 1 2.5 0.8
M5Glu30 9.28 3.4 1 2.5 0.8
M5Glu31 9.60 3.4 1 2.5 0.8
M5Glu32 9.92 3.4 1 2.5 0.8

2.12.5 T5 Test Dataset
Ten different stock solutions of D-glucose ranging from 4 g/L to 44.5 g/L were
prepared. Stock solutions with 4 g/L of L-glutamine, 12.5 g/L of D-galactose, 5 g/L of
yeastolate and 17 g/L of eRDF were also prepared. All stock solutions were made
with Millipore water. The T5 samples were assembled by pipetting 2 mL of the
specified D-glucose stock solution, 2 mL of L-glutamine, 2 mL of D-galactose, 2 mL
of yeastolate and 2 mL of eRDF to give a final sample volume of 10 mL.

Table 5 D-glucose sample composition for Raman testing with the amount of D-glucose per
sample changing at a rate of 0.9 g/L while the other components remain at one concentration.
Sample No eRDF
(g/L)
D-glucose
(g/L)
L-glutamine
(g/L)
D-galactose
(g/L)
Yeastolate
(g/L)
T5Glu01 3.4 1.70 0.8 2.5 1
T5Glu02 3.4 2.60 0.8 2.5 1
T5Glu03 3.4 3.50 0.8 2.5 1
T5Glu04 3.4 4.40 0.8 2.5 1
T5Glu05 3.4 5.30 0.8 2.5 1
T5Glu06 3.4 6.20 0.8 2.5 1
T5Glu07 3.4 7.10 0.8 2.5 1
T5Glu08 3.4 8.00 0.8 2.5 1
T5Glu09 3.4 8.90 0.8 2.5 1
T5Glu10 3.4 9.80 0.8 2.5 1

2.13 Complex Media Components Experiments:
Of the five media ingredients, eRDF and yeastolate showed a fluorescence and SERS
response. These components were eligible for testingthe efficacy of EEM, TSFS and
SERS for the quantification of complex components as a single unit within media.
2.13.1 eRDF Media Dataset (M5eRDF)
Ten different stock solutions of eRDF were prepared with the concentration ranging
from 5 g/L to 32 g/L. Stock solutions of 31 g/L of D-glucose, 4 g/L of L-glutamine,
12.5 g/L of D-galactose, 5 g/L of yeastolate were also prepared. A 2 mL aliquot from
the specified eRDF stock solution and 2 mL aliquots from D-glucose, L-glutamine, D-
galactose and yeastolate stock solutions were added together to prepare a 10 mL
eRDF sample.

Table 6 M5eRDF sample compositions with the amount of eRDF per sample changing at a rate of
0.6 g/L while the other components have a fixed concentration.
Sample No eRDF
(g/L)
D-glucose
(g/L)
L-glutamine
(g/L)
D-galactose
(g/L)
Yeastolate
(g/L)
M5eRDF01 1.0 6.2 0.8 2.5 1
M5eRDF02 1.6 6.2 0.8 2.5 1
M5eRDF03 2.2 6.2 0.8 2.5 1
M5eRDF04 2.8 6.2 0.8 2.5 1
M5eRDF05 3.4 6.2 0.8 2.5 1
M5eRDF06 4.0 6.2 0.8 2.5 1
M5eRDF07 4.6 6.2 0.8 2.5 1
M5eRDF08 5.2 6.2 0.8 2.5 1
M5eRDF09 5.8 6.2 0.8 2.5 1
M5eRDF10 6.4 6.2 0.8 2.5 1

2.13.2 Yeastolate Media Dataset (M5Ye)
The M5Ye samples were prepared in a similar way to the M5eRDF samples. Ten
different stock solutions of yeastolate ranging in concentration from 0.5 g/L to 8.6
g/L and stock solutions for eRDF(17 g/L), D-glucose (31 g/L), L-glutamine (4 g/L)
and D-galactose (12.5 g/L) were prepared. The yeastolate samples (10 mL) were made
up by taking a 2 mL aliquot from the particular yeastolate stock solution and adding 2
mL aliquots from eRDF, D-glucose, L-glutamine and D-galactose stock solutions.

Table 7 Yeastolate sample composition with the amount of yeastolate per sample changing at a
rate of 0.18 g/L while the other components have a fixed concentration.
Sample No eRDF
(g/L)
D-glucose
(g/L)
L-glutamine
(g/L)
D-galactose
(g/L)
Yeastolate
(g/L)
M5Ye01 3.4 6.2 0.8 2.5 0.10
M5 Ye02 3.4 6.2 0.8 2.5 0.28
M5 Ye03 3.4 6.2 0.8 2.5 0.46
M5Ye04 3.4 6.2 0.8 2.5 0.64
M5Ye05 3.4 6.2 0.8 2.5 0.82
M5Ye06 3.4 6.2 0.8 2.5 1.00
M5Ye07 3.4 6.2 0.8 2.5 1.18
M5Ye08 3.4 6.2 0.8 2.5 1.36
M5Ye09 3.4 6.2 0.8 2.5 1.54
M5Ye10 3.4 6.2 0.8 2.5 1.72

2.14 Measurement Techniques
2.14.1 Raman Spectroscopy and SERS
The Raman spectra were recorded using a Raman Station Model-Raman 400 (Avalon
instruments now Perkin Elmer) equipped with a 785 nm diode laser and a
thermoelectrically cooled charge coupled device (CCD) detector. The laser power
was set to 100% which equates to 80 mW. Spectra were collected over the 250 –
3311 cm–1 range at a resolution of 8 cm–1.

The instrumental setup allowed for two different scanning modes: line scanning and
mapping. The line scanning mode was originally used but later the mapping mode
was preferred as it gave a better sample representation. With line scanning or single
point data collection, possible sample inhomogeneity could have led to the collection
of erroneous data. This change from line scanning and mapping was also coordinated
with the sample holder change from aluminium crucibles to a multi-well plate.

For the M1GluR123, M3GluR1 and M5GluR1 datasets, the line scanning mode used a
3×10 s exposure time and multiple spectra were collected using a three point line scan
with 0.05 mm spacing. A total of nine spectra were collected per sample and these
were averaged for data analysis. For second and third data collections of M1Glu,
M3Glu and M5Glu, mapping was used. For the mapping of samples a 2×10 s
exposure time was used with a 3×3 grid with 0.05 mm spacing to give multiple
spectra, which were averaged prior to data analysis. All samples were analysed at
room temperature. For data collection of the SERS spectra, single point data
collection with a 2×10 s exposure time was used to give a single spectrum per sample.
2.14.1.1 Preparation of Silver Colloid
The silver colloid was prepared using the Lee and Meisel Method [282]. All
glassware used for the preparation of the colloid was washed with a soap solution,
rinsed with water and then was cleaned with Aqua Regia (HNO3:HCl, 1:3v/v) by
filling/immersing the glassware with/in the solution for 24 hrs. After treatment with

23 R1 denotes the first collection of these datasets i.e. Run 1.

Aqua Regia24 the glassware was thoroughly rinsed with Millipore water to remove all
traces of acid [160, 283]. For the colloid preparation, 250 mL of Millipore water,
0.045 g of silver nitrate and a Teflon coated magnetic stirring bar were put into a
round bottom flask. Stirring was performed for the duration of the reaction. A reflux
reaction was setup to prepare the colloid using an oil bath to maintain a constant
temperature and to protect the solution from light (as silver nitrate is light sensitive).
When the solution started to boil, 5 mL of a 1% Sodium Citrate solution25 was added
drop-wise. The reaction flask and the oil bath were then wrapped with Aluminium foil
to maintain a constant temperature and the reaction was left to reflux for one hour. A
colour change of colourless to yellow to green to olive green was observed after 10–
15 mins following the addition of sodium citrate. The colour change was indicative of
the colloid quality. After 1 hour of refluxing, the colloid was allowed to cool to room
temperature. The absorption spectrum of the colloid was then recorded on a Shimadzu
UV-1601 UV-Visible spectrophotometer to determine the plasmin band maximum
(λmax).
2.14.1.2 Cosmic Ray Artefacts
In some Raman and SERS spectra, sharp artefacts arising from cosmic rays were
noted (Figure 38). A sharp cosmic ray spike is caused by high energy radiation
resulting in a large signal in one or a few pixels which then appears as a large spike in
the spectrum [284]. When the samples were re-measured the sharp peak was either
absent or present at a different location. To overcome this issue, samples were re-
measured to obtain spectra without the interfering cosmic rays being present. In cases
of persistent cosmic peaks, since 9 spectra were collected per sample, the traces with
cosmic rays were omitted and the rest of the spectra were averaged for data analysis.

24 Disposal of the Aqua Regia was performed by first diluting the acid to 10% of its original volume.
The acid was then neutralised by the addition of sodium bicarbonate in small quantities until the
effervescence stopped. The solution was tested with pH paper to check for neutral pH before disposal.

25 A 1% solution of Sodium Citrate was prepared by the addition of 0.05 g to 5 mL of Millipore water.

Figure 38 Raman spectra of a media sample with sharp artefact peaks (red and turquoise peaks)
due to the presence of cosmic rays.
2.14.2 Fluorescence Spectroscopy
Steady state fluorescence spectra were recorded using a Cary Eclipse Varian
spectrophotometer, with two different scan modes, Excitation Emission Matrix
(EEM) and Total Synchronous Fluorescence Scan (TSFS). The EEMs were measured
by scanning the emission spectra from 270–600 nm with a 5 nm step and by varying
the excitation from 230–520 nm also with a 5 nm step. The scan settings were a scan
rate of 3000 nm/min, with an averaging time of 0.10 s. The TSFS was measured by
scanning the excitation range between 230 and 520 nm with the excitation and the
emission slits set at 5 nm. The delta acquisition interval was set at 5 nm, with the delta
stop set at 200 nm. The Cary Eclipse was equipped with a Peltier temperature
controlled multi-well sampler holder26 set to 25 oC. The cuvettes were inserted into
the sample holder with a 4 mm path-length orientation for excitation.26 It allows a maximum of four samples to be analysed sequentially.
500 1000 1500 2000 2500 3000
0.5
1
1.5
2
2.5
3
3.5
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
Cosmic Ray
Signal Spikes

2.15 Sample Holders
Aluminium crucibles27 were used as the sample holder for Raman testing had a 50 µl
volume and a 2 mm depth. Prior to use the aluminium crucibles were rinsed with
distilled water, followed by three washes with 70% isopropanol and finally were
thoroughly rinsed with Millipore water. The crucibles were then thoroughly dried
with cotton buds wrapped in lens tissue. Sampling was carried out in the LFH, where
40 µl of sample was pipetted into the crucible. The crucible was then placed on the
Raman sample stage for testing.

The testing procedure was altered during the Raman experiments to improve the
sampling process. The first run for each dataset was carried out using aluminium
crucibles and the second and third data collections were carried out using the stainless
steel 96 well plate sample holder. The changes to the measurement setup were noted
as this affected the spectral data. This change also altered the sample volume and the
sampling speed. The reason for the development of the stainless steel plate was to
improve the sampling method capability for high throughput screening, as it allows
for multiple samples to be tested in quick succession. The 96-well stainless steel plate
increased the number of samples for consecutive analysis, leading to a greater number
of samples being tested per day. The maximum sample volume per well was 200 µl
and for the analysis a sample volume of 100 µl was used. [8]

In-house development of an electropolished stainless steel 96-well plate facilitated the
replacement of the aluminium crucible for sampling.28 [285] Prior to use the plate
was rinsed with distilled water followed by washing with 70% isopropanol and a final
rinse with Millipore water, after which it was then dried with cotton buds wrapped in
lens tissue. Each well of the plate was airbrushed using a vacuum pump to remove
any residual fibres. A 100 µl aliquot of sample was pipetted into a well. The well
plate was then placed into the Raman stage for analysis.

27 Aluminium crucibles were supplied by Thorn Scientific Services Ltd UK.
28 The Aluminium crucibles were designed for single use and were subject to damage during cleaning
due to the light structure. Problems occurred with the sides of the crucibles, which caved in if too much
pressure was applied.

For SERS analysis 50 µl of sample solution and 50 µl of silver colloid were pipetted
into a well. The sample was mixed five times before testing by re-suspending the
sample colloid mixture.

Quartz cuvettes29 were used for the fluorescence analysis. Quartz cuvettes were rinsed
with Millipore water, followed by five washes with 70% isopropanol before
thoroughly rinsing with Millipore water. Cuvettes were dried in an oven set at 60 oC
and allowed to cool in the LFH before sampling. The sample solution was added to
the cuvette. The cuvette was stoppered and parafilm was used to secure the stopper in
place. The outside of the cuvette was wiped down with lens tissue before
measurement. At the end of each week of testing, the cuvettes and stoppers were
rinsed with Millipore water and left to soak in 30% nitric acid over the weekend (~2.5
days). This ensured a thoroughly clean quartz surface.
2.16 Specific Chemometric Procedures
2.16.1 Baseline Offset Correction
A Matlab routine utilizing an auto-level method was used for baseline correction of
Raman spectra. This was a weighted least squares method. It automatically
determined points that represented baseline alone allowing for removal of the baseline
offset. It worked by repeatedly fitting a baseline to each spectrum and the variables
were divided into groups for above and below the baseline. The points below the
baseline were ranked important in establishing the baseline as the points above the
baseline represented the sample signal. [232]
2.16.2 Water Elimination
The Raman data was dominated by a large water signal. Water formed a major
contribution to the background which interfered with the visibility of the analyte
signal. After baseline correction of the Raman data, the next step was to reduce the
high background signal. This was done using an in-house developed Matlab method
to subtract the water signal from each spectrum in the dataset. [82, 286]

29 Quartz cuvettes were supplied by Lightpath Optical (UK) Ltd.

2.16.3 Water to Analyte Ratio
This water-to-analyte ratio (WAR) was calculated using an in-house Matlab routine to
reflect the water per sample. This code calculated the WAR per spectrum based on
the residuals generated from the difference between the Savitzky–Golay smoothed
version of a spectrum and the test spectrum. Since this function worked on a
spectrum-by-spectrum basis and for WAR of a full dataset, the mean of the individual
WAR values was taken.
2.16.4 Model Evaluation Settings
The following factors were assessed to express the performance of the various
models; correlation co-efficient, associated error, and number of LVs. The
combination of parameters provided a better guide to the linearity and strength of the
model. It also prevented over-fitting30. The correlation co-efficient provided
information on the strength of the correlation between the spectral data and the
analyte and is used as an identifier for strong models. Associated error was evaluated
using a combination of parameters including: root mean square error of calibration
(RMSEC), root mean square error of cross validation (RMSECV), percentage error
and the ratio of RMSECV to RMSEC.
 The number of latent variables was selected from the captured variance in the
RMSEC and RMSECV (Figure 39). When RMSEC and RMSECV reached
the first local minimum, the number of variables at this point was selected.

30 Over-fitting occurred when unnecessary latent variables were used to overly explain variance and
noise amongst the spectra. This resulted in a restricted calibration model.

Figure 39 Plot of Variance captured per latent variable versus RMSEC and RMSECV.

 The relative error of prediction for the calibration models was calculated from
RMSECV and mean value for the analyte concentration (yconc).
Equation 2-21

𝑅𝐸𝑃% =
𝑅𝑀𝑆𝐸𝐶𝑉
𝑦𝑐𝑜𝑛𝑐̅̅ ̅̅ ̅̅ ̅̅
𝑥 100
 If the ratio of SECV to SEC was above 3, the data was deviating from the
model and being over-fitted. [6]

It is the right balance of these elements that gives models the potential to be useful in
the prediction of other samples.

2 4 6 8 10 12 14 16 18 20
0
5
10
15
Latent Variable Number
R
M
S
E
C
V
, R
M
S
E
C

RMSECV
RMSEC

2.16.5 Chemometric Workflow Overview
Spectroscopic methods provide a large amount of data, in order to interpret this data
the chemometric methods covered in this chapter were utilised. Both qualitative and
quantitative assessment of the data was performed.

For Raman, SERS and fluorescence data, the PCA method helped visualise the data
for reproducibility testing and outlier detection. The fluorescence data was further
examined for outliers using ROBPCA. But this method proved to be too sensitive for
the same sample number used here.

The fluorescence emission in the EEM data was generated by multiple, different
fluorophores. MCR and PARAFAC were used identify the fluorophores in the
samples and assess how their emission varied as composition was changed. The MCR
method was better suited than PARAFAC to the EEM data from these complexmedia
samples because of IFE introducing non-linearities into the data.

The development of quantitative models to predict component concentration was
performed after the data was qualitatively assessed and judged to be reproducible and
outliers removed. PLS regression was used and pre-processing was performed to
enhance the analyte signal and remove variations in the data that were not relevant to
the analyte signal. Quantitative models were built for D-glucose (Raman), yeastolate
(Raman, SERS and Fluorescence) and eRDF (Raman, SERS and Fluorescence). The
performance of the models varied according to the analyte signal quality for each
technique and sample type.

3 Development using Raman Spectroscopy for
the Analysis of Cell Culture Media
Components
This chapter covers the investigative work carried out to demonstrate the feasibility of
using Raman spectroscopy for the qualitative and quantitative analysis of complex
aqueous cell culture media. Raman spectroscopy was used to quantify the main source
of carbon and largest media component, D-glucose, as well as the more complex
media components (eRDF and yeastolate) in a model media system. This is of interest
in bioprocess monitoring because it is important to track the concentration of media
components, as this directly affects the metabolism of cells and influences production
yield.
3.1 Spectral Analysis
When compared to the other cell culture media components made up at their working
concentration in water, D-glucose had the strongest and most defined Raman spectrum
(Figure 40). It was typically present in media at the highest concentration (~44% of
the solid formulation weight). However, water had the biggest influence on the
spectrum for all of the sample sets. This was indicated by two broad bands at 1364
and 1640 cm–1 and an intense band above 3100 cm–1. It was not surprising since water
represented ~93% of the overall quantity of matter in the various samples. As for the
other components, eRDF had only trace levels of detail that offered some quantitative
information while the Raman spectra of yeastolate, L-glutamine and D-galactose were
too similar to the water spectrum to relay any quantitative information (Figure 40). D-
glucose gave the only signal with a significant level of detail because of its
intrinsically high working concentration in media. Therefore it was logical to assume
its quantification in media using Raman spectroscopy should be straight-forward.
Exploiting this information led to the D-glucose based datasets made up for the
development of a Raman method; see Table 2, Table 3 and Table 4 in Chapter 2,
Section 2.12.

Figure 40 An overlay of the Raman spectra of aqueous solutions of eRDF (17 g/L), D-glucose (31
g/L), D-galactose (12.5 g/L), L-glutamine (4 g/L) and Yeastolate (5 g/L), the components used to
formulate the cell culture media. The concentrations are those used in the final media
formulation. The spectrum has been enlarged to highlight the weaker peaks.

Figure 41 Raman spectra of an aqueous solution of D-glucose (49.6 g/L).

Figure 41 shows the peaks contained in a concentrated M1Glu sample (49.6 g/L)
spectrum. The top of the OH stretching band beyond 3000 cm–1 was omitted. The
other OH band present is the strong OH bending band at 1640 cm–1 which obscures
the carbonyl (C=O) group which would be seen at 1620–1680 cm–1. Several peaks
500 1000 1500 2000 2500 3000
2
4
6
8
10
12
x 10
4
Wavenumber (cm-1)
In
te
n
si
ty

(a) D-Glucose
(b) eRDF
(c) Yeastolate
(d) L-Glutamine
(e) D-Galactose
(a)
(b)
(c)
(d)
(e)

for D-glucose can be assigned. The low wavenumber peaks 426 cm–1 and 514 cm–1
result from skeletal deformation by exo and endocyclic CCO, CCC, COC and OCO
bending modes. The peaks seen at the high wavenumber 2902 cm–1 and 2950 cm–1 are
the asymmetric and symmetric stretching of CH2 and CH3 groups respectively. In the
fingerprint region the strongest peak at 1123 cm–1 can be assigned to a C–C stretching
vibration along with the peak at 1067 cm–1 which is either a C–C stretching of ring
vibration or molecular backbone. CH bending in the form of CH3, CH2 and CH
deformation gives rise to the 1460 cm–1 peak. The symmetric CH3 deformation (CH
twisting) results in the peak at 1373 cm–1 which is stronger than the other broad OH
band seen at 1364 cm–1. The peaks at 843 cm–1 and 915 cm–1 can be assigned to the
vibrations of the glycosidic bonds and sugar linkages [287-290]. As the datasets get
more complex (M1Glu to M5Glu), the bands appear weaker due to lower overall
concentration. In all cases multivariate analysis is required to extract information from
the Raman spectra with a strong water signal.
3.1.1 Averaged Aqueous D-glucose (M1Glu) Data
For each sample, nine spectra were collected and averaged to form the raw data.
Figure 42 shows the averaged spectra of the triplicate measurements (M1GluR1,
M1GluR2 and M1GluR3) of the M1Glu dataset (aqueous solution of D-glucose with
concentration ranging from 1.6 g/L to 49.6 g/L). Above 3000 cm–1 was the OH
stretching band of water which varied in intensity because of the different sample
concentration and experimental setup. The M1Glu Raman spectra also showed a
sloping baseline from 300–2500 cm–1. The spectra of the M1GluR1 data had a greater
baseline offset in comparison to the M1GluR2 and M1GluR3 data. This was
attributed to two changes in the experimental setup:
a) The M1GluR1 data was collected using aluminium crucibles instead of the 96-
well stainless steel plate used for the M1GluR2 and M1GluR3 data.
b) For the M1GluR1 samples, line scan data collection setup was used. The
scanning mode was changed from line scanning to mapping for the latter
collections as it reduced baseline offset.

Figure 42 Averaged Raw Raman spectra of the (a) M1GluR1, (b) M1GluR2 and (c) M1GluR3
data (Table 2).

The baseline variation resulted from the different depths and surface finishes of the
sample containers. It affected the performance of data when used for qualitative and
quantitative analysis until corrective action was taken. In order to ensure consistent
data collection the collimation of the radiation from the samples required focusing. If
the radiation was diffusely scattered, the level of collimation decreased and this led to
an increase in the stray light resulting in more scatter in the spectra. A way to prevent
scatter was to have the focal depth centred over the middle of the sample. This would
help to avoid scatter from the container surface, which if present would contributed to
the baseline offset [291].
3.1.2 Baseline Offset Correction of the Aqueous D-glucose
(M1Glu) Data
Figure 43 shows the results of baseline offset correction on the averaged M1Glu
spectra. The extensive difference observed between Figure 43a and Figure 42a
confirmed that the aluminium crucibles used for data collection were poor sample
500 1000 1500 2000 2500 3000
0.5
1
1.5
2
2.5
x 10
5
Wavenumber (cm-1)
In
te
n
s
it
y
(a)
500 1000 1500 2000 2500 3000
2
4
6
8
10
12
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
(b)
500 1000 1500 2000 2500 3000
2
4
6
8
10
12
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
(c)

containers and caused considerable background offset. Smaller spectral changes were
seen with the baseline improved M1GluR2 and M1GluR3 data after offset correction.
The baseline correction worked reasonably well, however it only removed the offset
and a gradient effect was still visible below ~ 1000 cm–1. Other methods are able to
deal with this type of gradient but they were not suitable in this case because of the
weak signal and the strong water band. Here, the removal of the offset was more
important since it was an unwantedsource of variation. The spectral region above
3000 cm–1 was a considerable source of variance as it was dominated by the OH band.
Factors such as shot noise, detector quantum yield variation, sample placement and
concentration effects caused the large variance in this region [3]. This spectral
variance was not removed by the baseline offset correction therefore further action
was required.

Figure 43 Baseline Corrected Raman spectra of the (a) M1GluR1, (b) M1GluR2 and (c)
M1GluR3 data.
3.1.3 Water Background Elimination of the Aqueous D-
glucose (M1Glu) Data
Spectral interference such as varying background caused by fluorescence or
instrumental noise like dark and shot can hinder qualitative and quantitative analysis
500 1000 1500 2000 2500 3000
0
0.5
1
1.5
2
x 10
5
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
0
2
4
6
8
10
12
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
(a)
(c)
(b)

of chemical information. In the Raman spectra, water was the major signal as it was
the largest component in cell culture media. This, however, had adverse effects on
modelling weaker signals. The water bending bands (1636 and 1364 cm–1)
overshadowed the signal in a large portion of the fingerprint region making specific
analyte identification impossible. It was assumed that if the water signal was removed
then the analyte signal should become clearer. However, after subtracting the pure
water signal from the sample spectra, the resulting residual spectra contained noise as
well as the D-glucose signal [292, 293].

Figure 44 (a) Raman spectra of M1GluR1 (49.6 g/L D-glucose) sample, water and a subtracted
spectra. (b) M1GluR1, (c) M1GluR2 and (d) M1GluR3 are the water eliminated Raman spectra
for the different datasets.

The M1Glu and water spectra were both baseline corrected prior to water elimination.
After water elimination (Figure 44), the analyte Raman bands were clearly visible and
easier to identify and interpret. This was particularly important when analysing
samples which displayed subtle changes. A drawback of water elimination was the
introduction of artefacts such as negative peaks, baseline shift and enhanced noise.
This was due to the variance amongst the spectra of the samples and water. It was also
evident that each data collection series (R1, R2 and R3) were affected differently by
water elimination. For example, above 3000 cm–1, the noise varied between replicate
500 1000 1500 2000 2500 3000
-1
-0.5
0
0.5
1
1.5
2
2.5
3
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-2000
0
2000
4000
6000
8000
10000
12000
14000
Wavenumber (cm-1)
In
te
n
s
it
y
(c)
(b)
500 1000 1500 2000 2500 3000
0
0.5
1
1.5
2
2.5
x 10
5
Wavenumber (cm-1)
In
te
n
s
it
y

M1R1 Sample
Water
Difference
(a)
500 1000 1500 2000 2500 3000
-2000
0
2000
4000
6000
8000
10000
12000
14000
16000
Wavenumber(cm-1)
In
te
n
s
it
y
(d)

runs. The M1GluR1 and M1GluR3 samples displayed similar artefacts. For instance
at 600–900 cm–1, they both showed downward facing bands, and in the 1500–2000
cm–1 region, where the OH bending band was, the spectra showed increased levels of
noise and baseline offset. For M1GluR2, water elimination introduced an increase
baseline offset across the data.
3.2 Reproducibility of Raman Data Collection
When the data collection is optimal, the samples should plot as a straight line along a
single principal component representing the variance caused by the D-glucose
concentration gradient. Changes and deviations in sampling represent sources of
variation (experimental setup, power fluctuations, noise and sample preparation faults,
etc.). This can have significant impact on the data quality and reproducibility.

PCA was performed to provide a simple overview of the variance within the data. The
averaged raw M1Glu spectra from the replicate measurements were amalgamated for
comparison in order to assess data reproducibility. Sample grouping and data
collection reproducibility were evaluated (Figure 45). During this study, changes in
the experimental setup (sample container and scanning mode) led to differences in
data collection. The scores plots (Figure 45), highlighted that the second and third
runs - which were collected under the same conditions - overlapped, indicating
reproducible data collection. The first run proved anomalous, however, due to the
different data collection setup used. Sample variance was an issue with these media
samples. It was clear from the scores plots that changing the setup minimised the
measurements variance as seen by the tighter grouping of the M1R2 and M1R3
samples [51, 66, 294].

Figure 45 Scores plots and loadings (L1, L2 and L3) of PC1, PC2 and PC3 for amalgamated
averaged raw M1Glu data. The black circles represent run1, red triangles represent run2 and
the green asterisks represent run3. The blue circle represents 95 % confidence level of explained
variance.

In Figure 45, the scores along PC3 for all the data collection runs showed the
expected near linear variation as a result of the changing analyte concentration.
Indeed the corresponding third loading contained peaks relating to the D-glucose. The
other two loadings represented the water signal and the baseline offset, respectively.
The large variability of run one measurements along PC2 was caused by the use of
aluminium crucibles which resulted in a strong baseline shift. Moreover, the
instrumental effects associated with line scanning were greater than those of the
mapping mode (this was the reason why the measurement protocol was changed from
line scanning to mapping). When baseline offset correction was performed and PCA
was repeated, the number of components was reduced to two with the first
representing the water signal and the second showing the analyte signal. Even after
pre-processing, line scanning and mapping samples did not overlap. This made them
incompatible when combined for modelling (data not shown but it was evident from
the PCA results).

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x 10
6
-4
-3
-2
-1
0
1
2
x 10
5
Scores on PC 1 (98.21%)
S
co
re
s
on
P
C
2
(
1.
74
%
)
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x 10
6
-8
-6
-4
-2
0
2
4
6
8
x 10
4
Scores on PC 1 (98.21%)
S
co
re
s
on
P
C
3
(
0.
04
%
)
-4 -3 -2 -1 0 1 2
x 10
5
-8
-6
-4
-2
0
2
4
6
8
x 10
4
Scores on PC 2 (1.74%)
S
co
re
s
on
P
C
3
(
0.
04
%
)
500 1000 1500 2000 2500 3000
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Wavenumber(cm-1)
In
te
n
si
ty

L1(98.21%)
L2(1.74%)
L3(0.04%)

A full data collection of line scanning samples was carried out on M1Glu, M3Glu and
M5Glu, prior to the switch to the new experimental setup. The advantage of line
scanning was the faster collection time. However, from the PCA results, the
M1GluR1 data should be treated separately as:
a) it did not match the R2/R3 data
b) the variance of M1GluR1 samples was much greater as seen by the scores
pattern in Figure 45.
3.3 Evaluation of Spectral Range
Variable selection was used to improve quantitative modelling. It was thought that it
may lead to better results due to the removal of areas that contain irrelevant
information31, therefore placing the focus on the more informative regions of the data.
Eliminating these irrelevant regions can lead to improved calibration models and
smaller residual levels. The selection of spectral ranges by the operator is subjective
and highly dependent on expertise. One may inadvertently discard a region with
useful information [225, 295]. Numerical based methods for variable selection are
available.One such method is Moving Window Partial Least Squares (MWPLS)
which highlights the regions in the spectra that contribute to prediction accuracy and
eliminate areas with high levels of uncertainty [271-273].

From the MWPLS results, a residual line plot of downward facing bands was
observed (Figure 46). The bands represented areas rich in signal variation while the
flat sections showed the areas of limited information. The five residual lines for the
five principle components were used to determine the error level. MWPLS was
performed on the M1Glu, M3Glu and M5Glu sample sets. The residual lines were
clearer in the M1Glu dataset due to the stronger signal and the simple sample
composition. In the region below 650 cm–1, there were scatter contributions that
appeared as large sloping baseline variances. This region was compromised by
Rayleigh light leakage from the filters, therefore the 250–600 cm–1 region was
omitted. For the M1Glu data, the first informative band selected by MWPLS was at

31 Areas that contain interference effects - from the diffuse light scatter and noise from the hardware
used - are removed.

818–1676 cm–1. This represented several groups from CH3 and CH2 deformations, C–
O and O–O groups, C–N, C–C, and C=C stretching bands etc. The second informative
band at 2774–3159 cm–1 represented C–H and O–H stretching modes. There were
similar informative areas for M3Glu (802–1612 cm–1, 2798–3151 cm–1) and M5Glu
data (826–1596 cm–1, 2814–3167 cm–1). Since the MWPLS results showed similar
downward bands for all three datasets, the regions 800–1680 cm–1 and 2770–3170 cm–
1 were chosen for data analysis. In previous research by the NBL laboratory on cell
culture media by Raman spectroscopy, [3, 6, 8] spectral regions were selected for
chemometric analysis and the most significant bands were observed in the 707–1853
cm–1 region (Figure 47). The expected bands associated with the media components
were observed in this region and it was also selected for data analysis.

Figure 46 MWPLS Error Plot of Log (SSR) versus Raman Shift (cm–1) of the (A) M1Glu
samples, (B) M3Glu samples and (C) M5Glu samples.

In the preliminary data analysis, the 2774–3174 cm–1 spectral range was modelled for
the M1Glu (Table 47) and M3Glu (Table 48) sample sets. The performance with the
M1Glu data showed good correlation to D-glucose concentration and performed at a
similar level to the other two reduced region models. Whilst for the M3Glu sample
set, the models were weaker. The other reduced regions (800–1680 cm–1 and 707–
1853 cm–1) of M3Glu data outperformed the 2774–3174 cm–1 region. The reasons for
500 1000 1500 2000 2500 3000
0
0.5
1
1.5
2
Wavenumber (cm-1)
Lo
g
[S
R
R
]
500 1000 1500 2000 2500 3000
1
1.5
2
2.5
3
3.5
Wavenumber (cm-1)
Lo
g
[S
R
R
]
500 1000 1500 2000 2500 3000
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Wavenumber (cm-1)
Lo
g
[S
R
R
]
(A)
(C) (B)

poor performance came from the weak analyte signal together with the strong water
band bordering this region which diminished the correlation to D-glucose seen in the
high concentration M1Glu samples. Another cause of the poor performance of the
2774–3174 cm–1 region was the high levels of variance caused by instrumental factors
such as lower detector sensitivity above 1000 nm and higher detector noise. Thus,
after preliminary modelling, the 2774–3174 cm–1 region was omitted from further
analysis.

Figure 47 Raman spectra of the 49.6 g/L D-glucose aqueous solution over the full spectra range
(250–3311 cm–1) and inset is the selected region of interest, 707–1853 cm–1.
3.4 Calibration Modelling
The objective of calibration modelling was to establish a relationship between the
known concentration of the analyte, e.g. D-glucose, and the spectral data. The reason
for evaluating multiple calibration models was to establish the optimal linear
relationship between the Raman spectra and the D-glucose concentration. With the
correct calibration model, it should be possible to predict the concentration of
unknown media samples with high accuracy. Assessment of the calibration models for
M1Glu sample set using the averaged data, baseline corrected data and water
eliminated data over the full region and in the reduced spectral ranges (707–1853 cm–
1 and 800-1680 cm–1) was performed. The reduced regions focused on the fingerprint
region which contained a wide range of vibrational modes that were useful for media

component analysis. More models were constructed following pre-processing to
further improve the calibration results.
Table 8 Spectral areas selected for model generation using the Raman Data.
Region ID Wavenumber Region (cm–1)
Full 250–3300 cm–1
Reduced Region (ROI) 707–1853 cm–1
MWPLS Region 800–1680 cm–1

All preliminary models built with the averaged raw data were able to correlate the
Raman spectra to the D-glucose concentration (Figure 48). Comparison of M1GluR1
and M1GluR2 revealed how the changes in sampling affected the reproducibility of
the data. The new experimental setup improved the RMSECV for the M1GluR2 data.
There was an improvement of 37% for averaged data, 47% for the baseline corrected
and 27% for water eliminated data over the M1GluR1 data. In the raw data calibration
model of M1GluR3 (Figure 48c), a weaker performance was seen compared to the
M1GluR1 and M1GluR2 models. When the raw data was investigated, the first 18
samples collected were separated by a spectral offset from the remaining 14 samples
that were collected later that day. The samples were tested in a random order and
variation was therefore seen randomly throughout the dataset. This variation was most
likely due to minor changes during sampling, such as the temperature, as these
samples were kept at room temperature for longer before measurement. This type of
change was only observed in this dataset, but it can have a major impact on data
quality and therefore model performance. In this case the spectral offset variance was
removed by pre-processing.

Figure 48 Predicted versus Measured D-glucose concentration plots for the calibration models for
the averaged data using full range for (a) M1GluR1, (b) M1GluR2 and (c) M1GluR3 sample sets.

0 5 10 15 20 25 30 35 40 45 50
0
10
20
30
40
50
60
Measured D-Glu Conc [g/L]
P
r
e
d
ic
te
d
D
-G
lu
C
o
n
c
[
g
/L
]
1
2
3
4 5
6
7
8
9
10
11 12
13 14
15
16
17
18 19
20
21
22
23
24 25
26 27
28 29
30
31R2 = 0.986
3 Latent Variables
RMSEC = 1.7048
RMSECV = 1.8654
0 5 10 15 20 25 30 35 40 45 50
0
10
20
30
40
50
60
Measured D-Glu Conc [g/L]
P
r
e
d
ic
te
d
D
-G
lu
C
o
n
c
[
g
/L
]
1
2
3
4
5
6
7
8 9
10 11
12
13
14 15
16
17
18 19
20
21
22
23
24
25
26 27
28
29
30
31
R2 = 0.996
3 Latent Variables
RMSEC = 0.95139
RMSECV = 1.142
0 5 10 15 20 25 30 35 40 45 50
0
10
20
30
40
50
60
Measured D-Glu Conc [g/L]
P
r
e
d
ic
te
d
D
-G
lu
C
o
n
c
[
g
/L
]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 20
21 22
23
24
25
26 27
28
29
30
31
R2 = 0.979
3 Latent Variables
RMSEC = 2.0663
RMSECV = 2.3858
(a) (b) (c)

For the M1GluR2 sample set, the preliminary models are shown in Table 9. For the
averaged data, the performance of all three regions was similar but the reduced range
used a smaller number of variables compared to the full dataset. The calibration
models improved with baseline correction and marginally worsened with water
elimination due to artefacts introduced by the signal elimination process. By using
reduced regions with water eliminated data, some of the artefacts were avoided and
model performance improved. Watereliminated data produced reasonable models in
the preliminary study, but the process also introduced artefacts into the spectra.
Further analysis was required to determine the severity of these artefacts.

The PLS loadings from the calibration model generated using baseline corrected
M1GluR2 data are shown in Figure 49. The first loading resembled the pure water
signal and represented 99.88% of the explained variance in the data. The second and
third loadings showed the baseline offset seen in the data. They both represented the
D-glucose signal. Together they accounted for 0.11% of the explained variance but
differed at 1450 cm–1 and 1367 cm–1; these bands originated from the asymmetric and
symmetric stretching of CH3 respectively. The offset intensity in the second and third
loadings suggested a high water signal and a low water signal as a result of
hydrophobic interactions. The second loading reflected the high concentration
samples while the third loading represented the low concentration samples and
background noise in an aqueous environment [296, 297].

Figure 49 Loadings plot from the calibration model for baseline corrected M1GLUR2 data over
the full range.

500 1000 1500 2000 2500 3000
-0.05
0
0.05
0.1
0.15
0.2
0.25
Wavenumber(cm-1)
Lo
ad
in
gs

L1(99.88%)
L2(0.10%)
L3(0.01%)

Table 9 Models generated from M1GluR2 data after preliminary pre-processing.
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg Full (250–3311 cm–1) 3 0.996 0.95 1.14 4.45
Avg ROI (707–1853 cm–1) 2 0.994 1.13 1.23 4.80
Avg MWPLS (800–1680 cm–1) 2 0.995 1.02 1.11 4.33
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Full (250–3311 cm–1) 3 0.996 0.89 1.02 3.98
BC ROI (707–1853 cm–1) 3 0.997 0.75 0.86 3.35
BC MWPLS (800-1680 cm–1) 3 0.998 0.62 0.72 2.81
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE Full (250–3311 cm–1) 2 0.996 0.97 1.09 4.25
WE ROI (707–1853 cm–1) 3 0.998 0.62 0.74 2.89
WE MWPLS (800–1680 cm–1) 3 0.998 0.64 0.76 2.96
3.5 Spectral Pre-Processing of M1Glu Sample Set
With preliminary data treatment (offset correction and water elimination), good
models were obtained. With further pre-processing it was possible to increase the
accuracy to improve the models. Different pre-processing methods and their
combinations were applied. Multiplicative scatter correction (MSC) and normalisation
were used to correct intensity differences due to sampling. The first order derivative
(FD) was used to remove baseline differences and clarify analyte signal. The
M1GluR2 sample set was chosen to show the effects of using the different pre-
processing methods. Also the influence of spectral region selection on pre-processing
was investigated.
3.5.1 Pre-area and Post-area Selection for Spectral Pre-
Processing
Firstly the issue of carrying out region selection prior or post pre-processing was
evaluated. Both ways were modelled and the results recommended that region
selection should be performed after pre-processing. When undertaking spectral pre-
processing, it was preferable to use the full spectral region. Then the dataset was
truncated into smaller regions to exclude any end of range artefacts. Figure 50 shows
end of range artefacts following derivative pre-processing where the first and last data
points of the spectra were altered.

100

Figure 50 Overlay of the first derivative pre-processing of M1Glu samples with end of range
artefacts. The blue traces used post area selection and the black traces used pre-area selection.
3.5.2 Multiplicative Scatter Correction of M1GluR2 Data

Figure 51 Effects of MSC on spectra from M1GluR2 data in the ROI range (707–1853 cm–1),
(a) Averaged Raw spectra, (b) after MSC.

The Raman spectra of the M1Glu before and after MSC pre-processing are displayed
in Figure 51. The additive/multiplicative effects observed in the raw data were
reduced by MSC and the peaks were clarified. Table 10 outlines the calibration
performance of M1GluR2 data after MSC for the D-glucose modelling. Similar results
were obtained for the models using data before and after baseline correction. The best
MSC model was built on averaged data using the 707–1853 cm–1 region. It used two
LVs and gave an accuracy of roughly 1.53% REP. This model demonstrated an
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-1000
0
1000
2000
3000
4000
5000
6000
7000
8000
Wavenumber(cm-1)
In
te
n
s
it
y
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-1000
0
1000
2000
3000
4000
5000
6000
7000
Wavenumber(cm-1)
In
te
n
s
it
y
(a) (b)
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-600
-400
-200
0
200
400
600
800
Wavenumber(cm-1)
In
te
n
si
ty

End of Range Artefacts

101

improved correlation for D-glucose concentration compared to the preliminary pre-
processing models in Table 9.

The MSC water eliminated (WE) models were generally poor as seen in Figure 52a.
There were two distinct regions in the predicted versus expected plot for the high and
low concentration samples. This indicated a big difference in how the glucose
interacted with water at high and low concentrations. In the Hotelling’s T2 vs Q
residuals plot (Figure 52c), there were several samples outside the 95% confidence
limit. Low concentration water eliminated samples were excessively modified by
MSC and were cast as outliers. This behaviour suggests the involvement of
hydrophobic interactions due to grouping of the high and low water samples [296,
297]. The PC1 vs PC2 scores plot gave a linear response with decreasing water
content, confirming the different hydrophobic behaviours seen in the samples (Figure
52d). Another explanation was that as glucose concentration increased the density and
the refractive index of the sample changed. This could affect the Raman spectra, as
the low concentration samples behaved differently to the high concentration samples
after water elimination. A conclusion was drawn that accurate linear glucose models
can only be constructed over a small concentration range.

Further analysis of the MSC WE PLS loadings (Figure 52b) showed that the major
signal was from D-glucose, with the first loading containing peaks attributed to D-
glucose and describing 85.5% of the explained variance. With MSC, the signal was
corrected to have a reduced level of scatter. Therefore the MSC WE data contained
the enhanced analyte peaks as well as the water removal artefacts. This inadvertently
introduced more noise into the data, with the second and third loadings accounting for
almost 10% of the signal variance and mainly describing the water artefacts. The
second loading (8.97%) represented the water removal artefacts present in the 1200–
2000 cm–1 region and the region beyond 3000 cm–1. The third loading described
0.95% of variance and its main contribution was the noise present beyond 3000 cm–1.

102

Figure 52 (a) Relationship between expected and predicted D-glucose content for WE MSC pre-
treated M1Glu sample set, (b) the loadings plots of the three PCs, (c) Hotelling’s T2 vs Q
residuals for water eliminated M1GluR2 data after MSC, and (d) the PC1 vs PC2 scores plot
with the high concentration samples in black and the low concentration samples in red.

Table 10 Calibration Evaluation for Multiplicative Scatter Correction on the M1GLUR2 data.32
M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP%
Avg Full 2 0.998 0.56 0.60 2.34
Avg ROI 2 0.997 0.71 0.75 2.92
Avg ROI (1) 3 0.999 0.35 0.39 1.52
Avg MWPLS 2 0.998 0.68 0.71 2.77
Avg MWPLS (1) 2 0.997 0.77 0.82 3.20
M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP%
BC Full 2 0.998 0.55 0.59 2.30
BC ROI 2 0.998 0.55 0.60 2.34
BC ROI (1) 2 0.998 0.55 0.60 2.34
BC MWPLS 2 0.998 0.60 0.65 2.53
BC MWPLS(1) 2 0.998 0.60 0.65 2.53
M1GluR2LV R2 RMSEC (g/L) RMSECV (g/L) REP%
WE Full 3 0.619 8.83 11.98 46.79
WE ROI 2 0.245 12.56 14.31 55.89
WE ROI (1) 3 0.494 10.17 12.57 49.10
WE MWPLS 2 0.244 12.55 14.26 55.70
WE MWPLS (1) 3 0.454 10.57 12.30 48.04

32 ROI (1) and MWPLS (1) signify that pre-processing was carried out on the reduced area of the
spectra. No bracket represents area selection on data pre-processed on the full range. For ease of
interpretation of the table, the best models will be highlighted in grey.
0 5 10 15 20 25 30 35 40 45 50
-20
-10
0
10
20
30
40
Expected (g/L)
P
re
di
ct
ed
(
g/
L)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 20
21 22
23 24
25 26 27 28
29
30 31
R 2̂ = 0.619
3 Latent Variables
RMSEC = 8.8302
RMSECV = 11.982
(a)
500 1000 1500 2000 2500 3000
-0.3
-0.2
-0.1
0
0.1
0.2
Wavenumber(cm-1)
In
te
n
si
ty

L1(85.5%)
L2(8.97%)
L3(0.95%)
(b)
0 2 4 6 8 10 12 14
x 10
7
0
5
10
15
20
25
30
Q Residuals (1.46%)
H
ot
el
lin
g
T
^2
(
98
.5
4%
)
1
2
3
4
5 6
7
8
9
10 11 12 13 15 16 22 31
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
7
0
1
2
3
4
5
6
7
8
9
Q Residuals (0.65%)
H
ot
el
lin
g
T
^2
(
99
.3
5%
)
1
3
5
6 7 8
9
10
11
12
13 14
15
16
17
18
19
20
21
22
23
24 25 26
27 28
29
30
31
0 2 4 6 8 10 12
x 10
5
0
1
2
3
4
5
6
7
8
Q Residuals (2.84%)
H
ot
el
lin
g
T
^2
(
97
.1
6%
)
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
18
19 20
21
22
23
24 25
26
27
28 29
30
31
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
0
5
10
15
20
25
30
Q Residuals (1.91%)
H
ot
el
lin
g
T
^2
(
98
.0
9%
)
1
2 3
4 5 6 7
8 9 10
11
12
15
16
22 27
29 31
(c)
2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
x 10
4
-6
-5
-4
-3
-2
-1
0
1
2
3
4
x 10
4
Scores on PC 1 (85.65%)
S
co
re
s
on
P
C
2
(
8.
91
%
)
(d)

103

3.5.3 Normalisation of M1GluR2 Data
Normalisation corrects for variation in signal intensity due to experimental setup.
Four different modes of normalisation were assessed here for enhancement of
calibration performance over the full and reduced ranges. Figure 53 shows the
spectral profiles of the normalised data using the different methods (Norm1, Norm2,
NormINF, and NormOH see section 2.6.4).

Figure 53 Effects of normalisation on M1GluR2 data (a) after NORM1 pre-processing, (b) after
NORM2 pre-processing, (c) after NORMINF pre-processing, and (d) after NORMOH pre-
processing.

Normalisation improved the models built using the averaged raw data (Table 9). The
calibration results (Table 11-Table 14) showed that the best model was obtained using
Norm2 which gave a correlation co-efficient of 0.999. Model performances only
varied slightly with the different normalisation methods, with Norm1 being the
weakest. Figure 54a showed the PLS loadings for the raw (averaged) M1GluR2 data
after Norm2 pre-processing; similar loadings resulted from the other normalisation
methods. The first loading revealed, as expected, that the water signal was the
dominant feature in the data while the second loading was the D-glucose signal and
described less than 0.1% of the total variance.
500 1000 1500 2000 2500 3000
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
Wavenumber(cm-1)
In
te
n
si
ty
500 1000 1500 2000 2500 3000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Wavenumber(cm-1)
In
te
n
si
ty
500 1000 1500 2000 2500 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wavenumber(cm-1)
In
te
n
si
ty
500 1000 1500 2000 2500 3000
1
2
3
4
5
6
7
8
Wavenumber(cm-1)
In
te
n
si
ty
(a) (b)
(c) (d)

104

Figure 54 (a) shows the loadings plots of the two principal components from raw (averaged)
M1GluR2 with Norm2 pre-processing and (b) the Water Eliminated M1GluR2 spectra with
Norm 2 pre-processing.

Figure 54b showed the water eliminated spectra with Norm2 pre-processing. The
drawback of normalising water eliminated data was the increased noise artefacts and
baseline offset. These had adverse implications for the calibration modelling. The
normalisation methods used were not able to handle the water eliminated data (Figure
55a). After water elimination, three components described the data, see Figure 55b.
The first and second loadings represented the D-glucose signal. The difference in
these loadings showed the different levels of interaction between the water and the D-
glucose at high (12.8–49.6 g/L) and low (1.6–19.2 g/L) concentrations. The
normalised and MSC data behaved in a similar fashion. All the WE spectra suffered
from an offset due to the changing concentrations and differing optical properties of
the sample (Figure 54b). The third loading represented the signal from the low
concentration samples and noise. Because the water signal was so strong in the low
concentration samples, the water elimination had a bigger impact and the resulting
spectra had more noise than detail compared to the high concentration samples. This
was seen in the data where the first four samples were noisy and the D-glucose signal
came through in the fifth sample (Figure 56a). In the Hotelling’s T2 vs Q residuals
plot (Figure 55c), the low concentration samples showed more unexplained variance
than the high concentration samples. The variance amongst the samples was also seen
in the scores plot where the high concentration (black) samples were clustered
together while the low concentration (red) samples were scattered due to their
increased variability. Comparing these with the remaining high concentration samples
showed the different groupings where the high concentrations were clustered together
and the low concentration samples were dispersed (Figure 55d). It may then be
500 1000 1500 2000 2500 3000
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Wavenumber(cm-1)
In
te
n
si
ty
(b)
500 1000 1500 2000 2500 3000
-0.05
0
0.05
0.1
0.15
0.2
0.25
Wavenumber(cm-1)
In
te
n
si
ty

L1(99.89%)
L2(0.09%)
(a)

105

concluded that normalisation was adequate when the peak positions remained
relatively constant across the samples. However, when specific changes affected the
spectra, such as the noise laden spectra of the low concentration samples,
normalisation failed.

Figure 55 (a) Relationship between expected and predicted D-glucose content for WE Norm2 pre-
treated M1Glu sample set, (b) the loadings spectrum of the three PCs, (c) Hotelling’s T2 vs Q
residuals for H2O Eliminated M1GluR2 data after Norm2, (d) the scores plot for PC1 vs PC2,
with the high concentration samples in black and the low concentration samples in red.

Figure 56 Water Eliminated M1GluR2 spectra with Norm2 pre-processing (a) first five samples
and (b) a selection of samples from M1GluSO1 to M1GluS31.

0 5 10 15 20 25 30 35 40 45 50
-20
-10
0
10
20
30
40
Expected(g/L)
P
re
d
ic
te
d
(g
/L
)
1
2
3
4
5
6
7
8
9
10
11 12 13 14
15
16
17
18
19 20
21 22
23
24
25 26
27
28
29 30
31
R2 = 0.702
3 Latent Variables
RMSEC = 7.8161
RMSECV = 9.7818
(a)
500 1000 1500 2000 2500 3000
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Wavenumber(cm-1)
In
te
n
s
it
y

L1(90.99%)
L2(5.82%)
L3(0.98%)
(b)
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15
-1
-0.5
0
0.5
1
1.5
Scores on PC 1 (91.01%)
S
c
o
re
s
o
n
P
C
2
(
5
.9
5
%
)
1
2
3
4
5
6
7 8
10
12
15 18
21 27
28
30
31
0 2 4 6 8 10 12 14
x 10
7
0
5
10
15
20
25
30
Q Residuals (1.46%)
H
o
te
lli
n
g
T
^2
(
9
8
.5
4
%
)
1
2
3
4
5 6
7
8
9
10 11 12 13 15 16 22 31
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
7
0
1
2
3
4
5
6
7
89
Q Residuals (0.65%)
H
o
te
lli
n
g
T
^2
(
9
9
.3
5
%
)
1
3
5
6 7 8
9
10
11
12
13 14
15
16
17
18
19
20
21
22
23
24 25 26
27 28
29
30
31
0 2 4 6 8 10 12
x 10
5
0
1
2
3
4
5
6
7
8
Q Residuals (2.84%)
H
o
te
lli
n
g
T
^2
(
9
7
.1
6
%
)
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
18
19 20
21
22
23
24 25
26
27
28 29
30
31
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
0
5
10
15
20
25
30
Q Residuals (1.91%)
H
o
te
lli
n
g
T
^2
(
9
8
.0
9
%
)
1
2 3
4 5 6 7
8 9 10
11
12
15
16
22 27
29 31
(c) (d)
500 1000 1500 2000 2500 3000
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Wavenumber(cm-1)
In
te
n
s
it
y

M1S01
M1S02
M1S03
M1S04
M1S05
500 1000 1500 2000 2500 3000
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Wavenumber(cm-1)
In
te
n
s
it
y

M1S01
M1S05
M1S10
M1S15
M1S20
M1S30
(a) (b)

106

Table 11 Calibration Evaluation for Normalisation on the M1GluR2 data using Norm1.32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg Full Norm1 2 0.995 0.99 1.09 4.25
Avg ROI Norm1 2 0.998 0.60 0.63 2.46
Avg ROI Norm1 (1) 2 0.996 0.85 0.91 3.55
Avg MWPLS Norm1 2 0.997 0.74 0.78 3.04
Avg MWPLS Norm1 (1) 2 0.994 1.12 1.20 4.68
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Full Norm1 3 0.997 0.78 0.90 3.51
BC ROI Norm1 3 0.997 0.84 0.96 3.75
BC ROI Norm1 (1) 3 0.934 3.69 4.15 16.21
BC MWPLS Norm1 3 0.997 0.80 0.91 3.55
BC MWPLS Norm1 (1) 3 0.925 3.93 4.41 17.22
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE Full Norm1 3 0.692 7.94 9.87 38.55
WE ROI Norm1 4 0.726 7.49 9.64 37.65
WE ROI Norm1 (1) 4 0.664 8.29 9.51 37.14
WE MWPLS Norm1 5 0.826 5.97 10.26 40.07
WE MWPLS Norm1 (1) 4 0.684 8.04 10.43 40.74

Table 12 Calibration Evaluation for Normalisation on the M1GluR2 data using NormOH. 32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg Full Norm OH 2 0.996 0.84 0.93 3.63
Avg ROI Norm OH 2 0.998 0.62 0.65 2.53
Avg ROI NormOH (1) 2 0.998 0.62 0.65 2.53
Avg MWPLS NormOH 3 0.999 0.44 0.49 1.91
Avg MWPLS Norm OH(1) 3 0.999 0.44 0.49 1.91
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Full Norm OH 3 0.994 1.06 1.22 4.76
BC ROI Norm OH 3 0.995 0.98 1.13 4.41
BC ROI Norm OH (1) 3 0.995 0.98 1.13 4.41
BC MWPLS Norm OH 3 0.995 0.97 1.12 4.37
BC MWPLS Norm OH(1) 3 0.995 0.97 1.12 4.37

107

Table 13 Calibration Evaluation for Normalisation on the M1GluR2 data using Norm2. 32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg Full Norm2 2 0.995 1.02 1.12 4.37
Avg ROI Norm2 2 0.999 0.43 0.45 1.75
Avg ROI Norm2 (1) 2 0.996 0.85 0.91 3.55
Avg MWPLS Norm2 2 0.999 0.53 0.56 2.18
Avg MWPLS Norm2 (1) 2 0.994 1.12 1.21 4.72
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Full Norm2 3 0.999 0.42 0.49 1.91
BC ROI Norm2 3 0.999 0.45 0.50 1.95
BC ROI Norm2 (1) 3 0.966 2.65 3.00 11.71
BC MWPLS Norm2 3 0.999 0.37 0.41 1.60
BC MWPLS Norm2 (1) 3 0.965 2.69 3.05 11.91
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE Full Norm2 4 0.822 5.45 7.22 28.20
WE ROI Norm2 3 0.664 7.49 9.28 36.25
WE ROI Norm2(1) 5 0.822 5.44 8.57 33.47
WE MWPLS Norm2 5 0.842 5.14 7.41 28.94
WE MWPLS Norm2(1) 5 0.834 5.25 8.02 31.32

Table 14 Calibration Evaluation for Normalisation on the M1GluR2 data using Norm INF. 32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg Full Norm INF 2 0.995 1.03 1.13 4.41
Avg ROI Norm INF 2 0.997 0.71 0.75 2.92
Avg ROI Norm INF (1) 2 0.996 0.89 0.96 3.75
Avg MWPLS Norm INF 2 0.999 0.53 0.56 2.18
Avg MWPLS Norm INF(1) 2 0.995 1.05 1.13 4.41
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Full Norm INF 3 0.999 0.41 0.47 1.83
BC ROI Norm INF 3 0.999 0.47 0.52 2.03
BC ROI Norm INF (1) 3 0.995 0.98 1.12 4.37
BC MWPLS Norm INF 3 0.999 0.36 0.40 1.56
BC MWPLS Norm INF(1) 3 0.995 0.97 1.12 4.37
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE Full Norm INF 4 0.781 6.05 9.32 36.40
WE ROI Norm INF 8 0.860 4.83 8.54 33.35
WE ROI Norm INF(1) 6 0.896 4.16 8.47 33.08
WE MWPLS Norm INF 5 0.819 5.50 9.42 36.79
WE MWPLS Norm INF(1) 5 0.797 5.82 9.07 35.42

108

3.5.4 Derivative Pre-Processing of M1GluR2 Data
The first order derivative smoothed and resolved peaks in complex spectral profiles. It
also caused the spectral effects of baseline offset and slopes to diminish. Here
derivative pre-processing of the data was performed using the Savitzky Golay
algorithm. The settings chosen were first order derivative with a filter width of eleven
and a polynomial order of three. In Figure 57 the averaged Raman spectra of the
M1Glu sample and the spectral profiles after first order derivative pre-processing of
the different regions (full, ROI and MWPLS) are compared. The first order derivative
spectra of M1GluR2 contained a large peak above 2900 cm–1, with multiple smaller
positive and negative peaks in the fingerprint region (400–1800 cm–1), indicative of
the D-glucose signal.

Figure 57 Effects of first order derivative pre-processing on M1GluR2 data, (a) Raw spectra, (b)
after processing in the full range (c) in the ROI range and (d) in the MW range.

Thus far the WE data only worked without further pre-processing. MSC and
normalisation increased artefacts produced by the water elimination dividing the
dataset into two populations of samples (high and low concentration). However the
first order derivative handled the water eliminated spectra as the baseline offset was
corrected and the scatter was reduced. When comparing the first derivative of the
500 1000 1500 2000 2500 3000
2
4
6
8
10
12
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-2000
-1000
0
1000
2000
3000
4000
Wavenumber (cm-1)
In
te
n
s
it
y
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-1000
-800
-600
-400
-200
0
200
400
600
Wavenumber (cm-1)
In
te
n
s
it
y
900 1000 1100 1200 1300 1400 1500 1600
-1000
-800
-600
-400
-200
0
200
400
600
Wavenumber (cm-1)
In
te
n
s
it
y
(a)
(c) (d)
(b)

109

averaged data (Figure 57b) to the water eliminated data (Figure 58), the OH bending
band (1640 cm–1) was clearly removed. In addition, the OH stretch above 3000 cm–1
left some noise, but region selection avoided interference from that spectral region.

The first order derivative data generated reasonable PLS models, however the models
generated using MSC and Norm2 pre-processing were better. Table 15 shows
consistent values for the PLS models; models for the region selection were virtually
the same and showed a slight improvement on the full range models.33

Table 15 Calibration Evaluation for first order derivative (FD) on the M1GluR2 data 32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg FD13 Full 2 0.994 1.06 1.17 4.57
Avg FD13 ROI 3 0.997 0.84 0.98 3.82
Avg FD13 ROI (1) 3 0.997 0.84 0.99 3.86
Avg FD13 MWPLS 3 0.996 0.84 0.99 3.86
Avg FD13 MWPLS (1) 3 0.996 0.85 0.99 3.86
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC FD13 Full 2 0.994 1.06 1.16 4.53
BC FD13 ROI 3 0.997 0.84 1.00 3.90
BC FD13 ROI (1) 3 0.997 0.84 0.99 3.86
BC FD13 MWPLS 3 0.996 0.85 1.00 3.90
BC FD13 MWPLS(1) 3 0.996 0.85 1.00 3.90
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE FD13 Full 5 0.999 0.54 0.92 3.59
WE FD13 ROI 3 0.998 0.62 0.97 3.78
WE FD13 ROI (1) 3 0.998 0.63 0.99 3.86
WE FD13 MWPLS 4 0.999 0.51 0.99 3.86
WE FD13 MWPLS (1) 3 0.998 0.61 0.98 3.82

Figure 58 First order derivative pre-processing on the water eliminated M1GluR2 Raman
spectra.

33 The settings used in Figure 50 are not the same as those for FD modelling and the end of range
effects are less severe with the FD setting of filterwidth of eleven and polynomial order of three.
500 1000 1500 2000 2500 3000
-1000
-500
0
500
1000
1500
2000
Wavenumber(cm-1)
Int
en
sit
y

110

3.5.5 MSC-FD and FD-MSC Pre-Processing of M1GluR2
Data
The next logical step was to consider the combination of multiple pre-processing
methods and how these might improve model quality in terms of RMSEC/RMSECV.
The pre-processing combination was based on the performance of the individual
method models. The combination of MSC and FD was investigated as their singular
models could be improved34 and also these methods complement each other. MSC is a
signal and scatter correction method and FD is a signal correction method. Together
they can remove baseline shift and additive effects. Considering the large impact that
baseline had on the data, the use of first derivative pre-processing alone eliminated the
baseline. However, a drawback of first derivative was that the artefacts generated by
smoothing and filtering had increased the noise. In order to prevent this, MSC was
used to increase the signal to noise ratio following derivative pre-processing.

Figure 59 Effects of FD-MSC (left) and MSC-FD (right) pre- processing on the M1GluR2 data in
the 707-1853cm–1, where the red is for pre-processing after area selection and blue represents
pre-processing before area selection.

Both sequences (MSC-FD and FD-MSC) of pre-processing were investigated (see
Table 16 and Table 17). FD-MSC outperformed MSC-FD leading to the best model.
Therefore for all remaining data analysis, FD-MSC was used. The different
arrangements generated different results for region selection. The FD-MSC led to a
poor performance when performed after area selection while MSC-FD was less
affected by area selection. In the FD-MSC spectra (Figure 59), the OH bending band
(1640 cm–1) was the main difference between the pre-processing before and after area
selection. With pre-processing after area selection, more variance caused by this band

34 The best model thus far was from Norm2 averaged data in the region 707–1853 cm–1.
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-1000
-500
0
500
1000
Wavenumber(cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-2000
-1000
0
1000
2000
3000

FSTMSCROI(1)
FSTMSCROI
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-1000
-500
0
500
1000
Wavenumber(cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-2000
-1000
0
1000
2000
3000

MSCFSTROI(1)
MSCFSTROI

111

resulted in the weaker calibration models, due to the strong influence that water had
on the data.

The WE process proved to be problematic with FD-MSC and MSC-FD (Table 16 and
Table 17). As in the case of the MSC and normalisation methods, after WE, the signal
was inundated with too much noise to be useful, since the low concentration samples
were largely composed of water. The low concentration samples were altered making
them different from high concentration samples (Figure 60 a and c). From the spectra
for M1GluS01 and M1GluS31 and the loadings (Figure 60 b and d), it was clear that
the low concentration sample signal was laden with noise while the higher
concentration sample reflected the D-glucose signal matching the first loadings signal.
The first loading explained 66.95% of the explained variance; this was much lower
compared to good correlation models (e.g. Figure 54a, with 99.99%). Overall the
noise artefacts generated in the low concentration samples prevented a correlation
between the Raman signal and D-glucose concentration.

Figure 60 (a) Hotelling’s T2 vs Q residuals, (b) M1GluSO1 and M1GluS31 spectra for WE
M1GluR2 data after FD-MSC, (c) the scores plot for PC1 vs PC3, with the high concentration
samples marked black and the low concentration samples marked red, and (d) the loadings
spectra for the four PCs.

0 2 4 6 8 10 12
x 10
6
0
5
10
15
20
25
Q Residuals (6.36%)
H
o
te
ll
in
g
T
^
2
(
9
3
.6
4
%
)
1
2
3
4
5
6
7
9 10 11
12
16 18
30
31
2000 2500 3000 3500 4000 4500 5000 5500 6000
-8000
-6000
-4000
-2000
0
2000
4000
6000
8000
Scores on PC 1 (67.48%)
S
c
o
re
s
o
n
P
C
3
(
8
.7
7
%
)
1
2
3
4
5
6
7
8 9 15 18
19
20
22
31
500 1000 1500 2000 2500 3000
-2.2
-2
-1.8
-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
Wavenumber(cm-1)
L
o
a
d
in
g
s

L1(66.95%) L2(17.15%) L3(1.71%) L4(7.84%)
500 1000 1500 2000 2500 3000
-3000
-2000
-1000
0
1000
2000
3000
4000
Wavenumber(cm-1)
In
te
n
s
it
y

M1S01
M1S31
(c)
(d)
(a) (b)

112

Table 16 Calibration Evaluation for MSC-FD on the M1GluR2 data.32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg MSCFD Full 2 0.999 0.48 0.53 2.07
Avg MSCFD ROI 2 0.999 0.38 0.40 1.56
Avg MSCFD ROI (1) 2 0.999 0.39 0.42 1.64
Avg MSCFD MWPLS 2 0.999 0.38 0.40 1.56
Avg MSCFD MWPLS (1) 3 0.998 0.59 0.69 2.69
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC MSCFD Full 2 0.999 0.44 0.49 1.91
BC MSCFD ROI 2 0.999 0.35 0.38 1.48
BC MSCFD ROI (1) 2 0.998 0.68 0.72 2.81
BC MSCFD MWPLS 2 0.999 0.35 0.38 1.48
BC MSCFD MWPLS(1) 2 0.997 0.76 0.80 3.12
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE MSCFD Full 5 0.773 6.81 12.98 50.70
WE MSCFD ROI 3 0.647 8.50 11.28 44.06
WE MSCFD ROI (1) 3 0.518 9.85 11.71 45.74
WE MSCFD MWPLS 2 0.404 11.04 11.74 45.85
WE MSCFD MWPLS (1) 2 0.353 11.53 12.56 49.06

Table 17 Calibration Evaluation for FD-MSC on the M1GluR2 data.32
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
Avg FDMSC Full 2 0.999 0.38 0.42 1.64
Avg FDMSC ROI 2 0.999 0.32 0.35 1.36
Avg FDMSC ROI (1) 3 0.971 2.44 2.89 11.28
Avg FDMSC MWPLS 2 0.999 0.32 0.34 1.32
Avg FDMSC MWPLS (1) 2 0.944 3.38 3.70 14.45
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC FDMSC Full 2 0.999 0.38 0.42 1.64
BC FDMSC ROI 2 0.999 0.32 0.35 1.36
BC FDMSC ROI (1) 3 0.971 2.44 2.89 11.28
BC FDMSC MWPLS 2 0.999 0.32 0.35 1.36
BC FDMSC MWPLS(1) 2 0.944 3.38 3.70 14.45
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE FDMSC Full 4 0.745 7.22 12.71 49.64
WE FDMSC ROI 2 0.391 11.16 11.99 46.83
WE FDMSC ROI (1) 4 0.723 7.53 10.79 42.14
WE FDMSC MWPLS 2 0.391 11.16 11.95 46.67
WE FDMSC MWPLS (1) 2 0.359 11.45 12.07 47.14

113

3.6 Outcomes from M1Glu Data Analysis
The glucose in water model was used to evaluate which data collection, data-pre-
processing and PLS modelling conditions that were most important for accurate
prediction of concentration using Raman spectroscopy. This investigative study
showed that it was possible to precisely quantify D-glucose in water at concentrations
typical of that used in cell culture (REP< 1.5%). The two main issues to overcome in
order to achieve this result were the strong water band and the baseline offset in the
Raman signal.

To deal with the strong water signal, WE was implemented and worked up to a point.
However, the WE method also introduced artefacts which highlighted differences
between high and low concentration samples. These artefacts were eliminated with a
simple first derivative, but not with MSC or normalisation which saw two linear
ranges emerge for high and low concentration samples: the first range of samples (1 to
12) covered 1.6 g/L to 19.0 g/L and the second range started from sample 8, covering
12 g/L to 50 g/L.

The baseline offset had a negative impact on calibration modelling. Through changes
to the experimental setup and pre-processing, reduction of spectral variance and
improved D-glucose signal were observed. The best pre-processing was FD-MSC.
The FD pre-processing removed baseline offset and also smoothed and resolved
peaks. This was prior to MSC, where correction of remaining offset, scatter and
baselinesshift, as well as derivative artefacts was performed.

The best performing models are shown in Table 18. It was clear that for each model
the performance was improved by region selection, which removed the influence of
the large OH band. Overall, the best calibration model was built on the averaged data,
with further pre-processing by FD and MSC, before finally being reduced to the best
performing region of 800–1680 cm–1. This model was referred to as M1Glu AVG FD-
MSC MW.

114

Table 18 The optimal M1GluR2 models generated after the different pre-processing methods.
M1GluR2 LV R2 RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC MW (800–1680 cm–1) 3 0.998 0.62 0.72 2.81
AVG ROI (707–1853 cm–1)
MSC pre-processing
3 0.999 0.35 0.39 1.52
AVG ROI (707–1853 cm–1)
Norm2 pre-processing
2 0.999 0.43 0.45 1.75
AVG ROI (707–1853 cm–1)
FD pre-processing
3 0.997 0.84 0.98 3.82
AVG MW (800–1680 cm–1)
FD-MSC pre-processing
2 0.999 0.32 0.34 1.32

3.7 Quantification of D-glucose in a ternary mixture
(M3Glu-Data)
The feasibility of using Raman with dilute solutions to extract relevant information
with an acceptable level of accuracy (~2% REP) was studied [4]. For the M3Glu
sample set35, the total dissolved analyte concentration ranged from 1.74 to 15.06 g/L
and was thus a dilute cell culture medium. The rationale for using a dilute cell culture
medium was that dilute solutions may generate better quality spectra for both
fluorescence and SERS measurements in quantification of the complex ingredients
(eRDF and YE). Therefore it was decided to investigate whether it was possible to
accurately quantify the glucose content in these dilute media. In practice, the goal
might be to take a medium sample, dilute it, and then perform fluorescence,
conventional Raman and SERS all on the same sample. The second goal of the
M3Glu study was to assess the effect of spectral overlap when trying to calibrate the
D-glucose concentration in the presence of a strong water background signal and
multiple similar components: e.g. L-glutamine and D-Galactose (Figure 61).

35 Table 3 Composition of the M3Glu

115

Figure 61 Overlay of the Raman spectra of solid D-glucose, L-glutamine and D-galactose (λex 785
nm). These spectra were collected as single scan (10 second exposure) from 250–3311 cm–1 with 8
cm–1 resolution.
3.7.1 Spectral Analysis of M3Glu Data
The averaged, baseline corrected and water eliminated M3GluR2 spectra are
displayed with the averaged M1GluR2 spectrum for comparison (Figure 62). With
these low concentration M3Glu samples, the analyte signal was weak. Any overlap
between the analytes (glucose, galactose and glutamine) was eclipsed by the water
signal in the data. A water analyte ratio (WAR) of 11.37 for the M3Glu data was
observed compared to the WAR of 9.35 in the M1Glu data. The higher WAR
signified the larger water signal within the weaker M3Glu dataset.
500 1000 1500 2000 2500 3000
1
2
3
4
5
6
7
x 10
4
Wavenumber(cm-1)
In
te
n
s
it
y

D-Glucose
D-Galactose
L-Glutamine

116

Figure 62 (a) Averaged Raman spectra of the M1GluR2 data, (b) averaged Raman spectra of the
M3GluR2 data, (c) baseline corrected Raman spectra of the M3GluR2 data, and (d) water
eliminated Raman spectra of the M3GluR2 data

Similar to the M1Glu data, a sloping baseline, baseline offset and large water signal
were characteristic of the M3Glu data. The baseline offset effects were increased by
the low analyte concentrations of the M3Glu samples as seen in Figure 62a and b.
After water elimination, a more detailed spectrum was generated with distinguishable
bands and water elimination artefacts. The artefacts were seen as the increased
baseline offset (1200–2100 cm–1) and the large noise signal seen above 3000 cm–1.
The M1Glu data analysis showed that these artefacts had a negative impact on the
modelling ability and limited useful spectral ranges. Water elimination was then
tested for M3Glu samples to verify if the same outcome occurred with more complex
data.
3.7.2 Reproducibility
When the PCA analysis was performed on the individual data collections, it did not
reveal any outliers and only two components were needed to describe the M3Glu data.
The first component represented the water signal and the second component contained
analyte signal. The M3Glu data showed a similar pattern to the M1Glu data for the
500 1000 1500 2000 2500 3000
0
0.5
1
1.5
2
2.5
3
3.5
x 10
4
Wavenumber (cm-1)
In
te
n
si
ty
500 1000 1500 2000 2500 3000
-0.5
0
0.5
1
1.5
2
2.5
x 10
4
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-2000
-1000
0
1000
2000
3000
Wavenumber (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
-1
0
1
2
3
4
5
6
7
x 10
4
Wavenumber(cm-1)
In
te
n
si
ty
(a) M1 Raw Data
(c) M3 BC Data
(b) M3Raw Data
(d) M3 WE Data

117

amalgamated samples sets (Figure 63). The second and third data collections were
close together while the first was separated due to the sampling setup change.

Figure 63 PCA Scores and Loadings plots for triplicate measurements (a)/(b) for averaged raw
M3Glu samples and (c)/(d) for the FDMSC M3Glu samples. Run 1 is black, Run 2 is red, and
Run 3 is green.

The impact of water and spectral offset was evident in the loadings (Figure 63b). For
example, in the M3GluL3 data, there was a severe baseline slope, a large downward
peak at the water bending band (1640 cm–1) and small analyte peaks. Their visibility
was hampered by the low analyte concentration, baseline slope and the large water
signal. After FD-MSC pre-processing (Figure 63c and d), the deviations caused by
scatter effects were dealt with and the number of loadings describing the media was
reduced to two, where the second loading described the D-glucose signal. This
showed that the data can be adequately corrected for quantitative analysis by
chemometric pre-processing.
-3 -2 -1 0 1 2 3 4 5 6
x 10
4
-2500
-2000
-1500
-1000
-500
0
500
1000
1500
2000
2500
Scores on PC 1 (99.42%)
S
c
o
re
s
o
n
P
C
2
(
0
.3
0
%
)
500 1000 1500 2000 2500 3000
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Wavenumber(cm-1)
L
o
a
d
in
g
s

L1(99.42%)
L2(0.30%)
500 1000 1500 2000 2500 3000
-0.1
-0.05
0
0.05
0.1
0.15
Wavenumber(cm-1)
In
te
n
s
it
y

L1(96.7%)
L2(3.28%)
L3(0.01%)
-3 -2 -1 0 1 2
x 10
5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
x 10
4
Scores on PC 2 (3.28%)
S
c
o
re
s
o
n
P
C
3
(
0
.0
1
%
)
(b)M3 Loadings (a) M3
Scores
(c) M3_FDMSC
Scores
(d)M3_FDMSC Loadings

118

3.7.3 Quantitative Analysis: Calibrating D-glucose in
M3Glu Data
To determine which model best estimated the D-glucose content in media samples, a
wide variety of models were assessed (Table 65–Table 69). The best M3Glu models
for the different pre-processing methods are shown in Table 19.

Table 19 The best performing M3Glu models generated after the different pre-processing
methods.
M3Glu Data LV Correlation
Coefficient
RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE Data MW (800–1680 cm–1) 4 0.990 0.28 0.38 7.65
BC MW (800–1680 cm–1)
MSC pre-processing
4 0.993 0.25 0.30 6.04
BCMW (800–1680 cm–1)
Norm2 pre-processing
4 0.993 0.25 0.32 6.44
BC FD MW (800–1680 cm–1)
FD pre-processing
3 0.993 0.25 0.29 5.84
BCFDMSCMW (800–1680 cm–1)
FD MSC pre-processing
3 0.995 0.20 0.23 4.63

When compared with the best M1Glu models (Table 18), there was a decline in the
calibration model quality. This was the result of the lower analyte concentrations in
these samples. Overall, the baseline corrected spectra suited the calibration models
and the best model also used FD-MSC pre-processing. It aided the model by resolvingthe peaks and removing baseline offset to improve the analyte signal. As with the
M1Glu data, water elimination on the M3Glu data only worked with first derivative
pre-processing.

For each pre-processing method, the best models used the 800–1680 cm–1 range as the
water bending band at 1640 cm–1 acted like an internal reference. The 1640 cm–1
water band remained steady, unlike the strong OH band above 3000 cm–1 which was
affected by detector limitations, shot noise and sampling effects, causing greater
variation. In these very dilute solutions compared to the M1Glu data, the strong water
signal was a benefit. The water signal showed little change and thus acted as an
internal standard. This region also contained several peaks 915, 1059, 1123, 1372, and
1460 cm–1 related to the D-glucose signal. The variation in the D-glucose signal was

119

self-referenced to the stable water signal allowing for estimation of the D-glucose
concentration[98, 131].

The best overall model was obtained after FD-MSC was applied to baseline corrected
data and when the spectral range was reduced to 800–1680 cm–1. Three latent
variables were necessary to model the data. The first loading (98.53%) represented the
large water signal within M3Glu data (Figure 64). Figure 65 shows a comparison
water spectrum before and after FDMSC pre-processing. The second loading
accounted for 1.16% of the explained variance and was the analyte signal. The third
loading (0.10%) was unresolved analyte signal and spectral noise.

Figure 64 The BC M3Glu calibration model is built using FDMSC pre-treated data in the 800-
1680cm–1 region, (left) the predicted versus expected plot and (right) the three latent variables
loadings used in the calibration model.

This model showed that for the more complex and dilute media such as M3Glu, one
could use Raman together with chemometrics to estimate the D-glucose concentration
with reasonable accuracy (REP of 4.63%). The same pre-processing and region
selection as for the M1Glu samples was used. The next step was to determine if the
same methodology worked on the more complex M5Glu Media which contains five
media components.

0 1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.995
3 Latent Variables
RMSEC = 0.20568
RMSECV = 0.23945
900 1000 1100 1200 1300 1400 1500 1600
-0.3
-0.2
-0.1
0
0.1
0.2
Wavenumber(cm-1)
In
te
n
s
it
y

L1(98.53%)
L2(1.16%)
L3(0.10%)

120

Figure 65 Raman spectra of water before and after by FDMSC pre-processing and inset is the
707-1853 cm–1 region FDMSC spectrum.
3.8 Quantification of D-glucose in a quinary mixture
(M5Glu-Data)
Using a recipe for media deployed within industry, a set of samples containing a fixed
concentration of eRDF, yeastolate, D-galactose and L-glutamine and a varying
concentration of D-glucose (0.0 g/L-9.92 g/L) were prepared (see section 2.12.4).
3.8.1 Spectral Analysis and Reproducibility of M5GLU
Data
The M5Glu spectra showed a strong baseline offset with large water signal obscuring
a lot of the analyte signal. During the analysis of the M1Glu, M3Glu and M5Glu data,
baseline offset increased with the change to low concentration samples. The WAR for
M5Glu dataset was 9.71, which was lower than the M3Glu dataset (11.37), as the
M5Glu had a more complex sample makeup with a higher percentage of dissolved
solids.
500 1000 1500 2000 2500 3000
0
1
2
3
4
5
6
7
8
9
10
x 10
4
Wavenumber(cm-1)
In
te
n
s
it
y

H2O
500 1000 1500 2000 2500 3000
-2000
-1000
0
1000
2000
3000
4000
Wavenumber(cm-1)
In
te
n
s
it
y

FDMSC-H2O
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-600
-400
-200
0
200
400
600
Wavenumber(cm-1)
In
te
n
si
ty

FDMSC-H2O

121

Figure 66 Averaged raw Raman spectra of the M5GluR1 data (a) and M5GluR2 (b).

For the first data collection, the averaged spectra (Figure 66a) displayed a severe
baseline offset. After baseline correction, however, the offset was removed. When
conducting PCA analysis on the individual data collections for M5Glu, the first data
collection showed no outliers. However in the M5GluR2 data (Figure 66b) a sample
displayed a lower intensity than the other samples; this sample was identified as
M5GluR2S12. PCA analysis (Figure 67) confirmed the outlier. When M5GluS12 was
measured as part of the third data collection, no outliers were present. Therefore this
outlier was the result of an odd measurement due to experimental error. After
M5GluR2S12 was removed, the repeated PCA analysis of the raw averaged
M5GluR2 data revealed that the remaining samples were within the 95% confidence
limit.

Figure 67: PCA Scores plots for averaged M5GluR2 data before and after the outlier removal
and the loadings of the PC1 and PC2 for Run 2 data after outlier has been removed.

The M5Glu PCA loadings (Figure 67) illustrated a trend similar to the one seen in all
the media data analysed so far. The first loading was dominated by the water signal
(99.98% of the explained variance) and the second revealed small peaks (426, 522,
1066, 1123, 1362 and 2898 cm–1) for only ~0.02% explained variance. This was a
-5 0 5 10 15
x 10
5
-1
-0.5
0
0.5
1
1.5
2
x 10
4
Scores on PC 1 (99.98%)
S
c
o
re
s
o
n
P
C
2
(
0
.0
1
%
)
1
3
4
5
6
11
12
13
16
17
18
21
26
29
30
31
32
-5 0 5 10 15
x 10
5
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
x 10
4
Scores on PC 1 (99.98%)
S
c
o
re
s
o
n
P
C
2
(
0
.0
1
%
)
1
4
5
6
9
11
16
17
18
21
23
24
25
26
27
29
30
31
32
500 1000 1500 2000 2500 3000
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Wavenumber(cm-1)
In
te
n
s
it
y

L1(99.98%)
L2(0.01%)
500 1000 1500 2000 2500 3000
0.5
1
1.5
2
2.5
x 10
5
Wavenumber(cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
1
2
3
4
5
6
7
8
9
10
x 10
4
Wavenumber(cm-1)
In
te
n
s
it
y
(a) (b)

122

result of the low analyte signal intensity compared to the water signal within the
spectra. Pre-processing of the data increased the explained variance for PC2 to 0.33%,
by enhancing the analyte signal.

Looking at the third data collection for M5Glu spectra (Figure 68), the raw data was
comprised of two groups of spectra as a result of being collected over 2 days. This
grouping was caused by instrumental variation of the Raman Station, a single channel
instrument which has no internal calibration to prevent possible power fluctuations
which could have caused the shift in the M5Glu data. This variation was corrected by
FD-MSC pre-processing and when the FD-MSC corrected data was overlaid for Day
1 and Day 2 (Figure 68b), the samples were distributed according to their
concentrations.

Figure 68 Averaged Raman spectra of the M5GluR3 data (left) and PCA Scores plots for
M5GluR3 data first derivative and multiplicative scatter correction (right). Red refers to day one
and black is day two.
1200 1250 1300 1350 1400 1450 1500 1550
-500
-400
-300
-200
-100
0
100
200
300
400
500
Scores on PC 1 (98.38%)
S
c
o
re
s
o
n
P
C
2
(
1
.0
4
%
)
1
2
3
4 5
6
7 8
9 10
11
12
13
14
16
17
18
19
20 21
23
25
26
27
28 29
30 31
32
500 1000 1500 2000 2500 3000
1
2
3
4
5
6
7
8
9
10
x 10
4
Wavenumber(cm-1)
In
te
n
s
it
y
(a) (b)
Low
High

123

Figure 69 PCA Scores and loadings plots for triplicate measurements of M5Glu sample sets. Run
1 is black, Run 2 is red and Run 3 is green.

As with the M1Glu and M3Glu data, PCA analysis on the M5Glu data showed that
the samples from the second and third runs were close together while the first data
collection was separated due to the sampling setup issue (Figure 69). Pre-processing
wasunable to correct for this variance. The strong water signal was observed in the
first loading, the second loading depicted the analyte signal and the third loading
represented the offset effects caused by water at 1640 cm–1 and above 3000 cm–1.
After FD-MSC pre-processing of the M5GluR3 data, the loadings were reduced to
two: one for the water signal and the other for the analyte, since the interferences seen
in the raw data were removed.
-6 -4 -2 0 2 4
x 10
4
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
x 10
4
Scores on PC 2 (0.12%)
S
c
o
re
s
o
n
P
C
3
(
0
.0
1
%
)
500 1000 1500 2000 2500 3000
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Wavenumber(cm-1)
In
te
n
s
it
y

L1(99.87%)
L2(0.12%)
L3(0.01%)
(a) M5 Scores (b) M5 Loadings
-2 0 2 4 6 8
x 10
4
-5000
-4000
-3000
-2000
-1000
0
1000
2000
3000
4000
5000
Scores on PC 1 (98.74%)
S
c
o
re
s
o
n
P
C
2
(
0
.8
6
%
)
500 1000 1500 2000 2500 3000
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Wavenumber(cm-1)
L
o
a
d
in
g
s

L1(98.74%)
L2(0.86%)
(c) M5_FDMSC Scores
(d) M5_FDMSC Loadings

124

3.8.2 Quantification: Glucose in M5Glu Data
Several models were built for the M5Glu data (Table 70 to Table 74) and the best
models for estimating the D-glucose concentration are listed in Table 20. From the
M1Glu/M3Glu dataset, it was observed that the reduced region of 800–1680 cm–1 and
FD-MSC pre-processing gave the best results. The same held true for the M5Glu data
(Figure 70). The best model used two latent variables, the first represented the water
signal and the second the analyte signal.

Figure 70 Predicted versus expected plots for the calibration of BC FDMSC M5Glu data in the
800-1680 cm–1 region and the loadings showing the components represented in the calibration
model.

Table 20 The best M5Glu models generated after the different pre-processing methods.
M5Glu Data LV Correlation
Coefficient
RMSEC
(g/L)
RMSECV
(g/L)
REP
(%)
BC Data (Entire region)
Preliminary pre-processing
4 0.983 0.38 0.50 8.63
BC MW (800–1680 cm–1)
MSC pre-processing
4 0.990 0.29 0.35 6.04
AVG MW (800–1680 cm–1)
Norm2 pre-processing
4 0.992 0.26 0.34 5.87
BC MW (800–1680 cm–1)
FD pre-processing
3 0.985 0.36 0.44 7.59
BC MW (800–1680 cm–1)
FD-MSC pre-processing
2 0.993 0.25 0.27 4.66

The chemometric modelling ability of Raman data to quantify D-glucose in
increasingly complex media was dependent on media complexity and concentration.
The M1Glu/M3Glu data modelling showed that the estimation of D-glucose was
possible at both high and low concentrations. The low analyte concentration reduced
the PLS model accuracy but the overall performance was still reasonable. The
quantification of glucose in M5Glu equalled that obtained for the simple M3Glu
samples. This indicated that increasing complexity did not adversely affect the
0 2 4 6 8 10 12
0
2
4
6
8
10
12
Expected (g/L)
P
re
di
ct
ed
(g
/L
)
R2 = 0.993
2 Latent Variables
RMSEC = 0.25039
RMSECV = 0.27805
900 1000 1100 1200 1300 1400 1500 1600
-0.3
-0.2
-0.1
0
0.1
0.2
Wavenumber(cm-1)
In
te
ns
ity

L1(99.78%)
L2(0.92%)

125

spectral data quality and that glucose quantification should always be possible in
complex media as long as it is present in relatively high concentration. If more
samples and more replicates were used in the calibration set and if a smaller, more
appropriate (to the designed formulation) D-glucose concentration range was
employed, then it should be feasible to get a much lower REP in the 1–2 % range.
3.9 Quantification of eRDF and Yeastolate in quinary
mixtures (M5eRDF and M5Ye)
It was possible to use Raman spectroscopy to estimate a single simple component (D-
glucose) to a reasonable level of accuracy. It was therefore also desirable to know if
the same was possible with complex media ingredients as a whole unit within the
media formulation. Therefore the goal of this section is to ascertain if Raman can be
used to determine if the correct amount of eRDF or yeastolate was added to a media.
Two sets of samples were prepared. Both contained D-galactose, D-glucose, L-
glutamine eRDF and yeastolate but for M5eRDF sample set, the concentration of
eRDF was varied while for M5Ye samples, the yeastolate concentration was varied.
The other components were kept at a constant level.
3.9.1 Spectra Analysis of M5eRDF and M5Ye Data

Figure 71 Averaged raw Raman spectra of (a) M5Ye (0.1-1.72 g/L) and (b) M5eRDF (1-6.4g/L)
and, in red, the spectra after multiplicative scatter correction.

The raw Raman spectra for M5eRDF and M5Ye resembled water spectra with strong
baseline offset effects (Figure 71). At the low concentrations used in these samples,
the Raman signal was weak and there was little difference observed in the spectra for
500 1000 1500 2000 2500 3000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
5
Wavenumber cm-1
In
te
n
s
it
y

500 1000 1500 2000 2500 3000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
5
Wavenumber cm-1
In
te
n
s
it
y
(a) M5Ye (b) M5eRDF

126

M5Ye, M5eRDF and M5Glu samples (Figure 66). The WARs were similar for
M5eRDF (14.93) and M5Ye (14.64) while the WAR for M5Glu data was 9.71. The
M5Glu samples were a more concentrated sample set compared to the M5eRDF and
M5Ye samples.

As a result of the weak analyte signal, multivariate analysis was required to extract
relevant analyte information. MSC pre-processing was used to remove spectral offset
and noise (Figure 71) and for the quantitative analysis the same spectral regions used
with M5Glu data were used (250–3311 cm–1, 707–1853 cm–1 and 800–1680 cm–1).
3.9.2 Quantification: eRDF in M5eRDF
The best PLS calibration models built with the M5eRDF Raman data are summarized
in Table 21. A full account of each model is available in the appendix; see section
8.3.4. The Raman data for M5eRDF was similar to the M1Glu, M3Glu and M5Glu
datasets. The M5eRDF spectra suffered from baseline offset and a strong water signal.
The same pre-processing methods worked well for the M5eRDF samples; FDMSC
removed the baseline offset efficiently and resolved the analyte peaks within the data.

Table 21 Summary of Calibration models for the M5eRDF samples using averaged Raman data.
M5eRDF LV Correlation
Coefficient
RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Data
MW(800–1680 cm–1)
5 0.980 0.24 0.70 18.91
Avg ROI (707–1853 cm–1)
MSC pre-processing
5 0.995 0.12 0.62 16.94
BC ROI (707–1853 cm–1)
NINF pre-processing
5 0.988 0.18 0.72 19.62
BC MW(800–1680 cm–1)
Norm2 pre-processing
4 0.974 0.27 0.77 20.81
BC ROI (707–1853 cm–1)
FST11 pre-processing
3 0.975 0.27 0.62 16.75
BC ROI(707–1853 cm–1)
FST11MSC pre-processing
4 0.993 0.14 0.59 15.94

The pre-processed M5eRDF spectra and the best calibration model are shown in
Figure 72. The best model was built using the reduced range of 707–1853 cm–1; this
eliminated the strong OH band and the sloping baseline seen below 700 cm–1; (visible
in Figure 71). The model used four variables and the second loading correlated with
the analyte signal and its scores showed a noisy linear correlation of 0.911 with

127

increasing concentration (Figure 73). The first loading represented the water signal
with 99% of the explained variance in the spectra. The remaining two loadings
described less than 1% of the explained variance (analyte signal and noise). When
compared to the second loading, these represented ~38% and ~23% of the analyte
signal and noise. The overwhelming water signal reduced their influence on the
model.

Figure 72 The pre-treated spectra of BC FDMSC M5eRDF data and the predicted versus
expected eRDF concentration plot for the calibration model for M5eRDF in the region of 707-
1853 cm–1.

Figure73 M5eRDF loadings (left) and scores (right) of the second component for BC FDMSC
ROI calibration model.

Compared to the M5Glu calibration models (Table 20) where the best REP was ~ 5%,
the M5eRDF model had significantly lower accuracy with a REP of ~16%. The
difference in performance was the result of the high WAR of the M5eRDF compared
to the M5Glu sample set and the different sample numbers used per sample sets.36
M5Glu comprised of 32 samples spanning 0.0 g/L to 9.92 g/L while only 10 samples

36 Measurement precision increases with more samples.
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-1000
-500
0
500
1000
Wavenumber(cm-1)
In
te
n
s
it
y
1 2 3 4 5 6 7
1
2
3
4
5
6
7
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.993
4 Latent Variables
RMSEC = 0.14718
RMSECV = 0.59011
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Wavenumber(cm-1)
L
o
a
d
in
g
s

LPC1(99.69%)
LPC2(0.13%)
LPC3(0.05%)
LPC4(0.03%)

128

were used for M5eRDF covering 1.0 g/L to 6.4 g/L range. The performance of the
M5eRDF model could then be improved by increasing the sample numbers and
increasing the concentration of eRDF within the samples.

When comparing the performance of the M5eRDF model to the M5Glu one, over-
fitting37 was evident within the data as noted by the low RMSEC and high RMSECV.
The M5eRDF models used more latent variables on average. For the best M5Glu
models the average ratio of SECV to SEC was 1.2, while for the M5eRDF the
averaged ratio was 3.5. This was another indicator that M5eRDF data was over-fitted.
This was likely due to samples not changing enough to be correlated with eRDF
concentration increase. The concentration change signal was not seen as a whole but
as a product of multiple individual components changing since eRDF is a mixture.
This resulted in a more complex but less intense change in the Raman signal. The
Raman method worked for M5Glu dataset not only due to the fact that only a single
component signal was changing, but also the M5Glu dataset benefitted from having
more samples thus allowing for greater precision.
3.9.3 Quantification: Yeastolate in M5Ye
For the quantification of yeastolate, the M5Ye calibration models generated were very
weak. Raman was not sensitive enough to the low concentration changes occurring in
the data (0.1–1.72 g/L). The high WAR value of 14.64 was also indicative of the very
weak analyte signal for yeastolate. Some models are shown in Table 2238 including
the best calibration model which was not acceptable with a REP level of ~38%.
Anything with a REP > 20/30% is unusable and indicated that the Raman data could
not be modelled in terms of yeastolate concentration.

Figure 74 shows a linear correlation. However, there was too much scatter and this
prevented generation of an accurate model. This calibration model used three
variables to describe the system. The first loading explained 99.65 % of variance and

37 When too many latent variables are used the model essentially fits noise which is specific to the
calibration set. Over-fitting is then characterised by the large RMSECV which results from the
prediction of samples with their own noise pattern.
38 A full listing of the calibration models is available in the appendix, see section 8.3.5.

129

matched the water signal after pre-processing shown in Figure 65. The first loading
for the M5Ye matched the first loadings for the M5eRDF data as water was the major
component in these media samples. After the water signal was described only 0.24 %
of variance was left to be explained by the second and third loadings. Together they
represented the noisy analyte signal buried beneath the water signal, as well as shot
noise. Model accuracy may be improved if a higher concentration range of yeastolate
was studied and if more samples and replicates were used in the calibration set.

Table 22 Summary of models performance for the M5Ye samples using averaged Raman data
M5GLUYe LV Correlation
Coefficient
RMSEC
(g/L)
RMSECV
(g/L)
REP%
BC Data Full(250–3311 cm–1) 4 0.935 0.13 0.36 39.67
BC Full (250–3311 cm–1)
MSC Pre-processing
4 0.944 0.12 0.36 39.78
BC Full (250–3311 cm–1)
NINF Pre-processing
4 0.938 0.12 0.36 40.32
BC Full (250–3311 cm–1)
Norm2 Pre-processing
4 0.938 0.12 0.36 39.56
Avg ROI (707–1853 cm–1)
FD11 Pre-processing
3 0.941 0.12 0.35 38.46
Avg ROI (707–1853 cm–1)
FD11MSC Pre-processing
3 0.929 0.13 0.43 47.25

Figure 74 Predicted versus expected concentration plot and loadings of the calibration model for
FD M5Ye data in the region of 707–1853 cm–1.

3.10 Model Validation
During calibration modelling, the model was internally validated using leave one out
cross validation. However, further validation was required in order to determine if the
model was robust. Two validation methods were implemented. First, as the number of
test samples was limited, a common semi-external validation was performed by
splitting the available M5Glu data into training and test sets. A limitation of sample
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.941
3 Latent Variables
RMSEC = 0.12588
RMSECV = 0.35503
800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Wavenumber(cm-1)
L
o
ad
in
g
s

LPC3(0.04%)
LPC2(0.20%)
LPC1(99.65%)

130

splitting validation was the similarity between the training and test set where common
spectral features were modelled. The second and preferred validation method was
external validation via an independent test set. It was more relevant than cross
validation or sample splitting because the results had higher significance in predicting
new samples and testing the models robustness. [298]
3.10.1 Prediction Performance by Sample Splitting
into a Training and Test set
The M5Glu sample set was split into training and test sets of 42 and 21 samples in a
random fashion using Matlab. This was done 10 times and, for each new training and
test combination, a new calibration model was generated which was then used to
predict the relevant test set. Each training and test combination had a slightly different
concentration range. The results are summarised in Table 23. The performance of the
Subset_A09 training and test subsets was highlighted in grey and the calibration and
prediction plot for this model are shown in Figure 75. The steady REP values (stdev:
0.87%) showed the reproducible nature of the models. The slight changes in the
concentration of the training and test datasets were handled by the models. This
indicated the reliability of the model based on internal samples.

In comparison to the M5GluR2 model (Figure 70, REP 4.66%), the error level was
higher for the validation models (avg. REP 7.10%). For these validation tests, the
samples were subject to more day to day variation which did not affect the M5GluR2
model. To overcome this source of error and improve the results, the data should be
normalised prior to validation.

131

Figure 75 Predicted versus expected prediction plot with the calibration samples for BC FDMSC
M5Glu data using the 800–1680 cm–1 range.

Table 23 Results for the Internal Validation on 10 different subsets for the BC FDMSC M5Glu
data in the 800–1680 cm–1 region.
Dataset LV Correlation RMSEC
(g/L)
RMSECV
(g/L)
RMSEP
(g/L)
REP%
Subset_A01C 3 0.991 0.28 0.34 6.26
Subset_A01P 3 0.984 0.28 0.34 0.38 6.99
Subset_A02C 3 0.987 0.33 0.38 6.99
Subset_A02P 3 0.987 0.33 0.38 0.31 5.70
Subset_A03C 3 0.990 0.28 0.33 6.21
Subset_A03P 30.991 0.28 0.33 0.44 8.28
Subset_A04C 3 0.992 0.26 0.32 7.40
Subset_A04P 3 0.986 0.26 0.32 0.39 9.02
Subset_A05C 3 0.986 0.32 0.37 7.82
Subset_A05P 3 0.991 0.32 0.37 0.32 6.76
Subset_A06C 3 0.988 0.32 0.38 8.29
Subset_A06P 3 0.984 0.32 0.38 0.35 7.64
Subset_A07C 3 0.989 0.30 0.35 6.90
Subset_A07P 3 0.984 0.30 0.35 0.43 8.48
Subset_A08C 3 0.991 0.27 0.33 6.58
Subset_A08P 3 0.984 0.27 0.33 0.37 7.38
Subset_A09C 3 0.987 0.30 0.37 7.50
Subset_A09P 3 0.989 0.30 0.37 0.30 5.89
Subset_A10C 3 0.990 0.30 0.35 7.11
Subset_A10P 3 0.985 0.30 0.35 0.37 7.52

0 1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
Measured (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.989
3 Latent Variables
RMSEC = 0.30786
RMSECV = 0.37503
RMSEP = 0.30607

132

3.10.2 Independent Test Set Prediction
A new sample set (T5) was collected to determine the capability of the larger M5Glu
dataset for the prediction of unknown samples. The samples were prepared in the
same way as the M5Glu samples but at a different time and date. The composition of
the samples can be seen in Materials and Methods (Table 5). The D-glucose
concentration in the T5 samples ranged from 1.7 g/L to 9.8 g/L. Calibration was
performed using the amalgamated data of the second and third M5Glu data collections
(62 samples39). Modelling was also conducted on the averaged replicate data (32
samples). As well as the amalgamated and the averaged M5Glu sample sets,
normalised data for these were also tested in order to minimise the intensity offset
between the different data collections.

PCA was used to evaluate the closeness between the T5/M5Glu data. The closer the
calibration set was to the prediction set, the better it was for the model and
quantitative performance. The scores plot showed that there was an overlap between
M5Glu/T5 datasets and the T5 samples were spread across the M5Glu sample set
(Figure 76).

Figure 76 Scores plot for the PCA comparison of T5 and M5Glu Data after BC FDMSC MW
pre-processing.

39 Samples M5R2S12 and M5R2S23 were removed as outliers.
0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
Scores on PC 1 (98.50%)
S
c
o
re
s
o
n
P
C
2
(
1
.0
0
%
)
M5
T5

133

Table 24 Calibration and prediction models generated by BC FDMSC M5Glu and BC FDMSC
T5 samples using the 800–1680 cm–1 range.
# of Samples
Pre-processing
Correlation
Coefficient
LV RMSEC RMSECV RMSEP REP%
62 0.982 3 0.40 0.45 9.07
10 0.980 3 0.40 0.45 0.57 9.93
62_Normalised 0.971 2 0.50 0.52 10.48
10_ Normalised 0.990 2 0.50 0.52 0.52 9.05

32 0.992 2 0.26 0.29 5.84
10 0.988 2 0.26 0.29 0.47 8.18
32_ Normalised 0.991 2 0.27 0.29 5.84
10_ Normalised 0.990 2 0.27 0.29 0.34 5.92

The external validation of M5Glu data (Table 24) showed that the models were
capable of performing predictions and produced a prediction performance equivalent
to the validation model generated by sample splitting. Averaging the M5Glu data
improved the model error level, while normalisation saw only a slight improvement in
the averaged and amalgamated data. The averaged sample sets performed better as the
variance within the data was reduced compared to the amalgamated sample sets. The
lower correlation in the validation models was due to the increased variation between
the calibration and prediction as the result of the different sample make up and
concentration ranges.

Figure 77 Predicted versus expected prediction plot for normalised BC FDMSC T5 with the
M5Glu calibration samples data using the 800–1680 cm–1 range.

0 1 2 3 4 5 6 7 8 9 10
-2
0
2
4
6
8
10
12
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.990
2 Latent Variables
RMSEC = 0.2727
RMSECV = 0.29923
RMSEP = 0.34207

134

Table 25 Prediction results of T5 data based on the normalised BCFDMSC data in the 800-1680
cm–1 region model from the M5Glu data.
Sample No Expected g/L Predicted g/L Difference g/L
T5Glu01 1.7 2.07 +0.37
T5Glu02 2.6 3.02 +0.42
T5Glu03 3.5 3.96 +0.46
T5Glu04 4.4 4.58 +0.18
T5Glu05 5.3 5.55 +0.25
T5Glu06 6.2 6.67 +0.47
T5Glu07 7.1 7.29 +0.19
T5Glu08 8.0 8.24 +0.24
T5Glu09 8.9 8.55 -0.35
T5Glu10 9.8 9.49 -0.31

Table 25 displays the prediction results for the T5 samples. The reasonable
performance from the M5Glu/T5 model indicated the potential of Raman
spectroscopy for modelling high concentration samples. These results showed that
prediction of the low concentration samples was worse than the higher concentration
ones. This was consistent with previously modelled M1Glu data, where better
calibration models were obtained as a result of higher concentration and stronger
signal. Therefore, adapting the Raman procedure to higher concentration ranges
would lead to an analytical tool with better accuracy for quantifying D-glucose
concentration in media.
3.11 General Conclusions: Raman Analysis
The use of Raman spectroscopy was investigated as an analytical tool for the
measurement of cell culture media components (D-glucose, eRDF and yeastolate) in a
model aqueous media, as it offers rapid, non-destructive analysis with little sample
preparation. The determination of D-glucose in three different model media with
increasing complexity (M1Glu, M3Glu and M5Glu) was investigated. This was
carried out in a stepwise fashion to cover the different factors affecting Raman spectra
quality, the required pre-processing, and the correlation of the signal with
compositional changes.

The major issue with these Raman datasets was the large water signal compared to
weak analyte signal as seen in Figure 78. The water signal dominated the first loading
for the different sample sets. M1Glu was a simple system where the analyte was at

135

higher concentrations, giving a stronger performing model. For this reason, the first
loading of the model selected for M1Glu had small peaks from the D-glucose beneath
the water signal. However for the M3Glu and M5Glu data, the first loadings were the
same as the water signal with varying baselines. Elimination of the water signal from
the data by simple subtraction was possible though it produced spectra containing
artefacts. These artefacts occurred when the variance amongst the spectra was caused
by more than the weak analyte signal and its varying concentration. Unresolved issues
such as baseline offset, sloping baseline and noise also contributed to the spectral
variance. Further pre-processing prior to water elimination may prevent the
appearance of some artefacts. The efficacy of the water elimination method as part of
a series of pre-processing steps was shown by Li et al. when performed on relatively
low concentration samples (1–2% dissolved solids). [3] However in this study, the
samples (M3Glu/M5Glu) had a lower concentration of dissolved solids (~1%) and
WE was not a suitable method.

The second loading for the M1Glu, M3Glu and M5Glu data revealed the impact of
lower D-glucose concentration and increasing sample complexity (Figure 78). For
M1Glu (blue), the analyte signal was clear but became less defined with M3Glu
(green) and M5Glu (red). The noise level was elevated with the M3Glu and M5Glu
data. M3Glu was severely affected by the water bending band at 1640 cm–1 as it had
the lowest level of dissolved solids.

Figure 78 (Left) First loadings and water spectra and (Right) second loading for M1Glu, M3Glu
and M5Glu from equivalent40 PCA models.

40 Same data treatment.
500 1000 1500 2000 2500 3000
0
0.2
0.4
0.6
0.8
1
Wavenumber(cm-1)
In
te
n
s
it
y

L1M1(99.91%)
L1M3(99.99%)
L1M5(99.98%)
H2O
500 1000 1500 2000 2500 3000
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.10.15
0.2
0.25
Wavenumber(cm-1)
In
te
n
s
it
y

L2M1(0.08%)
L2M3(0.03%)
L2M5(0.01%)

136

The calibration methods were used on high (M1Glu) and low concentration (M3Glu,
M5Glu, M5eRDF and M5Ye) samples. The performance of the various models
reflected the different concentration magnitudes and dataset complexity. However, the
same pre-processing and region selection were found to be adequate for all the
different D-glucose samples sets. The M5Glu model developed here performed well in
the 0–9.92 g/L range giving a REP of ~ 5%. This was well within the typical
concentration range of glucose in media samples for mammalian cell culture (which is
typically 1–10 g/L). [23, 299, 300] The glucose concentration required for plant, yeast
and bacterium cell lines is higher (20–30 g/L). [24, 25, 301] The Raman method
worked to a limited level of accuracy for the D-glucose samples as a result of the low
concentrations studied. An improvement of the model performance would be
expected with a higher concentration range where the signal would be stronger. For
the quantification of the more complex components, Raman analysis did not perform
well. No acceptable model was found for the weak M5Ye samples and the M5eRDF
model was three times weaker than the M5Glu model. These results could be
improved by a larger number of samples but were intrinsically weakened by the low
concentration ranges.

This study has shown that (1) Raman analysis is not sensitive enough for the dilute
solutions tested here, (2) the analyte signal is obscured by the water signal and the
associated shot noise, and (3) the water elimination correction method added more
interfering information rather than being of any benefit to the analyte signal. For
these reasons the next logical step was to use SERS to enhance the analyte signal
while limiting the impact of the water signal. SERS can however only be applied to
yeastolate and eRDF quantification as D-glucose is not a SERS active molecule.

137

138

4 Surface Enhanced Raman Spectroscopy
(SERS) Analysis of Complex Media
Components
4.1 Rationale for Quantitative Analysis using SERS
The Raman method worked for the determination of D-glucose in M5Glu samples
with a REP of ~ 5%, but the method was much less effective or did not work at all for
the determination of eRDF and yeastolate. D-glucose was relatively easily
quantifiable because it was present in reasonable concentrations. However yeastolate
and eRDF are complex mixtures and therefore the individual analyte concentrations
were much lower. For example the pyridoxine concentration in the M5eRDF samples
varied from 0.058–0.376 mg/g. It was also not feasible to measure each individual
component as eRDF is composed of over 30 components and yeastolate is even more
complex.

SERS offered greater sensitivity and was applied to the qualitative analysis of
yeastolate and other media components [5, 159]. The use of SERS for monitoring
changes in yeastolate showed that the SERS signal from complex media components
was a useful qualitative tool for detecting batch to batch variations and storage
changes [5]. With this in mind, we propose to use SERS to quantify the eRDF and
yeastolate concentrations in cell culture media. Specifically we want to quantify the
global concentration of eRDF/yeastolate and not individual constituents.
This chapter focuses on two topics:
(a) The investigation of SERS signals from complex media components eRDF
and yeastolate; and
(b) The quantification of eRDF and yeastolate in media by SERS.

SERS measurements were carried out on the same M5eRDF and M5Ye sample sets as
for conventional Raman (Figure 79). The comparison of SERS spectra obtained by
using colloidal silver nanoparticles, with the corresponding Raman spectrum revealed
a significant enhancement of the Raman signal (Figure 80).

139

Figure 79 (Left) SERS spectra of M5eRDF samples (1–6.4 g/L), and (Right) SERS spectra of
M5Ye samples (0.1–1.72 g/L).

Figure 80 Conventional Raman spectra, SERS spectra and difference for the M5eRDF samples
(Left) and M5Ye samples (Right).
4.2 Experimental Considerations for SERS Analysis
The silver colloid was prepared by reduction of silver nitrate with sodium citrate
using the Lee and Meisel method. Silver colloids made by this method have shown to
be stable and display activity for several (~6) months [302]. The relative intensity of
the SERS signal depended on various experimental parameters such as the quality of
the silver colloid, the ratio of the colloid to sample, the use of aggregating agents and
the incubation time. No aggregating agents were used in this study as eRDF already
contained two common aggregating agents in its formulation, sodium chloride and
magnesium sulphate. In the absence of additional external aggregating agents, colloid
mixed with M5 media samples gave detectable bands [303].
4.2.1 The Absorption Spectrum (λ max and FWHM)
Batches of silver colloid were repeatedly prepared in order to achieve consistent
colloids and were assessed based on their UV-Vis spectrum (Figure 81). The UV-Vis
500 1000 1500 2000 2500 3000
4
6
8
10
12
14
16
x 10
4
Raman Shift (cm-1)
In
te
n
s
it
y
500 1000 1500 2000 2500 3000
4
6
8
10
12
14
x 10
4
Raman Shift (cm-1)
In
te
n
s
it
y
50 100 150 200 250 300
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
Wavenumber (cm-1)
In
te
n
s
it
y

eRDF AvgSERS
eRDF AVGCR
Difference
50 100 150 200 250 300
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
0.065
0.07
Wavenumber (cm-1)
In
te
n
s
it
y

Ye AvgSERS
Ye AVGCR
Difference

140

spectrum gave information on the size (λmax) and the size distribution (fwhm) of
colloid particles. For optimum SERS enhancement when using 785 nm excitation, the
good quality silver colloids showed λmax values close to ~400 nm and a fwhm of <60
nm [304-306]. An increase in the fwhm value indicated an increasing particle size
variation [307]. From our preparation of silver colloids, good batches had an
absorption maximum (λmax) of ~406 nm with a full width half maximum of 80 nm.
Acceptable Raman spectra were also achieved from colloids with a λmax as high 412
nm; however colloids with a λmax of ~430 nm generated poor Raman spectra (Figure
81).

Figure 81 Normalised UV-Vis absorption spectra of ten different batches of silver colloid, where
the optimal colloids are the solid lines while the poor performing colloids are represented by the
dashed lines.

In order to overcome batch variation based on colloid particle size, several batches of
good quality SERS colloids were mixed together to form a single colloid. Mixing
batches minimized batch-to-batch variation which would otherwise adversely affected
spectral reproducibility [306].
4.2.2 Sampling Time
Variation in the time between the addition of silver colloid to the sample and the
measurement of the spectra can affect the intensity of the SERS signal. The intensity
of the SERS signals will rise, stabilise and decline with different incubation times.
After mixing the sample and colloid, a number of competing effects happen. Firstly as
300 350 400 450 500 550 600 650 700 750 800 850
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wavelength(nm)
In
te
n
si
ty

B1 408nm
B2 430nm
B3 404nm
B4 406nm
B5 412nm
B6 404nm
B7 404nm
B8 434nm
B9 404nm
B10 404nm

141

particles aggregate they form junctions and the structure of these junctions generates
highly localised plasmon fields. These result in a dramatic enhancement if SERS
active analytes are present. These hot spots generate very intense SERS signals.
Secondly as the nanoparticles aggregate to form bigger particles, these can simply
precipitate out of solution, decreasing thesignal [5, 233, 308-310].

It was previously seen in an eRDF solution (18 g/L) using a 1:4 sample to colloid
ratio that the SERS signal steadily increased for about 6 minutes before levelling off
[101]. Therefore, data collection was performed within minutes of colloid addition
before levelling off could occur. The colloid was added, the solution was mixed five
times and the spectra were then measured. The incubation times for all samples were
kept close and as short as possible.

Figure 82 (a) Plot of SERS spectra for M5Ye sample (1.54 g/L) versus time (sixteen
measurements taken over an hour), (b) Intensity profiles for selected peaks showing the
increasing intensity and (c) the intensity ratio for the selected peaks against the water peak at
1604 cm–1. The SERS spectra were measured using a single point collection with an exposure
time of 2×10 s. A sample to colloid ratio of 1:1 was used.

600 800 1000 1200 1400 1600 1800
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
Wavenumber (cm-1)
In
te
ns
ity
60 mins
0 mins
(a)
(b) (c)

142

In order to investigate incubation time effects on the M5Ye (yeastolate at 1.54 g/L)
and M5eRDF (eRDF at 3.4 g/L) samples, a series of SERS spectra were taken over an
hour using a 1:1 sample colloid ratio. Sixteen measurements were made with no re-
suspension to show the evolving spectra for M5Ye and M5eRDF (Figure 82a and
Figure 83a). The spectra displayed a steady increase in baseline intensity and
enhancement with time indicating a reasonably stable sample colloid mixture with no
need of re-suspension. These results differed from the SERS testing of eRDF (17.7
g/L) solution with the 1:4 sample to colloid ratio. In that case, aggregation of the
nanoparticles was induced at a higher rate compared to the less concentrated M5Ye
and M5eRDF samples [101]. The use of dilute media samples required a smaller
quantity of colloid which saw less aggregation occurring and provided a steadily
increasing signal for testing.

In the M5Ye sample (Figure 82b), the 730 cm–1 peak exhibited the greatest intensity
increase followed by the 1332 cm–1 peak, while for the M5eRDF sample (Figure 83b),
it was the 650 cm–1 peak along with the 1388 cm–1 that displayed the greatest
intensity. We speculated that the 730 cm–1 peak was related to the adenine signal in
yeastolate, while the 650 cm–1 peak was a result of the L-cysteine hydrochloride
monohydrate present in eRDF and the 1332 cm–1 and 1388 cm–1 signified the amino
acid portion of the M5eRDF and M5Ye samples, respectively. In terms of variance
compared to the 1604 cm–1 band (OH bending band), there were two trends: the low-
wavenumber bands (650 cm–1 and 730 cm–1) showed a relative increase and the high-
wavenumber bands remained stable (see Figure 82c and Figure 83c). This could be a
result of the high-wavenumber bands relating to the stretching vibrations while the
low-wavenumber bands involved bending vibrations (since more energy is required to
stretch a group than to bend a group) [311, 312]. Other factors such as the surface
orientation of individual analytes may also be the source of differences in the intensity
bands. As the micro-environment of the sample was continually changing because of
the aggregation of the colloid, more hotspots were forming resulting in an increasing
signal from molecules41 closer to the hotspots [311].

41 Molecules that are perpendicular to the surface are more significantly enhanced than those parallel to
the surface.

143

For these samples, using a 1:1 ratio of sample to colloid gave a stable mixture without
re-suspension of the sample as the signal increased or remained constant during the
testing period, indicating that precipitation did not occur and the SERS signal was
steady.

Figure 83 (a) Plot of SERS spectra for M5eRDF sample (3.4 g/L) versus time (sixteen
measurements taken over an hour), (b) Intensity profiles for selected peaks showing the
increasing intensity and (c) the intensity ratio for the selected peaks against the water peak at
1604 cm–1. The SERS spectra were measured using a single point collection with an exposure
time of 2×10 s. A sample to colloid ratio of 1:1 was used.
4.2.3 Reproducibility
Reproducibility was a major issue with SERS measurements. If the sample was left to
stand, the colloid was liable to aggregate and precipitate out of solution before testing,
therefore immediate testing of the sample was preferable. PCA scores (Figure 84) of
the replicate runs42 for the M5eRDF and M5Ye measurements demonstrated the class
variance for the different data collections. M5Ye samples (Figure 84b) were more
stable with overlapping ellipsoids, while the M5eRDF samples showed greater

42 The PCA scores for the individual runs are shown in 8.4.1
600 800 1000 1200 1400 1600 1800
1000
2000
3000
4000
5000
6000
7000
8000
Wavenumber cm-1
In
te
ns
ity
60 mins
0 mins
(a)
(b) (c)

144

variability especially amongst the low concentration samples. For example,
M5eRDFS01 from data collection one (#1) and three (#21) were close but for data
collection two (#11), it differed with its high PC2 reading. This may be a low
concentration effect where matrix effects were causing more fluctuations to occur,
given that the high concentration samples M5eRDFS10 (#10, #20, #30) were grouped
together in the centre of the scores plot.

Figure 84 Scores plots for the three raw data collections for M5eRDF (a) and M5Ye (b), with the
replicate runs outlined. The ellipsoids represent the PCA subspace generated for each dataset.
4.3 Spectral Analysis of M5eRDF and M5Ye Data
While the conventional Raman spectra of the aqueous solutions of eRDF (17 g/L) and
yeastolate (5 g/L) displayed very weak analyte bands with a strong water signal,
SERS gave detailed spectra with multiple peaks visible over the original water signal.
The SERS spectra of eRDF and yeastolate were visually different (Figure 85 and
Figure 86). It was not possible to specifically assign vibrational modes for peaks
within complex mixtures but one may speculate as to the origin of certain peaks. The
SERS spectrum depended on the Raman response of the media components capable
of binding to the colloid surface. For example within the eRDF and yeastolate
samples, some compounds were strongly SERS active and some were not. There were
similarities between the two materials in that they both contained amino acids as well
as a wide range of other biochemical compounds. It was known that some of these
compounds were strongly SERS active; for example, compounds containing nitrogen
and sulphur. Common peak positions visible at 730, 802 and 955 cm–1 suggested that
the same materials within eRDF and yeastolate were binding [5]. As previously
mentioned, both of these have high amino acid concentrations and therefore it may be
hypothesised that what we observe in the SERS spectra were most likely signals
4 4.5 5 5.5 6 6.5 7 7.5 8
x 10
5
-8
-6
-4
-2
0
2
4
6
8
10
x 10
4
Scores on PC 1 (99.73%)
S
c
o
re
s
o
n
P
C
2
(
0
.1
4
%
)
1
2
3
4
5
6
7
8 9
10
11
12
13
14
15
19
20
21
22
23
24
25
26
27
29
30
4 4.5 5 5.5 6 6.5 7 7.5 8
x 10
5
-6
-4
-2
0
2
4
6
x 10
4
Scores on PC 1 (99.86%)
S
c
o
re
s
o
n
P
C
2
(
0
.0
6
%
)
Run 1
Run 2
Run 3
Run 1
Run 2
Run 3
(a) (b)

145

originating from the amino acid constituents of the media. The peak at the 655–666
cm–1 may be assigned to the C–S stretching vibration of cysteine, as L-Cysteine
hydrochloride monohydrate was present in eRDF. [313, 314] Previous amino acids
studies [315-318] suggest that the bands at 1332–1396 cm–1 were related to the C–COO− stretching vibration and COO− symmetric stretching vibration enhancements.
This demonstrated binding to the silver surface through the carboxylic group. The
broad band at 1644 cm–1 was the Raman signal from the water solvent (OH bending).
In previous work on yeast extracts, the strongest SERS peak at 730 cm–1 was
associated with the adenine ring breathing mode as adenine produces a strong SERS
signal. [5, 319] Adenine was not, however, listed in the formulations for either eRDF
(Table 45) or yeastolate (Table 46). Despite this, since yeastolate was not chemically
defined, adenine may be present but may not have been tested in the yeastolate
processing. Previous studies have shown adenine to be present in yeast extract
because of its cellular role as a direct or indirect building block. Eleven lots of yeast
extracts were tested using pre-column derivatisation with reverse phase HPLC and
fluorescence detection. The adenine content recorded was an average of 1.16 mg/g +
0.71 mg/g. [320]

Figure 85 SERS and Raman spectra for an aqueous solution of yeastolate (5 g/L).
500 1000 1500 2000 2500 3000
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
5
Wavenumber (cm-1)
In
te
ns
ity

Yeastolate
Ye SERS
730
1332
1027
1388
802
955
1644
658
2934

146

Figure 86 SERS and Raman spectra for an aqueous solution of eRDF (17 g/L).
4.4 Region Selection for Quantitative Analysis
Since the aim was to quantify the eRDF and yeastolate content in the M5 media,
spectra were collected at three spatial points per sample and averaged. The averages
of the replicate measurements were used to generate the calibration model in order to
get a more representative signal. This stage involved determining the most
informative spectral data. MWPLS was tested, but was inconclusive, so instead five
regions were manually selected (Table 26).
Table 26 Spectral areas selected for calibration modelling of the SERS spectra.
Region ID Wavenumber Region (cm–1)
Full(F) 250–3300 cm–1
Reduced Region (ROI) 707–1853 cm–1
Region A (A) 602–995 cm–1
Region B (B) 1260–1444 cm–1
Region A and B (AB) 602–995 cm–1 & 1260–1444 cm–1
4.5 Quantitative Analysis of Yeastolate in M5Ye
SERS Data
By using SERS, the aim was to improve the calibration modelling that was achieved
using conventional Raman. The replicate measurements were also modelled
separately. The values for those models are shown in the appendix, see section 8.4.3.
Comparison of the models confirmed that averaging the data improved the models
performance for the M5Ye data. Table 27 shows the best results of the calibration
500 1000 1500 2000 2500 3000
4
6
8
10
12
14
16
x 10
4
Wavenumber (cm-1)
In
te
ns
ity

eRDF
eRDF SERS
2942
1612
1340
1035
955
730
899
802
666
1396

147

models for determining yeastolate concentration using different spectral pre-
processing methods. Using SERS data it was possible to generate a better correlation
between the yeastolate concentration and the M5YE spectra. The model performance
of the conventional Raman data gave an error of 38–47% compared to 10–18% error
for these SERS models. However the error levels were still high and accurate
predictions were difficult to reliably obtain.

Compared to the other pre-processing methods (Table 27), the result for FDMSC
offered the best performance; the calibration model had REP of ~12% with three
latent variables. Also seen in Table 27 was a high correlation of 0.998 with a lower
REP of ~10% for the NormINF model. This model appeared better than FDMSC
model but it was subject to over-fitting43 as indicated by the higher number of latent
variables used and the large difference between the RMSEC and RMSECV values.
The reduced region (1260–1444 cm–1) of the spectra used for the FDMSC model
(Figure 87) contributed to the correlation observed. From previous studies[95, 145],
the bands from 1300 cm–1 to 1400 cm–1 were attributed to CH /CH3 bending and
deformation in amino acids. Since there were multiple amino acids within the M5Ye
model media, it was not surprising that the amino acid components influenced the
calibration model.

Figure 87 SERS spectra for M5Ye of the 1260–1444 cm–1 range after FD-MSC pre-processing
with the predicted versus expected plot for the resulting calibration model.

43 Over-fitting leads to poor prediction results as the model is too specific to the calibration samples.
1260 1280 1300 1320 1340 1360 1380 1400 1420 1440
-1500
-1000
-500
0
500
1000
1500
2000
Wavenumber(cm-1)
In
te
n
s
it
y
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.979
3 Latent Variables
RMSEC = 0.075789
RMSECV = 0.11168

148

Table 27 The best results for calibration modelling of yeastolate in the M5YE SERS data
M5Ye LV (R2) RMSEC
(g/L)
RMSECV
(g/L)
REP%
WE Data (250–3311 cm–1) 4 0.966 0.09 0.16 17.58
WE MSC (250–3311 cm–1) 4 0.973 0.08 0.15 16.48
WE NINF A (602–995 cm–1) 5 0.998 0.02 0.10 10.98
WE Norm2A (602–995 cm–1) 4 0.985 0.06 0.14 15.38
BC FST11B (1260–1444 cm–1) 3 0.951 0.11 0.15 16.48
BC FST11MSC B (1260–1444 cm
–1
) 3 0.979 0.07 0.11 12.08

The loadings and scores (Figure 88) explained more about the behaviour of the model.
The first loading was representative of the average FDMSC signal with peaks at 1310
cm–1 and 1380 cm–1, while the corresponding first score plot confirmed the increasing
signal for an increasing yeastolate concentration.

Figure 88 The loadings and scores versus samples from the M5Ye SERS calibration model with
three components: the first component in blue, the second in green and the third in red.

The second loading dealt with the 1310 cm–1 and 1420 cm–1 peaks while the third
loading covered the peaks at 1270 cm–1, 1330 cm–1, 1380 cm–1 and 1440 cm–1. From
the loadings, it was clear that the M5YeS06 was different from the neighbouring
samples. However in the calibration M5YeS06 was on the regression line and was not
an outlying sample. Also in the PCA results (Figure 116), M5YeS06 was grouped
with the other samples. It seemed to be an anomaly. The fault in this sample may be
1260 1280 1300 1320 1340 1360 1380 1400 1420 1440
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Wavenumber(cm-1)
Lo
ad
in
gs

LPC1(88.90%)
LPC2(10.18%)
LPC3(0.64%)

149

in the sample preparation given that when it was compared with the other samples, it
displays a similar spectrum to the M5YeS01 and M5YeS02. In the prediction testing
of eRDF concentration within the M5YeS06 sample as part of the model evaluation
for M5eRDF, the eRDF concentration was double what was expected. This gave
merit to the hypothesis of incorrect sample preparation and also gave a reason as to
why the M5YeS06 spectrum matched the M5YeS01 and M5YeS02 samples. They
shared a common trait - their ratio of eRDF to yeastolate was large compared to the
high concentration samples. Another factor linking this M5YeS06 sample to a strong
eRDF signal was the proximity of the 1375 cm–1 peak to the 1388 cm–1 peak which
was a significant peak in the M5eRDF spectra (Figure 83).
4.6 Quantitative Analysis of eRDF in M5eRDF SERS
Data
Here SERS was used to predict eRDF concentration in the M5eRDF samples. Various
pre-processing methods were surveyed for the calibration modelling (see section
8.4.2). The top calibration models are listed in Table 28 with the best M5eRDF model
shown in Figure 89. The best model was obtained by using water eliminated,
normalised data. The water eliminated data enhanced the peaks in the fingerprint
region of 600–1700 cm–1 and at 2900 cm–1. Even though water elimination artefacts
were a problem, they were not as significant as those seen in the
M1Glu/M3Glu/M5Gludata (Figure 89). The largest artefact was from the removal of
the large OH band which left a large negative peak above 3000 cm–1. The other
artefacts were spectral offset and sloping baseline. However normalisation removed a
lot of the measurement error associated with absolute intensity fluctuations therefore
decreasing the impact of these latter artefacts (offset and baseline effects).

Table 28 Comparison between best calibration models after different pre-processing methods
from the M5eRDF SERS data
M5eRDF LV (R2) RMSEC
(g/L)
RMSECV
(g/L)
REP%
AvgDataB(1260–1444 cm–1) 3 0.919 0.49 0.63 17.02
WE MSC (250–3311 cm–1) 2 0.908 0.52 0.63 17.02
WE NINF (250–3311 cm–1) 2 0.922 0.48 0.59 15.94
WE Norm2ROI (707–1853 cm–1) 4 0.943 0.41 0.61 16.48
Avg FST11ROI (707–1853 cm–1) 4 0.960 0.34 0.74 20.00
Avg FST11MSCROI(707–1853 cm
–1
) 4 0.965 0.32 0.74 20.00

150

Figure 89 Water eliminated SERS spectra after normalising for the M5eRDF over the entire
range (250–3311 cm–1) and the predicted versus expected plot for corresponding calibration
model.

The REP for this model was ~16%, similar to that achieved with the conventional
Raman data. In addition, only two latent variables were used and a low SECV/SEC
ratio was noted, unlike the conventional Raman model. Both scores show increasing
linear trends for increasing eRDF concentration per sample; see Figure 90. When
comparing the scores versus concentration plots, the R2 value for L2 was 0.92
compared to 0.71 for L1. The first loading was representative of the average
spectrum after water elimination. This led to a weaker correlation from the signal for
the changing eRDF concentration as the largest water elimination artefact was also
included in the model. The second variable was better correlated to the increase in
eRDF concentration as its fingerprint region was highly detailed and the level of noise
from the water elimination was significantly less.

Figure 90 The loadings and scores versus samples for the calibration model of M5eRDF SERS
with the first component in blue and the second in green.

500 1000 1500 2000 2500 3000
-0.2
0
0.2
0.4
0.6
0.8
Wavenumber(cm-1)
In
te
n
s
it
y
1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.922
2 Latent Variables
RMSEC = 0.4811
RMSECV = 0.5958
500 1000 1500 2000 2500 3000
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Wavenumber(cm-1)
L
o
a
d
in
g
s

LPC1(95.78%)
LPC2(2.87%)

151

4.7 Model Evaluation
The following procedure was used in order to perform the test set evaluation of the
best M5eRDF and M5Ye models.
 For the yeastolate models, the M5eRDF sample set was used to attempt to
predict its yeastolate concentration
 For the eRDF models the M5Ye sample set was used to attempt to predict its
eRDF concentration.

The advantage of doing this was that test sets with a significant in-built variability
were being used while also removing the need to build a new test set. These samples
were used for the prediction of the stable analyte concentration and also to see how
spectral fluctuations may impact the prediction of analyte concentrations.

Table 29 Prediction results based on the M5Ye prediction of yeastolate concentration in M5eRDF
and M5eRDF prediction of eRDF concentration in M5Ye.
Sample ID/
Expected Ye Conc. (g/L)
SERS
Predicted
Ye Conc.
Sample ID /
Expected eRDF
Conc. (g/L)
SERS
Predicted
eRDF
Raman
Predicted
eRDF
M5eRDFS01 1 0.98 M5YeS01 3.4 3.98 3.21
M5eRDFS02 1 1.60 M5YeS02 3.4 3.27 3.06
M5eRDFS03 1 1.81 M5YeS03 3.4 3.26 3.03
M5eRDFS04 1 1.22 M5YeS04 3.4 3.40 3.61
M5eRDFS05 1 1.00 M5YeS05 3.4 3.34 3.23
M5eRDFS06 1 0.95 M5YeS06 3.4 5.60 3.10
M5eRDFS07 1 0.78 M5YeS07 3.4 3.28 2.91
M5eRDFS08 1 0.59 M5YeS08 3.4 3.38 3.80
M5eRDFS09 1 0.58 M5YeS09 3.4 3.59 3.24
M5eRDFS10 1 0.52 M5YeS10 3.4 3.51 3.79

As the REP for the SERS and conventional Raman M5eRDF models were the same,
both models were tested for the prediction of eRDF. Yeastolate prediction used the
best M5Ye SERS model and the results showed that prediction was possible (Table
29). The closer the validation samples were to the model samples the better the
prediction was (as seen with the M5YeS04 and M5YeS05 for eRDF and samples

152

M5eRDFS05 and M5eRDFS06 for yeastolate). In the design of the experiment, the
concentration of the samples overlapped in the mid-point range of the samples, see
Table 6 and Table 7.

The M5Ye SERS model only predicted three samples accurately. However for the
yeastolate prediction, the influence of the eRDF concentration in the M5eRDF
samples was evident. The prediction ability of the samples decreased with increasing
eRDF concentration. The prediction of the eRDF concentration was not affected by
the varying concentration of yeastolate within the M5Ye samples. Moreover, the
SERS model was more accurate than the Raman model. Seven of the ten validation
samples were within 10% of the target concentration while only four of the Raman
predictions were within this limit.
4.8 General Conclusions: SERS Analysis of M5eRDF
and M5YE
SERS was investigated as an analytical tool to improve upon the previous
measurements of eRDF and yeastolate in a media environment using conventional
Raman. The SERS method showed that it was possible to enhance the analyte signal
sufficiently to undertake quantitative ingredient analysis.

When comparing the SERS and conventional Raman models for eRDF and yeastolate
the following was observed:
 For eRDF, the best models gave equivalent prediction errors of ~16%. The
SERS model was better than the Raman model, however, because it did not
show any over-fitting44; which can lead to inaccurate predictions.
 For yeastolate, the SERS model was much improved with a percentage error
of ~12% versus ~38% for Raman. The low concentration of M5Ye sample set
with a strong water signal hampered the Raman model. The SERS method
gave the signal enhancement needed for M5Ye samples to compensate for the
strong water signal which led to a good r2 correlation of 0.979.

44 The Raman model has a high SECV/SEC ratio of 3.5 compared to 1.22 for the SERS model.

153

SERS showed some promise at quantifying the complex media components as a
whole but the error levels were still too high. A point to note was that the
concentration range used here is greater than the +10% variation typically expected in
industrial use. The predictions showed that best results were closest to the mid-point
of the concentration range for the media samples, thus a reduction of the
concentration span would improve the quantification. In the prediction of yeastolate,
the method only worked for the low to mid-range M5eRDF samples as another SERS
active molecule contained in the eRDF component impacted the result. Therefore
using the current setup it would not be feasible to quantify both eRDF and yeastolate
simultaneously due to spectral overlap, as there were too few samples and the
experimental design was not fully optimised. It may be more feasible once a better
sampling and model setup are implemented. This could include: a calibration sample
set of more than 60 samples, a reduced range of + 20% of concentration specification
and a greater number (i.e. >3) of replicate measurements per calibration sample.
These steps should result in a more reproducible data collection and yield a more
reliable calibration model to base accurate predictions on.

The overall goal of this work is the development of a robust quantitative method for
complex ingredient analysis, and so another approach was investigated. As both
eRDF and yeastolate contain fluorophores that will produce a distinctive fluorescencespectrum, fluorescence may be capable of quantifying the complex media components
as a whole.

154

5 Fluorescence Spectroscopy Analysis of
Complex Media Components
The work in this chapter examined the use of multi-dimensional fluorescence
spectroscopy for the quantitative analysis of complex media components. The
excitation emission matrices (EEM) and total synchronous fluorescence scans (TSFS)
were collected for the M5eRDF and M5Ye samples sets. Both EEM and TSFS data
had a reasonable signal from the fluorophores in the M5eRDF and M5Ye samples.
This suggested that it may be possible to quantify yeastolate and eRDF using
fluorescence and thus produce an analytical method that is non-destructive, sensitive
and selective.
Table 30 lists the primary fluorophores in eRDF and yeastolate and their
concentration ranges in the prepared samples. The fluorescence emission profiles of
eRDF and yeastolate were similar because they both contained many of the same
fluorophores (tryptophan, tyrosine and phenylalanine). In contrast, the signal was
dissimilar as a result of the overall different chemical composition. This gave rise to
the individuality within the EEM/TSFS spectra, and allowed for differentiation and
quantification of yeastolate and eRDF in a media formulation.

Table 30 Summary of the fluorophores present in eRDF and yeastolate.45
Fluorophore eRDF (mg/g) Yeastolate M5eRDF(mg/g) M5Ye (mg/g)
Tryptophan 1.08 0.5% w/w (5
mg/g)
6.08–11.92 17.5–25.6
Tyrosine 5.11 0.8% w/w
(8 mg/g)
13.11–40.75 28–40.96
Phenylalanine 4.37 3.6% w/w
(36 mg/g)
40.37–63.97 126–184.32
Riboflavin 0.011 Not listed 0.011–0.075 0.04
Pyridoxine 0.058 Not listed 0.058–0.376 0.197
Folic acid 0.517 Not listed 0.517–3.308 1.757
Phenol Red 0.294 Not listed 0.294–1.882 0.999

45 Formulation compositions provided by the manufacturers, see 8.2.1 and 8.2.2.

155

5.1 The EEM/TSFS Analytical Procedure
Multi-dimensional fluorescence data provided information about chemical
composition because both the peak intensity and shape of the signal were sensitive to
individual and global concentration changes in the analytes present [191, 206, 211,
321]. In this work, EEM/TSFS measurements were used to see if it was feasible to
quantify the yeastolate and eRDF concentration in the model media. The NBL group
has already demonstrated that EEM can be used for the quantification of individual
components, [1] media variance and identification applications, [4] and media
degradation [2].

Fluorescence data is information rich and can be analysed in multiple ways to extract
both qualitative and quantitative results. The outline of the fluorescence workflow
was:
 Spectral Overview
o Identify Peaks in EEM/TSFS data
 PARAFAC/MCR Analysis
o Identification of the fluorophores
o Profile changes for the fluorophores in relation to concentration
 Variance analysis and Outlier detection
o Investigate what causes changes in the spectra by PCA
o Identify abnormal samples using ROBPCA
 Quantification – UPLS modelling of the media components (eRDF and
yeastolate) in the M5eRDF and M5YE sample sets.

156

5.2 Spectral Overview of Media Samples (M5eRDF
and M5Ye)
Numerous peaks were seen in the EEM spectra obtained for the M5eRDF and M5Ye
samples (Figure 91 and Figure 92). The samples from both samples sets had similar
peaks, indicating the presence of multiple fluorophores (i.e. the fluorescent amino
acids and vitamins) in eRDF and yeastolate. Previous studies in this lab had identified
the key fluorophores in both eRDF and yeastolate. The peak locations of the
M5eRDF/ M5YE samples in this study were similar to those of the chemically
defined media samples described by Calvet et al. and yeastolate samples described by
Li et al. [1, 2, 4, 7] In their studies five peaks were identified at λex/λem = 275/310 nm,
λex/λem = 260–285/355 nm, λex/λem = 320/390 nm, λex/λem = 365/520 nm and λex/λem =
355/445 nm. These were due to the fluorescence of amino acids (tyrosine and
tryptophan) and vitamins (pyridoxine, riboflavin, and folic acid).

Figure 91 EEM landscape plots of (left) an M5eRDF sample (1 g/L eRDF) and (right) an M5Ye
sample (0.1 g/L Ye). The Rayleigh scatter was removed from the spectra.

157

Figure 92 EEM contour profiles46 for (a) M5eRDF S01(1 g/L eRDF), (b) M5eRDF S10(6.4 g/L
eRDF), (c) M5Ye S01(0.1 g/L Ye ) and (d) M5Ye S10 (1.72 g/L Ye).

For the EEM landscape plots (Figure 91), the peak of maximum intensity for the
M5eRDF and M5Ye data was located at excitation/emission wavelengths (λex/λem) of
285/355 nm. The secondary peaks were located at λex/λem of 280/305 nm, 230/305 nm
and 230/360 nm. Second order bands started to appear at λex/λem of 280/595 nm and
230/595 nm. The contour plots (Figure 92) showed the changes in the signals with
increasing concentration. For both the M5eRDF and M5Ye samples, the tryptophan
signal peak at 285/355 nm dominated. In the case of M5eRDF samples, the tyrosine
peaks at 280/305 nm and 230/305 nm were weak but observable (even after the
increase in concentration from 1 g/L to 6.4 g/L). Similarly the low concentration
M5YE sample displayed the same peaks indicative of tyrosine (at 280/305 nm and
230/305 nm). Tyrosine fluorescence can be difficult to observe clearly due to overlap
with the tryptophan emission band and the occurrence of radiative energy transfer
(RET) from tyrosine to tryptophan. With increasing sample concentration, the
tyrosine signal decreased as the tryptophan signal increased. In both sample sets, the
much weaker emission from pyridoxine (325/395 nm), riboflavin (455/520 nm), and

46 300 contour lines were used with 0.83 spacing starting with 0.90.
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
Emission Wavelength (nm)
300 350 400 450 500 550 600
250
300
350
400
450
500
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
Emission Wavelength (nm)
300 350 400 450 500 550 600
250
300
350
400
450
500
(a) (b)
(c) (d)

158

folic acid (350/445 nm) only became visible at high concentrations of eRDF and
yeastolate.

TSFS spectra provided an alternative way of measuring the total emission of complex
mixtures. When comparing TSFS with the EEM spectra, the output plot and data were
orientated differently. In TSFS, the peaks were viewed by plotting the excitation
wavelength against the delta wavelength offset (Δλ=λem−λex) with the intensity along
the z-axis. The TSFS landscape plots for both M5eRDF and M5Ye (Figure 93)
displayed the tyrosine signal at 230 nm excitation while the tryptophan signal was
visible at 285 nm for the M5eRDF sample and at 290 nm for M5Ye.

Figure 93 TSFS landscape plots for (left) M5eRDF media sample (1 g/L eRDF) and (right)
M5Ye media sample (0.1 g/L Ye).

In order to more easily compare EEM and TSFS data, the TSFS spectra were re-
plotted after being mathematically transformed into EEM spectra. The transformation
involved diagonally stacking the collected data and filling in zero for empty areas.
When comparing the contour plots of the TSFS and EEM data (Figure 94c and d) the
signal intensities for the samples were different but the peak positions were constant.
From the fluorescence profiles of the media samples, it was possible to see that these
samples had the same underlying components. The peaks were visible in both the
EEM and TSFS contour plots, atthe following wavelengths: 285/355 nm (tryptophan,
1), 280/305 nm (tyrosine, 2), 325/395 nm (pyridoxine, 3), 455/520 nm (riboflavin, 4)
and weakly at 350/445 nm (folic acid, 5) [1].

159

Figure 94 (a) TSFS landscape for M5YeS05 media sample (0.82 g/L) and (b) TSFS contour
profile, (c) rearranged TSFS profile into EEM format and (d) EEM contour profile for M5YeS05
media sample.46

Figure 95 (a) TSFS landscape plot for M5eRDFS05 media sample (3.4 g/L), (b) TSFS contour
profile, (c) rearranged TSFS contour profile into EEM format and (d) EEM contour profile for
M5eRDFS05 media sample. 46
50
100
150
200
300
400
500
50
100
150
200
250
Delta wavelength (nm)Excitation wavelength (nm)
In
te
n
s
it
y
D
e
lt
a
W
a
v
e
le
n
g
th
(
n
m
)
Excitation Wavelength (nm)
250 300 350 400 450 500
20
40
60
80
100
120
140
160
180
200
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
Emission Wavelength (nm)
300 350 400 450 500 550 600
250
300
350
400
450
500
(c)
(a)
(b)
(d)
(1) (2)
(3)
(5)
(4)
50
100
150
200
300
400
500
50
100
150
200
250
Delta wavelength (nm)Excitation wavelength (nm)
In
te
n
s
it
y
D
e
lt
a
W
a
v
e
le
n
g
th
(
n
m
)
Excitation Wavelength (nm)
250 300 350 400 450 500
20
40
60
80
100
120
140
160
180
200
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
(a) (b)
(c) (d)
(1)
(2)
(3)
(5)
(4)

160

5.3 Assessing Fluorophore Contributions from Media
Fluorescence
In the model media, the identities of the fluorophores which contributed most
significantly to the EEM and TSFS profiles were tentatively identified by visual
inspection. These findings were then corroborated with literature references [1, 2, 4,
7, 8]. While this was acceptable as an initial inspection, there needed to be a more
precise identification of the fluorophores. A superior, more rigid approach was to use
a mathematical, factor based, chemometric approach like PARAFAC or MCR in
order to identify fluorophores and to analyse emission changes [250, 322-324].

PARAFAC decomposed the fluorescence data into the individual excitation and
emission profiles for the fluorophores in the sample. The PARAFAC model gave
loadings to help determine the fluorophores present and additionally generated a score
associated with each component (fluorophore). Changes in each components
contribution to the EEM spectrum were thus quantified as sample composition varied.
PARAFAC scores may not correlate with fluorescence concentration because of non-
linearities caused by IFE/RET etc. TSFS data was not suitable for PARAFAC
analysis as it was not tri-linear [246]. PARAFAC was optimized to work with tri-
linear data which was characterised by the fact that each component displayed the
same pattern (profile) for the different samples in both excitation and emission modes.
Within the M5eRDF/M5Ye samples, the complex sample matrix affected the
fluorescence profile, leading to non-linear data. This was one of the challenges with
these complex media where the emission and excitation spectra of many fluorophores
overlapped. In the PARAFAC results, the unique profiles obtained were not the true
profiles of the components, as the data was not tri-linear47. Since the PARAFAC
results were found to be lacking, another factor analysis method (MCR) was utilised
to help determine and better understand the behaviour of the underlying components.
MCR worked better with non-trilinear data for the evaluation of the fluorescent

47 For the data to be tri-linear , the same sample profile must hold for different samples but can be
scaled differently as a result of changing concentration.

161

components by bilinear decomposition [325]. MCR was applied to both EEM and
TSFS data to solve for the underlying components. For the M5eRDF and M5Ye
samples, the resolution of the components improved with the use of MCR (Table 31).
Table 31 The number of fluorophores/components determined by PARAFAC and MCR for
M5eRDF and M5Ye
5.3.1 Fluorophore Identification and Profile Changes by
PARAFAC
PARAFAC generated two types of plots: (1) the loadings for the M5eRDF and M5Ye
models (Figure 96 and Figure 98), which approximated the emission or excitation
spectra and (2) the scores, which showed the relative contribution of each
factor/spectrum. From the PARAFAC scores the degree of spectral change can be
approximated (Figure 97) and the possibility of a correlation between scores data and
yeastolate/eRDF concentration was investigated. The scores allowed for visualisation
of how each individual component (or mixtures of components) varied in terms of
contribution as the yeastolate or eRDF concentration changed. The differences in the
intensity values of the EEM spectra (Figure 99 and Figure 100) showed the changing
composition between low and high concentration samples. In samples of higher
concentration the loss of tyrosine signal in favour of the tryptophan signal was evident
and the lesser fluorophores (pyridoxine and riboflavin) became more defined.

Figure 96 PARAFAC loadings excitation (left) and emission (right) for M5eRDF for the replicate
EEM data collections.
250 300 350 400 450 500
0
500
1000
1500
2000
2500
Excitation wavelength (nm)

L1R1
L2R1
L1R2
L2R2
L1R3
L2R3
300 350 400 450 500 550 600
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Emission wavelength (nm)

L1R1
L2R1
L1R2
L2R2
L1R3
L2R3
Dataset Model Type No of Components
M5Ye EEM PARAFAC 3
M5eRDF EEM PARAFAC 2
M5Ye EEM MCR 5
M5Ye TSFS MCR 6
M5eRDF EEM MCR 5
M5eRD FTSFS MCR 6

162

The M5eRDF EEM data used a two component PARAFAC model (Figure 96),
however the components were not well resolved. In the PARAFAC emission
loadings, the first loading was clearly tryptophan with a contribution from tyrosine
(visible as a shoulder at ~ 310 nm). The second emission loading was obviously a
composite emission from multiple fluorophores (folic acid, vitamin B6 and its
derivatives, and riboflavin) [326-328]. The concentration change in M5eRDFS01 and
M5eRDFS10 was less visible in the contour plots (Figure 99) as these samples were
more concentrated compared to the low concentration M5Ye samples. They were
therefore subject to more IFE’s. The PARAFAC score for component one showed an
increase for each sample with increasing eRDF concentration, while the second score
showed a stable signal with minor changes for each sample.

Figure 97 PARAFAC scores results for M5eRDF (left) and M5Ye (right).

Figure 98 PARAFAC loadings excitation (left) and emission (right) of M5Ye for the replicate
EEM data collections.

PARAFAC of the M5Ye EEM data revealed three components (Figure 98). The first
component was clearly tryptophan. The second component featured two unresolved
bands from tyrosine with a shoulder peak indicative of tryptophan. The third
component represented an amalgamated peak for multiple fluorophores (pyridoxine,
1 2 3 4 5 6 7 8 9 10
0.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
Samples

Component1 (72.52%)
Component2 (27.47%)
1 2 3 4 5 6 7 8 9 10
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Samples

Component1 (53.00%)
Component2 (39.41%)
Component3 (7.58%)
250 300 350 400 450 500
0
500
1000
1500
2000
2500
Excitation Wavelength (nm)

L1R1
L2R1
L3R1
L1R2
L2R2
L3R2
L1R3
L2R3
L3R3
300 350 400 450 500550 600
0
0.05
0.1
0.15
0.2
0.25
0.3
Emission Wavelength (nm)

L1R1
L2R1
L3R1
L1R2
L2R2
L3R2
L1R3
L2R3
L3R3

163

folic acid and a shoulder for riboflavin at 520 nm). From Figure 100, the peak
intensity showed a difference in the M5YeS01 and the M5YeS10 samples, as the
fluorophores evolved with the changing concentration. This was in agreement with
the PARAFAC scores where components one and three were increasing with the
increasing yeastolate for each sample. In contrast, component two - which related to
tyrosine decreased as the tryptophan signal, became more dominant at the higher
yeastolate concentration. For M5Ye the third component represented a merged signal
of pyridoxine, folic acid and riboflavin; this signal only became visible at higher
concentration. Therefore, the subsequent PARAFAC scores showed the increase for
each sample. This PARAFAC result indicated that there was more variace in the
M5YE samples compared to M5eRDFsamples. The change in concentration for the
M5Ye samples were more signficant as larger changes in the profile and the intensity
were seen. While the M5eRDF sample set had a higher overall concentration, it was
also more susceptible to IFE/ET and quenching which resulted in less dynamic
changes in this complex medium. The PARAFAC scores, however, showed that there
was some correlation with the change in fluorophores as the eRDF concentration
increased.

Figure 99 Comparison of EEM contour plots for the low concentration M5eRDFS01 (1 g/L) and
the high concentration M5eRDFS10 (6.4 g/L) samples; the low to high concentration is based on
the added eRDF.46
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)

300 350 400 450 500 550 600
250
300
350
400
450
500
50
100
150
200
250

164

Figure 100 Comparison of EEM contour plots for the low concentration M5YESO1 (0.1 g/L) and
the high concentration M5YES10 (1.72 g/L) samples; the low to high concentration is based on
the added yeastolate.46

PARAFAC was able to resolve some components or groups of components as a result
of the responses to the eRDF/yeastolate concentration increase. In previous
PARAFAC studies on complex mixtures, for example with polycyclic aromatic
hydrocarbons (PAHs), it was shown that components tend to be grouped into classes
based on similar spectral and quenching characteristics [329]. This was also observed
here with the M5eRDF and M5Ye samples, where grouping of amino acids and
vitamins was visible in the PARAFAC loadings. This was due to the non-linearity
experienced in the matrix where different regions of the EEM (and thus different
fluorophores) were affected in different ways. This grouping confirmed that
PARAFAC was not very good at resolving the individual fluorophores in these
complex cell culture media.

The PARAFAC results obtained were different from previous studies on cell culture
media; in the study by Calvet et al., it was feasible to resolve more than two or three
components [1, 2]. The main difference came from the co-linearity in the variation of
several fluorophores increasing together within the yeastolate and eRDF. This
behaviour was not modelled by PARAFAC because it was not able to determine if the
fluorophores were different components, and also because PARAFAC assumes
constant profiles in all dimensions48, which was not the case. The complex M5eRDF
and M5Ye samples gave rise to non-linearity in the EEM data. This non-linearity
could be resolved by significantly diluting the samples so that the interactions (energy

48 For succesful PARAFAC, the profile response should be the same.
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)
300 350 400 450 500 550 600
250
300
350
400
450
500
Emission Wavelength (nm)
E
x
c
it
a
ti
o
n
W
a
v
e
le
n
g
th
(
n
m
)

300 350 400 450 500 550 600
250
300
350
400
450
500
50
100
150
200
250

165

transfer/quenching/IFE) between components were minimised. This was not,
however, an ideal solution for media analysis where minimal sample handling was
desired.

Since PARAFAC did not effectively resolve the individual fluorophores, another
factor based analysis method (MCR) was used with the hope of better elucidation of
the fluorophores of M5eRDF and M5YE data.
5.3.2 Fluorophore Identification by MCR Analysis
Spectral profiles obtained by MCR for EEM (Figure 101) and for TSFS (Figure 102),
where the data was separated into 5 and 6 component models respectively.

Figure 101 Resolved emission profiles of the M5eRDF (left) and M5Ye (right) for EEM MCR
models.

Figure 102 Resolved Delta Profiles for M5eRDF (left) and M5Ye (right) from the TSFS data.

The EEM and TSFS spectra were collected for five pure fluorophores (tyrosine,
tryptophan, riboflavin, pyridoxine and folic acid) that were listed in the eRDF
formulation, Table 45. The excitation, emission and delta profiles were then resolved
300 350 400 450 500 550 600
0
100
200
300
400
500
Emission wavelemgth(nm)
S
c
o
re
s

S1- 66.6%
S2- 16.6%
S3- 1.07%
S4- 10.7
S5- 4.9%
300 350 400 450 500 550 600
0
100
200
300
400
500
Emission wavelemgth(nm)
S
c
o
re
s

S1- 68.7%
S2- 14.4%
S3- 0.99%
S4- 109%
S5- 4.81%
20 40 60 80 100 120 140 160 180 200
0
50
100
150
200
250
300
350
400
450
Delta Wavelength (nm)
S
c
o
re
s

S1-41.78%
S2-18.81%
S3-5.30%
S4-18.63%
S5-12.79%
S6-2.45%
20 40 60 80 100 120 140 160 180 200
0
50
100
150
200
250
300
350
Delta Wavelength (nm)
S
c
o
re
s

S1-34.69%
S2-22.45%
S3-2.70%
S4-24.16%
S5-14.06%
S6-1.77%

166

by MCR for comparison to unknown fluorophore profiles recovered from the
M5eRDF and M5Ye samples. The extracted MCR profiles for the media samples
were a reflection of the relative fluorescence emission between components and their
scores indicated their changes in the media environment.

When the recovered emission profiles for the EEM data were compared with the pure
spectra of tryptophan, tyrosine, pyridoxine and riboflavin, very close agreement was
achieved (Figure 103). There were shifts seen in the maximum band position between
the pure and recovered profiles as the extracted profiles were affected by the sample
complexity. The emission profiles recovered for the same fluorophore in the different
media environments (M5eRDF/M5Ye) were in close agreement than when compared
to the pure fluorophore spectra. These spectral shifts were the result of energy
transfer/quenching/IFE that occurred within the media samples. Tyrosine was red
shifted by ~10 nm caused by energy transfer as it overlapped with the absorption of
tryptophan.

There was a large difference between the final recovered component and pure folic
acid spectrum; this indicated that component five was not folic acid. It was difficult to
determine the number of fluorophores present above the 375 nm emission region
because of the lower signal intensity. The 450 nm peak could be a secondary
excitation band for riboflavin [325]. In a proposed assignment of the unknown
component five, it was noted that the fluorescence behaviour was similar to the
biogenic fluorophores NADH and NADPH. NADPH fluorescence at 360/460 nm was
seen (Figure 26) in the analysis of yeast samples and, because yeastolate is a digest of
yeast, NADPH may be a component of yeastolate [211]. However without
comprehensive compositional information about the yeastolate, it was unknown if
NADPH was definitely present in yeastolate. The compositional information for
yeastolate (listed in Table 46) was limited tomainly the amino acid and mineral
content.

167

Figure 103 Emission spectra resolved by MCR from the EEM data for the pure components
(solid line), M5Ye (dotted) and M5eRDF (dashed). The spectra were normalised to area equal to
one.

For TSFS data, both the excitation and delta profiles (Figure 104 and Figure 105)
were considered. Six components were recovered for both M5eRDF and M5Ye, and
when compared to the pure component profiles, they did not align well for the
suspected components. From these plots it was clear that it was impossible to clearly
assign the TSFS components with the pure component spectra. For example in both
the M5eRDF and M5Ye data, the excitation profiles showed that all bands were
excited in the 280 nm to 305 nm range (Table 32). This covered the tyrosine and
tryptophan excitation range. However, pyridoxine, riboflavin and folic acid were not
clearly indicated by the excitation profiles, showing that they were minor contributors
to the overall sample fluorescence. The dynamic environment allowed molecules to
undergo interactions with other media components resulting in various micro-states.
TSFS proved to be too sensitive to these changes. Therefore the elucidation of
components from TSFS data was not as clear as the EEM data. It was easier to
interpret the emission profiles for the EEM data compared to the delta profiles for the
TSFS data. This may be caused by the data displaying the information in different
ways, given that the collection methods were analysing the same samples. From
Figure 104 and Figure 105, the excitation profiles for the fluorophores were very
close. One of the reasons for this was that only a 5 nm step was used between each
excitation and delta profile. The offset measurement approach did not allow for clear
300 350 400 450 500 550 600
0
0.02
0.04
0.06
0.08
0.1
Emission wavelength (nm)
In
te
ns
ity

250 300 350 400 450 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Excitation Wavelength (nm)
In
te
ns
ity
(A
U)

Trp
Tyr
Py
RB
FA

168

resolution of the peaks but the EEM method did. Therefore, for complicated media
samples the EEM MCR approach clearly offers the best method for identifying the
specific components.

Figure 104 TSFS delta and excitation profiles for M5eRDF compared to the pure component
profiles (coloured traces) resolved by MCR. The spectra were normalised.

250 300 350 400 450 500
0
0.1
0.2
0.3
0.4
0.5
0.6
Excitation Wavelength (nm)
In
te
ns
ity

C1
C2
C3
C4
C5
FA
Py
Rb
Tyr
Trp
20 40 60 80 100 120 140 160 180 200
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Delta Wavelength (nm)
In
te
ns
ity

C1
C2
C3
C4
C5
FA
Py
Rb
Tyr
Trp

169

Figure 105 TSFS delta and excitation profiles for M5Ye compared to the pure component
profiles (coloured traces) resolved by MCR. The spectra were normalised.

Table 32 Excitation, Emission and % Fit values for MCR TSFS models using 1–6 factors. The
emission wavelength (λem) was obtained by adding the delta (∆𝛌) to the excitation wavelength
(λex).
Pure Standard M5Ye M5eRDF

λex
(nm)
λem
(nm)

λex
(nm)
λem
(nm)
% Fit
λex
(nm)
λem
(nm)
% Fit
Trp 270 360 C1 285 360 34.6 C1 285 360 41.7
Tyr 270 320 C2 290 345 22.4 C2 290 345 18.8
Py 355 400 C3 280 310 2.7 C3 280 310 5.3
Rb 400 530 C4 305 435 24.1 C4 305 430 18.6
FA 440 490 C5 285 385 14.0 C5 285 300 12.7
C6 285 300 1.7 C6 285 385 2.4

20 40 60 80 100 120 140 160 180 200
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Delta Wavelength (nm)
In
te
ns
ity

C1
C2
C3
C4
C5
FA
Py
Rb
Tyr
Trp
250 300 350 400 450 500
0
0.1
0.2
0.3
0.4
0.5
0.6
Excitation Wavelength (nm)
In
te
n
s
it
y

C1
C2
C3
C4
C5
FA
Py
Rb
Tyr
Trp

170

The EEM scores plot (Figure 106) showed an increase of two components (#2 and #3)
from M5eRDF sample 1 to sample 10 while the other components showed little
change. The TSFS scores plot showed that all components (bar component four)
displayed an increase from M5eRDF sample 1 to sample 10. Component four gave a
stable signal across the scores plot.

Figure 106 M5eRDF MCR scores for EEM model (left) and TSFS model (right).

The scores (Figure 107) for both the EEM and TSFS M5Ye data showed a similar
trend. The scores of the components emitted at shorter wavelengths decreased, the
intermediate components were stable, while at the longer wavelengths the component
scores increased. This pattern was clearer in the TSFS scores. The short wavelength
components were subject to more IFE compared to long wavelength components (as a
result of the higher absorbance that occurred in these regions). The M5Ye scores also
revealed that M5YeS01 deviated from the other score points. This observation was
confirmed by ROBPCA results where the M5YeS01 sample was seen as different
from the other samples. The reason for the altered profile was the low sample
concentration, which gave rise to a more dilute spectral profile of the M5YeS01
sample compared to the other samples.

Figure 107 M5Ye MCR scores plots for EEM model (left) and TSFS model (right).
1 2 3 4 5 6 7 8 9 10
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
Samples

C1 C2 C3 C4 C5
1 2 3 4 5 6 7 8 9 10
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
Samples

C1
C2
C3
C4
C5
C6
1 2 3 4 5 6 7 8 9 10
0.5
1
1.5
2
2.5
3
Samples

C1
C2
C3
C4
C5
1 2 3 4 5 6 7 8 9 10
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
Samples

C1
C2
C3
C4
C5
C6

171

The MCR scores for all components varied as eRDF or yeastolate concentration
increased. It was therefore appropriate that the full or reduced spectral area be used
for the quantitative modelling. If one wanted to quantify individual fluorophores, one
could of course look at a more restricted emission range [1].
5.4 Variance Analysis
5.4.1 PCA Analysis
To assess the degree of spectral variation of the M5 sample sets, PCA was performed
individually on the M5eRDF and M5Ye data and then on the combined
(M5eRDF+M5Ye) sample set. PCA of the combined sample sets clarified the size of
the variance caused by changing each component as well as revealing which
components contribute to changes seen in the EEM signal.49

Figure 108 Graphic results for the PCA analysis of M5eRDF/M5YE comparison, the arrows
show the samples going from low to high concentration. (Top Left) PC1 vs PC2 Scores plot, (Top
Right) Loadings for PC1 (97.15%), (Bottom Left) Loadings for PC2 (2.56%), (Bottom Right)
Loadings for PC3 (0.17%).

49 The same trends occurred with the TSFS data and these results will not be shown.
300 350 400 450 500 550 600
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Emission Wavelength [nm]
L
o
a
d
in
g
s
o
n
P
C
1
(
9
7
.1
5
%
)
300 350 400 450 500 550 600
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
Emission Wavelength [nm]
L
o
a
d
in
g
s
o
n
P
C
2
(
2
.5
6
%
)
300 350 400 450 500 550 600
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Emission Wavelength [nm]
L
o
a
d
in
g
s
o
n
P
C
3
(
0
.1
7
%
)
1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700
-1500
-1000
-500
0
500
1000
1500
Scores on PC 1 (97.15%)
S
c
o
re
s
o
n
P
C
2
(
2
.5
6
%
)
M5eRDF
M5Yeastolate

172

It was not possible to fully segregate the M5eRDF and M5Ye samples in the scores
plots as they partially overlapped due to the common fluorophores. From the analysis
of the loadings plots, it was proposed that PC1 was largely tryptophan signal; PC2
was largely tyrosine signal, while PC3 was an unresolved tryptophan and pyridoxine
peak. From Figure 108(Top Left), it was clear that changes in the M5Ye were greater
alongboth PC1 and PC2 than for M5eRDF. This was attributed to the decrease in the
tyrosine signal and increase in the tryptophan signal with yeastolate concentration
changes (Figure 100 and Figure 97). The M5eRDF PCA result was less susceptible to
variation as a result of the higher concentration of these samples. This led to less
dynamic change being observed in the fluorescence signal with increasing
concentration as seen with the MCR scores (Figure 106).

In the PCA analysis of the individual datasets, only two principal components were
used to describe the variance in each sample set. For the individual M5Ye PCA model
the first component described 95.53% of the explained variance while the second
component described 4.23%. The first component showed a peak with emission at
355 nm i.e. tryptophan, which was the dominant feature in both the M5eRDF and
M5Ye emission. The second component of the individual M5Ye PCA model matched
the second component of the combined model i.e. tyrosine. Besides the peak for
tyrosine in PC2 there was also a negative portion which was representative of the
interaction with the tryptophan as the concentration changed [330]. With the
increasing tryptophan and tyrosine concentration, tryptophan emission became
dominant and radiative energy transfer potentially occurred between tyrosine and
tryptophan fluorophores leading to a reduced signal from the tyrosine peak [331]. For
the individual M5eRDF PCA model, the first component (emission peak at 355 nm)
described 99.68% of the explained variance and while the second component (0.3%)
matched the third component of the combined PCA model. Considering the strength
of the tryptophan signal in the M5eRDF, the observed variation may be attributed to
IFE and energy transfer interactions that occur with tryptophan within M5eRDF
samples.

173

This variance - created by the differing fluorophores, IFE, energy transfer
interactions, and quenching occurring within each sample - will be used to correlate
the eRDF and yeastolate concentrations to the gross fluorescence signal.
5.4.2 ROBPCA Analysis
While the PCA modelling provided a good insight into the size and magnitude of the
fluorescence variance within the M5eRDF and M5Ye samples, its sensitivity to subtle
differences was less than that of robust PCA [236]. ROBPCA flagged samples that
differed from the majority of the data as an outliers. For ROBPCA modelling, the
fluorescence data was unfolded from three-way data cube (sample vs. excitation
wavelength vs. emission wavelength) to a two-way data array (sample vs.
excitation/emission pair). An unfolded landscape is shown in Figure 109.

Figure 109 EEM landscape spectra of a single M5eRDF solution (5.8g/L) and unfolded spectra of
M5eRDF.

Figure 110 ROBPCA Scores plot for PC1 versus PC2 of M5eRDF/M5YE comparison, EEM
Scores (left) and TSFS scores (right). The arrows indicate the direction of the changing
concentration from low to high.

300
400
500
600
300
400
500
0
50
100
150
200
250
Emission Wavelength (nm)Excitation Wavelength (nm)
In
te
n
s
it
y
0 500 1000 1500 2000 2500 3000 3500 4000
0
50
100
150
200
250
300
Data Points
In
te
n
s
it
y
-500 0 500 1000 1500
-200
-100
0
100
200
300
400
Scores1 (69.22%)
S
c
o
r
e
s
1
(
2
2
.6
0
%
)

M5eRDF
M5Ye
-600 -400 -200 0 200 400 600 800 1000 1200
-300
-200
-100
0
100
200
Scores 1 (73.68%)
S
c
o
r
e
s
2
(
2
2
.1
0
%
)

M5Ye
M5eRDF

174

The ROBPCA result showed that the sample concentration influenced sample
fluorescence as the sample scores followed the increasing concentration (Figure 110).
As the concentration changed in the media, so did the photo-physical behaviour as
IFE/quenching/energy transfer increased. This caused differences in the spectra so the
samples at either ends of the concentration range tended to be very different. The
ROBPCA subspaces for EEM and TSFS were orientated differently due to the
difference in the way the data was presented. When comparing the ROBPCA scores
(Figure 110) to the PCA scores (Figure 108, top left), the same pattern of crossing
over was observed as the samples go from low to high concentration.

Table 33 Outlying observations from the EEM and TSFS datasets for M5eRDF, with the outlying
samples being identified by ROBPCA.
Dataset PC’s Outliers
Identified
Outliers Type
M5eRDF EEM R1 3 1, 10 Good Leverage(1), Bad Leverage(10)
M5eRDF EEM R2 3 1, 9 Bad Leverage(1), Good Leverage(9)
M5eRDF EEM R3 2 1, 2, 9 Orthogonal(1), Bad Leverage(2)
Good Leverage(9)
M5eRDF EEM 30 3 8, 10, 16, 21 Good Leverage (8), Bad Leverage(10,21)
Orthogonal (16)
M5eRDF TSFS R1 3 1, 8 Good Leverage(1), Bad Leverage(8)
M5eRDF TSFS R2 3 8, 10 Bad Leverage(10), Good Leverage(8)
M5eRDF TSFS R3 2 1, 9 Orthogonal(1), Bad leverage(9)
M5eRDF TSFS 30 4 8, 9, 16, 21 Good Leverage (8,16), Bad Leverage(9,21)

Samples with either low or high additive concentration were identified as outliers.
The samples in the edges of the data distribution were still valid samples but they
were easily identified as outliers because of the small sample sets. It was however
unwise to disregard these since the variance was a result of the variability associated
with the limited linear range. The majority of outliers in Table 33 and Table 34 were
either good or bad leverage points, indicating that they only deviated along a single
aspect of the spectra. In addition, the orthogonal outliers occurred when there was a
significant change from the mean data profile. A tentative assumption was made here
that good or bad leverage points were representative of a spike in fluorophore
intensity within a sample, while the orthogonal outliers were more frequently seen
with the low concentration samples. This was due to a low tryptophan signal within
the low concentration samples which altered them from the mean data. The PCA

175

results showed that the primary component (mean signal) in both the M5eRDF and
M5Ye was an emission peak at 355 nm which was indicative of tryptophan.

Table 34 Outlying observations from the EEM and TSFS datasets for M5Ye, with the outlying
samples being identified by ROBPCA.
Dataset PC’s Outliers Identified Outliers Type
M5YE EEM R1 3 2 Orthogonal
M5YE EEM R2 3 1 Orthogonal
M5YE EEM R3 2 1, 2 Orthogonal
M5YE EEM 30 4 10, 11, 17, 21, 20, 30 Good Leverage (11,21)
Bad Leverage(10, 17,20, 30)
M5YE TSFS R1 2 1, 2 Orthogonal
M5YE TSFS R2 3 1, 9 Bad Leverage(1)
Good Leverage(9)
M5YE TSFS R3 2 1, 2 Orthogonal
M5YE TSFS 30 4 10, 11, 14, 17, 21, 27 Good Leverage(10)
Bad Leverage(14,27)
Orthogonal (11,17,21)

For the M5eRDF and M5Ye datasets, the outliers identified using ROBPCA were not
seen in the SERS and Raman data because only PCA was carried out for these
methods. The PCA results gave no outliers for the Raman or SERS data. In the SERS
modelling of the M5Ye, the PLS scores revealed that M5YeS06 deviated along the
second and third components of the model. This type of behaviour would have been
picked up by a ROBPCA model. This is because the ROBPCA results for the
fluorescence data highlighted the end-range samples as deviating from the mean,
whereas the PCA results gave no outlier. The linearity range with these sample sets
was too big, causing the limits to be exceeded. As a result, excluding end-range
samples to reduce the concentration range may be necessary in order to achieve
accurate quantification. In essence, this indicated that the fluorescence method was
more-sensitive to matrix effects and smaller concentration ranges should be used. A
similar situation was observed for single component quantification modelling [1].

176

5.5 Quantitative Analysis of M5eRDF and M5Ye
The PCA and ROBPCA scores plots (Figure 108and Figure 110) revealed that there
was a clear linear trend observed with increasing concentrations of yeastolate and
eRDF in the M5Ye and M5eRDF data respectively. This trend indicated that the
fluorescence data was suitable for developing linear PLS models for yeastolate and
eRDF.

The data was unfolded prior to quantitative analysis (PLS) of EEM and TSFS (Figure
109). The use of unfolded PLS was chosen because it takes into consideration the
analyte-background interactions when describing the calibration data as part of the
number of latent variables (components) selected [332].

During the development of fluorescence based calibration models, the full region and
two reduced regions were selected to focus on the strong analyte signal. The contour
plots (Figure 111) for the EEM of the full and the two reduced regions showed the
varying levels of information in the different regions.

Figure 111 Contour plots for the Full Range (λex/λem 230–520/270–600 nm), Reduced Area A
(λex/λem 230–315/270–435 nm) and Reduced Area B (λex/λem 250–360/285–425 nm) of a single
M5eRDF solution.46

E
x
c
it
a
ti
o
n
w
a
v
e
le
n
g
th
(
n
m
)
Emission wavelength (nm)
300 350 400 450 500 550 600
250
300
350
400
450
500
E
x
c
it
a
ti
o
n
w
a
v
e
le
n
g
th
(
n
m
)
Emission wavelength (nm)
280 300 320 340 360 380 400 420
230
240
250
260
270
280
290
300
310
E
x
c
it
a
ti
o
n
w
a
v
e
le
n
g
th
(
n
m
)
Emission wavelength (nm)
300 320 340 360 380 400 420
250
260
270
280
290
300
310
320
330
340
350
360
(c)
(b)
(a)

177

 For the EEM data the selected regions were:
o Reduced region A - excitation 230–315 nm and emission 270–435 nm
o Reduced region B - excitation 250–360 nm and emission 285–425 nm.

 For the TSFS data the selected regions were:
o Reduced region A - excitation 230–400 nm and ∆λ 10–180 nm
o Reduced region B - excitation 250–310 nm and ∆λ 10–140 nm

Unfolded PLS was performed on the individual data collections for EEM and TSFS
datasets with the inclusion of all the samples (Table 137 to Table 140). Some
anomalies were noted, however, and so unfolded PLS was repeated on the averaged
data of the M5eRDF and M5Ye sample sets (Table 35 to Table 38). When unfolded
PLS modelling was performed on the individual runs for M5eRDF data, the sample
M5eRDFS09R1 deviated from the expected measurement line. As a result, this
sample was removed when generating the averaged M5eRDF sample set. The
ROBPCA results were used as a guide for erroneous samples, but with so few
samples over a large range, the ROBPCA results were hypersensitivity to linear
deviations.

For the M5Ye samples, the unfolded PLS on the individual runs revealed two
anomalies. First, the sample M5YeS01R1 was an odd measurement; its spectrum was
removed from the averaged dataset (Figure 112). The second anomaly was the non-
linear behaviour of the high concentration samples. In the calibration models of the
individual runs, there was a dramatic shift in between samples M5YeS07 and
M5YeS08. When the spectra were investigated the biggest change was seen with the
tryptophan peak. The intensity of the tryptophan peak was plotted (Figure 112). It
showed a linear increase in signal up to sample M5YeS06, but after the data became
non-linear. In order to improve the calibration model for the M5Ye data, the sample
set was reduced to only six samples covering a linear range of 0.1–1.0 g/L yeastolate.

178

Figure 112 Predicted versus expected plots of the EEM calibration model for all 30 M5Ye
samples (Left) and the intensity changes of the tryptophan peak (285/355 nm.) per sample
(Right).
5.5.1 Correlation of eRDF Concentration to the M5eRDF
Fluorescence Data
In the calibration modelling of the averaged M5eRDF data (using 10 samples), a good
correlation between the fluorescence signal and the concentration of eRDF was found
for all of the PLS models (EEM/TSFS). The best EEM model was constructed from
the reduced region B (λex/λem 250–360/285–425 nm) with no pre-processing while the
TSFS model used the full spectral area and no pre-processing (Figure 113). The
TSFS data models built with no pre-processing exceeded the performance of the
models generated after MSC and normalisation. A similar trend was evident for the
EEM data in the case of the full region and the reduced region B. Pre-processing
proved useful, as the second best calibration model for the EEM data was produced
using the reduced region A model after normalisation. Overall pre-processing and
region selection did not have a major influence on the M5eRDF model performances
(Table 35 and Table 36). There was little difference between the TSFS and EEM
models, with the EEM being marginally better in terms of r2 and REP (7.29% versus
7.83%).
R2 = 0.835
3 Latent Variables
RMSEC = 0.20977
RMSECV = 0.23678
R2 = 0.835
3 Latent Variables
RMSEC = 0.20977
RMSECV = 0.23678
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)

R2 = 0.835
3 Latent Variables
RMSEC = 0.20977
RMSECV = 0.23678
1 2 3 4 5 6 7 8 9 10
220
230
240
250
260
270
280
290
300
Samples
In
te
n
s
it
y

M5YeR1
M5YeR2
M5YeR3

179

Figure 113 Predicted versus expected plots for M5eRDF of the EEM calibration model (Left)
and the TSFS calibration model (Right).

Table 35 Calibration models using unfolded PLS regression for averaged M5eRDF EEM data.
Model LV R2 RMSEC
g/L
RMSECV
g/L
REP%
Full Range (λex 230–520 nm λem 270–600 nm)
M5eRDF EEM Unfolded 3 0.990 0.17 0.34 9.18
M5eRDF EEM Unfolded MSC 3 0.978 0.25 0.48 12.97
M5eRDF EEM Unfolded Norm 3 0.985 0.21 0.42 11.35
Reduced Region A (λex 230–315 nm λem 270–435 nm)
M5eRDF EEM Unfolded 3 0.988 0.19 0.34 9.18
M5eRDF EEM Unfolded MSC 4 0.998 0.08 0.41 11.08
M5eRDF EEM Unfolded Norm 4 0.998 0.08 0.32 8.64
Reduced Region B (λex 250–360 nm λem 285–425 nm)
M5eRDF EEM Unfolded 3 0.993 0.14 0.27 7.29
M5eRDF EEM Unfolded MSC 2 0.962 0.33 0.47 12.70
M5eRDF EEM Unfolded Norm 2 0.974 0.27 0.38 10.27

Table 36 Calibration models using unfolded PLS regression for averaged M5eRDF TSFS data.
Model LV R2 RMSEC
g/L
RMSECV
g/L
REP%
Full Range (λex 230–520 nm ∆𝛌 10–200 nm)
M5eRDF TSFS Unfolded 3 0.991 0.15 0.29 7.83
M5eRDF TSFS Unfolded MSC 3 0.979 0.24 0.46 12.43
M5eRDF TSFS Unfolded Norm 3 0.986 0.20 0.38 10.27
Reduced Region A (λex 230–310 nm ∆𝛌 10–190 nm)
M5eRDF TSFS Unfolded 3 0.988 0.19 0.34 9.18
M5eRDF TSFS Unfolded MSC 4 0.995 0.12 0.47 12.70
M5eRDF TSFS Unfolded Norm 4 0.997 0.08 0.35 9.45
Reduced Region B (λex 250–310 nm ∆𝛌 10–140 nm)
M5eRDF TSFS Unfolded 2 0.978 0.25 0.34 9.18
M5eRDF TSFS Unfolded MSC 2 0.961 0.34 0.47 12.70
M5eRDF TSFS Unfolded Norm 2 0.972 0.28 0.39 10.54
1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.993
3 Latent Variables
RMSEC = 0.14858
RMSECV = 0.27636
1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.991
3 Latent Variables
RMSEC = 0.15924
RMSECV = 0.29901

180

5.5.2 Correlation of Ye Concentration to the M5Ye
Fluorescence Data
The PCA showed that the spectral variation caused by changing yeastolate
concentration was much larger than for eRDF (Figure 108). One consequence of this
was that at the higher yeastolate concentrations the spectral changes became non-
linear. The concentration range was reduced by removing the highest concentration
samples to get a decent qualitative model. As a result the M5Ye calibration models
were stronger than M5eRDF models and showed a good linear performance for both
EEM and TSFS data (Figure 114). The best models were built using the reduced
region B with MSC pre-processing for EEMand with no pre-processing for the TSFS.
The model performance improved when the reduced region B was used over the full
or reduced region A. Pre-processing positively adjusted the model results, as it
minimized spectral variations that were not caused by changes in the analyte
concentration. The best TSFS calibration model was reasonable with a REP of 7.27%
but was weaker than the EEM whose model gave REP of 5.45%.

Figure 114 Predicted versus expected for M5Ye with the EEM calibration model (left) and the
TSFS calibration model (right).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.998
3 Latent Variables
RMSEC = 0.014723
RMSECV = 0.034587
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Expected (g/L)
P
re
d
ic
te
d
(
g
/L
)
R2 = 0.994
3 Latent Variables
RMSEC = 0.0246
RMSECV = 0.047416

181

Table 37 Unfolded PLS calibration model for the averaged M5Ye EEM data using 6 samples.
Model LV R2 RMSEC
g/L
RMSECV
g/L
REP%
Full Range (λex 230–520 nm λem 270–600 nm)
M5YeEEM Unfolded 2 0.962 0.05 0.10 18.18
M5YeEEM Unfolded MSC 2 0.986 0.03 0.06 10.90
M5YeEEM Unfolded Norm 2 0.990 0.03 0.05 9.09
Reduced Area A (λex 230–315 nm λem 270–435 nm)
M5YeEEM Unfolded 2 0.955 0.06 0.11 20.00
M5YeEEM Unfolded MSC 2 0.980 0.04 0.07 12.72
M5YeEEM Unfolded Norm 2 0.987 0.03 0.06 10.90
Reduced Area B (λex 250–360 nm λem 285–425 nm)
M5YeEEM Unfolded 3 0.997 0.01 0.04 7.27
M5YeEEM Unfolded MSC 3 0.998 0.01 0.03 5.45
M5YeEEM Unfolded Norm 3 0.995 0.02 0.06 10.90

Table 38 Unfolded PLS calibration model for the averaged M5Ye TSFS data using 6 samples.
Model LV R2 RMSEC
g/L
RMSECV
g/L
REP%
Full Range (λex 230–520 nm ∆𝛌 10–200 nm)
M5YeTSFS Unfolded 2 0.962 0.05 0.10 18.18
M5YeTSFS Unfolded MSC 2 0.982 0.04 0.06 10.90
M5YeTSFS Unfolded Norm 2 0.987 0.03 0.05 9.09
Reduced Area A (λex 230–310 nm ∆𝛌 10–190 nm)
M5YeTSFS Unfolded 2 0.954 0.06 0.11 20.00
M5YeTSFS Unfolded MSC 2 0.977 0.04 0.08 14.54
M5YeTSFS Unfolded Norm 2 0.983 0.03 0.06 10.90
Reduced Area B (λex 250–310 nm ∆𝛌 10–140 nm)
M5YeTSFS Unfolded 3 0.994 0.02 0.04 7.27
M5YeTSFS Unfolded MSC 3 0.994 0.02 0.04 7.27
M5YeTSFS Unfolded Norm 3 0.991 0.02 0.06 10.90

When comparing the EEM and TSFS models, the differences in performance were
minor for the M5eRDF samples. However, the EEM model outperformed the TSFS
model for the M5Ye data (Table 37and Table 38). Overall, the EEM method was
better in the analysis of M5Ye and M5eRDF data. This was shown by the resolution
of the fluorophores in the MCR results as well as by the good calibration models
formed with the M5eRDF/M5Ye data.

182

5.5.3 Model Evaluation
The fluorescence models were evaluated in the same way as the prediction ability of
the best models was evaluated in the SERS study.
 For the M5eRDF calibration model, the M5Ye samples were taken to see if
their eRDF concentrations could be correctly predicted.
 For the M5Ye calibration model, yeastolate concentrations were predicted on
the M5eRDF data.

Table 39 Prediction results of eRDF concentration in M5Ye, from the EEM and TSFS calibration
models.
Sample ID Expected
Concentration
Predicted Conc.
from EEM Model
Predicted Conc.
from TSFS Model
M5YeS01 3.4 8.83 9.43
M5YeS02 3.4 5.98 6.83
M5YeS03 3.4 4.15 4.97
M5YeS04 3.4 3.46 3.93
M5YeS05 3.4 3.46 3.65
M5YeS06 3.4 3.39 3.54
M5YeS07 3.4 3.29 2.79
M5YeS09 3.4 4.90 4.03
M5YeS10 3.4 4.25 3.69

The predictions of the eRDF concentration from the M5Ye sample set showed
reasonable results for the mid–to high– concentration samples (Table 39). It was seen
in the PCA scores that the overlap between the M5eRDF and M5Ye was small due to
the different media changes within each sample set. The low concentration samples
were very poorly predicted (with the prediction values for S01 and S02 approximately
double the expected 3.4 g/L). The primary reason for the relatively low model
accuracy and over estimation of the low concentration sample was the large spectral
difference in the signal for the test set compared to M5eRDF calibration sample set.
The variance was due to the different matrix environment given the varying yeastolate
or eRDF concentration. It was obvious from the test set EEM contour plot (Figure
119) that there was a large change between the lowest to the highest prediction sample
(the main fluorescence signal changes from 275/310 nm to 285/355 nm). This
corresponded to the decrease in tyrosine signal as the tryptophan signal increased. In
the calibration EEM contour plots (Figure 120) the main fluorescence signal remained
at 285/355 nm from low to high concentration samples.

183

Table 40 Prediction results of M5Ye EEM and TSFS calibration models for yeastolate
concentration in M5eRDF.
Sample ID Expected
Concentration
Predicted Conc.
from EEM
Model
Predicted Conc.
from TSFS
Model
M5eRDFS01 1.0 1.00 0.91
M5eRDFS02 1.0 1.01 0.96
M5eRDFS03 1.0 1.04 0.98
M5eRDFS04 1.0 1.03 1.00
M5eRDFS05 1.0 0.98 0.97
M5eRDFS06 1.0 1.02 1.02
M5eRDFS07 1.0 1.06 1.06
M5eRDFS08 1.0 1.09 1.10
M5eRDFS09 1.0 1.05 1.08
M5eRDFS10 1.0 1.04 1.10

The prediction ability of the M5Ye calibration model was better than the M5eRDF
model (Table 40). The better predictability came from the similarity in the M5eRDF
and M5Ye samples which contained the same first principal component (i.e. the
tryptophan emission) that was observed from the PCA results (Figure 108). Therefore,
in predicting the yeastolate concentration of M5eRDF samples, the correlation
between the calibration and prediction was based on the tryptophan emission band.
Also, no dramatic changes were observed in M5eRDF test set; this can be seen in both
the PARFAC and MCR scores (Figure 97, Figure 106 and Figure 107).

Quantitative models for the more complex media components, eRDF and yeastolate,
were developed. Both EEM and TSFS calibration and prediction models worked, but
the EEM method outperformed TSFS with its results and ease of interpretation. These
EEM/TSFS methods were part of the holistic approach to cell culture media analysis
and this quantification method complemented previous work developed for the
quantification of specific fluorophores [1, 2]. Calvet et al. showed that specific
analytes could be quantified in chemically defined media. The prediction of
tryptophan, tyrosine, pyridoxine, riboflavin and folic acid in eRDF media solutions
worked using NPLS analysis (samples were prepared using a standard addition
method (SAM)). These models used narrower spectral ranges from EEM data centred
on each analyte emission. The analytes were predicted with the following error levels:
tryptophan (4.5%), tyrosine (5.5%), pyridoxine (4.6%), riboflavin (2.3%), and folic
acid (8.7%) [2].

184

However, the impact of changing the concentration of a complex ingredient like
eRDF or yeastolate resulted in large changes in the concentration of multiple
fluorophores. This led to large matrix changes which reduced the potential linear
ranges for quantification. For the M5eRDF, there were fluctuations that limited the
correlation performance to the changes in eRDF concentration. For the M5eRDF data,
there were large fluctuations in the test set that limited the correlation performance of
the M5eRDF calibration model. This could be seen in the poor prediction of the eRDF
concentration in the test set. The most accurate results were close to the mean
concentration which had a stable tryptophan signal. These results indicated that eRDF
could potentially be modelled within a specified range. For the M5Ye, it caused
fluctuations and non-linearity to accurately correlate the changes inyeastolate
concentration over the full linear range, but once the linear was reduced the
correlation improved and the prediction worked well.

This study showed that it was feasible to quantify complex ingredients of cell culture
media. However the method does require further refinements such as
 Setting the ingredient concentration range to a more realistic range for X g/L ±
25%, so if we were to redo the M5eRDF sample set a more appropriate
concentration range might be: 3.4 g/L ± 0.85 g/L.
 Using more samples in the calibration modeling. Li et al. used a ratio of 1 test
sample to 4 or 5 calibration samples [7]. The minimum sample number should
be set to 20 as the use of 10 samples in this model was not enough. The
replicate measurement is usually set at three but double that may minimize day
to day variation better.
 More suitable test sets designed using risk analysis of the media would be
better for assessing model accuracy in real terms. These test sets would also
inform the design of the calibration sample set so that there was correct
overlap between the PCA subspaces of the samples.
5.6 General Conclusions: Fluorescence Study
Fluorescence spectroscopy performed reasonably well for the analysis of media
components, their behaviour and analyte quantification in cell culture media. Both
data types (EEM and TSFS) were information rich and easy to collect. As both scan

185

types were collected on the same instrument with a short measurement time (less than
15 minutes), preliminary studies could use both scan types before deciding which was
optimal for a particular process. For the samples used in this study, the EEM
measurement performed better for yeastolate and eRDF analysis and was also easier
to interpret in the qualitative analysis. With the TSFS data, it was less comparable to
standards as the excitation profile/delta profile showed with the MCR results, making
TSFS data analysis more challenging. The TSFS data is, however, adaptable and can
be converted into an EEM format (Figure 94) and its main advantage is that it avoids
the Rayleigh scatter that transects the EEM data.

The linearity of the calibration plots indicated good performances in correlating the
eRDF and yeastolate concentrations with the EEM and TSFS models. Using unfolded
PLS it was possible to quantify the eRDF concentration with a 7.2% error level for the
EEM data and a 7.8% error level for the TSFS data. For yeastolate concentration,
calibration model errors were 5.4% and 7.2% using EEM and TSFS respectively.

The M5Ye calibration model worked, and its prediction ability was good. The
prediction results for yeastolate indicated that it was possible to quantify yeastolate
from the gross analyte signal of test media samples. The M5eRDF samples generated
good calibration models but had weaker prediction results. When the M5Ye
prediction results were compared to the M5eRDF, it highlighted the importance of the
test set. The test set for the M5eRDF model deviated too much from the calibration
samples as there was a rapidly varying peak intensity from the IFEs at the edges of the
concentration range of the test set samples. The type of deviation experienced in
practice was seen with the test set for M5Ye calibration as the test set was devoid of
major fluctuations.

If comparing the results from fluorescence, Raman and SERS, the results revealed
how the sample set under investigation changed the performance of the model. The
fluorescence method improved the calibration for the M5eRDF data but the test set
was undergoing too many matrix fluctuations for effective prediction. For this reason
the prediction results of Raman and SERS data outperforms the fluorescence data (see
Table 29). In the case of M5Ye samples, the fluorescence method outperformed

186

Raman and SERS in terms of both the calibration and prediction performance. The
Raman data was not able to measure the weak yeastolate signal. The SERS method
improved the correlation with the enhancement of the signal but its prediction
performance suffered from spectral overlap from other components which led to poor
prediction results. The overall quantification of yeastolate in the M5Ye samples was
best achieved using fluorescence measurements.

Within every fluorescence landscape, part of the acquired signal contained little or no
fluorescence; these areas were eliminated to improve results. In the quantitative
analysis, reduced area selections were favoured with the EEM and TSFS data for
M5Ye. This finding was in agreement with other studies where variable selection for
the most prominent excitation and emission combinations were chosen leading to
improvement in prediction capability [7, 333, 334]. In previous studies with industrial
cell media, Li et al. correlated the fluorescence signal in EEM data to the glycoprotein
yields [7]. The use of the full spectrum resulted in weak calibration but when variable
selection methods were applied the model performance improved. The R2 value went
from 0.2 to 0.94 and the REP from ~8.94% to ~3.62% depending on the process stage
being tested. This study showed that correlation was only dependent on high intensity
fluorescence bands and emission properties of the analyte of interest. For
multicomponent analytes like eRDF and yeastolate, specific emission ranges span
multiple emission bands. As a result precise area selection could not be used to pin
point the fluorescent analytes for multicomponent mixtures but mathematical variable
selection50 which take into account the full area could be applied to improve model
performance [7, 335-338].

In industrial settings, large quantities of powders are mixed together to produce the
media; the variance seen would be within specification of industrial limits. Therefore
no large fluctuations would be expected unless an outlier was present. Thus the
method developed in this work could be adapted for industrial use where the variance
in samples and the concentration ranges used are smaller.

50 Methods like competitive adaptive reweighted sampling (CARS) and ant colony optimization
(ACO) based on mathematical evaluation of each wavelength importance would be better for the
multiple component analytes.

187

6 Conclusions and Future Work
The FDA and the biopharmaceutical industry want to better regulate and understand
bio-processes through the use of quality by design (QbD) and PAT. One area where
QbD and PAT can be applied for better control is media formulation. Prior to use,
media are tested in order to determine whether they are fit for purpose. This can
include small scale performance testing, but this is time consuming and expensive.
Variability in media can have a large impact on product quality and process
performance [77, 339-341]. The objective of this thesis was to develop rapid
spectroscopic methods for quantifying certain components in media which could then
be used for media formulation analysis. Raman, SERS and Fluorescence
spectroscopic methods offer the possibility to carry out non-destructive qualitative
and quantitative analysis of cell culture media in near real time.
6.1 Spectroscopic Conclusions
In Chapter 3, the use of Raman spectroscopy for quantifying D-glucose, eRDF and
yeastolate in cell culture media was investigated. Raman is a fast technique and
allows for high throughput screening with the use of an in-house developed stainless
steel 96 well-plate [285]. The common features of the Raman spectra were the large
water signal, baseline offset and weak analyte signal. Water elimination was
investigated because of the large water signal, but this led to spectral artefacts.
However the majority of the baseline offset was removed by changing the
experimentalsetup. For the remainder, various pre-processing methods were used to
account for the baseline offset and spectral variance in order to improve the
performance of the data for quantification. Data pre-processing methods also
enhanced the analyte signal and removed scatter while region selection further
improved the quantification performances. Using Raman for quantitative analysis of
media samples enabled D-glucose determination with ~4.7% error, but eRDF and
yeastolate with larger ~16% and ~38% error respectively. The D-glucose model
worked better than an in line Raman method for D-glucose with 15.3% error and was
close to the reference method (Bioprofile 400 analyser) with 4% [100]. It was,
however, harder to get a good calibration with the more complicated media analytes.

188

The complex nature of eRDF and yeastolate meant that they were open to analysis by
all of the spectroscopic method being investigated (Table 41). Chapter 4 and 5
covered the quantification of eRDF and yeastolate by SERS and fluorescence
respectively. SERS gave the signal enhancement required to compensate for the
strong water signal but the preparation of colloid made it the most labour intensive
method. The results showed some promise in quantifying the complex media
components as a whole but the error levels were too high to be useful. SERS gave an
improved yeastolate model (12% error), while the eRDF model that it produced
(~16% error) matched the Raman model. The SERS method can be improved with
more control over sample to colloid ratio, increased sample numbers, reduced linear
range as the data exceed ± 25%, incubation time, sample re-suspension, and use of
aggregating agent. . In order to make the SERS method optimal, further development
is required. Areas to improve upon are: sample to colloid ratio; whether to use an
aggregating agent; and improved reproducibility testing, by taking ten replicate
measurements.

Table 41 Summary of the best calibration models for the media components generated from the
different methods.
Dataset Method Sample
Number
Range Pre-processing REP%
M5Glu

Raman 32 800–1680 cm–1 BC FD MSC 4.66
M5eRDF

Raman 10 707–1853 cm–1 BC FST11MSC 15.94
M5eRDF

SERS 10 250–3311 cm–1 WE NINF 15.94
M5eRDF EEM 10 λex 250–360 nm
λem285–425 nm
Unfolded 7.29
M5eRDF TSFS 10 λex 230–520 nm
∆λ 10–200 nm
Unfolded 7.83
M5Ye

Raman 10 707–1853 cm–1 Avg FD11 38.46
M5Ye

SERS 10 1260–1444 cm–1 BC FST11MSC 12.08
M5Ye

EEM 6 λex 250–360 nm
λem 285–425 nm
Unfolded MSC 5.45
M5Ye

TSFS 6 λex 250–310 nm
∆λ 10–140 nm
Unfolded 7.27

Fluorescence data was informative and easy to collect and the measurements were
reproducible. The EEM and TSFS of yeastolate and eRDF shared common

189

fluorophores due to their biogenic components (amino acids, peptides and vitamins).
Prior to quantitative analysis, an extra assessment of the data was conducted using
ROBPCA for outlier detection. ROBPCA indicated that the samples at the ends of the
concentration deviated most, in other words the matrix (in photophysical terms) had
changed very significantly. This indicated that the linear range was too large for
accurate quantification. For practical operational use, these methods would be better if
they were developed with a more restricted concentration range (i.e. a range that
varied by ± 25% of the set concentration value) like the M5YE after the sample
numbers were reduced. This would limit the extent of matrix variations and generate
accurate quantitative methods. Using unfolded PLS, it was possible to quantify
yeastolate concentration with 5.4% error level for the EEM data and 7.2% for the
TSFS data; for eRDF, the error levels were 7.2% and 7.8% using EEM and TSFS
respectively. EEM outperformed TSFS in both the calibration and prediction
performance for both M5eRDF and M5Ye. It also gave better resolved bands for
identification of underlying fluorophores.
6.2 Future Studies and Solutions
This thesis is an initial attempt at developing a protocol for the quantitative analysis of
media formulations using spectroscopic methods. The first issue in developing any
robust quantitative protocol is to determine which spectroscopic methods are optimal
for the different ingredient types in a complex media. The second issue was to assess
the best methods of extracting the quantitative information; in this case we used
chemometrics. We have determined that Raman spectroscopy is only suitable for
relatively high concentration single components (e.g. glucose) and that EEM is best
suited for the complex ingredients where the individual measurable analytes are
present in low concentration. In order to take these methods further, a revised
experimental plan inspired by these findings and based more closely on the expected
maximum concentration variances expected (±10%). The first change to be made
should be the sample number as the ultimate accuracy of any chemometric method is
related to sample number. Here we used relatively small sample sets to determine
feasibility, but in practice, sample sets would be 3-4 time larger.
The second major change to be implemented in the experimental design would be to
restrict the concentration variance of any ingredient to ±15% or the nominal set value

190

for that ingredient in the media. This value is a truer reflection of the ingredient
variance that industry is likely to accept, thus there is no need to look at the wider
ranges tested here.
.
The third change would be to use independent test set validation throughout, and the
test set samples should span a similar range to the calibration set with a 1 to 5 ratio of
test samples to calibration samples. They should be evenly spread throughout the
PCA subspace.

191

7 References
1. Calvet A, Li B, Ryder AG. Rapid quantification of tryptophan and tyrosine in
chemically defined cell culture media using fluorescence spectroscopy. Journal of
Pharmaceutical and Biomedical Analysis. 2012;71(0):89-98.
2. Calvet A, Li B, Ryder AG. A rapid fluorescence based method for the
quantitative analysis of cell culture media photo-degradation. Analytica Chimica
Acta. 2014;807(0):111-9.
3. Li B, Ryan PW, Ray BH, Leister KJ, Sirimuthu N, Ryder AG. Rapid
characterization and quality control of complex cell culture media solutions using
raman spectroscopy and chemometrics. Biotechnology and bioengineering.
2010;107(2):290-301.
4. Li B, Ryan PW, Shanahan M, Leister KJ, Ryder AG. Fluorescence Excitation–
Emission Matrix (EEM) Spectroscopy for Rapid Identification and Quality Evaluation
of Cell Culture Media Components. Applied spectroscopy. 2011;65(11):1240-9.
5. Li B, Sirimuthu NMS, Ray BH, Ryder AG. Using surface-enhanced Raman
scattering (SERS) and fluorescence spectroscopy for screening yeast extracts, a
complex component of cell culture media. Journal of Raman Spectroscopy.
2012;43(8):1074-82.
6. Li B, Ray BH, Leister KJ, Ryder AG. Performance Monitoring of a
Mammalian Cell Based Bioprocess using Raman Spectroscopy. Analytica Chimica
Acta. 2013.
7. Li B, Shanahan M, Calvet A, Leister K, Ryder AG. Comprehensive,
quantitative bioprocess productivity monitoring using fluorescence EEM
spectroscopy and chemometrics. Analyst. 2014.
8. Ryan PWL, B. Shanahan, M. Leister, K.J. Ryder, A.G. Prediction of cell
culture media performance using fluorescence spectroscopy. Analytical Chemistry.
2010;82(4):1311-7.
9. Walsh G. Current status of biopharmaceuticals: Approved products and trends
in approvals. Knäblein J, editor 2005. 1–34 p.
10. Walsh G. Second-generation biopharmaceuticals. European Journal of
Pharmaceutics and Biopharmaceutics. 2004;58(2):185-96.
11. Macdougall IC, Eckardt K-U. Novel strategies for stimulating erythropoiesis
and potential new treatmentsfor anaemia. The Lancet. 2006;368(9539):947-53.
12. Rader R. FDA Biopharmaceutical Product Approvals and Trends.
Biotechnology Information Institute, www biopharma com/approvals_2011 html,
accessed Jan. 2012;16.
13. Zhu J. Mammalian cell protein expression for biopharmaceutical production.
Biotechnology Advances. 2011.
14. Cartwright T. Animal cells as bioreactors: Cambridge Univ Pr; 1994.
15. Hossler P, Khattak SF, Li ZJ. Optimal and consistent protein glycosylation in
mammalian cell culture. Glycobiology. 2009;19(9):936-49.
16. Glassey J, Gernaey KV, Clemens C, Schulz TW, Oliveira R, Striedner G, et al.
Process analytical technology (PAT) for biopharmaceuticals. Biotechnology Journal.
2011.
17. Langer E. Advances in Large-Scale Biopharmaceutical Manufacturing and
Scale-Up Production: ASM Press and BioPlan Associates, Inc., Washington, DC;
2007.

192

18. Sanchez S, Demain A. Special issue on the production of recombinant
proteins. Biotechnology Advances. 2012;30(5):1100-1.
19. Walsh G. Biopharmaceuticals: biochemistry and biotechnology: Wiley-
Blackwell; 2003.
20. Masters JRW. Animal cell culture: A practical approach: Oxford University
Press Oxford; 2000.
21. Butler M. Animal cell cultures: recent achievements and perspectives in the
production of biopharmaceuticals. Applied Microbiology and Biotechnology.
2005;68(3):283-91.
22. Rhiel M, Cohen MB, Murhammer DW, Arnold MA. Nondestructive near‐
infrared spectroscopic measurement of multiple analytes in undiluted samples of
serum‐based cell culture media. Biotechnology and Bioengineering. 2002;77(1):73-
82.
23. Riley MR, Crider HM, Nite ME, Garcia RA, Woo J, Wegge RM.
Simultaneous measurement of 19 components in serum‐containing animal cell culture
media by Fourier transform near‐infrared spectroscopy. Biotechnology progress.
2001;17(2):376-8.
24. Sivakesava S, Irudayaraj J, Ali D. Simultaneous determination of multiple
components in lactic acid fermentation using FT-MIR, NIR, and FT-Raman
spectroscopic techniques. Process Biochemistry. 2001;37(4):371-8.
25. Sivakesava S, Irudayaraj J, Demirci A. Monitoring a bioprocess for ethanol
production using FT-MIR and FT-Raman spectroscopy. Journal of Industrial
Microbiology and Biotechnology. 2001;26(4):185-90.
26. Perez-Garcia O, Escalante FME, de-Bashan LE, Bashan Y. Heterotrophic
cultures of microalgae: Metabolism and potential products. Water Research.
2011;45(1):11-36.
27. Nielsen LK. The encyclopedia of cell technology: Wiley New York; 2000.
28. Hauser H, Wagner R. Mammalian cell biotechnology in protein production:
Walter De Gruyter Inc; 1997.
29. Altamirano C, Paredes C, Cairo J, Godia F. Improvement of CHO cell culture
medium formulation: simultaneous substitution of glucose and glutamine.
Biotechnology progress. 2000;16(1):69-75.
30. Barngrover D, Thomas J, Thilly W. High density mammalian cell growth in
Leibovitz bicarbonate-free medium: effects of fructose and galactose on culture
biochemistry. Journal of cell science. 1985;78(1):173-89.
31. Altamirano C, Cairo J, Godia F. Decoupling cell growth and product
formation in Chinese hamster ovary cells through metabolic control. Biotechnology
and Bioengineering. 2001;76(4):351-60.
32. Coster J, McCauley R, Hall J. Glutamine: metabolism and application in
nutrition support. Asia Pacific journal of clinical nutrition. 2004;13(1):25-31.
33. van der Valk J, Brunner D, De Smet K, Fex Svenningsen Å, Honegger P,
Knudsen LE, et al. Optimization of chemically defined cell culture media – Replacing
fetal bovine serum in mammalian in vitro methods. Toxicology in Vitro.
2010;24(4):1053-63.
34. Bashor MM. Dispersion and disruption of tissues. Methods in enzymology.
1979;58:119.
35. Williamson J, Cox P. Use of a new buffer in the culture of animal cells.
Journal of General Virology. 1968;2(2):309-12.

193

36. Incorprated AB. Product Data Sheet HEPES Buffer Solution (1M) 2010.
Available from: http://www.atlantabio.com/assets/PDS/File/PDS%20-
%20HEPES%20Buffer%20Solution.pdf.
37. Luo S, Pal D, Shah SJ, Kwatra D, Paturi KD, Mitra AK. Effect of HEPES
buffer on the uptake and transport of P-glycoprotein substrates and large neutral
amino acids. Molecular pharmaceutics. 2010;7(2):412-20.
38. Delmouly K, Belondrade M, Casanova D, Milhavet O, Lehmann S. HEPES
inhibits the conversion of prion protein in cell culture. Journal of General Virology.
2011;92(5):1244-50.
39. Denry Sato J, Kan, M. Media for culture of mammalian cells. Current
Protocols in Cell Biology. 1998;1:1-.2.
40. Langer E. Trends to Watch in the Biopharmaceutical Industry: The Economy,
Approvals, Contamination, and Going Animal-Free. 2010.
41. Cleland D, Jastrzembski K, Stamenova E, Benson J, Catranis C, Emerson D,
et al. Growth characteristics of microorganisms on commercially available animal-
free alternatives to tryptic soy medium. Journal of microbiological methods.
2007;69(2):345-52.
42. Jayme D, Watanabe T, Shimada T. Basal medium development for serum-free
culture: a historical perspective. Cytotechnology. 1997;23(1):95-101.
43. Kim DY, Lee JC, Chang HN, Oh DJ. Development of serum-free media for a
recombinant CHO cell line producing recombinant antibody. Enzyme and microbial
technology. 2006;39(3):426-33.
44. Sung Y, Lim S, Chung J, Lee G. Yeast hydrolysate as a low-cost additive to
serum-free medium for the production of human thrombopoietin in suspension
cultures of Chinese hamster ovary cells. Applied Microbiology and Biotechnology.
2004;63(5):527-36.
45. Franěk F, Hohenwarter O, Katinger H. Plant protein hydrolysates: preparation
of defined peptide fractions promoting growth and production in animal cells cultures.
Biotechnology progress. 2000;16(5):688-92.
46. Michiels J-F, Barbau J, De Boel S, Dessy S, Agathos S, Schneider Y-J.
Characterisation of beneficial and detrimental effects of a soy peptone, as an additive
for CHO cell cultivation. Process Biochemistry. 2011;46(3):671-81.
47. Schlaeger E-J. The protein hydrolysate, Primatone RL, is a cost-effective
multiple growth promoter of mammalian cell culture in serum-containing and serum-
free media and displays anti-apoptosis properties. Journal of immunological methods.
1996;194(2):191-9.
48. Burteau CC, Verhoeye FR, Molsl JF, Ballez J-S, Agathos SN, Schneider Y-J.
Fortification of a protein-free cell culture medium with plant peptones improves
cultivation and productivity of an interferon-γ-producing CHO cell line. In Vitro
Cellular & Developmental Biology-Animal. 2003;39(7):291-6.
49. Heidemann R, Zhang C, Qi H, Rule JL, Rozales C, Park S, et al. The use of
peptones as medium additives for the production of a recombinant therapeutic protein
in high density perfusion cultures of mammalian cells. Cytotechnology.
2000;32(2):157-67.
50. Dick LW, Kakaley JA, Mahon D, Qiu D, Cheng KC. Investigation of proteins
and peptides from yeastolate and subsequent impurity testing of drug product.
Biotechnology progress. 2009;25(2):570-7.
http://www.atlantabio.com/assets/PDS/File/PDS%20-%20HEPES%20Buffer%20Solution.pdf
http://www.atlantabio.com/assets/PDS/File/PDS%20-%20HEPES%20Buffer%20Solution.pdf

194

51. Mosser M, Chevalot I, Olmos E, Blanchard F, Kapel R, Oriol E, et al.
Combination of yeast hydrolysates to improve CHO cell growth and IgG production.
Cytotechnology. 2012:1-13.
52. Even MS, Sandusky CB, Barnard ND. Serum-free hybridoma culture: ethical,
scientific and safety considerations. Trends in Biotechnology. 2006;24(3):105-8.
53. Sarkar A. Stem Cell Culture: Discovery Publishing House.
54. Ham RG. Clonal growth of mammalian cells in a chemically defined,
synthetic medium. Proceedings of the National Academy of Sciences of the United
States of America. 1965;53(2):288.
55. Gstraunthaler G. Alternatives to the use of fetal bovine serum: serum-free cell
culture. Altex. 2003;20(4):275-81.56. Murakami H, Masui H, Sato GH, Sueoka N, Chow TP, Kano-Sueoka T.
Growth of hybridoma cells in serum-free medium: ethanolamine is an essential
component. Proceedings of the National Academy of Sciences. 1982;79(4):1158.
57. Kong ZLM, M. Murakami, H. Shinohara, K. Establishment of a
macrophagelike cell line derived from U-937, human histiocytic lymphoma, grown
serum-free. In Vitro Cellular & Developmental Biology-Plant. 1990;26(10):949-54.
58. Kawahara MN, A. Terada, S. Kato, K. Tsumoto, K. Kumagai, I. Miki, M.
Mahoney, W. Ueda, H. Nagamune, T. Replacing factor-dependency with that for
lysozyme: affordable culture of IL-6-dependent hybridoma by transfecting artificial
cell surface receptor. Biotechnology and Bioengineering. 2001;74(5):416-23.
59. Chua F, Oh SKW, Yap M, Teo WK. Enhanced IgG production in eRDF media
with and without serum:: A comparative study. Journal of immunological methods.
1994;167(1-2):109-19.
60. Garnick R, Solli N, Papa P. The role of quality control in biotechnology: An
analytical perspective. Analytical Chemistry. 1988;60(23):2546-57.
61. Hanko VP, Rohrer JS. Determination of carbohydrates, sugar alcohols, and
glycols in cell cultures and fermentation broths using high-performance anion-
exchange chromatography with pulsed amperometric detection. Analytical
Biochemistry. 2000;283(2):192-9.
62. Hanko VP, Rohrer JS. Determination of amino acids in cell culture and
fermentation broth media using anion-exchange chromatography with integrated
pulsed amperometric detection. Analytical Biochemistry. 2004;324(1):29-38.
63. Fa Y, Yang H, Ji C, Cui H, Zhu X, Du J, et al. Simultaneous determination of
amino acids and carbohydrates in culture media of< i> Clostridium thermocellum</i>
by valve-switching ion chromatography. Analytica Chimica Acta. 2013;798:97-102.
64. Buha SM, Panchal A, Panchal H, Chambhare R, Patel PR, Kumar S, et al.
HPLC-FLD for the Simultaneous Determination of Primary and Secondary Amino
Acids from Complex Biological Sample by Pre-column Derivatization. Journal of
chromatographic science. 2011;49(2):118-23.
65. Genzel Y, König S, Reichl U. Amino acid analysis in mammalian cell culture
media containing serum and high glucose concentrations by anion exchange
chromatography and integrated pulsed amperometric detection. Analytical
Biochemistry. 2004;335(1):119-25.
66. Potvin J, Fonchy E, Conway J, Champagne CP. An automatic turbidimetric
method to screen yeast extracts as fermentation nutrient ingredients. Journal of
microbiological methods. 1997;29(3):153-60.

195

67. Pohlscheidt M, Charaniya S, Bork C, Jenzsch M, Noetzel TL, Luebbert A.
Bioprocess and fermentation monitoring. Encyclopedia of Industrial Biotechnology.
2012.
68. Sun Y-t, Zhao L, Ye Z, Fan L, Liu X-p, Tan W-S. Development of a fed-batch
cultivation for antibody-producing cells based on combined feeding strategy of
glucose and galactose. Biochemical Engineering Journal. 2013;81:126-35.
69. Food, Administration D. Guidance for Industry: PAT—a framework for
innovative pharmaceutical development, manufacturing, and quality assurance.
Rockville, MD. 2004.
70. De Beer T, Allesø M, Goethals F, Coppens A, Vander Heyden Y, De Diego
HL, et al. Implementation of a process analytical technology system in a freeze-drying
process using Raman spectroscopy for in-line process monitoring. Analytical
Chemistry. 2007;79(21):7992-8003.
71. De Beer T, Burggraeve A, Fonteyne M, Saerens L, Remon JP, Vervaet C.
Near infrared and Raman spectroscopy for the in-process monitoring of
pharmaceutical production processes. International Journal of Pharmaceutics.
2011;417(1):32-47.
72. De Beer TRM, Bodson C, Dejaegher B, Walczak B, Vercruysse P, Burggraeve
A, et al. Raman spectroscopy as a process analytical technology (PAT) tool for the in-
line monitoring and understanding of a powder blending process. Journal of
Pharmaceutical and Biomedical Analysis. 2008;48(3):772.
73. Johansson J, Pettersson S, Folestad S. Characterization of different laser
irradiation methods for quantitative Raman tablet assessment. Journal of
Pharmaceutical and Biomedical Analysis. 2005;39(3-4):510.
74. Clarke SJ, Littleford RE, Smith WE, Goodacre R. Rapid monitoring of
antibiotics using Raman and surface enhanced Raman spectroscopy. Analyst.
2005;130(7):1019-26.
75. Jain G, Jayaraman G, Kökpinar Ö, Rinas U, Hitzmann B. On-line monitoring
of recombinant bacterial cultures using multi-wavelength fluorescence spectroscopy.
Biochemical Engineering Journal. 2011;58–59(0):133-9.
76. Lee HLT, Boccazzi P, Gorret N, Ram RJ, Sinskey AJ. In situ bioprocess
monitoring of Escherichia coli bioreactions using Raman spectroscopy. Vibrational
Spectroscopy. 2004;35(1-2):131.
77. Lee HW, Christie A, Yoon S. Characterization of Raw Material Influence on
Mammalian Cell Culture Performance: Chemometric Based Data Fusion Approach.
78. Lourenço N, Lopes J, Almeida C, Sarraguça M, Pinheiro H. Bioreactor
monitoring with spectroscopy and chemometrics: a review. Analytical and
bioanalytical chemistry. 2012;404(4):1211-37.
79. Macaloney G, Draper I, Preston J, Anderson K, Rollins M, Thompson B, et al.
At-Line Control and Fault Analysis In an Industrial High Cell Density Escherichia
Coli Fermentation, Using NIR Spectroscopy. Food and Bioproducts Processing.
1996;74(4):212-20.
80. Marose S, Lindemann C, Scheper T. Two-Dimensional Fluorescence
Spectroscopy: A New Tool for On-Line Bioprocess Monitoring. Biotechnology
progress. 1998;14(1):63.
81. Triadaphillou S, Martin E, Montague G, Norden A, Jeffkins P, Stimpson S.
Fermentation process tracking through enhanced spectral calibration modeling.
Biotechnology and Bioengineering. 2007;97(3):554-67.

196

82. Ryder AGV, John De Li, Boyan Ryan, Paul W. Sirimuthu, Narayana M. S.
Leister, Kirk J. A stainless steel multi-well plate (SS-MWP) for high-throughput
Raman analysis of dilute solutions. Journal of Raman Spectroscopy.
2010;41(10):1266-75.
83. Settle FA. Handbook of Instrumental Techniques for Analytical Chemistry.
Journal of Liquid Chromatography Related Technologies. 1998;21(19):3072-6.
84. Ewing GW. Analytical instrumentation handbook: CRC Press; 1997.
85. Willard HH, Merritt Jr LL, Dean JA. Instrumental methods of analysis. Settle
Jr FA, editor: Wadsworth Pub. Co.; 1988.
86. Turrell G, Corset J. Raman microscopy: developments and applications:
Access Online via Elsevier; 1996.
87. McCreery R. Raman spectroscopy for chemical analysis: Wiley-Interscience;
2000.
88. Hollas M. Modern Spectroscopy. 1987. New York: John Wiley & Sons.
89. Straughan B, Walker S. Spectroscopy: Chapman and Hall London; 1976.
90. Collette TW, Williams TL. The role of Raman spectroscopy in the analytical
chemistry of potable water. Journal of Environmental Monitoring. 2002;4(1):27-34.
91. Egawa T, Yeh S-R. Structural and functional properties of hemoglobins from
unicellular organisms as revealed by resonance Raman spectroscopy. Journal of
Inorganic Biochemistry. 2005;99(1):72-96.
92. Smith E, Dent G. Modern Raman spectroscopy: a practical approach: Wiley;
2005.
93. Sebastian R, Petra R, Marion AS, Dorothea B, Malgorzata B, Hartwig S, et al.
Nondestructive analysis of single rapeseeds by means of Raman spectroscopy. Journal
of Raman Spectroscopy. 2007;38(3):301-8.
94. Ortiz C, Zhang D, Xie Y, Davisson VJ, Ben-Amotz D. Identification of insulin
variants using Raman spectroscopy. Analytical Biochemistry. 2004;332(2):245.
95. Zhu G, Zhu X, Fan Q, Wan X. Raman spectra of amino acids and their
aqueous solutions. Spectrochimica Acta Part A: Molecular and Biomolecular
Spectroscopy. 2011;78(3):1187-95.
96. Kang J, Yuan X, Dong X, Gu H, editors. The effect of aqueous solution in
Raman spectroscopy. Photonics and Optoelectronics Meetings 2009; 2009:
International Society for Optics and Photonics.
97. Cannizzaro C, Rhiel

2014BKissanePhD

Neuropsicologia

UFRGS

Ferramentas de estudo

Conteúdos escolhidos para você

ClinChemLearning_Abbot

Carbon Dots: Síntese e Aplicações

Carbon Dots: Propriedades e Aplicações

Preparation, characterisation and biological evaluation of biopolymer-coated multi-walled carbon nanotubes for sustained-delivery of silibinin

Cost-effective urine recycling enabled by a synthetic osteoyeast platform for production of hydroxyapatite

Perguntas dessa disciplina

Os minerais podem executar inúmeras funções nas células, com destaque para a função de regulação, que consiste em regular a atividade de algumas pr...

A química analítica é um campo que abrange estudos e aplicações práticas interligadas a diversos setores da atividade humana. Nesse domínio, ocorre...

O corpo humano é uma complexa rede de sistemas que depende da interação adequada de diversas moléculas essenciais para manter a saúde, o desempenho...

Prova Período para responder 8497 29/07/2025 02/08/2025 1 [Laboratório Virtual - Determinação de Sódio e Potássio em Bebida Isotônica por Fotometri...

Os tamponantes intracelulares são a primeira linha de defesa para o controle homeostático do pH muscular durante os exercícios de alta intensidade....

Conteúdos escolhidos para você

ClinChemLearning_Abbot

Carbon Dots: Síntese e Aplicações

Carbon Dots: Propriedades e Aplicações

Preparation, characterisation and biological evaluation of biopolymer-coated multi-walled carbon nanotubes for sustained-delivery of silibinin

Cost-effective urine recycling enabled by a synthetic osteoyeast platform for production of hydroxyapatite

Perguntas dessa disciplina

Os minerais podem executar inúmeras funções nas células, com destaque para a função de regulação, que consiste em regular a atividade de algumas pr...

A química analítica é um campo que abrange estudos e aplicações práticas interligadas a diversos setores da atividade humana. Nesse domínio, ocorre...

O corpo humano é uma complexa rede de sistemas que depende da interação adequada de diversas moléculas essenciais para manter a saúde, o desempenho...

Prova Período para responder 8497 29/07/2025 02/08/2025 1 [Laboratório Virtual - Determinação de Sódio e Potássio em Bebida Isotônica por Fotometri...

Os tamponantes intracelulares são a primeira linha de defesa para o controle homeostático do pH muscular durante os exercícios de alta intensidade....

Mais conteúdos dessa disciplina