Prévia do material em texto
Development of Rapid Spectroscopic Methods for the Analysis of Cell Culture Media By Bridget Kissane, B.Sc. Thesis presented for the degree of Ph.D. of the National University of Ireland Submitted: November 2014 Supervisor: Dr. A.G. Ryder Head of Department: Prof. Paul Murphy School of Chemistry National University of Ireland, Galway - 2 - i Table of Contents Declaration ..................................................................................................................... I List of abbreviations ..................................................................................................... II Abstract ......................................................................................................................... V 1 Introduction ............................................................................................................ 1 1.1 Biopharmaceuticals – Next Generation Drugs ................................................ 1 1.2 Cell Lines and Expression systems ................................................................. 2 1.3 Culture Requirements ...................................................................................... 3 1.3.1 Feed Strategy ........................................................................................... 3 1.3.2 Cell Culture Media ................................................................................... 4 1.3.3 Media Advances....................................................................................... 7 1.4 Analysis of Cell Culture Media ..................................................................... 10 1.5 Process Analytical Technology (PAT) Principles ......................................... 12 1.6 Spectroscopic Methods Suitable for Cell Culture Media Analysis ............... 13 1.6.1 Raman Spectroscopy .............................................................................. 13 1.6.2 Surface Enhanced Raman Spectroscopy (SERS) .................................. 22 1.6.3 Fluorescence Spectroscopy .................................................................... 35 1.7 Project Objectives ......................................................................................... 50 2 Chemometrics, Materials and Methods ............................................................... 51 2.1 Chemometrics................................................................................................ 51 2.2 Qualitative and Quantitative analysis ............................................................ 51 2.3 Calibration Modelling ................................................................................... 52 2.4 Figures of Merit for Modelling ..................................................................... 53 2.4.1 Correlation Coefficient (r2) .................................................................... 53 2.4.2 Root Mean Square Error of Calibration (RMSEC)................................ 54 2.4.3 Root Mean Square Error of Cross Validation (RMSECV) .................... 54 2.4.4 Root Mean Square Error of Prediction (RMSEP).................................. 55 2.5 Multivariate Analysis .................................................................................... 55 2.5.1 Variance Analysis .................................................................................. 56 ii 2.5.2 Regression .............................................................................................. 58 2.5.3 Factor Analysis ...................................................................................... 60 2.6 Data Pre-Processing ...................................................................................... 63 2.6.1 Mean Centring ....................................................................................... 63 2.6.2 Derivatives ............................................................................................. 63 2.6.3 Multiplicative Scatter Correction (MSC)............................................... 64 2.6.4 Normalisation ......................................................................................... 65 2.7 Variable/Wavelength Selection ..................................................................... 67 2.7.1 Moving Window Partial Least Squares (MWPLS) ............................... 68 2.8 Outliers .......................................................................................................... 68 Materials and Methods ................................................................................................. 70 2.9 Materials ........................................................................................................ 70 2.9.1 Sample Materials ................................................................................... 70 2.9.2 Colloid Materials ................................................................................... 70 2.10 Workflow Description ............................................................................... 70 2.11 Sample Preparation and Handling. ............................................................ 71 2.12 Datasets ...................................................................................................... 71 2.12.1 Model Media Samples: .......................................................................... 71 2.12.2 M1Glu Media Dataset ............................................................................ 72 2.12.3 M3Glu Media Dataset ............................................................................ 73 2.12.4 M5Glu Media Dataset ............................................................................ 74 2.12.5 T5 Test Dataset ...................................................................................... 76 2.13 Complex Media Components Experiments: .............................................. 76 2.13.1 eRDF Media Dataset (M5eRDF) ........................................................... 76 2.13.2 Yeastolate Media Dataset (M5Ye)......................................................... 77 2.14 Measurement Techniques .......................................................................... 78 2.14.1 Raman Spectroscopy and SERS ............................................................ 78 2.14.2 Fluorescence Spectroscopy .................................................................... 80 iii 2.15 Sample Holders.......................................................................................... 81 2.16 Specific Chemometric Procedures ............................................................. 82 2.16.1 Baseline Offset Correction ..................................................................... 82 2.16.2 Water Elimination .................................................................................. 82 2.16.3 Water to Analyte Ratio .......................................................................... 83 2.16.4 Model Evaluation Settings ..................................................................... 83 3 Development using Raman Spectroscopy for the Analysis of Cell Culture Media Components ................................................................................................................. 86 3.1 Spectral Analysis ........................................................................................... 86 3.1.1 Averaged Aqueous D-glucose (M1Glu) Data........................................ 88 3.1.2 Baseline Offset Correction of the Aqueous D-glucose (M1Glu) Data .. 89 3.1.3 Water Background Elimination of the Aqueous D-glucose (M1Glu) Data 90 3.2 Reproducibility of Raman Data Collection ................................................... 92 3.3 Evaluation of Spectral Range ........................................................................94 3.4 Calibration Modelling ................................................................................... 96 3.5 Spectral Pre-Processing of M1Glu Sample Set ............................................. 99 3.5.1 Pre-area and Post-area Selection for Spectral Pre-Processing ............... 99 3.5.2 Multiplicative Scatter Correction of M1GluR2 Data ........................... 100 3.5.3 Normalisation of M1GluR2 Data ......................................................... 103 3.5.4 Derivative Pre-Processing of M1GluR2 Data...................................... 108 3.5.5 MSC-FD and FD-MSC Pre-Processing of M1GluR2 Data ................. 110 3.6 Outcomes from M1Glu Data Analysis ........................................................ 113 3.7 Quantification of D-glucose in a ternary mixture (M3Glu-Data) ............... 114 3.7.1 Spectral Analysis of M3Glu Data ........................................................ 115 3.7.2 Reproducibility .................................................................................... 116 3.7.3 Quantitative Analysis: Calibrating D-glucose in M3Glu Data ............ 118 3.8 Quantification of D-glucose in a quinary mixture (M5Glu-Data) .............. 120 3.8.1 Spectral Analysis and Reproducibility of M5GLU Data ..................... 120 iv 3.8.2 Quantification: Glucose in M5Glu Data .............................................. 124 3.9 Quantification of eRDF and Yeastolate in quinary mixtures (M5eRDF and M5Ye) .................................................................................................................... 125 3.9.1 Spectra Analysis of M5eRDF and M5Ye Data.................................... 125 3.9.2 Quantification: eRDF in M5eRDF....................................................... 126 3.9.3 Quantification: Yeastolate in M5Ye .................................................... 128 3.10 Model Validation ..................................................................................... 129 3.10.1 Prediction Performance by Sample Splitting into a Training and Test set 130 3.10.2 Independent Test Set Prediction .......................................................... 132 3.11 General Conclusions: Raman Analysis ................................................... 134 4 Surface Enhanced Raman Spectroscopy (SERS) Analysis of Complex Media Components ............................................................................................................... 138 4.1 Rationale for Quantitative Analysis using SERS ........................................ 138 4.2 Experimental Considerations for SERS Analysis ....................................... 139 4.2.1 The Absorption Spectrum (λ max and FWHM) ..................................... 139 4.2.2 Sampling Time ..................................................................................... 140 4.2.3 Reproducibility .................................................................................... 143 4.3 Spectral Analysis of M5eRDF and M5Ye Data .......................................... 144 4.4 Region Selection for Quantitative Analysis ................................................ 146 4.5 Quantitative Analysis of Yeastolate in M5Ye SERS Data ......................... 146 4.6 Quantitative Analysis of eRDF in M5eRDF SERS Data ............................ 149 4.7 Model Evaluation ........................................................................................ 151 4.8 General Conclusions: SERS Analysis of M5eRDF and M5YE .................. 152 5 Fluorescence Spectroscopy Analysis of Complex Media Components ............ 154 5.1 The EEM/TSFS Analytical Procedure ........................................................ 155 5.2 Spectral Overview of Media Samples (M5eRDF and M5Ye) .................... 156 5.3 Assessing Fluorophore Contributions from Media Fluorescence ............... 160 5.3.1 Fluorophore Identification and Profile Changes by PARAFAC ......... 161 5.3.2 Fluorophore Identification by MCR Analysis ..................................... 165 v 5.4 Variance Analysis ....................................................................................... 171 5.4.1 PCA Analysis ....................................................................................... 171 5.4.2 ROBPCA Analysis............................................................................... 173 5.5 Quantitative Analysis of M5eRDF and M5Ye ............................................ 176 5.5.1 Correlation of eRDF Concentration to the M5eRDF Fluorescence Data 178 5.5.2 Correlation of Ye Concentration to the M5Ye Fluorescence Data ...... 180 5.5.3 Model Evaluation ................................................................................. 182 5.6 General Conclusions: Fluorescence Study .................................................. 184 6 Conclusions and Future Work ........................................................................... 187 6.1 Spectroscopic Conclusions .......................................................................... 187 6.2 Future Studies and Solutions ....................................................................... 189 7 References .......................................................................................................... 191 8 Appendix ............................................................................................................ 214 8.1 Supplementary Information for Chapter One.............................................. 214 8.2 Supplementary Information for Chapter Two ............................................. 217 8.2.1 Kyokuto eRDF Medium ...................................................................... 217 8.2.2 Difco TC Yeastolate UF ...................................................................... 218 8.3 Supplementary Information for Chapter Three ........................................... 219 8.3.1 Calibration Models for M1Glu Data (Replicate Models) .................... 219 8.3.2 Calibration Models For M3Glu Data (Replicate Models) ................... 228 8.3.3 Calibration Models For M5Glu Data (Replicate models).................... 233 8.3.4 Calibration Models For M5eRDF Data for the Conventional Raman Data 238 8.3.5 Calibration Models for M5Ye Data for the Conventional Raman ....... 240 8.4 Supplementary Information for Chapter Four ............................................. 242 8.4.1 Reproducibility of the Replicate SERS Data using PCA..................... 242 8.4.2 Calibration Models for M5eRDF for the SERS Data .......................... 244 8.4.3 Calibration Models for M5Ye for the SERS Data ............................... 256 vi 8.5 Supplementary Information for Chapter Five ............................................. 268 8.5.1 Spectral Overview ................................................................................ 268 8.5.2 PARAFAC ........................................................................................... 268 8.5.3 Calibration Models for the Fluorescence Data (Replicate Runs) ........ 270 I Declaration I declare that the work included in this thesis is my own work and has not been previously submitted for a degree to this or any other academic institution. Bridget Kissane. II List of abbreviations ACO – Ant Colony Optimization AE-IPAD Anion-Exchange Chromatography - Integrated Pulsed Amperometric Detection API – Active Pharmaceutical Ingredient ATP – Adenosine Triphosphate AVG – Averaged BC – Baseline Correction BSS – Balanced Salt Solutions CCD – Charge Coupled Device CD – Chemically Defined CHO – Chinese Hamster Ovary CIP – Cleaned In Place CoAdReS – Competitive Adaptive Reweighted Sampling DMEM – Dulbecco Minimal Essential Medium E. coli – EscherichiaColi ED – Electrochemical Detection EEM – Excitation Emission Matrix eRDF – Enhanced RPMI/DMEM/F12 F12/F10 – Ham’s Nutrient Mixture medium FAD – Flavin Adenine Dinucleotide FD – First Derivative FDA – Food and Drug Administration FDMSC – First Derivative Multiplicative Scatter Correction FMN – Flavin Mononucleotide FSH – Follicle-Stimulating Hormone HEPES – N-2-hydroxyethylpiperazine-N’-2-ethanesulfonic Acid HPLC – High Performance Liquid Chromatography IC – Ion Chromatography IFE – Inner Filter Effect IR – Infrared LB – Lysogeny Broth LC-MS – Liquid Chromatography–Mass Spectrometry LFH – Laminar Flow Hood LH – Luteinizing Hormone III LOD – Limit of Detection LOOCV – Leave One Out Cross Validation MCCV – Monte Carlo Cross Validation MCD – Minimum Covariance Determinant MDCK – Madin Darby Canine Kidney cell lines MEM – Minimal Essential Medium MIR – Mid Infrared MSC – Multiplicative Scatter Correction MWPLSR – Moving Window Partial Least Squares Regression NAD – Nicotinamide Adenine Dinucleotide NADH – Reduced Nicotinamide Adenine Dinucleotide NADP – Nicotinamide Adenine Dinucleotide Phosphate NADPH – Reduced Nicotinamide Adenine Dinucleotide Phosphate NIR – Near Infrared NMR – Nuclear Magnetic Resonance Norm – Normalisation PARAFAC – Parallel Factor Analysis PAT – Process Analytical Technology PCA – Principal Component Analysis Phe – Phenylalanine PLS – Partial Least Squares QE – Quantum Efficiency RDF – RPMI/DMEM/F12 2:1:1 Mixture REP – Relative Error of Prediction RET – Radiative Energy Transfer RMSEC – Root Mean Square Error of Calibration RMSECV – Root Mean Square Error of Cross Validation RMSEP – Root Mean Square Error of Prediction RNA – RiboNucleic Acid ROBPCA – Robust Principal Component Analysis RPMI – Roswell Park Memorial Institute medium SERS – Surface Enhanced Raman Spectroscopy SFS – Synchronous Fluorescence Scan SIMCA – Soft Independent Modelling by Class Analogy SSR – Sum of Squared Residue TChr – Thiochrome IV Trp – Tryptophan TSB – Tryptic Soy Broth TSFS – Total Synchronous Fluorescence Scan Tyr – Tyrosine UPLS – Unfolded Partial Least Squares Regression UV – Ultra Violet VIP – Variable Importance in Projection WE – Water Elimination YPD – Yeast Extract-Peptone-Dextrose broth V Abstract Industrial scale cell culture is used for the production of many therapeutic agents such as protein and vaccines. Cell culture medium is a vital raw material used in these production processes. Formulation analysis of the medium is thus an essential task of any bioprocess. The medium is a critical aspect of the process because it has to supply all of the necessary nutrients and other factors to ensure growth and productivity. Small variations in medium composition can alter cell metabolism, thereby changing process efficiency and productivity. There is an ongoing need for analytical methods to ensure reproducible medium formulations; therefore, real–time qualitative and quantitative analysis of medium components by spectroscopic methods in combination with chemometrics has the potential to be adapted as a PAT tool in bioprocesses. This thesis investigates the spectroscopic analysis and quantification of three medium components - D-glucose, eRDF and yeastolate - in model medium formulations by Raman, Surface Enhanced Raman Scattering (SERS) and two fluorescence approaches (Excitation Emission Matrix (EEM) and Total Synchronous Fluorescence Scan (TSFS)). These methods were used in conjunction with chemometrics to provide a wealth of information about medium composition: qualitative assessment and outlier detection through principal component analysis and robust principal component analysis, fluorophore detection and identification using parallel factor analysis and multivariate curve resolution, and quantitative analysis achieved with partial least squares. These studies complement previous studies in this laboratory where specific component quantification [1, 2] and variance analysis were used for characterising, screening [3-5] and quantifying the performances of cell culture media by spectroscopic methods [6-8]. The advantages of spectroscopic methods are that they require little to no sample preparation and they give spectra with rich information content suitable for the discrimination of subtle chemical and physical effects. The goal of this work was to see if these spectroscopic methods could be used to accurately quantify medium components, both simple (glucose) and complex (yeastolate and eRDF). The end-use VI application was to develop a quality assurance method for correct medium preparation/formulation. Quantitative accuracy varied with the methods due to various experimental factors. Various different pre-processing techniques were used to minimise unwanted spectral effects such as noise, intensity and baseline differences. With Raman, quantification of D-glucose, eRDF and yeastolate was achieved with an error of ~5%, ~16% and ~38% respectively. The SERS model gave error percentages of 16% for the eRDF and 12% error for yeastolate, while the best fluorescence model gave error figures of 5.4% for yeastolate and 7.2% for eRDF. These models show the potential of these spectroscopic methods for the measurement/identification of individual medium components within complex cell culture medium. However, the error level obtained suggests that improvement could be achieved through modification of the current experimental setup which would then lead to more accurate prediction of component concentrations. VII 1 1 Introduction 1.1 Biopharmaceuticals – Next Generation Drugs Originating in the 1990s, the term biopharmaceuticals represents a class of therapeutics produced by modern biotechnology techniques. These include protein based products (produced by genetic engineering), and monoclonal antibodies, (produced by hybridoma technology). During the 1990’s the concept of nucleic acid based drugs was developed for use in gene therapy and anti-sense technology. Such products as well as interfering RNA’s and decoy oligonucleotides are also considered to be biopharmaceuticals [9]. Developed by Genentech in collaboration with Eli Lilly in 1982, the first biopharmaceutical to gain marketing approval was Humulin1 produced in E. coli. This marked the true beginning of the biopharmaceutical industry [10, 11]. Figure 1 The number of food and drug administration (FDA) approvals for new biopharmaceutical products by year since the first biopharmaceutical in 1982[12]. Since Humulin was first approved, the FDA has approved more than 100 new recombinant protein therapeutics and more than 300 non-recombinant biopharmaceuticals [13]. In 2012, 18 products received approval from the FDA; the 1 Recombinant Human Insulin 2 majority of these products were bio-better, me-too, or follow-on products, nine of which were considered to be new biopharmaceutical entities [12]. 1.2 Cell Lines and Expression systems Cell lines are the hosts for the production of biopharmaceuticals due to their ability to produce proteins that can be used for medical treatments. The choice of a cell culture expression system depends on the product, product yield and timeframe required for both the growth phase and purification stage. Generally, microbial systems grow quickly on simple, inexpensive culture media. However, they are incapable of post- translational modifications and the rate of incorrect folding of proteins is higher than with mammalian cell lines[14]. The majority of approved biopharmaceuticals are expressed in mammalian cell lines, mainly Chinese hamster ovary (CHO). However, expression in mammalian cell lines is more technically complex and expensive when compared to E. coli based systems. Eukaryotic cell lines, unlike prokaryotic cell lines, are capable of carrying out post- translational modifications such as glycosylation. Many important biopharmaceuticals are naturally glycosylated, such as erythropoietin, blood factor VIII and hormones (follicle-stimulating hormone (FSH), luteinizing hormone (LH)). Glycosylation may be required for biological activity, to increase serum half-life, protein stability, or reduce immunological problems. In some cases unglycosylated versions of a naturally glycosylated protein retain the therapeutic properties of the native protein. While expression in lower eukaryotic systems such as saccharomyces cerevisiae is possible, glycosylation patterns are more similar to that of native human protein if expressed in an animal cell line [13, 15, 16]. Mammalian cells share many metabolic processes and similar characteristics such as protein expression; however some replica strains of cell lines may differ in the metabolic requirements and production performance. These differences originate from genomic changes that occur during transfection of the parental line to cultivation of the cell line [17]. Current statistics of biopharmaceutical cell expression systems reveal the following production figures [18]: 3 45% originates from mammalian cell lines - CHO is dominant at 35%, while other cell lines produce 10% of products 40% originates from bacterial cell lines - 39% in Escherichia coli and 1% in other bacteria The remaining 15% comes from yeast based fermentations. 1.3 Culture Requirements Bioprocesses involve the cultivation of cell lines within a bioreactor2 for the production of a desired product. The cultivation and production of target biological products depends on bioreactor conditions (such as oxygen, pH, temperature and feed strategy), nutrients supply and cell culture medium. 1.3.1 Feed Strategy High concentrations of cells and long fermentation times would ideally result in a higher product yield. However, growth and production can be constrained by the accumulation of by-products and the depletion of vital nutrients. Several feed strategies are available and are shown in Figure 2. The choice of feed strategy for the bioreactor is critical. In order to get the optimum product yield at the end of a fermentation cycle, it must be matched to the specific cell line for best results [19, 20]. 2 A bioreactor or fermentor is a reaction vessel containing a liquid medium to support cell growth. Fermentor refers to the vessel in which the fermentation of single-celled organisms occurs while a bioreactor is the vessel for the culture of animal cells. 4 Figure 2 Different types of feed strategies for fermentation and cell culturing3 [19-21]. 1.3.2 Cell Culture Media Fermentations are controlled by their supply of nutrients, irrespective of the culture method; hence a critical aspect of any fermentation is the medium. Firstly, the medium has to supply all of the necessary nutrients and other materials to maintain all the different processes in the cell, which include: synthesis of new cells and cellular products and consumption of substrates for energy metabolism. It also supplies vitamins and minerals to act as catalysts, and bulk inorganic ions which function as both catalytic and physiological factors [20]. Its secondary functions are to minimise 3 Growth inhibitors are substances that hinder the growth by interfering with metabolism and uptake of nutrients. In cell culture systems growth inhibition can be caused by a build-up of metabolites such as lactate, pyruvate, succinate, propionate, isobutyrate, and acetate. 5 adverse pH changes, minimise toxic by-product formation and maintain homeostasis. Therefore a medium is comprised of a basal medium4 and other nutrient supplements like insulin, cholesterol and lipids. A basal medium contains amino acids, minerals, sugars, inorganic salts, vitamins, organic acid and buffers. The basic composition of basal medium allows for a wide variety of supplements to be added to enhance growth and productivity. The requirements vary among cell lines and these differences have led to the development of an extensive collection of medium formulations [14] [17]. Formulation analysis is a vital task in cell culture medium analysis and pre- formulation analysis highlights compositional faults prior to starting the culture. Various spectroscopic techniques (NIR, MIR and Raman) have been applied for monitoring nutrients during fermentation to ensure on-going process quality [22-25]. 1.3.2.1 Energy Sources Glucose is a primary fuel for heterotrophs5 and D-glucose is the natural form used by animal cells [26]. Energy derived from glucose is stored in the high-energy phosphate bonds in ATP or other nucleotide triphosphates. It is also stored in energy-rich hydrogen atoms associated with the co-enzymes NADP and NAD. Animal cells need a source of both carbohydrates and the amino acid glutamine to ensure the production of high energy metabolites (ATP and NADPH). Glucose is vigorously used by cells; it is, however, subject to glycolysis at high concentrations. It is therefore better to supplement the medium with glucose, thus avoiding the formation of the pyruvate by- product [20, 27, 28]. Glucose is metabolized by cells at a faster rate than other carbon sources (galactose and fructose). Glucose and galactose use the same transporter into the cell but glucose has a greater affinity and a higher uptake rate than galactose [29]. Fructose is another carbon source that can be used. Fructose and galactose both result in reduced formation of lactic acid, but also exhibit a slower cell growth rate [20]. For Vero and MDCK cell lines, fructose is used as the carbohydrate source as it helps maintain the 4 Minimal Essential Medium (MEM) and other basal media supply the basic needs for cellular metabolism. 5 A heterotrophic organism utilizes organic compounds to obtain carbon that is essential for growth and development. Examples of such organisms are animals, which are not capable of manufacturing food from inorganic sources but must consume organic substrates for nutrition. 6 lactate/pyruvate ratio and a stable pH in high density cultures [30]. Galactose has been used as a carbon source for CHO TF 7OR cells as a suitable substrate with an acceptable growth rate and minimizes the generation of toxic by-products [31]. 1.3.2.2 Amino Acids Cells cannot synthesise all the essential amino acids and vitamins that they require. Therefore, these nutrients have to be provided by the cell culture medium. There are thirteen amino acids that are considered to be crucial for cultured cells: arginine, cysteine, glutamine, histidine, isoleucline, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, tyrosine and valine [10, 12, 13]. L-Glutamine is not stable in solution and for this reason it is generally added as a separate component to the medium.6 The breakdown and metabolism of glutamine produces ammonia, which is toxic to the cells [32, 33]. 1.3.2.3 Vitamins and Minerals Cell culture medium contains different vitamins or precursors; most are water soluble but some are fat soluble. Examples include biotin, folic acid, niacinamide, pantothenic acid, pyridoxine, riboflavin, thiamine, vitamin B12, and ascorbic acid. All can be used to optimize cell growth and productivity,depending on the requirements of the culture. The B vitamins serve as functional group carriers for enzymes in various metabolic pathways for all cell types. Other vitamins regulate cell cycle and redox potential of specific cell lines [27, 28]. Cells require sodium, potassium, calcium, magnesium, chlorides and phosphates for proliferation. Balanced Salt Solutions (BSS) of bulk ions supply the required electrolytes needed for physiological roles. Ions are important for the following reasons: maintenance of osmotic pressure, controlling the membrane potential, and coordination of the transport channels in and out of the cells. Ions also participate in oxidation-reduction, and are used in energy production (Kreb’s cycle) [27, 28]. Phenol red is added to the BSS as a visual indicator of pH. BSS may or may not be buffered with bicarbonate, depending on the culture setup [34]. 6 Glutamine in solution undergoes cyclisation to form a toxin, pyroglutamate (5- oxoproline, 5- pyrrolidone-2-carboxylic acid). This reaction occurs at room temperature and is accelerated by heat. 7 1.3.2.4 Buffers Buffers are added to cell culture media to maintain and avoid adverse changes in pH. The most common buffering system used in mammalian cell cultures is the bicarbonate/carbon dioxide system as it mimics the buffering system of blood. The bicarbonate/carbon dioxide buffer has a pKa of 6.3 at 37 °C and requires the use of a closed culture system as a result of the gaseous nature of carbon dioxide [30, 35]. HEPES (N-2-hydroxyethylpiperazine-N’-2-ethanesulfonic acid) is a zwitterion buffer with a pKa of 7.3 at 37 °C that is sometimes supplemented into cell culture media for more effective buffering in the physiological pH range. HEPES is used in conjunction with sodium bicarbonate, as bicarbonate also provides some nutritional value. HEPES is added in a concentration range of approximately 10 mM to 25 mM to maintain pH stability [36-38]. 1.3.2.5 Serum in Cell Culture Media Serum is an undefined biological fluid which is often added to basal medium as a growth supplement. Serum contains a variety of factors7 that are required for cell proliferation and expression. It also protects cells against stress induced by shear forces within the bioreactor [39]. However it can also contain adventitious agents like bacterial endotoxins or immunogenic contaminants [40, 41]. Serum is an expensive element of cell culture as the cost for screening is increasing. Commercial sera are obtained from a number of different animals such as fetal, bovine, and equine [21] [33] [42]. The choice of serum is based on the purity, cost, availability and ease of storage. Vendors test sera for their ability to support the growth of cell lines and also for their purity, based on the bioburden as well as endotoxin and haemoglobin levels [42, 43]. 1.3.3 Media Advances 1.3.3.1 Serum Free Media Serum free media comes from the desire to minimise lot to lot variation, cost, foreign antigens, unplanned viral contaminants and interferences at the purification stage. The 7 Carrier protein; attachment regulators; defence molecules; growth factors; hormones; enzymes and their regulators. 8 development of serum free media has advanced existing culture methodologies to facilitate bioprocesses without serum proteins and endogenous serum substances such as hormones or natural antibodies [42]. A growing number of alternatives to animal sera exist for cell lines and primary cultures [21, 33, 40, 42]. Hydrolysates are enzymatic or acid digests of biological materials such as animal tissues (meat digest), milk products (casein), microorganisms (yeast) and vegetables (soy, wheat gluten, rice). Hydrolysates are relatively low-cost medium additives used to provide nutrients and growth factors to cell cultures in order to partly or fully replace serum. These hydrolysates are poorly defined complex mixtures of peptides, free amino acids, lipids, polysaccharides, phenolics, vitamins, nucleic acids, and minerals. These relatively low cost materials are an ideal addition for large scale production. Some hydrolysates show an anti-apoptotic activity which can extend the fermentation lifetime [44-46]. For a better defined production of therapeutic proteins, there is a move towards animal free hydrolysates originating from yeast and vegetable (soy, rice, wheat gluten, rapeseed and chickpeas) [47-49]. Yeast hydrolysate8 or yeastolate is a medium supplement that is cost-effective, non-animal derived, and has been shown to have a significant positive effect on cell growth. Yeastolate is a complex mixture known to contain free amino acids, peptides, vitamins, minerals, and carbohydrates, but it also contains a significant amount of unknown material [43, 50, 51]. 1.3.3.2 Chemically Defined Media The development of chemically defined supplements for media is a gradual process. Every cell type and fermentation process has its own specific requirements for growth. Jayme’s review provides a good overview of the development of chemically defined media [42]. The road to chemically defined cultivations started with Eagle’s basal medium. This is an isometric, pH balanced mixture of salts, amino acids, vitamins and other essential nutrients [52]. Eagle’s is a simple medium that is fortified with additional supplements like serum to support a wide range of mammalian cells. The compositional information for various widely available media used is listed in 8 Yeastolate is produced by culturing yeast to a certain volume. Once this volume is reached, the process is stopped with heat shock. The cells are digested to produce unrefined hydrolysates. The hydrolysate is then filtered, concentrated, ultra-filtered, and spray dried. 9 Table 42 to Table 44. The development of serum free and chemically defined media occurred through a series of multiple steps starting with serum based medium. Eagle’s medium was altered by increasing the amino acid content to form Eagle’s Minimal Essential Medium (MEM). This modified version still required serum for cell growth [33]. MEM was further adjusted by Dulbecco to form DMEM, and this contained a fourfold concentration increase of nutrients [53]. Another media series, Ham’s nutrient mixtures F12 and F10 were shown to support growth and maintenance of different cell types [54]. The merger of DMEM and F12 by Sato [55] gave a fortified basal medium. This amalgamated medium still needed to be supplemented with serum to support bioreactor production. Testing showed that under serum-free conditions, transferrin, insulin, ethanolamine, linoleic acid, ascorbic acid, hydrocortisone and certain trace element compounds stimulated hybridoma growth. The supplementation of DMEM/F12 with insulin, transferrin, selenium and ethanolamine provided the additional nutrients required to facilitate serum free cultivations [33, 56]. RPMI media series use an increased level of nutrients while maintaining a constant salt content. The combination of RPMI with supplemented DMEM/F12 in a 1:1 ratio produced a formulation (RDF) with a superior performance than DMEM/F12 alone [57-59]. Further enhancement of basal RDF medium with a three-fold increase in the level of amino acids and glucose gave the enriched RDF medium (eRDF) [57- 59]. eRDF is a chemically defined basal medium used in the culturing of therapeutic proteins and each formulation is proprietary for individual manufacturers. eRDF comprises over 50 compounds including inorganic salts, amino acids, vitamins, HEPES buffer, glucose, and various others. 10 1.4 Analysis of Cell Culture Media In largescale manufacturing, most operational parameters are set. Medium formulations change with manufacturers, cell line and product type. Extensive use testing9 is required to select the high performing lots that support cell culture performance in large scale production [60]. The formulation of medium involves selecting and blending various components, resulting in a complex medium matrix. For effective and reproducible culturing, the correct medium formulation and blending is essential. Changes to the medium can affect growth rate, product yield, and quality [17]. Therefore, there is an ongoing need for new or improved analytical methods to ensure reproducible medium formulation. Comprehensive and detailed analysis of cell culture medium composition and variance can help control and understand the very complex cell culture based manufacturing process. In-depth analytical methods for cell culture medium can be time consuming, multistep, challenging and expensive. Medium samples are centrifuged or filtered to remove particulates, diluted and derivatized (if necessary) before testing. Detailed method development is required to address all analytes present, and also to determine analytes of interest, analyte concentration and overall analysis time. The analysis method is dependent on the medium and should be determined individually for each medium to ensure accurate measurement [61, 62]. Exact quantification of nutrients in cell culture medium is desirable, if not necessary, in order to meet cell line requirements. It is advantageous to characterise the medium ingredients (including inorganic ions, carbohydrates, alcohols, and aliphatic carboxylic and amino acids) because the presence or absence of specific components may impact the yield of the desired products. When it comes to characterisation, most of these nutrients and metabolites are ionic or polar in nature, and do not have the chromophores necessary for analysis by absorption measurements. Ion chromatography (IC) with electrochemical detection (ED) is a suitable technique for the determination of these components. Carbohydrate 9 When testing a new basal medium, a scaled down production process is used. In the test, the new material and the reference material are cultured side by side. The results compare cell growth, product yield, nutrient usage and by-product formation for the reference versus new material. 11 analysis uses liquid chromatography with refractive index detection [63]. The common method for amino acid detection involves liquid chromatography (reverse phase or cation exchange) resulting in the detection of derivatives [62-64]. Pre- or post-column derivatization based methods are limited to a specific range of amino acids. High operating costs and the inability to detect multiple components, such as certain amino acids and carbohydrates, renders LC methods unattractive [62, 65]. In order to advance the chromatographic measurements beyond derivative detection and toward multiple component analysis, anion chromatography with integrated pulsed amperometric detection was developed [65]. Hanko et al. [61, 62] have developed a method using anion exchange chromatography – integrated pulsed amperometric detection (AE–IPAD) technology that allows for simultaneous detection of amino acids and carbohydrates in four media formulations (YPD broth, LB broth, MEM and serum free-protein free hybridoma medium). Identifying lot to lot variability in raw material is another aspect of media analysis. Variability can arise from several sources: the producer, the raw material used, and also the aging of the material. Variability in chemical composition and culture performance may depend on extraction material (i.e. whether extraction is from yeast, malt or protein digests). Medium supplements like hydrolysates are heterogeneous in terms of molecular size and chemical diversity. The exact concentration varies per manufacturer and it is impossible to identify and quantify every individual component. Yeastolate is inexpensive and lot to lot variation is known to occur. When six different lots were tested, the free amino acid content varied from 45% to 78% resulting in different biomass levels and growth rates [66]. Various methods are used to analyse the different components in bioprocess studies. Commonly glucose, lactate, glutamine, and glutamate are measured using enzyme- based biosensors. The biosensors are amperometric electrodes that have immobilized enzymes in their membranes and work by converting the glucose or other substrates to hydrogen peroxide, which is oxidized to produce an amperometric signal proportional to substrate concentration [67]. For example, Sun et al. used six different tests to quantify nutrients, metabolite, product and by-product formation, because a single method was not able to quantify all of the different components [68]. These 12 included BioProfile analyser for glucose, lactose and ammonium; two different enzyme based assays for galactose and ATP; three different variations of HPLC for quantifying amino acids, vitamins and antibody formation. Most analytical approaches used for cell culture medium analysis are multistep and time consuming in nature. As a result, there is a need for a rapid, sensitive and inexpensive technique that is capable of monitoring multiple components in a single measurement. Spectroscopic methods seem to be ideal candidates and in this work, three spectroscopic methods (Raman, SERS and Fluorescence) were used to study cell culture medium for formulation analysis. 1.5 Process Analytical Technology (PAT) Principles The Food and Drug Administrations (FDA) guidelines recommend that Process Analytical Technology (PAT) should be adopted as a regulatory framework that will encourage growth and innovation in pharmaceutical development, manufacturing, and quality assurance [69]. FDA considers PAT to be “a system for designing, analysing, and controlling manufacturing through timely measurements (i.e. in process) of critical quality and performance attributes of raw and in-process materials and processes”, with the objective to guarantee the final product quality. Using PAT ideology, the intention is to design and develop well understood and efficient processes that will consistently ensure a predefined quality at the end of manufacturing. The main challenge for process control is the dynamic conditions that are difficult to predict or simulate. This makes the robustness together with high precision measurements vital to the overall process control. No specific technologies are mentioned in the FDA guidance document, thus allowing for various methods to be implemented. Spectroscopic methods offer fast, non-destructive analysis with minimal sample preparation. Furthermore, they can be adapted for process monitoring and have in-situ capabilities. For example, Raman spectroscopy has been shown to be effective as a PAT tool for in-line and real time monitoring of processes such as freeze drying, blending, active pharmaceutical ingredient (API) monitoring and endpoint analysis [70-72]. Chemometric methods for multivariate data analysis were applied to extract the relevant information from complex datasets, resulting in more robust models with lower prediction error than manual data evaluation [73]. 13 1.6 Spectroscopic Methods Suitable for Cell Culture Media Analysis Spectroscopic methods (such as NIR, Raman and Fluorescence) offer many advantages: they are fast, easy to use, suitable for automation, require little sample preparation and have lower setup costs. The development of spectroscopic methods for media analysis is based on the non-destructive, rapid, reliable, and robust nature of the measurements[74-81]. In the Nanoscale Biophontics Laboratory (NBL), the primary focus is developing rapid spectroscopic methods (Raman, surface enhanced Raman spectroscopy (SERS) and fluorescence) for the analysis of cell culture media and its components, and targeting their implementation in the biopharmaceutical industry [3, 6, 82]. The principles and applications of three spectroscopic methods of interest will be discussed in detail in the following sections. 1.6.1 Raman Spectroscopy Raman spectroscopy is an optical method that makes use of inelastically scattered light to measure molecular vibrations [83]. With Raman, when a sample is irradiated using a monochromatic light source (e.g. from a laser), some light is scattered. When the scattered light is studied spectroscopically, the majority of this light has the same frequency as the incident light while a very small fraction is observed at different frequencies. The scattered light with the same frequency as the incident light is known as Rayleigh or elastic scattering, while the scattered light at a different frequency is Raman (inelastic) scattering. Raman shifts10 are independent of the exciting frequency and are characteristic of the species giving rise to the scattering. There are two types of Raman transitions: photons may lose some of their energy (Stokes radiation) or photons may gain some energy (anti-Stokes radiation) [83-85]. The intensity ratio of the Rayleigh line is about 10–3 with respect to the incident excitation while the Raman lines are at most 10–6. Rayleigh scattering can be 104 to 106 times stronger than Raman scattering [86, 87]. 10 Difference between the incident and scattered beam frequencies 14 When a molecule enters an electric field of strength 𝐸, a dipole moment P is induced in the molecule. The magnitude of the induced dipole moment is 𝑃 = 𝛼𝐸, where α is the polarizability of the molecule. If the molecule encounters electromagnetic radiation of frequency ʋo, a varying electric field E is induced. This in turn induces a varying electric dipole moment, which causes an emission of light identical in frequency to the incident radiation. This is elastic or Rayleigh scattering. If there is a change in the polarizability of a bond during a rotation or vibration through interaction with electromagnetic radiation then the vibrational mode is Raman active and the emitted light is altered from the incident radiation [87-89]. Figure 3: (Left) A schematic illustrating the scattering of incident light as it interacts with a molecule, giving off Rayleigh and Raman scatter and (Right) energy level diagram depicting scattering processes of Rayleigh and Raman. E0 and E1 are the ground and first excited electronic energy levels, respectively. Reproduced with permission from [90, 91]. The quantum theory for the scattering process treats the monochromatic light frequency 𝑣0 as a stream of photons having energy ℎ𝑣0, where h is Planck’s constant. With Rayleigh scattering, the incident photons interact with a molecule and are scattered without a change in frequency (elastic scattering). However in the Raman effect, the photon interacts with the vibrational energy levels of the molecule and the scattered radiation has a different frequency 𝑉𝑣 through either loss or gain of energy from the incident light (inelastic scattering). A molecule undergoing a vibrational transition from the ground vibrational energy level (v = 0) to the first excited vibrational energy level (v’ = 1) will have a corresponding frequency of 𝑣𝑣 and the 15 scattered photon will be diminished in energy by the amount ℎ𝑣𝑣. The energy of the scattered photon will be ℎ(𝑣0 − 𝑣𝑣). This is known as Stokes scattering. In contrast, if the molecule is already in an excited vibrational state when the photon interacts with it, the transition 𝑣′ → 𝑣′′ may be induced and the photon will be scattered with an enhanced energy that produces anti-Stokes Raman lines. At room temperature in accordance with Boltzmann distribution, the population of molecules in the ground vibrational states is always much greater than those in the excited vibrational states. As a result, the intensities of anti-Stokes lines will always be much weaker than those of the Stokes lines. [91, 92] For a molecule which possesses a centre of symmetry such as CO2, there is a useful rule - the mutual exclusive rule. This states that for molecules with a centre of symmetry, fundamental transitions which are active in the infrared (IR) absorption spectroscopy are forbidden in Raman and vice versa. Together Raman and IR absorption spectroscopy provide a complete picture of the different vibrational frequencies in a molecule. Groups which lack strong features in Raman, may exhibit intense bands in IR and vice versa. In molecules with symmetric elements other than a centre of symmetry, certain bonds may be Raman active, IR active, both or neither. All normal modes allowed in both IR and Raman for complex molecules with no symmetry. The methods are complementary to one another. [87-89] 1.6.1.1 Bioprocess Monitoring using Raman spectroscopy Raman spectroscopy provides a non-destructive method for gathering both macroscopic and microscopic information about biological molecules in cells, tissues, media and plants [93]. The Raman fingerprint region is very sensitive to changes in chemical composition, bonding and conformation. Raman offers several advantages over other spectroscopic methods: detailed chemical/structural information content, a relatively weak water signal (sample dependent) and minimal sample preparation [94]. However, in many biological samples, a strong fluorescence signal can obscure the Raman scattered light completely. The best excitation wavelength region for biological studies is between 780-1064 nm. However, highly sensitive CCD cameras (that deliver spectra with a good signal to noise ratio in a short time period) only work 16 well within a 780-850 nm excitation range. This wavelength range also reduces fluorescence to an acceptable level [93]. As water is the principle medium in cell culture media, its impact is important. Comparative studies of solid amino acids and their aqueous solutions show the difference in the spectra of solid amino acids and their aqueous solutions is significant, (Figure 4). The spectra of solid amino acids are complex and detailed compared to their aqueous solutions. The aqueous solutions are low concentration samples, and as the signal intensity is proportional to concentration, this leads to a weaker signal and loss of spectral detail [95, 96]. Figure 4 Raman spectra of solid and aqueous solutions of Phe (0.3 g/L), Trp (0.11 g/L) and Tyr (0.004 g/L). Reproduced with permission from [95]. Even though water should have a weak Raman signal, it can be a significant issue when looking at very low concentrations in aqueous solution. The strong impact of the water signal on the amino acid signal is significant since the samples prepared as part of this study are low concentration aqueous solutions. The Raman method may 17 not be able to detect single analytes within an aqueous solution but we are going to test its ability in testing the gross signal changes for a whole component (D-glucose, eRDF and yeastolate). Raman analysis can add supplementary information to current proteomic diagnostic methods (chromatographic and mass spectrometry analysis). Raman spectra can give insight into the changes in protein and amino acid interactions as information on the microenvironment of aromatic amino acids can be reflected by intensity variations. If binding or exposure to environmental changes occurs in the presence of aromatic amino acids, thespectrum will highlight the change. For example, the Raman spectra of insulin variants demonstrate that the method is capable of providing chemical information to distinguish proteins of similar structure in biomedical testing (see Figure 5). Using high quality Raman spectra of low concentration protein solutions along with multivariate analysis techniques, small spectral differences associated with insulin variants were identified, and subtle differences among individual proteins, peptides and mixtures were identified [94]. Figure 5 Average Raman spectra of (a) human, (b) bovine, and (c) porcine insulin on the left and difference spectra between human and porcine insulin on the right. Reproduced with permission from [94]. There are several applications of Raman spectroscopy in bioprocess monitoring. The value of Raman spectroscopy to bioprocess monitoring is that it is rapid, non- invasive, adaptable to on-line measurements, and is easy to operate and maintain. In addition, bench top instruments are available [3]. Raman analysis has been utilised in a variety of applications e.g. from carotenoid production, where a single compound was monitored [97] in observing the biotransformation of glucose into ethanol in yeast fermentation [98]. It has also been used to simultaneously measure the changing 18 concentrations of glucose (30–80 g/L) and lactic acid production during lactic acid fermentation by L. casei with error values of 2.5 g/L for glucose and 0.74 g/L for lactic acid [24]. For more complex industrial bioprocesses such as the production of gibberellic acid (GA3), the Raman results showed that it is possible to quantify the GA3 product from the spectral data of unprocessed samples [99]. The flexibility of the Raman instrumentation increased with the use of a fiber optic probe as the delivery and collection system. In-situ Raman spectra were measured for an E. coli fermentation of phenylalanine production for simultaneous estimation of glucose, acetate, formate, lactate, and phenylalanine [76]. The substrates were modelled based on the Raman spectra and the HPLC reference method data. The error levels for Raman models for a production run were glucose (4.16%), acetate (4.67%), formate (5.5%), lactate (5.39%), and phenylalanine (not detected). The Raman estimates for glucose consistently underestimated the reference method, the estimates for acetate, formate and lactate showed qualitative agreement with error, while phenylalanine was not detected by the Raman model. The results showed potential despite the errors introduced by the physical environment of the bioreactor [76]. The implementation of Raman analysis to fermentations also has potential for tracking culture parameters. In-line Raman monitoring of a mammalian cell culture bioreactor was applied for prediction of various media components (glutamine, glutamate, glucose, lactate, and ammonium) and compared to the standard reference measurements using a BioProfile 400 Analyzer [100]. Table 1 and Figure 6 show the model predictions and accuracy based on the Raman spectra throughout the culture. The predictions follow the overall expected trend. The Raman models accurately predict decreases in nutrient levels (glutamine, glutamate, glucose) and increases in metabolite levels (lactate and ammonium). The error level values for glutamine are close as low accuracy is seen with both the Raman and reference method because of the low concentration of glutamine. The error levels for glutamate and lactate compare better as the models accurately predict the behaviour of these analytes. The models for glucose and ammonium did follow the process; however, their error levels are poor, reducing their model accuracy. The performances demonstrate that the Raman method is comparable to the reference method and therefore Raman 19 spectroscopy provides an attractive approach for monitoring mammalian cell culture processes. Figure 6 Comparison of measured nutrient and metabolite concentrations (solid diamonds) and the predictions (solid lines) from the modelling of Raman data for (a) glutamine, (b) glutamate, (c) glucose, (d) lactate, and (e) ammonium. Dashed lines indicate the standard deviation measured for the reference method. Reproduced with permission from [100]. 20 Table 1 Results for predictions of nutrient and metabolite concentrations using in line Raman meaurements and standard reference measurements from the BioProfile 400 Analyzer. Reproduced with permission from [100]. Media Component Calibration Range Raman % error Reference % error Glutamine (mM) 0.66–4.26 30.3 22.0 Glutamate (mM) 2.21–5.72 12.0 17.0 Glucose (g/L) 2.07–6.22 15.3 4.0 Lactate (g/L) 0.23–5.21 12.9 10.0 Ammonium (mM) 2.01–8.51 11.4 4.0 Previous work on the cell culture media analysis by Li et al. showed rapid identification, characterisation and quality assessment of media components used in industrial cell culturing [1-8, 101]. Raman was used to identify the different media types and as a sample quality testing method. Chemometric analysis (PCA and SIMCA11) were used for sample evaluation. Five different chemically defined (CD) commercial media components (Figure 7) were investigated. Each of these components was used in a Chinese Hamster Ovary (CHO) based manufacturing of recombinant proteins. Raman data provided significant differences within spectra to identify the different media types, and also outlier analysis allowed for identification of suspect samples. The “normal” samples were selected for the routine identification and quality evaluation of the different media components. Five distinct classes were obtained through SIMCA classification (Figure 7b) where each medium type was grouped according to their spectral differences [3, 102]. This study clearly showed that the identification and classification of incoming materials was possible using the Raman method. 11 SIMCA (Soft independent modelling of class analogy) is a classification method that outperforms PCA which is based on total variance. In order to build a reliable model, it uses a series of PCA models where samples are identified as class members by describing relevant spectral variance. The classification is based on significance tests where the distance from the model center (leverage) and the distance to the model space (residuals) are examined. 21 Figure 7 Spectra of five different chemically defined (CD) commercial media and the SIMCA classification of the 336 sample measurements using the pre-processed Raman spectra of CD–A1, CD–A2, CD–S1, CD–S2, and eRDF samples. Reproduced with permission from [3]. When Raman was applied to the analysis of bioprocess samples (sample components may include cells, fresh media, spent media and product proteins), these samples resulted in a strong water signal and weak signals for the media components. Li et al. investigated the correlation between the Raman spectra with the glycoprotein yield from 9 different time points. The generated models used a full region (400–1053 cm– 1), and two variable selection methods (CoAdReS and ACO). The full range model gave a poor performance while the variable selection greatly improved the prediction ability. The CoAdReS and ACO models were equally matched but the run time for the ACO method was very time consuming. This was not acceptable when the goal was the development of a rapid analysis method. However, using Raman spectra with CoAdReS variable selection generated an accurate prediction of the glycoprotein yield in a timelier manner. This opened up the possibility of developing the Raman method for bioprocess evaluation of product yield during the process in order to ensure consistent yieldsand prevent losses [6]. 22 1.6.2 Surface Enhanced Raman Spectroscopy (SERS) An enhanced Raman spectrum was observed for pyridine adsorbed on electrochemically roughened silver like that in Figure 8. The initial conclusion by Fleischmann et al. was that a roughened electrode surface area caused a local increase in pyridine concentration leading to a stronger signal [103]. This was disproved by D. L. Jeanmaire and R. P. Van Duyne who showed that the signal increase was caused by a dramatic increase (an estimated 105 fold enhancement) in the Raman scattering cross section [92, 104, 105]. Figure 8 (a) SERS of the pyridine at silver films and (b) Raman spectrum of the aqueous solution of 0.01 M pyridine in 0.1 M KCl. Reproduced with permission from [106]. SERS is a form of Raman spectroscopy involving the interaction of molecules with nanostructured colloids or nanostructured metal surfaces – generally silver or gold. The adsorption of molecules at or near nano-roughened metal surfaces can enhance the Raman scattering efficiency by a factor 103 to 106 compared to normal Raman scattering. The signal enhancement of SERS combines the structural information of vibrational spectroscopy with extreme sensitivity. The extreme sensitivity allows for detection of a species at very low concentrations even down to single molecule levels. The SERS effect also quenches the fluorescence background signal from adsorbed species (Figure 9) [107-111]. 23 Figure 9 (A) Raman spectrum of 0.05 g/L Acebutolol solution displaying fluorescence interference and (B) surface enhanced Raman spectrum of 0.05 g/L Acebutolol showing reduced background, adapted from and reproduced with permission from [112]. 1.6.2.1 Mechanisms for SERS SERS enhancement is the result of two different processes: electromagnetic and chemical enhancement. Electromagnetic enhancement is the more dominant phenomenon and occurs when nanostructured metal particles or roughened metal surfaces are exposed to light of an appropriate wavelength. Chemical enhancement is the result of an interaction between the adsorbate molecule and the metal, usually involving electronic effects such as charge transfer. A SERS enhancement factor of 106 can generally be broken down into an electromagnetic enhancement factor of 104 and a chemical enhancement factor of 102 [87]. Electromagnetic field enhancement is a long range (0-~30 nm) effect whereas chemical enhancement is very short range, confined to molecules that have direct contact or monolayer coverage with the metal surface [92, 105, 113]. In chemical adsorption, changes in the adsorbate are not evident unless the adsorbate is conjugated with the chemical bonds directly. Chemical adsorption is common in functional groups such as S–H and N–H [114]. SERS signal strength is also dependent on the orientation of the adsorbed species. In the case of pefloxacin (Figure 10), at high concentrations a more perpendicular orientation of the adsorbed species occurs, whereas a flatter alignment is observed with a lower concentration. At 10–6 M, the intensity of the 229 cm–1 band increases, corresponding to a carboxylate group interacting with the Ag surface in accordance with the adsorbed molecules lying flat on the surface. At 10–4 M, the intensity of the 210 cm–1 and 1656 cm–1 bands increases, reflecting a more tilted orientation that 24 results from local steric hindrance, with increased surface coverage and/or repulsive electrostatic forces between the adsorbed species [115]. Figure 10 (a) SERS spectra of pefloxacin at three concentrations (10–4, 10–5, and 10–6 M) and a representation of possible orientations for pefloxacin adsorbed on silver colloid; (b) for pefloxacin concentration of 10–6 M; (c) 10–5 M; (d) 10–4 M, reproduced with permission from [115]. 1.6.2.1.1 Electromagnetic Enhancement The electromagnetic enhancement mechanism relates to the amplitude of the electric field for light and is the result of nanostructured roughened metal surface structure and interaction of the adsorbed molecules with surface plasmons [107]. The intensity of an electromagnetic field is dependent on the number of excited electrons and on the volume of nanostructures. When a beam of incident light interacts with a nano- roughened metal surface, free electron-like behaviour is exhibited. When photons interact with these electrons they begin to oscillate as a collective group across the surface; these oscillations are termed surface plasmons. Surface plasmons have a resonance frequency at which they absorb and scatter light most efficiently. The frequency depends on the metal and surface morphology12 [92, 116]. The 12 For scattering there needs to be an oscillation perpendicular to the surface, which requires a roughened surface. 25 electromagnetic field of the surface plasmon is stronger than the incident light field and therefore increases the intensity of the Raman scattered light [92]. The excitation of the surface plasmon greatly increases the local field experienced by the molecule (Figure 11). The enhanced field depends on the optical conductivity of the metal, while optical conductivity depends on the wavelength used and the size and shape of the particle [87, 92, 103, 105]. Figure 11 Illustration of the excitation of the localized surface plasmon resonance of a spherical nanoparticle by incident electromagnetic field. Reproduced with permission from [117]. The plasmon resonance condition of individual particles is limited to a small distance range. Signal enhancement is greater at the point between touching particles or in clusters of particles compared to isolated particles. When metal nanoparticles are in contact (Figure 12), the contact points show very active electric fields which lead to an enhanced Raman signal. The particle size, shape and the arrangement into clusters all contribute to the SERS enhancement [92, 118]. Figure 12 Schematic illustration of the electromagnetic field generated between adjacent nanospheres upon incident irradiation. The opposing nanosphere sides have opposite polarization charges, leading to the highly dipolar environment. Reproduced with permission from [119]. 26 1.6.2.1.2 Chemical Enhancement Adsorption of the analyte onto the nanostructured metal surface leads to changes in the molecular orbitals and electron distributions across both the analyte and the metal surface. This increases the polarizability of the analyte molecule [87, 92, 120]. It is believed that chemical enhancement is related to the new electron state belonging to the bond formed between the analyte and the metal surface; the new electron states are resonant intermediates [92]. When an incident photon excites an electron from the metal surface into an adsorbed molecule, it creates a negatively charged excited molecule. The molecular geometry of this excited molecule differs from that of the neutral species. This allows for charge transfer to occur from the metal to the analyte. The signal enhancement will take place when the excited electron of the charge transfer becomes resonant with the incident light [105]. The incident photon is adsorbed onto the metal nanoparticles, and the associated charge transfer induces a nuclear relaxation within the excited molecule. This results in the return of the electron to the metal surface, the creation of a neutral molecule, and the emission of a Raman scattered photon [92, 107]. 1.6.2.2 Substrates SERS spectra are obtained after molecules interact or are adsorbed onto certain nanostructured metal surfaces. Many different types of surfaces can be used for SERS; examples include aggregated colloid suspension [121-123], roughened electrodes[121, 124, 125], metal films (such as silver island films) [126-128], and silver coated beads [129-131]. Silver, gold and copper are the most commonly used substrate materials, with silver being the most widely used. The choice of SERS substrates is based on the wavelength of the surface plasmon band (e.g. λmax in the UV or visible spectrum). This is a function of the material used and its size. Both silver and gold surface plasmons oscillate at frequencies in the visible region making them suitable for use with the visible and NIR excitation wavelengths typically used in Raman spectroscopy. Silver has a broad excitation range from the UV to IR while gold is limited to the red and IR ranges because of band transitions. Silver is less reliant on the excitation wavelength compared to other SERS active metals like gold and copper due to its favourable dielectric function [92, 132]. Silver colloids are used in applied techniques (i.e. silver island films) because silver is a more efficient optical 27 material giving a SERS signal of 10–100 fold higher than gold. Gold colloids are, however, used in studies of living organisms because of their chemical stability, better control of size and shape and higher biocompatibility [118, 133]. 1.6.2.2.1 Colloids A metal colloid is the suspension of metal nanoparticles in a solvent. Silver or gold colloidal suspensions may be formed by chemical reduction of metal salts. Silver colloids can be prepared by the sodium borohydride or citrate reduction methods [134, 135]. The size, shape and dielectric constant of colloids exhibit differences in plasmon resonance and wavelength dependence. The nanoparticle sizes range from 10 to ~200 nm and depend on the method of preparation [136, 137]. UV-visible absorbance measurements indicate the size of the particles present in the colloid. Larger particles produce broader peaks at longer wavelengths. For silver nanoparticles, an absorbance maximum ranging from 395–405 nm indicates a particle size of 10–14 nm in diameter, absorbance around 420 nm indicates a size ranging from 35–59 nm, and absorbance around 438 nm point to a size of 60–80 nm [138]. When the particles in a colloid are too small to exhibit large field enhancement, an aggregating agent can be added to produce clusters of particles. The most commonly used aggregating agent is sodium chloride (NaCl). Nanoparticles are kept in suspension by repulsive electrostatic forces between the particles. The addition of NaCl buffers the charges allowing the particles to clump together and form aggregates. The resulting silver particle clusters provide a much higher surface- enhancement signal [87, 138]. Metal colloids may either be added directly to the analyte solution or immobilised on a mounting substrate before exposure to the analyte. 1.6.2.3 Sensitivity of SERS In comparison to normal Raman spectroscopy, SERS can deliver enhancements of 106 or greater. The sensitivity of SERS signal makes it possible to observe a spectrum from a single molecule. Etchegoin et al. showed the possibility of chemically identifying Rhodamine 6G in solution down to a concentration of 10–18 M. At the single molecule level, detection suffers from fluctuations and is not easily 28 reproducible. Reproducible spectra can be obtained from samples containing 50–100 analyte molecules [139, 140]. SERS is sensitive to structural differences between equal mass isomers. In a study by Dressler et al., the use of SERS for characterising the different geometric orientations of three pyridine compounds – para, meta, and ortho-pyridine carboxylic acid – was investigated. The SERS spectra were compared to the Raman spectra of the crystalline form of each isomer (Figure 13). The results showed different profiles for each isomer, since each interaction with the silver colloid varied depending on the isomer present [141-143]. Figure 13 Bulk-Raman (black) and SERS (grey) spectra for (left) para-pyridinecarboxylic acid and (right) meta-pyridinecarboxylic acid. Reproduced with permission from [141]. The profiles of amino acids in solution were studied using SERS spectroscopy [144- 146]. Nineteen different amino acids and their adsorption route onto the metal surface were examined. Identification of amino acids was possible because of their spectral differences. The side chain groups influenced the spectra through their interaction with the metal surface. SERS studies showed that interactions between the silver surface and the amino acids occur through their deprotonated carboxylate group. The sulphur containing amino acids also showed a strong interaction between the surface and the sulphur containing functional groups [146, 147]. The size of the amino acids can also contribute to the signal strength. In some cases, the signal strength was weakened when a large molecule size caused limited interaction. The adsorption of the aromatic and sulphur containing amino acids resulted in stronger band intensity and was more favourable than the rest [123, 148]. The detection limits of the aromatic and sulphur containing amino acids were as low as 10–10 M , and the rest of amino acids were detectable at 10–9 M [144]. 29 HPLC is the standard method for quantification of melamine13. However, it is labour intensive and time consuming. A faster SERS protocol was developed for detection and quantification of melamine in foodstuffs. When the SERS method was compared to HPLC, the qualitative results showed that SERS was capable of detecting trace amounts of melamine. The limit of detection for melamine with the SERS Klarite14 substrate was estimated at 0.033 µg/L. The HPLC limit of detection for melamine standard solution was 1.0 µg/L. In a quantitative assessment of the methods; the HPLC outperformed the SERS method by an r2 value of 0.99 to 0.90. The SERS method showed high sensitivity (L.O.D. ~0.033 µg/L); however, the accuracy and the precision was less than the HPLC method. As a result, SERS could potentially be used as a preliminary screening method for the large sample sets, as it is faster, less labour intensive and simpler than HPLC. Verification of suspect samples could then be carried out using the more precise HPLC method [149-151]. 1.6.2.4 Concerns relating to SERS Analysis When conducting a SERS experiment, SERS signals are very sensitive to a wide range of factors, including substrate, environmental (pH, temperature and solvent), and compositional changes (analyte concentration and matrix interferents). Therefore care needs to be taken to generate reproducible SERS spectra for quantitative analysis [152, 153]. 1.6.2.4.1 pH Effects Under different pH conditions, the surface charges of the nanoparticles change, affecting the SERS performance [154]. The effect of pH variation on SERS behaviour can be utilised to study different molecular species. For example, the SERS analysis of thiamine at different pH discriminated the protonated from un-protonated species (Figure 14). At low pH, spectra featured the protonated related peaks at 1657 and 1550 cm–1. As the pH increased, the degree of protonation decreased, resulting in the emergence of un-protonated related peaks at 1590 an 1373 cm–1. At a pH of 9 and above, the thiamine was destroyed and the SERS spectrum was no longer recorded [155]. 13 Melamine is a nitrogen rich compound that is banned in foodstuffs. 14 The Klarite substrate consists of a silicon surface which has been patterned with a square array of micrometre-sized square-based pyramidal pits coated with a 300-nm layer of gold. 30 Figure 14 SERS spectra of thiamine in colloidal gold sol (10–5 mol/L) at different pH values. Reproduced with permission from [155]. 1.6.2.4.2 Colloid Sample RatioFor colloid based SERS substrates the ratio between colloid and sample concentration is critical. Spectral fluctuations can arise from the sample colloid ratio used as a result of: Competitive binding between different analytes and the number of hotspots. Differences in the local SERS enhancement. Charge-transfer between molecule and surface interactions. Movement of the absorbed molecule on the surface. The impact of increasing colloid concentration for a B. megaterium sample showed that as the concentration increased, the baseline offset decreased and signal intensity increased (Figure 15). Increased colloid concentration had a definite impact on the reproducibility and the spectral features. Significant improvements were seen with up to 4 times the initial colloid concentration. Above the fourfold concentration only marginal improvements were observed, as the colloid underwent aggregation causing changes in the proximity of the colloid to the sample. The percentage variance for the ten B. megaterium spectra was calculated. A drop from 54.7% to 16.3% was observed 31 as the concentration of colloid increased eightfold. The more reproducible SERS spectrum was achieved by increasing the colloid concentration. This was the result of the high concentration of nanoparticles and aggregates that remain close to the sample surface for a stable signal [154, 156]. Figure 15 Improvement in the reproducibility of SERS spectra as the concentration of the colloidal solution increases from (a) 2X colloid to (b) 4X colloid, and (c) 8X colloid [154]. 1.6.2.4.3 Colloid Variation Different colloid preparation methods are available and can produce different SERS spectra for the same analyte. Figure 16 shows the effect of different silver colloids for the same sample where one is produced by sodium citrate reduction and the other by sodium borohydride reduction. The different preparation methods produce nanoparticles of varying size, shape and surface charges15 which cause the variances seen in the spectra [154]. 15 The surface charges impact the SERS signal via the strength of interaction and the proximity of the nanoparticle to the analyte. 32 Figure 16 Comparison of SERS spectra of E. coli obtained using borohydride reduced and citrate-reduced silver nanoparticles. [154] Another limitation in SERS spectroscopy is the batch to batch variation of the SERS substrate. Therefore all possible sources of variance should be identified and controlled in the preparation of SERS substrates. There may be changes in the particle size, shape of roughened surfaces and the distribution of the particles into clusters after aggregation of a colloid [92, 157]. 1.6.2.5 SERS Analysis of Cell Culture Media In most SERS studies involving cell culture media, the identification of micro- organisms was the principal concern rather than the analysis of the fermentation medium [74]. Various studies discussed the SERS signal from growth media [158- 160]. These studies indicated that screening for discrimination, variability and consistency for media was possible as was monitoring for degradation. Given that these media generated a SERS response, it is feasible that SERS could be used as an analysis method for blend and formulation analysis of prepared media. The Marotta study proposed that the SERS signal seen for bacteria samples was the residual signal from growth media. The SERS spectra for bacteria cells (EC, BC and AH) and nutrient broth (NB) showed similarities (Figure 17). The spectra were collected from dilute bacteria culture samples, rather than the specialised preparation method of repeated cycles of washing and centrifugation needed for bacteria samples. Therefore the bacteria samples tested contained both bacteria cells and a diluted growth medium, which was shown to give a strong SERS response (Figure 17b). The stock medium spectrum was dominated by a strong fluorescence background signal while the diluted medium spectrum had well-defined peaks for components in cell culture medium (Figure 17b) [158]. The Marotta study was followed up by the 33 Premasiri study [159] in which the SERS signal for bacterial cells and growth media were shown to be different at a spectroscopic level. Comparison of the SERS spectra for bacteria and growth media (Figure 18) revealed a common peak observed in several spectra at 725–730 cm–1. The intensity of the peak varied with each particular sample. This peak was assigned to the adenine signal from the FAD component found in both bacterial cells and growth media [159]. Figure 17 (Left) SERS spectra of diluted Nutrient Broth (NB) and the three different bacterial cells - (EC) E. coli, (BC) B. cereus, and (AH) A. histidinolovorans - prepared in diluted nutrient broth. (Right) SERS spectrum of Nutrient Broth (a) concentrated stock, and (b) diluted 1:100 (v/v) [158]. Because of the subtle differences observed in the bacteria and media spectra, principal component analysis (PCA) was needed to confirm the difference between growth media and bacteria cells. The PCA grouped the data based on their variance and showed that the SERS signal for growth media was different to properly washed bacterial cells. In the scores plot (Figure 19a) for the different washes, nutrient broth, no wash culture and the spun culture, the signal from bacteria and media were separated. The spun culture and first wash groupings were standalone but the subsequent three washes overlapped. The nutrient broth was clustered with the no wash culture as they both contained large amounts of broth. In order to show that medium had no effect on the bacterial signal, two different growth media (TSB and LB broth) were used to grow the bacteria E. faecalis (Figure 19b). The PCA results showed the washed bacteria samples from both fermentations were clustered together and well separated from either growth medium. Also in Figure 19b, the effect of a growth medium (TSB) on three different cultures was examined. TSB medium had no effect as the PCA results showed the TSB medium cluster was well separated from the bacteria cells clusters [159]. 34 Figure 18 The SERS spectra of the bacterial species with their strain and growth media noted and on the right, the SERS spectra of the various growth media [159]. Figure 19 PCA analysis results for (A) the sample preparation and (B) the separation seen amongst the different bacterial cell and growth media [159]. The application of SERS to media analysis has potential considering the SERS activity of many media components. SERS can then be used as a screening method for cell culture media and its components. Changes in the sample composition cause spectral differences [5]. For example, SERS analysis of yeast extracts gave spectra with deviating band positions and intensities for different batches. This helped to characterise and discriminate the yeast extracts based on identity, origin and source. Since SERS observed changes in medium composition, it was also used in a medium degradation study [101]. SERS was able to detect storage induced changes in 35 chemically defined cell culture medium. PCA of the SERS data revealed a change within the media samples during dark storage conditions. The observed change in the SERS spectra was caused by cysteine oxidation. This was identified as the key event, resulting in the formation of cystine, which does not promote cell growth, unlike cysteine. This showed that SERS can be used to detect compositional changes such as cysteine oxidation that have an impact on sustaining optimal cell growth. 1.6.3 Fluorescence Spectroscopy 1.6.3.1 Mechanism of Fluorescence Emission Fluorescence is light emissionthat originates from electronic states of the same multiplicity - usually from the first excited singlet state to the ground state. The emission rates of fluorescence are typically 10–8 s. The lifetime of a fluorophore is the time it occupies the excited state - this can be very short, ranging from 0.5 to 100 ns [161]. Fluorescence can be summarised as a three-stage event (excitation, excited- state transitions and emission) and is best illustrated by a Jablonski diagram (Figure 20). The lowest horizontal lines (S0) represent the ground state energy of the molecule, which is typically a singlet state. Each electronic state has numerous vibrational energy levels, represented by the multiple lines. In the ground electronic state, molecules can occupy a variety of vibrational energy levels. At ambient temperature, most molecules are in the lowest vibrational state of the ground energy. Occupancy of the upper vibrational states depends on the temperature and the Boltzmann distribution [84, 162-164]. Absorption of light causes the evaluation of molecules from the ground state into electronically excited states. The strength of the absorbed energy determines which electronic level (S1 or S2) becomes populated. In the excited sate, collisions cause excited molecules to lose energy until they reach the lowest vibrational level of the excited electronic state (S2 and S1). An excited molecule exists only for a limited time (~10–8 s) as a result of these energy reducing processes. An excited molecule can return to its ground state (S0) through different steps. The preferred route is the one that minimises the lifetime of the excited state [162, 163]. 36 Figure 20 Jablonski Energy Diagram for photoluminscent systems. The lowest heavy horizontal line (S0) represents the ground state of the molecule. The upper lines represent excited electronic states. S1 and S2 are the first and second electronic singlet states. T1 is the first electronic triplet state. Each electronic state has numerous vibrational energy levels, [165] adapted from [84, 162]. Figure 20 shows the various routes taken back to the ground state. De-excitation directly back to the ground state (S0) can occur by fluorescence or internal conversion. Fluorescence returns the excited molecule to the ground state accompanied by the emission of light at a longer wavelength. Internal conversions are transitions between electronic states that allow the return of an excited state to S0 without light emission. Internal conversion is a non-radiative transition where there is no change in spin [84, 162, 163, 166]. Intersystem crossing is a non-radiative transition involving a change in spin, for example from the singlet excited state (S1) to the triplet excited state (T1). From the triplet excited state (T1), the de-excitation of the molecule occurs by internal conversion or phosphorescence. Phosphorescence is the transition of an excited molecule from a triplet excited state (T1) to the ground state (S0) with the emission of light. Intersystem crossing is stimulated in molecules with iodine, [167] bromine [168] or in the presence of molecular oxygen [84, 162, 163, 169, 170]. 37 1.6.3.2 Stokes Shift and Mirror Image The energy associated with fluorescence emission is typically less than that of absorption. The emitted photons have less energy and are shifted to longer wavelengths. The Stokes shift is a measure of the difference between the maximum wavelength of absorbance and emission [171]. This Stokes shift arises from the loss of energy from the excited species through various processes such as excited-state reactions, energy transfer, solvent effects, and complex formations. The size of the shift varies with environment, but can range from a few nanometres to over several hundred nanometres. Figure 21 Graphical representation of the absorption and emission transitions and normalized absorption spectra (in dimethylformamide) and fluorescence spectra of quinine sulfate dication in (a) cyclohexane, (b) diethylether and (c) dimethylformamide. Reproduced with permission from [172]. The peaks in the absorption spectrum correspond to transitions from the lowest ground state energy levels to different vibrational levels of the electronic excited state. Meanwhile, the peaks observed in the fluorescence spectrum arise from transitions from the lowest vibrational level of the excited electronic state to the different vibrational levels of the ground state. Following absorption (see Figure 21), an excited fluorophore quickly undergoes relaxation (yellow arrows) to the lowest vibrational energy level of the excited state (S1). All subsequent relaxation processes – fluorescence, radiationless relaxation, and intersystem crossing – will therefore originate from the lowest vibrational level of the excited state (S1). Thus, the excitation wavelength should not influence the emission spectrum. Under ideal conditions for a single fluorophore, the mirror image effect between the emission and 38 absorption spectra can be observed. In terms of Figure 21, the resulting emission spectrum at λem = 450 nm strongly resembles the absorption spectrum at Aλ = 350 nm from the ground state (S0) to the first excited transition state (S1), but not of the entire absorption spectrum, which may include transitions to higher energy levels (S2) at Aλ = 320 nm [162, 163]. In complex cell culture media, there are usually multiple fluorophores and thus the observed emission is a combination of the emission from multiple fluorophores. This means that the mirror image rule does not hold and that the emission is very sensitive to the excitation wavelength. Thus in multi-fluorophoric mixtures like cell culture media, multi-dimensional methods are commonly used to collect the maximum information [1, 2, 4, 7, 8, 160]. 1.6.3.3 Multi-Dimensional Fluorescence Scan Modes An Excitation Emission Matrix (EEM) provides a total intensity profile of the sample over a range of excitation and emission wavelengths, revealing all the fluorescent constituents over a given range [173]. Total Synchronous Fluorescence Scan (TSFS) is another multi-dimensional scan mode. TSFS involves the emission and excitation monochromators being set to scan simultaneously in such a way that a constant delta wavelength interval is kept between emission and excitation wavelengths. A TSFS spectrum plots the fluorescence intensity as a combined function of excitation wavelength and delta wavelength intervals [174, 175]. In TSFS, the data is collected along the diagonal (λem = λex + ∆λ), whereas for the EEM it is collected in lines (λex = constant). When comparing TSFS with the EEM spectra, TSFS avoids Rayleigh scattering that diagonally bisects the EEM profile and has a shorter acquisition time. With EEM data, the Rayleigh scatter has to be removed computationally prior to analysis as scatter peaks can interfere with data analysis if not effectively removed [176]. EEMs, however, contain more information than TSFS or conventional emission and excitation spectra [177-179]. Multi-component mixtures are better analysed using multi-dimensional data like EEM [180, 181]. EEM and 2D synchronous (SFS) methods were compared for analysing beer. The EEM data exhibited three bands: one at λex/λem of 250/350 nm, a second at 39 350/420 nm, and a third at 450/520 nm. These peaks were assigned to the aromatic amino acids tryptophan, tyrosine and phenylalanine, as well as the vitamin riboflavin. The 2D synchronous spectra were collected at ∆λ 30 nm and ∆λ 60 nm. Several bands were observed in the synchronous fluorescence spectra taken at ∆λ = 30 nm. The sharp and intense short-wavelength emission was attributed to amino acids, while the longest-wavelength emission band belongedto riboflavin. The fluorescence was measured directly from the beer and the data was used to quantify the amino acids and riboflavin content [178]. Both EEM and SFS gave the same quantitative results for riboflavin and tryptophan corresponding to RMSECV of 14% and 4% respectively. EEM outperformed SFS for tyrosine and phenylalanine, with RMSECV of 4% and 16% compared to 6% and 31% respectively. The better performance was the result of the full sample profile by EEM while the SFS was only a cross section of the data. Figure 22 Three-dimensional plot of EEM for a beer sample studied along with the synchronous fluorescence scan of beers 1 and 2 at ∆𝛌 = 30 nm. Reproduced with permission from [178]. 1.6.3.4 Rayleigh Scatter Elimination in EEM data. The Rayleigh scatter bears no relevant chemical information relating to the EEM sample data. It is a by-product of light passing through the sample and interacting with particles. Rayleigh scatter is increased when samples are opaque or not completely dissolved. Rayleigh scatter reflects the clarity of the sample but not the composition. Three types of scatter can be encountered [176, 182]: Tyndall - from large suspended particles. This can be overcome by filtering the sample prior to analysis. 40 Rayleigh - from all molecules. The first order scatter is the prominent scatterer, while a second order scatter can be seen at a wavelength double that of the exciting light and is generally weak. Raman - from all molecules. This is caused by a light shift to a longer wavelength. For the EEM data the scatter follows a series of multiple sharp peaks along the diagonal line in the matrix (Figure 23a). Light scatter artefact peaks can cause problems with chemometric analysis of EEM data by interfering with qualitative and quantitative analysis and swamping the signal. In order to avoid these problems, the scatter peaks are eliminated as part of the pre-processing (Figure 23b) [183]. Figure 23 (a) EEM landscapes of M5eRDF and (b) the scatter corrected spectrum for the same M5eRDF sample from this body of work. 1.6.3.5 Factors which influence fluorescence emission 1.6.3.5.1 Quenching Quenching (collisional or static) is the process by which fluorescence intensity decreases. Collisional quenching occurs when the excited-state fluorophores are de- excited by contact with some other molecule. This molecule is called a quencher and its presence results in the return of the fluorophore to the ground state. The molecules are not chemically altered in the process. A wide variety of molecules can act as collisional quenchers. Examples include oxygen, halogens, amines and electron deficient molecules like acrylamide [162]. Flavins like lumiflavin, riboflavin and 300 400 500 600 300 400 500 0 50 100 150 200 250 300 Emission wavelenth (nm)Excitation wavelenth (nm) In te n s it y (a) 300 400 500 600 300 400 500 0 50 100 150 200 250 Emission wavelenth (nm)Excitation wavelenth (nm) In te n s it y (b) 41 FMN undergo collisional quenching from iodide ions as they exert their quenching effect by spin orbit perturbation, giving rise to increased intersystem crossing [184]. Fluorophores can sometimes form non-fluorescent complexes with quenchers. This process is referred to as static quenching since it occurs in the ground state and does not rely on diffusion or molecular collisions [162]. Static quenching also occurs with riboflavin following interaction with methionine or cysteine. Riboflavin and methionine static quenching are assumed to be the result of a non-fluorescent pair formation of a riboflavin anion and a protonated methionine cation. For cysteine, no static quenching is observed at low pH (pH = 4). At neutral pH (pH = 7), however, the deprotonated thiolate form of cysteine interacts with neutral riboflavin to form a riboflavin anion by the reduction of thiolate to the thiol form. This causes both static and dynamic quenching [185]. 1.6.3.5.2 Inner-Filter Effects Another cause of reduction in fluorescence intensity is inner-filter effects (IFEs). This is the attenuation of the incident light by the fluorophore itself or by other absorbing species. At high concentrations, a spectrum may be affected by IFE and intermolecular energy transfer, causing a decrease in the fluorescence signal. Any interfering species that absorbs at the same wavelength as the analyte decreases the light available to excite the analyte. Also, when an interfering species absorbs at the emission wavelength, it diminishes the number of emitted photons that reach the detector [83]. The influence of IFE can be reduced by sample dilution or use of a shorter excitation path-length. 1.6.3.5.3 Environmental Effects Fluorescence is a very sensitive measurement method and thus may be affected by several environmental factors such as pH, temperature and solvent effects [186]. In some cases, changes in pH will radically affect the intensity and spectral profile. Buffered solutions are recommended for increased environmental control. Fluorescent compounds with acid and basic forms are usually dependent on the pH, because the ionized and non-ionized fractions can have different absorption wavelengths and emission intensities. For example, riboflavin exists in three different forms: cationic, 42 anionic and neutral, depending on the solutions pH value. The cationic form is non- fluorescent, the anionic form is weakly fluorescent and the neutral form fluoresces. The absorption spectrum differs, with each form dominating at varying pH values [187, 188]. Figure 24 The cationic, neutral, and anionic structures of riboflavin species, R (ribityl side chain) = -CH2(CHOH)3CH2OH, reproduced with permission from [188]. An increase in temperature generally results in a decrease in fluorescence because collisional quenching increases with higher temperatures. This leads to more radiationless decay processes (internal conversion/intersystem crossing). Other factors contributing to the decline in fluorescence intensity at high temperature is the loss of planar configuration for some molecules and the dissociation of molecular complexes on heating [189, 190]. To control the temperature, most instruments are fitted with a temperature controller cell holder. Solvent molecules can also interact with excited state molecules, thus lowering their energy. Solvent effects relates to the chemical properties of the fluorophores, solvent and surrounding molecules. The degree of interaction increases with solvent polarity. Polar fluorophores are highly sensitive to solvent polarity while non-polar fluorophores remain unaffected by solvent polarity. A large spectral shift for a small change in the solvent composition generally indicates specific solvent effects. These effects include hydrogen bonding, preferential solvation, charge transfer interactions and acid and base chemical reactions. Solvents containing –Br, –I or –NO2 are undesirable because they promote fluorescence quenching with increased triplet formation [162, 187]. 1.6.3.6 Fluorophores In biological and fermentation samples, many intrinsic biological fluorophores exist. Examples include aromatic amino acids (tryptophan, tyrosine and phenylalanine), vitamins (riboflavin and pyridoxine) and coenzymes (NADH, NADPH, FMN and 43 FAD) [80, 191]. In cell culture media, the range of fluorophores is less. For example, in the chemically defined basal medium (eRDF), five significant fluorophores were identified: tryptophan, tyrosine, pyridoxine, folic acid and riboflavin [1]. The interactions between all of the various components play a large part in determining the shape and intensity of obtained spectra. This produces a unique fluorescence 3D profile whichcan be used to characterize cell culture media. Small changes in media composition can cause variances in the spectral profiles observed [1, 2]. Figure 25 Typical Biological fluorophores that can be detected with the use of an excitation- emission matrix (EEM). Reproduced with permission from [191]. 1.6.3.6.1 Fluorescent Amino Acids The aromatic amino acids L-tryptophan (Trp), L-phenylalanine (Phe) and L-tyrosine (Tyr) exhibit intrinsic fluorescence. The absorption maximum for phenylalanine is 260 nm and for tyrosine and tryptophan, it is 280 nm [192]. Phenylalanine in proteins typically has a quantum yield16 of 0.03 so its emission is very weak and only occurs in the absence of tryptophan and tyrosine. The fluorescence maxima of tryptophan and tyrosine occur at 350 nm and 310 nm, respectively. Tryptophan and tyrosine have similar fluorescence efficiencies in water but the higher extinction coefficient of tryptophan results in its stronger fluorescence. Tryptophan can be selectively excited at 295 – 305 nm which minimises both the tyrosine signal and thus spectral overlap. In some samples, tryptophan fluorescence dominates because of its large extinction coefficient and absorbance at a longer wavelength [162, 193, 194]. 16 The quantum yield or efficiency (QE) for fluorescence is the ratio of the total number of emitted molecules to the total number of excited molecules. 44 1.6.3.6.2 Vitamins and Co-enzymes Riboflavin (Vitamin B2) is a precursor in the biosynthesis of two co-enzymes flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD). Riboflavin is used in energy metabolism via FMN and FAD [195]. Riboflavin, FMN and FAD absorb light from the visible region at around 450 nm with a fluorescence emission maximum at ~520 nm. For riboflavin, the quantum yield is 0.26 at pH 7. The fluorescence yield is the same for the FMN but lower for FAD. FAD fluorescence is weakened by the presence of adenine [162]. Other fluorescent vitamins include Vitamin B1 (thiamine), Vitamin B6 (pyridoxine) and Vitamin B9 (folic acid). Thiamine needs to be oxidised to its fluorescent by-product thiochrome to enable detection. Thiochrome can be excited at ~360 nm and emits at ~450 nm. Pyridoxine fluoresces at ~395 nm after excitation at ~320 nm. For folic acid, excitation occurs at ~370 nm and emission is seen at ~430 nm [179, 196, 197]. Reduced nicotinamide adenine dinucleotide (NADH) and reduced nicotinamide adenine dinucleotide phosphate (NADPH) both fluoresce at 460 nm after excitation at 360 nm. NADP, the phosphate ester of NAD, oxidises alcohol to aldehydes or ketones. The reduced form, NADPH, reduces carbonyl compounds to alcohol [162, 198]. Under anaerobic conditions, the NADPH signal increases from the build-up of the fluorescent NADH as the NADH compound is not being oxidised to form the non- fluorescent NAD+. 1.6.3.7 Bioprocesses Monitoring and Control using Fluorescence Spectroscopy The goals of any fermentation process are high product yield, constant product quality, optimisation and control. Fermentations are currently monitored using oxygen and carbon dioxide levels, pH, redox potential and conductivity probes [199, 200]. However, in the rapidly expanding bioprocess industry, there is a need for more real time compositional information. Online monitoring of fermentation processes involves performing real time monitoring of biomass, substrate and product levels. Optical sensor measurements such as fluorescence are ideal for bioprocess monitoring. They are non-invasive with online and in situ sampling reducing the risk of contamination and erroneous measurements during sampling [191, 201-204]. One 45 drawback is that complex mixtures with multiple fluorophores are misrepresented by single emission spectra. However, the use of multi-wavelength fluorescence gives a three dimensional landscape for each sample allowing several different fluorophores to be monitored [199, 205, 206]. Fluorescence monitoring based on the biological fluorophores from the cell culture medium means that the physiological state, cell mass and the production level of biological products can be followed online and non- invasively. Different fluorophores can be correlated to different parameters within the cell culture medium. The fluorescence profile of a fermentation process can be measured and used to verify subsequent batches and identify different phases [165, 207]. The main challenges facing in-line measurements are the physical attributes of the fermentation process. Stirring rates and aeration procedures generate variations in signals. The stirring rates can decrease the signal to noise ratio (SNR), while air bubbles interfere with measurements. In general, signals from in-line analysis exhibit a lower SNR than off-line measurements. In-line spectral data thus requires smoothing to reduce the impact of noise [206]. The time requirements for data analysis may affect the real-time output as the information rich spectra require chemometrics to interpret and evaluate data. This is part of the model development to be carried out prior to setting up automated analysis [208]. 1.6.3.8 Monitoring Cell Concentration and Process Variables using Single Component and Multiple Component Measurements Initial fluorescence monitoring of bioprocesses was through the measurement of NADPH fluorescence [162, 209, 210]. In bioprocess broth analysis, a correlation was observed between the fluorescence signal of NADH and the status of the fermentation. The fluorescence signals from the NADH component as well as the biomass signal were relatively constant during the exponential growth phase. NADH is a good indicator of cell density and the signal can be correlated to the metabolic switches, substrate conditions and oxygen limitations. In Figure 26 the behaviour of NADH (excitation/emission wavelength (λex/λem) of 350/450 nm) can be seen through changes in the intensity. The NADH signal is sensitive to the dissolved oxygen 46 concentration. This NADH fluorescence correlation to oxygen concentrations is more sensitive than the dissolved oxygen electrode, with the fluorescence signal reaching 90% response in 1 s while the dissolved oxygen electrode required 1min [80, 209, 211]. Figure 26 (a) Spectrum of a starved Yeast suspension at aerobic conditions; (b) spectrum of a starved Yeast suspension at anaerobic conditions; (c) difference spectrum of a starved Yeast suspension (anaerobic–aerobic). Reproduced with permission from [211]. The limitation of single compound monitoring arises when other fluorophores overlap with the signal of interest making signal interpretation difficult. For example, excitation at 365 nm generates NADPH and riboflavin emission overlap. This overlap can be avoided with specific excitation wavelength selection. For NADPH, excitation at 334 nm eliminates the riboflavin emission; meanwhile, for riboflavin, excitation at 404 nm eliminates the NADPH signal [205, 208]. Alternatively, using EEM or TSFS allows one to simultaneously monitor several biogenic fluorophores such as vitamins, coenzymes and amino acids, in order to give a clearer picture of the cellular activities [206]. Different fluorophores can be more sensitive to different process aspects [199, 47 205, 209]. The most appropriate fluorophore to use for monitoring product production, nutrient depletion and by-product build up will depend on the particular fermentation [205]. For example, Li et al. studied fluorophoric behaviour of NADPH, tryptophan, pyridoxine and riboflavin in three different yeast fermentations. The fluorescence signal was recorded and correlated to different cellular processes. In the baker’s yeast on glucose based medium,the tryptophan signal proved optimal for monitoring cell concentration, while the pyridoxine signal closely followed the cellular activity. In the Candida utilis on ethanol based medium, the tryptophan signal again proved optimal for monitoring cell concentration, while the NADPH was a good indicator for cellular activity. In the S. cerevisiae RTY110/pRB58 fermentation growing on a glucose- nitrogen based medium, both pyridoxine and NADPH tracked cell concentration but the pyridoxine signal was stronger. For all systems, riboflavin gave a weak signal [205]. Another way of analysing cell culture fluorescence was to examine the culture signal as a whole. Multicomponent fluorescence analysis was performed on P. pastoris batch culture, which contained NADPH, tryptophan and riboflavin. Prediction models for biomass concentration were built using the fluorescence signal of the combined three fluorophores. The combined fluorescence signal offered a more robust measure than a single fluorophore. The strength of the model was dependent on the overall signal and the interplay of fluorophores with the process variable [75, 206, 210, 212]. 1.6.3.9 Fluorescence Analysis of Cell Culture Media Raw materials have a within specification composition; changes to the composition are important to note for process reliability as these changes may have a negative impact on the process. Media screening is therefore a huge area of potential research. The ability to determine the efficacy of media before use improves process efficiency by ensuring the use of reliable starting material leading to consistent product yield. It can prevent financial losses by determining poor performing media. 48 From previous work on cell culture media using fluorescence [1, 2, 4, 213], EEM data has proven a suitable method for media screening and evaluation (i.e. rapidly identifying different types of media and determination of the sample quality). EEM data for seven different types of media were collected and the different media lots were easily classified based on the spectroscopic profiles (Figure 27) using NPLS-DA scores plot. Scores describes the variation between samples, giving a visualisation representation of how the samples related to one another. Any significant compositional changes which cause measurable differences in the spectroscopic data are represented by changes in scores. Scores plots allow one to easily visualize sample differences. This meant that different media samples were easy to identify despite appearing visually similar. Even identical media samples that were changed by different CHO based productions (A and L process) were separated based on their slightly different in-process media compositions [213]. Figure 27 Scores plot of LV3 versus LV1 showing NPLS-DA discrimination of CD-A1, CD-A2, CD-S2, insulin, eRDF, yeastolate, and phytone sample solutions. Reproduced with permission [4]. Changes in the composition within batches of media samples can impact the process; therefore, it is necessary to determine the quality of the media samples. Chemometric methods like MROBPCA and MANOVA can be used to identify outliers and define class variance [4, 213]. MROBPCA identifies outlying samples based on compositional changes. For example 22 yeastolate samples were tested in triplicate by EEM. The MROBPCA outlier map identified 14 major and 5 minor outliers. The spectra of the major outliers displayed either a higher or lower than normal fluorescence compared to the main body of samples. The minor outliers were singular events due to experimental error [5]. 49 In conjunction with MROBPCA, MANOVA was used to calculate the class variance of groups identified. The use of MANOVA can be useful for comparing media over time. Ryan et al. observed changes to the class variance of media samples with time indicating that minor compositional changes were occurring within the media during storage. This can be an issue if the changes become significant and impact the performance of the media [213]. Media consistency is critical for production efficiency in industrial biotechnology. Identification of unwanted media changes is thus very important and fluorescence EEM can be used to identify these changes. Medium degradation such as photo- damage from improper storage17 can reduce the efficacy of medium. Photo-damage of medium occurs when light sensitive components react, degrade or form photo products (i.e. riboflavin degradation leading to the formation of lumichrome). This causes changes in the EEM profile and these changes correlate to the photo- degradation of specific media components and the formation of photo-induced by- products. Comparison of a chemically defined eRDF media stored under different environmental conditions (light/dark) over 30 days is shown in Figure 28. When exposed to light, a signal decrease in the EEM profiles of tryptophan, tyrosine, and pyridoxine was observed whereas an increase was seen in the signal associated with photo-induced by-product. When stored in the dark the scores for fluorescence components remained at a constant level over the testing period indicating no change in the media [2]. 17 Media can be stored in transparent bioreactors, media storage vessels, or single-use disposable bioreactors. 50 Figure 28 The PARAFAC scores are shown for the two different storage conditions: (left) RT-L and (right) RT-D. Components 1 to 4 are represented by blue squares (Trp), green inverted triangles (Tyr), red circles (Py), and cyan upright triangles (FA/Rf and/or photo-products) respectively. Reproduced with permission from [2]. EEM profiles are sensitive to compositional changes in the media. Therefore, the EEM data was also tested for quantitative purposes by modelling the changes incurred by varying analyte concentrations in order to predict the quantity of that analyte within test samples. Calvet et al. developed a modified standard addition method for the determination of tryptophan and tyrosine in eRDF media solutions [1] which was later expanded to pyridoxine, riboflavin and folic acid [2]. eRDF samples produced relatively complex spectra with strong fluorescence from tyrosine and tryptophan and weaker contributions from pyridoxine, riboflavin and folic acid. Apart from quantifying the media components within prepared medium, the best performing models (tyrosine, tryptophan and riboflavin models) were applied to quantify these analytes in stored eRDF media samples as they degraded. The tyrosine model failed due to the dynamic changes seen in the stored media samples. However, the tryptophan and riboflavin models compared well with the equivalent HPLC result, which also confirmed changes in the tryptophan and riboflavin concentrations [1, 2]. These quantitative results coupled with the effective qualitative analysis show that EEM fluorescence is a potential method for a wide-range of analytical tests for cell culture media. 1.7 Project Objectives This project sought to use spectroscopy (Raman, SERS, and Fluorescence) for analysis (qualitative and quantitative) of complex cell culture media components in a 51 liquid environment. The use of different spectroscopic methods and the correlation between the spectroscopic signals and ingredient concentrations was produced to aid in the development of quality control methods for medium formulation analysis: By providing robust analytical methods for the accurate quantification of ingredients in prepared cell culture medium. By providing a quality assurance tool in biotechnology with spectroscopic variance analysis in conjunction with ingredient quantification. Media was prepared based on an industrial recipe for the formulationof basal medium. It was a five component medium of D-glucose, L-glutamine, D-galactose, yeastolate and eRDF and was examined for quantification of D-glucose, yeastolate and eRDF. The assessment of medium was based on multivariate analysis of different types of spectroscopic data - Raman, SERS, and Fluorescence. 2 Chemometrics, Materials and Methods 2.1 Chemometrics During the 1970’s the development of chemometrics coincided with the emergence of the personal computer and the increased use of computers in chemistry. Modern instrumentation generates vast amounts of numerical data and the examination of such data was limited until the introduction of computer based analysis [214]. A definition of chemometrics is “the chemical discipline that uses mathematical and statistical methods (a) to design or select optimal measurement procedures and experiments and (b) to provide maximum chemical information by analysing chemical data”. In other words, computer based statistical analysis of chemical data [215]. 2.2 Qualitative and Quantitative analysis Chemometrics involves the establishment of relationships between different variables and the development of suitable mathematical models for descriptive and predictive purposes [216]. What analyte is present and how much? These types of questions can be answered using qualitative and quantitative analysis methods. Qualitative analysis is the identification of an analyte or analytes in a sample. Quantitative analysis is 52 commonly the determination of the amount of an analyte/analytes a given sample [217]. In this work, spectral and sample variance were evaluated using Principal Component Analysis (PCA) for 2D and 3D data, while Robust Principal Component Analysis (ROBPCA) and Parallel Factor Analysis (PARAFAC) were utilised for the 3D data only. For quantifying components in the data, the Partial Least Squares (PLS) regression method was used for 2D data and Unfolded Partial Least Squares (UPLS) regression method was used for the 3D data. 2.3 Calibration Modelling The objective of calibration modelling is to develop a statistical model that can be used for the prediction of dependent variables from the numerical values generated by at least one analytical measurement. A simple univariate calibration model utilises a single independent response variable X, such as an intensity or absorbance at a single wavelength to predict the dependent variable Y [218]. The simplest form of a linear calibration model is Equation 2-1 y𝒊 = 𝑏1𝑥𝑖 + 𝑒𝑖 Where 𝑦𝑖 represents the concentration of the ith calibration sample, 𝑥𝑖 refers to the corresponding instrument measurement, 𝑏1 stands for the correlation coefficient (a measure of the slope of the line), and 𝑒𝑖 stands for the error associated with the ith calibration sample (the error is assumed to be normally distributed) [218, 219]. Values in yi and xi are used to estimate the model parameter b1 by the least squares procedure. The least-squares estimate of 𝑏1 (�̂�) is calculated by Equation 2-2 �̂�1 = (𝑋𝑇X) −1𝑋𝑇Y The “b-hat” character �̂�1 is referred to as signifying the estimate of 𝑏1. The resulting linear calibration model is developed using Equation 2-1 to predict the concentration of analyte for an unknown sample �̂�𝑢𝑛𝑘. Equation 2-3 �̂�𝑢𝑛𝑘 = 𝑋𝑢𝑛𝑘 �̂�1 Where 𝑋𝑢𝑛𝑘 denotes the response signal for the unknown sample, measured at the calibrated wavelength [220]. A disadvantage of univariate calibration is that a single independent (X) variable can be insufficient to explain the variation in the dependent variable (Y). This problem can 53 be overcome by the use of multivariate calibration. Multivariate calibration utilises several explanatory variables like spectra to predict dependent variables. Thus multivariate models have increased stability in the prediction of model parameters. It also corrects for their interfering effects. A multivariate calibration model can be expressed in a linear form as: Equation 2-4 𝑌 = 𝑋𝛽 + 𝑒 Where Y is a vector of the measured responses for I objects, X is a (I x K) matrix of measured spectra for the I objects, 𝛽 is a vector of regression coefficient and 𝑒 is a vector of the residuals of the linear regression model [214, 220, 221]. 2.4 Figures of Merit for Modelling It is necessary to accurately judge the ability of models to predict unknown samples. To assess the overall quality of multivariate models, one evaluates the correlation coefficient or the model’s associated error. Every measurement has an error and the estimated parameters show the deviations from the true value [218, 222]. The most common figures of merit for estimating error in chemometrics are the root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP) and root mean square error of cross validation (RMSECV) [216]. 2.4.1 Correlation Coefficient (r2) The correlation coefficient is a measure of the strength and direction of the relationship between the measured and predicted variables. The correlation coefficient, r2, is defined as: Equation 2-5 𝑟2 = (∑ (𝑁 𝑖=1 �̂� − 1 �̂� ) ∗ (𝑦1 − �̅�)) (∑ (�̂� − �̂� 1 ) ∗𝑁 𝑖=1 ∑ (𝑦𝑖 𝑁 𝑖=1 − �̅�)) Where �̅� is the mean of the known y values and �̂̅� is the mean of the model estimated y values. Correlation coefficients range from -1.00 to +1.0018 [220, 223]. 18 A correlation coefficient of 1.00 indicates a positive fit i.e. the relationship between obtained and predicted variables follow a similar pattern. In the case of the correlation of zero it represents no relationship between the obtained and the predicted. A correlation coefficient of -1.00 indicates a negative fit amongst the different variables. It occurs when there is an increase in one variable and a decrease associated with another variable. 54 2.4.2 Root Mean Square Error of Calibration (RMSEC) The RMSEC describes the measure of uncertainty between the estimation obtained for the calibration samples and the accepted true values of the calibration samples used to obtain the model parameters in y = xb1 + e according to: Equation 2-6 𝑅𝑀𝑆𝐸𝐶 = √ 1 𝑛 − (𝑚 − 1) ∑(𝑦𝑖 𝑛 𝑖=1 − 𝑦�̂�)2 Because estimating the model parameters uses m+1 degrees of freedom, the remaining 𝑛 − (𝑚 − 1) degrees of freedom are used to estimate RMSEC. Typically RMSEC generates overly optimistic values. This is a result of the internal error estimation. The samples themselves are used to calculate the error; therefore measurement noise is also modelled in the estimated parameters [218, 224]. 2.4.3 Root Mean Square Error of Cross Validation (RMSECV) Cross validation can be used to estimate the predictive ability of a calibration model. It is based on systematic re-sampling of all data present for estimating optimal model choice and associated error: Equation 2-7 𝑅𝑀𝑆𝐸𝐶𝑉 − 𝐿𝑂𝑂 = √ 1 𝑛 ∑(𝑦�̂�𝐿𝑂𝑂 − 𝑦𝑖) 2 𝑛 𝑖=1 The leave one out cross validation (LOOCV) and Monte Carlo cross validation (MCCV) methods are two different strategies used for error estimating. LOOCV is performed by generating n calibration models, where each of the N samples is left out one at a time. In each case the omitted sample is analysed by the model. The prediction values are averaged giving the estimate of the prediction ability. A pitfall of LOOCV is the internal nature of the prediction, leading to an overly optimistic outcome. While the LOOCV approach is often necessary when only small numbers of calibration samples are available. However, when multiple samples are removed, the resulting validation is more accurate. With MCCV, the sample set is randomly split many times into training (calibration) and validation sets. For each split,55 validation is performed ultimately giving an averaged MCCV value from a large number of random splits [8] [218, 224]. 2.4.4 Root Mean Square Error of Prediction (RMSEP) Validation is performed by either internal or external sampling. Internal validation test sets are prepared by setting aside some of the available samples and using them to estimate the model performance. Cross validation is another form of internal validation where systematic resampling of the available data is performed to test the model. External validation is carried out using a second independent test set that has a similar range to the current calibration set. An external validation set provides an independent assessment of the predictive power since the data used for validation is different than the one used to build the calibration model [225]. The RMSEP is also known as root mean square error of validation (RMSEV). It is a measure of the uncertainty that can be expected in future predictions. With RMSEP, a set of validation samples (test set) are prepared and measured independently from the calibration samples. The number of validation samples p should be sufficiently large so that the estimated prediction error accurately reflects all sources of variability within the calibration method [218, 226]. The RMSEP is computed by: Equation 2-8 𝑅𝑀𝑆𝐸𝑃 = √ 1 𝑝 ∑(𝑦𝑖 − 𝑦�̂�)2 𝑝 𝑖=1 The RMSEP judges the prediction ability of the model and indicates if the number of latent variables used is correct. The RMSEP has the same units as the validation samples [110]. 2.5 Multivariate Analysis Multivariate analysis methods like PCA and PLS have proven useful in biopharmaceutical analysis as they handle multidimensional data and variation (such as experimental error and noise) caused by the changing environment [78, 227]. PCA can be used to characterise raw materials and analyse the variability within samples, 56 batches and process variables, while PLS can be used to investigate the correlation between spectral data and quantitative properties such as product yield [228]. Using multiple analysis methods allows for a more detailed exploration of the process to be carried out [229]. 2.5.1 Variance Analysis 2.5.1.1 Principal Component Analysis Principal component analysis (PCA) is a statistical technique that linearly transforms an original set of variables into a substantially smaller set of uncorrelated variables that represent the information in the original variables. Its goal is to reduce the dimensionality of the original dataset, making it easier to understand [230]. One of the main reasons for the use of PCA resides in the enormous amount of data that is generated by modern techniques. For example, a typical Raman spectrum contains 500 to 2000 data points [218, 231]. The PCA algorithm reduces the number of variables and the information is projected onto a smaller number of significant variables, the so-called principal components (PCs). The principal components are linear combinations of the original variables and are selected so that the first principal component covers as much of the variation in the data as possible. The second principal component is orthogonal to the first and covers as much of the remaining variation as possible and so forth [206]. The mathematical model for the PCA method is as follows: Equation 2-9 𝑋 = 𝑇𝐴𝑃𝐴 𝑡 𝑡 + 𝜀 Where T is an N by A matrix containing the scores of the PCs and P is an M-by-A matrix containing the loadings of the PCs and the ɛ matrix contains unexplained variance. The scores are the intensities of each of the new compressed variables for all of the samples and contain information on how the samples relate to each other. The loadings are the distributions of the new variables in terms of the original variables and include information on how variables relate to one another. 57 Figure 29 PCA plot is shown where the blue circles represent the scores of the sample after PCA analysis. The major axis of the ellipse represents the first principal component, PC1, and its minor axis the second principal component, PC2. Orthogonality can be described as two vectors being completely uncorrelated with one another. The scores are orthogonal to each other, for example the scores of PC1 are unrelated to the scores of PC2. A consequence of orthogonality of the principal components is that the issue of correlation between X variables is completely eliminated if one chooses to use principal components instead of original X variables. If the number of principal components examined is the same as the number of original X variables, then 100% of the variance in the data is explained. Data compression occurs when the user chooses a number of principal components that is much lower than the number of original variables. This necessarily involves ignoring a small fraction of variation in the original X data [218, 231-233]. 2.5.1.2 Robust Principal Component Analysis (ROBPCA) The goal of robust PCA methods is to obtain principal components that are not influenced by outliers. The ROBPCA19 method combines ideas of projection pursuit (PP) and robust covariance estimation [234]. For high dimensional data, where the number of variables is greater than the number of samples, ROBPCA proceeds as follows: The X data is pre-processed by reducing its data space to the linear transformed subspace using singular value decomposition. The dimension of this subspace is at most N-1, where N is the number of samples. A 19 Developed by Hubert et al. in 2005 58 measure of “outlyingness” is computed for each data point obtained within the new data space by projecting high dimension data points in many univariate directions. For every direction a robust centre and scale is computed for the projected data points. Each direction is scored by its corresponding value of “outlyingness”: Equation 2-10 𝑜𝑢𝑡𝑙(𝑋𝑖) = 𝑚𝑎𝑥 𝑣 |𝑋𝑖 𝑡𝑉 − 𝑚𝑀𝐶𝐷(𝑋𝑖 𝑡𝑉)| 𝑠𝑀𝐶𝐷 Where 𝑋𝑖 𝑡𝑉 is the standardised distance to the centre measured for each data point, location 𝑚𝑀𝐶𝐷 and scale sMCD are the univariate minimum covariance determinant (MCD) estimators and V is the number of univariate directions. Using the data points with the smallest “outlyingness” to form a covariance matrix, the final number of principal component K is determined. The data points are projected onto a K dimensional subspace of which the centre and shape is determined by means of a reweighted MCD estimate. From this the robust principal components are known and the robust centre is the MCD location estimate [234-236]. 2.5.2 Regression 2.5.2.1 Partial Least Squares Regression Partial Least Squares regression (PLS) was developed in the 1960s by H. Wold [231]. It has become a highly utilised regression method in the chemometric toolbox. PLS is a standard method for building multivariate regression models to predict different parameters from complex samples. The reason for the success of PLS is the applicability to various types of data, the ability to handle non-linear data and the development of software which has aided in the interpretation of and visualisation of the PLS results. There are many different types of PLS algorithms which include PLS1, PLS2, Moving Window Partial Least Squares (MWPLS) and Unfolded PLS (U-PLS). In PLS1, a separate calibration model is built for each column in the Y data. In PLS2 mode a single calibration model is constructed for all columns on the Y data simultaneously. MWPLS and U-PLS will be explained further, later in the text [237, 238]. 59 PLS regression uses exactly the same mathematical model for compression of X and Y data. The data matrix X is decomposedinto a matrix of scores T and loadings P and the response matrix Y is also split into a matrix of scores U and loadings Q: Equation 2-11 𝑋 = 𝑇𝑃𝑇 + 𝐸 Equation 2-12 𝑌 = 𝑈𝑄𝑇 + 𝐹 The goal of PLS regression is to model all the variables within X and Y in order that the error in X block, E, and the error in Y block, F, is minimised. The least squares regression is performed between U and T. An internal correlation is built that relates the scores of the X block to the scores of the Y block in terms of the maximum covariance between X and Y: Equation 2-13 𝑈 = 𝑇𝑊 This is followed by the overall regression step where the decomposition of X is used to predict y. Equation 2-14 �̂� = 𝑋�̂� The regression coefficients are given by Equation 2-15 �̂� = 𝑃(𝑃𝑇𝑃)−1𝑊𝑄𝑇 2.5.2.2 Unfolded Partial Least Squares (U-PLS) Unfolded Partial Least Squares (U-PLS) involves the application of PLS to matrices which have been unfolded into a two-way structure. UPLS is useful in fluorescence spectroscopy, where EEM and TSFS are forms of 3D data (KxIxJ) [239, 240]. Figure 30 The unfolding scheme for multi-dimensional array into KxIJ slices, adapted from [241] with permission to reproduce. Before performing PLS regression, it is possible to unfold the 3D data into a 2D matrix. Unfolding is the conversion of the three-way data matrix into a stack of two- J3 …… K K J I Matrix (X) I J1 J2 I I 60 way data where simpler mathematical models can be applied. During the unfolding process, one of the directions remains unchanged while the other two are arranged slice by slice to give a row vector. A cube (KxIxJ) can be unfolded in three different directions: row wise (KxIJ), column wise (IxKJ), and tube wise (JxKI). After unfolding an EEM matrix, the 2D matrix will have the following dimensions Kx(I*J) where K is the number of samples and I is the number of excitation wavelengths and J the number of emission wavelengths. PLS regression analysis is performed on the rearranged two-way data [218, 219, 242-244]. 2.5.3 Factor Analysis 2.5.3.1 Multivariate Curve Resolution Alternative Least Squares (MCR-ALS) The goal of curve resolution and factor analysis is to mathematically decompose sample signals into the underlying profiles of each component. The multivariate curve resolution method describes the bilinear decomposition of the matrix D. the MCR- ALS model can be written as Equation 2-16 𝐷 = 𝐶𝑆𝑇 + 𝐸 It decomposes a bilinear spectral data matrix, 𝐷(I×J), into two matrices; 𝐶(I×K), which contains the relative concentration profile of each component in different samples and 𝑆(J×K), which contains the true spectral profile of each component, where I is the number of samples, J is the number of wavelengths (i.e. the wavelength range over which the spectra were collected), and K is the number of components or factors. 61 Figure 31 Scheme of steps for the resolution process in MCR-ALS method, adapted from [245] with permission to reproduce. In MCR, to start the iterative ALS process, an initial estimation of the factors present in the spectral profiles for each sample is performed by methods like PCA. With the initial estimation, solving for both 𝐶 and 𝑆𝑇 least squares solutions can be implemented in an alternating cycle, with iterations giving a new 𝐶 or 𝑆𝑇 matrix. The calculation of 𝐶 and 𝑆𝑇 are repeated until an optimal solution is obtained or convergence is achieved. Constraints may be imposed on the profiles, for example non-negativity where the spectra or concentration values cannot be negative. The MCR-ALS method works with trilinear and non-trilinear data sets. A trilinear structure can be set as an optional constraint in the MCR-ALS method for the C matrix [218, 245-247]. 62 2.5.3.2 Parallel Factor Analysis (PARAFAC) For the analysis of the EEM data, one can also use PARAFAC as a decomposition method in order to resolve the fluorescence landscape into a number of trilinear components f, which, in theory, could represent the excitation and emission spectra of the constituent fluorophores. In PARAFAC, multi-way data are decomposed into sets of scores and loadings with the same number of factors identified by the model. The numbers of factors or latent variables are much lower than the number of original variables making visualisation of the data possible. PARAFAC uses all the original variables to determine the set of latent variables [200, 248, 249]. The objective for PARAFAC is to build a model that minimises the sum of the residual 𝑟𝑖𝑗𝑘 present: Equation 2-17 𝑋𝑖𝑗𝑘 = ∑ 𝑎𝑖𝑓𝑏𝑗𝑓𝑐𝑘𝑓 𝐹 𝑓=1 + 𝑟𝑖𝑗𝑘 Where 𝑋𝑖𝑗𝑘 is an element of the three-way data and i, j, and k are the indices of this element on the sample, excitation and emission modes. The fluorescence landscape is decomposed into sample scores, 𝑎𝑖𝑓, excitation loadings, 𝑏𝑗𝑓 , and emission loadings, 𝑐𝑘𝑓 , for each factor f. The residual 𝑟𝑖𝑗𝑘 contains the variation not captured by the PARAFAC model [248, 250-252]. Figure 32 Graphical representation of a two component PARAFAC model. A three-way array 𝑿𝒊𝒋𝒌 is expressed as the sum of two trilinear components and three-way array residual 𝒓𝒊𝒋𝒌. Reproduced with permission from [250]. The core consistency gauge is a method for finding the correct number of components to use in PARAFAC modelling [253]. In an ideal PARAFAC model, the core array has ones on the super-diagonal, and zeros elsewhere, implying that no interactions occur between the components from different modes for the PARAFAC model. A core array is calculated from the loadings for each component in the model and compared with the ideal PARAFAC core array of zeros and ones. The optimum model is computed when the number of components comes together with an 63 acceptable core array. Core consistency can be increased by lowering the number of components [254]. The core consistency is the relative sum of squared differences between the core array and the array of super-diagonal core of ones. Core consistency provides a quantitative measure of how well the loadings represent variation within the data. It is generally expressed as percentage and if the percentage is close to 100, the model gives an appropriate description of the data. In cases of a low core consistency percentage, the model is not describing the data [255, 256]. 2.6 Data Pre-Processing The purpose of pre-processing is to remove or minimise unwanted variation which is not related to the analyte of interest. Spectral variations like light scattering, baseline offset and suppressed analyte signal can be corrected by pre-processing. The correct selection and implementation of data pre-processing can result in more accurate and robust chemometric models. Listed below are some of the most commonly implemented methods and the reasons why they might be used [220, 232, 257, 258]. 2.6.1 Mean Centring Mean centring focuses on the variation in responses by removing the absolute intensity information (mean response) from each variable. Mean centring involves calculating the average response for each variable in a dataset then subtracting the averaged response from the each variable. The pre-processed data can be transformed back into the original data by adding the mean response to data [220]. 2.6.2 Derivatives Derivatives act as a frequency scaling tool and high pass filter. Derivative pre- processing minimises lower frequency features such as sloping baseline and retains the high frequency aspects of the original data like the Raman peaks (Figure 33). A drawback to derivative pre-processing comes from the frequency response function used in polynomial smoothing which can introduce distortions into the data. Also thefiltering nature removes substantial amounts of signal, producing a lower signal to noise ratio in the data [218]. The first order derivative effectively removes the baseline offset variation in the spectral profiles and is useful where the samples exhibit a baseline shift. The second derivative removes differences in baseline offset 64 and baseline slope. In the case of a complex spectrum, the use of a second order derivative can make spectral interpretation more difficult. However, for a low signal spectrum, the second order derivative enhances the signal [220, 232, 259, 260]. In this body of work, the Savitzky-Golay derivative algorithm20 is used [261]. The S-G algorithm fits individual polynomials to filter windows around each point across the spectrum. This continues until it reaches the end of the spectrum. It requires the selection of the size of the window, the order of the polynomial and the order of derivative [220, 232, 259, 260]. Figure 33 (Left) Raman spectra of M1GLU data (this work) and (Right) the first order derivative following Savitzy-Golay smoothing of M1GLU data (this work). 2.6.3 Multiplicative Scatter Correction (MSC) MSC aims to eliminate the additive and/or multiplicative effects; this can include differences in baseline offsets and slope changes and non-linearity in the spectra [262]. It removes effects caused by sampling variation such as sample thickness, sample packing, focal depth, sample temperature and possible water evaporation. Figure 34 shows the before and after spectra where baseline offset has been removed. The MSC method is performed using a linear regression model of each spectrum against a reference spectrum. The mean spectrum for the dataset is generally used as the reference spectrum. The least squares coefficients are calculated and then these values are used to calculate the MSC corrected spectrum [218, 220, 232, 263]. 20 There are alternative derivative methods such as finite differences and Norris-Williams (NW) derivation. The former is sensitive to high frequency noise while NW is less applicable to spectroscopic data than Savitzky-Golay. 500 1000 1500 2000 2500 3000 0 0.5 1 1.5 2 2.5 3 3.5 x 10 4 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -1500 -1000 -500 0 500 1000 1500 2000 2500 Wavenumber (cm-1) In te n s it y 65 Figure 34 (A) Un-processed NIR spectra and (B) after multiplicative scatter correction of wheat samples.[264] 2.6.4 Normalisation Normalising methods attempt to overcome changes in the data due to fluctuations in absolute intensity due to instrument or measurement factors e.g. light source variation may be overcome by identifying a feature present in the sample/spectra that should be constant from one sample to the next and correcting the variables to scale based on this characteristic. A good normalisation method minimises the variance between spectra and maximises the signal for classification/discrimination purposes. Full spectrum normalization captures the general characteristics of the data such as scaling to the area under the curve. On the other hand, local normalization methods polarise the spectra such as scaling to a known peak which may be useful with varying noise levels [265-267]. Below are the equations for the normalisation methods used in this study. [218, 232] In each case, 𝑋𝑖,𝑛𝑜𝑟𝑚 is the normalized spectrum; 𝑋𝑖 is the spectrum of the ith sample, 𝑋𝑖 ∗ is the vector of observed values for the given normalisation; j is the variable number and n is the total number of variables. In area normalisation, (Norm1), each variable is scaled to unit area under the curve equal to one. This is achieved by dividing each variable by the sum of the absolute value of all variables from the given samples. Equation 2-18 𝑋𝑖,𝑛𝑜𝑟𝑚1 = 𝑋𝑖 ∑ |𝑋𝑖 ∗ |𝑛 𝑗=1 With Norm2 normalisation, each variable is divided by the sum of the squared values of all variables for the given sample. Norm2 returns a vector of unit length (length 66 equal to one). It is a form of weighted normalisation where larger values are weighted more heavily in the scaling. Equation 2-19 𝑋𝑖,𝑛𝑜𝑟𝑚2 = 𝑋𝑖 ∑ 𝑋𝑖 ∗2𝑛 𝑗=1 For the infinity normalisation mode (maximum norm - NormINF), each variable is divided by the highest peak observed for all variables of a given sample, giving a vector scaled to a maximum value equal to one. Therefore all variables are weighed against the largest value. Equation 2-20 𝑋𝑖,𝑛𝑜𝑟𝑚𝑖𝑛𝑓 = 𝑋𝑖 (𝑀𝑎𝑥(𝑋𝑖 ∗ )) Figure 35 Illustration of different methods of normalisation, (a) Untreated Raman spectra of M1GLU data (b) Norm 1 Raman Spectra, (c) Norm 2 Raman Spectra and (d) Norm INF Raman Spectra. Water is the main component of all samples studied in this work. The water bands can act as an internal standard as vibrational OH bands are evident above 3000 cm–1 and at 1640 cm–1 and these non-overlapping bands can be used as internal references. By normalising to the OH bending vibration we are reducing the impact of the absolute 500 1000 1500 2000 2500 3000 0 0.5 1 1.5 2 2.5 3 3.5 x 10 4 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 0 1 2 3 4 5 x 10 -3 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 0 0.01 0.02 0.03 0.04 0.05 0.06 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Wavenumber(cm-1) In te n s it y (a) (b) (c) (d) 67 intensity variation of water signal. The water signal is in such an excess that variance in this signal is large due to measurement error, but since it is present in excess we can assume that in reality it is an invariant signal. In other words for the changing media composition, the water signal should remain unchanged and thus suitable as an internal standard. Looking to the differences in the analyte-water band ratios, the relative intensity of water signal gives a good estimation of the analyte signal. 2.7 Variable/Wavelength Selection Another way to improve the precision, accuracy and robustness of a calibration model is variable selection. With the right variables, the selection removes irrelevant data21 whilst retaining the more informative areas of the spectra. Wavelength selection involves choosing subsets for modelling that give a lower error level, better stability and the removal of minor data permitting easier interpretation of the relationship between the samples and the model. Variable selection methods can be organised into three main classes: filter, wrapper and embedded. The filter method involves two steps. A PLS model is fitted to the data. The threshold is determined from the response with respect to the variable identified by the fitted PLS model. Variable selection is then carried out based on the threshold. Examples include the loading weights method or variable importance in projection (VIP) method [268]. The wrapper method acts in an iterative way where each search extracts a subset, and for each subset the model performance is measured. Selection variables are sorted based on filter constraints like selected frequencies. There are numerous wrapper methods including genetic algorithm PLS, backward variable elimination PLS and interval PLS [269]. Embedded methods combine variable selection and modeling in a single step. The variable selection is performed within the PLS algorithm. Embedded methods include interactive variable selection, soft threshold PLS, sparse PLS and power PLS [270]. 21 Full spectrum calibration models can be misled by contrived correlations from system drift or co- variation amongstconstituents resulting in over-fitting of the model. 68 2.7.1 Moving Window Partial Least Squares (MWPLS) Moving Window Partial Least Squares (MWPLS) is a wrapper variable selection method and was the primary variable selection method used in this body of work. The objective of MWPLS is to find the informative spectral regions within complex spectra. Informative regions hold the most relevant information for the PLS analysis to yield better performing models. The operation of MWPLS uses a fixed sized window to build a series of PLS models across the whole spectral region. The informative regions are assigned by the examination of the complexity and the error level of the PLS models. For each PLS model built along the spectrum, the sum of the squared residue (SSR) is calculated. In Figure 36, the SSR versus window position plot provides a display of the residual lines useful in identifying informative regions. Residue lines show downward-facing bands, which correspond to a particular wavenumber range. The informative ranges have a low SRR value compared to the insignificant ranges [271-273]. Figure 36 MWPLS Residue lines obtained from Raman spectra of M1GLU data collected in this work for the calibration samples. Each colour represents a different residual line. 2.8 Outliers An outlier is an observation which does not obey the pattern of the majority of the data. There are two types of outliers: firstly odd measurements, where one of the replicate measurements is different, possibly due to measurement error; and secondly odd samples, where samples may be compositionally different to each other. Outliers should be flagged and discarded as they will introduce unfavourable repercussions in 69 further analysis. Outlier analysis techniques have the ability to detect the artefacts (like spectral distortions and deviation from offset) that cannot be seen in the spectra. If a model describes the variance expected from the calibration test and unknown samples then one should not encounter outliers. When samples do not fit the model it is an indicator that either, a) the sample is different, or b) the model does not adequately describe the variance space. In addition to computing principal components, both PCA and ROBPCA also flag outliers. Outliers are identified on a scores plot if they fall outside of the 95% confidence limit or are identified as outliers in the corresponding Hoteling’s T2/Q residual model. There are different categories of outliers which are encountered in a ROBPCA outlier map (Figure 37): (1) good leverage points (2) orthogonal outliers, and (3) bad leverage points. Good leverage points lie close to the PCA subspace but far from the regular observations, whereas bad leverage points have a large orthogonal distance22 to the PCA subspace yet their projection is within the PCA space. Orthogonal outliers have a large orthogonal distance and their projection on the PCA space is far from the regular data [234, 235] [274] [275]. Figure 37: An outlier map plots the orthogonal distance versus the score distance, with orthogonal outliers and good and bad leverage points. 22 The orthogonal distance is the distance between an observation and its projection in the k- dimensional subspace. 70 Materials and Methods 2.9 Materials 2.9.1 Sample Materials D-glucose (ACS Reagent), L-glutamine (>99%) and D-galactose (min. 98%) were purchased from Sigma Aldrich and used as received. Isopropanol (Laboratory Grade Reagent) and nitric acid (Laboratory Grade Reagent, SG 1.42, 70%) were purchased from Fisher Scientific. All media samples were prepared with purified Millipore water (18 MΩ resistance). Yeastolate Difco TC UF was purchased from Becton Dickinson and eRDF basal medium was purchased from Kyokuto, Tokyo. In order to minimise variation for the complex media components, only one batch of each was purchased and used over the course of this project. 2.9.2 Colloid Materials Nitric acid (Laboratory Grade Reagent, SG 1.42, 70%) and hydrochloric acid (Laboratory Grade Reagent) were purchased from Fisher Scientific. Silver nitrate (99.99% trace metals basis); sodium citrate (ACS reagent, ≥99.0%) and sodium bicarbonate (ACS reagent, 99.7-100.3%) were purchased from Sigma Aldrich. 2.10 Workflow Description Model medium based on a typical formulation of feed medium used in industry were prepared to provide a platform to test spectroscopic methods that did not compromise intellectual property issues. The goal was to develop successful analytical methods that could be adapted for a formulation assay, where the medium was tested for the correct component levels in the ready to use (liquid) state. The conventional Raman study covered the development of the analysis method from the initial data collection setup to data handling and chemometric analysis. It involved the changing of the sample composition and the progression from single analyte component samples to five component model medium samples. The chemometric methods were applied to both the simple and complex samples and with increasing 71 complexity the degree of accuracy from the model performance was studied. The Raman study was undertaken firstly to develop a method for measuring D-glucose concentration in the M5Glu media. Two more sample sets, M5eRDF and M5Ye (see below), were model media with very complex compositions (amino acids, carbohydrates, minerals, etc.). For these media the goal was to quantify the amount of multi-component ingredients (e.g. eRDF and yeastolate) added to each media blend. For the complex ingredient quantification, more sensitive methods like SERS and EEM were investigated. Qualitative and quantitative models were built for the eRDF and yeastolate samples using Raman, SERS, EEM and TSFS. 2.11 Sample Preparation and Handling. All samples were prepared under sterile conditions in a Laminar Flow Hood (LFH) to prevent microbial contamination. The LFH was wiped down with 70% isopropanol before and after use to maintain a sterile environment for sample preparation. The glassware used to prepare the media was carefully cleaned using the following procedure: 1) wash thoroughly with a soap solution 2) rinse with distilled water 3) soak overnight in a 10% nitric acid solution and 4) final thorough rinse with Millipore water. Once prepared the samples were divided into sterile vials for storage (1 mL and 15 mL aliquots). All samples were stored at –70 oC to ensure sample integrity. For testing, the samples were removed from storage at –70 oC in batches. The sample vials were transferred to a refrigerator set at 2–8 oC where they were allowed to defrost overnight. Prior to sampling, the samples were homogenised using a test tube shaker and the outside of the sample vial was wiped down with 70% isopropanol before being transferred to the LFH. Shaking of the samples created bubbles; therefore the samples were allowed to stand until all bubbles had dissipated. After sampling was complete, the sample vial was returned to the freezer at –70 oC for long term storage. 2.12 Datasets 2.12.1 Model Media Samples: The datasets created provide a model media system for analysis and allow for changes in media to be characterised. There are five ingredients used for the model media, (D- 72 glucose, D-galactose, L-glutamine, yeastolate and eRDF). D-glucose is a monosaccharide sugar that is the main carbon source for cells [276]. D-galactose is a monosaccharide sugar that works as a transcription promoter as well as a carbon source [277]. L-glutamine is an amino acid that functions as an intermediate in energy metabolism, acts as an acid base balance regulator and is used in the detoxification of ammonia[278, 279]. TC yeastolate products are animal-free and water-soluble portions of autolyzed Saccharomyces cerevisiae. TC yeastolate contains peptides, amino acids, vitamins, carbohydrates, simple and complex, and is used as a nutritional supplement in cell culturing [280, 281]. eRDF is a basal media used in cell culture; it is a complex mixture of inorganic salts, amino acids, vitamins and other components that promote cell growth [55, 57, 58]. 2.12.2 M1Glu Media Dataset A simple single analyte (D-glucose) in an aqueous solution was used as a starting point for uncovering the important experimental parameters. Thirty one different solutions of varying concentration of D-glucose in the range 1.6 g/L to 49.6 g/L were prepared by dissolving a known weight of D-glucose in 10 mL Millipore water. A detailed description of the composition of each sample is tabulated below (Table 2). Table 2 Composition of the M1Glu samples. # Sample D-glucose (g/L) # Sample D-glucose (g/L) # Sample D-glucose (g/L) M1Glu01 1.6 M1Glu11 17.6 M1Glu21 33.6 M1Glu02 3.2 M1Glu12 19.2 M1Glu22 35.2 M1Glu03 4.8 M1Glu13 20.8 M1Glu23 36.8 M1Glu04 6.4 M1Glu14 22.4 M1Glu24 38.4 M1Glu05 8.0 M1Glu15 24.0 M1Glu25 40.0 M1Glu06 9.6 M1Glu16 25.6 M1Glu26 41.6 M1Glu07 11.2 M1Glu17 27.2 M1Glu27 43.2 M1Glu08 12.8 M1Glu18 28.8 M1Glu28 44.8 M1Glu09 14.4 M1Glu19 30.4 M1Glu29 46.4 M1Glu10 16.0 M1Glu20 32.0 M1Glu30 48.0 M1Glu31 49.6 73 2.12.3 M3Glu Media Dataset A 3-component model system (D-glucose, L-glutamine and D-galactose) was generated to determine the potential for quantifying closely related analytes in dilute aqueous solution. D-glucose concentration was varied throughout the sample set and L-glutamine and D-galactose concentrations were kept at either a high or low level as a background influence (Table 3). Chemometric modelling was performed on the Raman data collected to predict the D-glucose concentration within these samples as it was giving the strongest signal of the media components used. Table 3 Composition of the M3Glu model media solutions. # Sample D-glucose (g/L) D-galactose (g/L) L-glutamine (g/L) M3Glu01 0.00 1.3 0.44 M3Glu02 0.32 3.7 1.16 M3Glu03 0.64 1.3 0.44 M3Glu04 0.96 3.7 1.16 M3Glu05 1.28 1.3 0.44 M3Glu06 1.60 3.7 1.16 M3Glu07 1.92 1.3 0.44 M3Glu08 2.24 3.7 1.16 M3Glu09 2.56 1.3 0.44 M3Glu10 2.88 3.7 1.16 M3Glu11 3.20 1.3 0.44 M3Glu12 3.52 3.7 1.16 M3Glu13 3.84 1.3 0.44 M3Glu14 4.16 3.7 1.16 M3Glu15 4.48 1.3 0.44 M3Glu16 4.80 3.7 1.16 M3Glu17 5.12 1.3 0.44 M3Glu18 5.44 3.7 1.16 M3Glu19 5.76 1.3 0.44 M3Glu20 6.08 3.7 1.16 M3Glu21 6.40 1.3 0.44 M3Glu22 6.72 3.7 1.16 M3Glu23 7.04 1.3 0.44 M3Glu24 7.36 3.7 1.16 M3Glu25 7.68 1.3 0.44 M3Glu26 8.00 3.7 1.16 M3Glu27 8.32 1.3 0.44 M3Glu28 8.64 3.7 1.16 M3Glu29 8.96 1.3 0.44 M3Glu30 9.28 3.7 1.16 M3Glu31 9.60 1.3 0.44 M3Glu32 9.92 3.7 1.16 74 The thirty one different solutions of D-glucose prepared in the M1Glu dataset were used as the stock solutions for D-glucose in the M3Glu dataset. Two stock solutions were prepared for L-glutamine at 2.2 g/L and 5.8 g/L. Two stock solutions were prepared for D-galactose at 6.5 g/L and 18.5 g/L. All stock solutions were prepared with Millipore water. The M3Glu dataset consisted of 32 samples. Sample one was prepared by pipetting 6 mL of Millipore water, 2 mL of L-glutamine and 2 mL of D- galactose at the low concentration giving a sample volume of 10 mL. For sample two to sample thirty two, a 2 mL aliquot of the specified D-glucose solution was pipetted into a sample vial together with 2 mL of L-glutamine and 2 mL of D-galactose (at high concentration for the even numbered samples and at low concentration for the odd numbered samples). The sample volume was made up to 10 mL with 4 mL of Millipore water. 2.12.4 M5Glu Media Dataset The M5GLU dataset (Table 4) was generated as a model media system based on media formulations used within the biopharmaceutical industry. The development of a calibration model for D-glucose quantification in media involved a set of samples containing a fixed concentration of eRDF, yeastolate, D-galactose and L-glutamine together with a concentration of D-glucose that varied. The thirty one different solutions of D-glucose prepared in the M1Glu dataset were used as the stock solutions for D-glucose in the M5Glu dataset. Stock solutions of 4 g/L of L-glutamine, 12.5 g/L of D-galactose, 5 g/L of yeastolate and 17 g/L of eRDF were prepared. All stock solutions were prepared with Millipore water. The M5Glu samples were prepared by pipetting 2 mL of the specified D-glucose stock solution, 2 mL of L-glutamine, 2 mL of D-galactose, 2 mL of yeastolate and 2 mL of eRDF to give a 10 mL sample. Media can also contain complex ingredients that also contain glucose. Yeastolate and eRDF are complex mixtures that have D-glucose as a component, for example eRDF has a glucose concentration of 0.019 mg/g. The same issue arises with yeastolate. For yeastolate, there is no accurate compositional data available. The amount of glucose contained in yeastolate and eRDF is so small that it should not have a significant impact on the model. When quantitative analysis of D-glucose on the M5Glu data was 75 performed, the concentration of D-glucose in yeastolate and eRDF was not taken into account. Table 4 The composition of the M5Glu samples with the additive contribution of D-glucose from eRDF and yeastolate giving a new range of D-glucose from 0.0 g/L to 9.92 g/L. Sample No D-glucose g/L eRDF g/L Yeastolate g/L D-galactose g/L L-glutamine g/L M5Glu01 0.00 3.4 1 2.5 0.8 M5Glu02 0.32 3.4 1 2.5 0.8 M5Glu03 0.64 3.4 1 2.5 0.8 M5Glu04 0.96 3.4 1 2.5 0.8 M5Glu05 1.28 3.4 1 2.5 0.8 M5Glu06 1.60 3.4 1 2.5 0.8 M5Glu07 1.92 3.4 1 2.5 0.8 M5Glu08 2.24 3.4 1 2.5 0.8 M5Glu09 2.56 3.4 1 2.5 0.8 M5Glu10 2.88 3.4 1 2.5 0.8 M5Glu11 3.20 3.4 1 2.5 0.8 M5Glu12 3.52 3.4 1 2.5 0.8 M5Glu13 3.84 3.4 1 2.5 0.8 M5Glu14 4.16 3.4 1 2.5 0.8 M5Glu15 4.48 3.4 1 2.5 0.8 M5Glu16 4.80 3.4 1 2.5 0.8 M5Glu17 5.12 3.4 1 2.5 0.8 M5Glu18 5.44 3.4 1 2.5 0.8 M5Glu19 5.76 3.4 1 2.5 0.8 M5Glu20 6.08 3.4 1 2.5 0.8 M5Glu21 6.40 3.4 1 2.5 0.8 M5Glu22 6.72 3.4 1 2.5 0.8 M5Glu23 7.04 3.4 1 2.5 0.8 M5Glu24 7.36 3.4 1 2.5 0.8 M5Glu25 7.68 3.4 1 2.5 0.8 M5Glu26 8.00 3.4 1 2.5 0.8 M5Glu27 8.32 3.4 1 2.5 0.8 M5Glu28 8.64 3.4 1 2.5 0.8 M5Glu29 8.96 3.4 1 2.5 0.8 M5Glu30 9.28 3.4 1 2.5 0.8 M5Glu31 9.60 3.4 1 2.5 0.8 M5Glu32 9.92 3.4 1 2.5 0.8 76 2.12.5 T5 Test Dataset Ten different stock solutions of D-glucose ranging from 4 g/L to 44.5 g/L were prepared. Stock solutions with 4 g/L of L-glutamine, 12.5 g/L of D-galactose, 5 g/L of yeastolate and 17 g/L of eRDF were also prepared. All stock solutions were made with Millipore water. The T5 samples were assembled by pipetting 2 mL of the specified D-glucose stock solution, 2 mL of L-glutamine, 2 mL of D-galactose, 2 mL of yeastolate and 2 mL of eRDF to give a final sample volume of 10 mL. Table 5 D-glucose sample composition for Raman testing with the amount of D-glucose per sample changing at a rate of 0.9 g/L while the other components remain at one concentration. Sample No eRDF (g/L) D-glucose (g/L) L-glutamine (g/L) D-galactose (g/L) Yeastolate (g/L) T5Glu01 3.4 1.70 0.8 2.5 1 T5Glu02 3.4 2.60 0.8 2.5 1 T5Glu03 3.4 3.50 0.8 2.5 1 T5Glu04 3.4 4.40 0.8 2.5 1 T5Glu05 3.4 5.30 0.8 2.5 1 T5Glu06 3.4 6.20 0.8 2.5 1 T5Glu07 3.4 7.10 0.8 2.5 1 T5Glu08 3.4 8.00 0.8 2.5 1 T5Glu09 3.4 8.90 0.8 2.5 1 T5Glu10 3.4 9.80 0.8 2.5 1 2.13 Complex Media Components Experiments: Of the five media ingredients, eRDF and yeastolate showed a fluorescence and SERS response. These components were eligible for testingthe efficacy of EEM, TSFS and SERS for the quantification of complex components as a single unit within media. 2.13.1 eRDF Media Dataset (M5eRDF) Ten different stock solutions of eRDF were prepared with the concentration ranging from 5 g/L to 32 g/L. Stock solutions of 31 g/L of D-glucose, 4 g/L of L-glutamine, 12.5 g/L of D-galactose, 5 g/L of yeastolate were also prepared. A 2 mL aliquot from the specified eRDF stock solution and 2 mL aliquots from D-glucose, L-glutamine, D- galactose and yeastolate stock solutions were added together to prepare a 10 mL eRDF sample. 77 Table 6 M5eRDF sample compositions with the amount of eRDF per sample changing at a rate of 0.6 g/L while the other components have a fixed concentration. Sample No eRDF (g/L) D-glucose (g/L) L-glutamine (g/L) D-galactose (g/L) Yeastolate (g/L) M5eRDF01 1.0 6.2 0.8 2.5 1 M5eRDF02 1.6 6.2 0.8 2.5 1 M5eRDF03 2.2 6.2 0.8 2.5 1 M5eRDF04 2.8 6.2 0.8 2.5 1 M5eRDF05 3.4 6.2 0.8 2.5 1 M5eRDF06 4.0 6.2 0.8 2.5 1 M5eRDF07 4.6 6.2 0.8 2.5 1 M5eRDF08 5.2 6.2 0.8 2.5 1 M5eRDF09 5.8 6.2 0.8 2.5 1 M5eRDF10 6.4 6.2 0.8 2.5 1 2.13.2 Yeastolate Media Dataset (M5Ye) The M5Ye samples were prepared in a similar way to the M5eRDF samples. Ten different stock solutions of yeastolate ranging in concentration from 0.5 g/L to 8.6 g/L and stock solutions for eRDF(17 g/L), D-glucose (31 g/L), L-glutamine (4 g/L) and D-galactose (12.5 g/L) were prepared. The yeastolate samples (10 mL) were made up by taking a 2 mL aliquot from the particular yeastolate stock solution and adding 2 mL aliquots from eRDF, D-glucose, L-glutamine and D-galactose stock solutions. Table 7 Yeastolate sample composition with the amount of yeastolate per sample changing at a rate of 0.18 g/L while the other components have a fixed concentration. Sample No eRDF (g/L) D-glucose (g/L) L-glutamine (g/L) D-galactose (g/L) Yeastolate (g/L) M5Ye01 3.4 6.2 0.8 2.5 0.10 M5 Ye02 3.4 6.2 0.8 2.5 0.28 M5 Ye03 3.4 6.2 0.8 2.5 0.46 M5Ye04 3.4 6.2 0.8 2.5 0.64 M5Ye05 3.4 6.2 0.8 2.5 0.82 M5Ye06 3.4 6.2 0.8 2.5 1.00 M5Ye07 3.4 6.2 0.8 2.5 1.18 M5Ye08 3.4 6.2 0.8 2.5 1.36 M5Ye09 3.4 6.2 0.8 2.5 1.54 M5Ye10 3.4 6.2 0.8 2.5 1.72 78 2.14 Measurement Techniques 2.14.1 Raman Spectroscopy and SERS The Raman spectra were recorded using a Raman Station Model-Raman 400 (Avalon instruments now Perkin Elmer) equipped with a 785 nm diode laser and a thermoelectrically cooled charge coupled device (CCD) detector. The laser power was set to 100% which equates to 80 mW. Spectra were collected over the 250 – 3311 cm–1 range at a resolution of 8 cm–1. The instrumental setup allowed for two different scanning modes: line scanning and mapping. The line scanning mode was originally used but later the mapping mode was preferred as it gave a better sample representation. With line scanning or single point data collection, possible sample inhomogeneity could have led to the collection of erroneous data. This change from line scanning and mapping was also coordinated with the sample holder change from aluminium crucibles to a multi-well plate. For the M1GluR123, M3GluR1 and M5GluR1 datasets, the line scanning mode used a 3×10 s exposure time and multiple spectra were collected using a three point line scan with 0.05 mm spacing. A total of nine spectra were collected per sample and these were averaged for data analysis. For second and third data collections of M1Glu, M3Glu and M5Glu, mapping was used. For the mapping of samples a 2×10 s exposure time was used with a 3×3 grid with 0.05 mm spacing to give multiple spectra, which were averaged prior to data analysis. All samples were analysed at room temperature. For data collection of the SERS spectra, single point data collection with a 2×10 s exposure time was used to give a single spectrum per sample. 2.14.1.1 Preparation of Silver Colloid The silver colloid was prepared using the Lee and Meisel Method [282]. All glassware used for the preparation of the colloid was washed with a soap solution, rinsed with water and then was cleaned with Aqua Regia (HNO3:HCl, 1:3v/v) by filling/immersing the glassware with/in the solution for 24 hrs. After treatment with 23 R1 denotes the first collection of these datasets i.e. Run 1. 79 Aqua Regia24 the glassware was thoroughly rinsed with Millipore water to remove all traces of acid [160, 283]. For the colloid preparation, 250 mL of Millipore water, 0.045 g of silver nitrate and a Teflon coated magnetic stirring bar were put into a round bottom flask. Stirring was performed for the duration of the reaction. A reflux reaction was setup to prepare the colloid using an oil bath to maintain a constant temperature and to protect the solution from light (as silver nitrate is light sensitive). When the solution started to boil, 5 mL of a 1% Sodium Citrate solution25 was added drop-wise. The reaction flask and the oil bath were then wrapped with Aluminium foil to maintain a constant temperature and the reaction was left to reflux for one hour. A colour change of colourless to yellow to green to olive green was observed after 10– 15 mins following the addition of sodium citrate. The colour change was indicative of the colloid quality. After 1 hour of refluxing, the colloid was allowed to cool to room temperature. The absorption spectrum of the colloid was then recorded on a Shimadzu UV-1601 UV-Visible spectrophotometer to determine the plasmin band maximum (λmax). 2.14.1.2 Cosmic Ray Artefacts In some Raman and SERS spectra, sharp artefacts arising from cosmic rays were noted (Figure 38). A sharp cosmic ray spike is caused by high energy radiation resulting in a large signal in one or a few pixels which then appears as a large spike in the spectrum [284]. When the samples were re-measured the sharp peak was either absent or present at a different location. To overcome this issue, samples were re- measured to obtain spectra without the interfering cosmic rays being present. In cases of persistent cosmic peaks, since 9 spectra were collected per sample, the traces with cosmic rays were omitted and the rest of the spectra were averaged for data analysis. 24 Disposal of the Aqua Regia was performed by first diluting the acid to 10% of its original volume. The acid was then neutralised by the addition of sodium bicarbonate in small quantities until the effervescence stopped. The solution was tested with pH paper to check for neutral pH before disposal. 25 A 1% solution of Sodium Citrate was prepared by the addition of 0.05 g to 5 mL of Millipore water. 80 Figure 38 Raman spectra of a media sample with sharp artefact peaks (red and turquoise peaks) due to the presence of cosmic rays. 2.14.2 Fluorescence Spectroscopy Steady state fluorescence spectra were recorded using a Cary Eclipse Varian spectrophotometer, with two different scan modes, Excitation Emission Matrix (EEM) and Total Synchronous Fluorescence Scan (TSFS). The EEMs were measured by scanning the emission spectra from 270–600 nm with a 5 nm step and by varying the excitation from 230–520 nm also with a 5 nm step. The scan settings were a scan rate of 3000 nm/min, with an averaging time of 0.10 s. The TSFS was measured by scanning the excitation range between 230 and 520 nm with the excitation and the emission slits set at 5 nm. The delta acquisition interval was set at 5 nm, with the delta stop set at 200 nm. The Cary Eclipse was equipped with a Peltier temperature controlled multi-well sampler holder26 set to 25 oC. The cuvettes were inserted into the sample holder with a 4 mm path-length orientation for excitation.26 It allows a maximum of four samples to be analysed sequentially. 500 1000 1500 2000 2500 3000 0.5 1 1.5 2 2.5 3 3.5 x 10 4 Wavenumber (cm-1) In te n s it y Cosmic Ray Signal Spikes 81 2.15 Sample Holders Aluminium crucibles27 were used as the sample holder for Raman testing had a 50 µl volume and a 2 mm depth. Prior to use the aluminium crucibles were rinsed with distilled water, followed by three washes with 70% isopropanol and finally were thoroughly rinsed with Millipore water. The crucibles were then thoroughly dried with cotton buds wrapped in lens tissue. Sampling was carried out in the LFH, where 40 µl of sample was pipetted into the crucible. The crucible was then placed on the Raman sample stage for testing. The testing procedure was altered during the Raman experiments to improve the sampling process. The first run for each dataset was carried out using aluminium crucibles and the second and third data collections were carried out using the stainless steel 96 well plate sample holder. The changes to the measurement setup were noted as this affected the spectral data. This change also altered the sample volume and the sampling speed. The reason for the development of the stainless steel plate was to improve the sampling method capability for high throughput screening, as it allows for multiple samples to be tested in quick succession. The 96-well stainless steel plate increased the number of samples for consecutive analysis, leading to a greater number of samples being tested per day. The maximum sample volume per well was 200 µl and for the analysis a sample volume of 100 µl was used. [8] In-house development of an electropolished stainless steel 96-well plate facilitated the replacement of the aluminium crucible for sampling.28 [285] Prior to use the plate was rinsed with distilled water followed by washing with 70% isopropanol and a final rinse with Millipore water, after which it was then dried with cotton buds wrapped in lens tissue. Each well of the plate was airbrushed using a vacuum pump to remove any residual fibres. A 100 µl aliquot of sample was pipetted into a well. The well plate was then placed into the Raman stage for analysis. 27 Aluminium crucibles were supplied by Thorn Scientific Services Ltd UK. 28 The Aluminium crucibles were designed for single use and were subject to damage during cleaning due to the light structure. Problems occurred with the sides of the crucibles, which caved in if too much pressure was applied. 82 For SERS analysis 50 µl of sample solution and 50 µl of silver colloid were pipetted into a well. The sample was mixed five times before testing by re-suspending the sample colloid mixture. Quartz cuvettes29 were used for the fluorescence analysis. Quartz cuvettes were rinsed with Millipore water, followed by five washes with 70% isopropanol before thoroughly rinsing with Millipore water. Cuvettes were dried in an oven set at 60 oC and allowed to cool in the LFH before sampling. The sample solution was added to the cuvette. The cuvette was stoppered and parafilm was used to secure the stopper in place. The outside of the cuvette was wiped down with lens tissue before measurement. At the end of each week of testing, the cuvettes and stoppers were rinsed with Millipore water and left to soak in 30% nitric acid over the weekend (~2.5 days). This ensured a thoroughly clean quartz surface. 2.16 Specific Chemometric Procedures 2.16.1 Baseline Offset Correction A Matlab routine utilizing an auto-level method was used for baseline correction of Raman spectra. This was a weighted least squares method. It automatically determined points that represented baseline alone allowing for removal of the baseline offset. It worked by repeatedly fitting a baseline to each spectrum and the variables were divided into groups for above and below the baseline. The points below the baseline were ranked important in establishing the baseline as the points above the baseline represented the sample signal. [232] 2.16.2 Water Elimination The Raman data was dominated by a large water signal. Water formed a major contribution to the background which interfered with the visibility of the analyte signal. After baseline correction of the Raman data, the next step was to reduce the high background signal. This was done using an in-house developed Matlab method to subtract the water signal from each spectrum in the dataset. [82, 286] 29 Quartz cuvettes were supplied by Lightpath Optical (UK) Ltd. 83 2.16.3 Water to Analyte Ratio This water-to-analyte ratio (WAR) was calculated using an in-house Matlab routine to reflect the water per sample. This code calculated the WAR per spectrum based on the residuals generated from the difference between the Savitzky–Golay smoothed version of a spectrum and the test spectrum. Since this function worked on a spectrum-by-spectrum basis and for WAR of a full dataset, the mean of the individual WAR values was taken. 2.16.4 Model Evaluation Settings The following factors were assessed to express the performance of the various models; correlation co-efficient, associated error, and number of LVs. The combination of parameters provided a better guide to the linearity and strength of the model. It also prevented over-fitting30. The correlation co-efficient provided information on the strength of the correlation between the spectral data and the analyte and is used as an identifier for strong models. Associated error was evaluated using a combination of parameters including: root mean square error of calibration (RMSEC), root mean square error of cross validation (RMSECV), percentage error and the ratio of RMSECV to RMSEC. The number of latent variables was selected from the captured variance in the RMSEC and RMSECV (Figure 39). When RMSEC and RMSECV reached the first local minimum, the number of variables at this point was selected. 30 Over-fitting occurred when unnecessary latent variables were used to overly explain variance and noise amongst the spectra. This resulted in a restricted calibration model. 84 Figure 39 Plot of Variance captured per latent variable versus RMSEC and RMSECV. The relative error of prediction for the calibration models was calculated from RMSECV and mean value for the analyte concentration (yconc). Equation 2-21 𝑅𝐸𝑃% = 𝑅𝑀𝑆𝐸𝐶𝑉 𝑦𝑐𝑜𝑛𝑐̅̅ ̅̅ ̅̅ ̅̅ 𝑥 100 If the ratio of SECV to SEC was above 3, the data was deviating from the model and being over-fitted. [6] It is the right balance of these elements that gives models the potential to be useful in the prediction of other samples. 2 4 6 8 10 12 14 16 18 20 0 5 10 15 Latent Variable Number R M S E C V , R M S E C RMSECV RMSEC 85 2.16.5 Chemometric Workflow Overview Spectroscopic methods provide a large amount of data, in order to interpret this data the chemometric methods covered in this chapter were utilised. Both qualitative and quantitative assessment of the data was performed. For Raman, SERS and fluorescence data, the PCA method helped visualise the data for reproducibility testing and outlier detection. The fluorescence data was further examined for outliers using ROBPCA. But this method proved to be too sensitive for the same sample number used here. The fluorescence emission in the EEM data was generated by multiple, different fluorophores. MCR and PARAFAC were used identify the fluorophores in the samples and assess how their emission varied as composition was changed. The MCR method was better suited than PARAFAC to the EEM data from these complexmedia samples because of IFE introducing non-linearities into the data. The development of quantitative models to predict component concentration was performed after the data was qualitatively assessed and judged to be reproducible and outliers removed. PLS regression was used and pre-processing was performed to enhance the analyte signal and remove variations in the data that were not relevant to the analyte signal. Quantitative models were built for D-glucose (Raman), yeastolate (Raman, SERS and Fluorescence) and eRDF (Raman, SERS and Fluorescence). The performance of the models varied according to the analyte signal quality for each technique and sample type. 86 3 Development using Raman Spectroscopy for the Analysis of Cell Culture Media Components This chapter covers the investigative work carried out to demonstrate the feasibility of using Raman spectroscopy for the qualitative and quantitative analysis of complex aqueous cell culture media. Raman spectroscopy was used to quantify the main source of carbon and largest media component, D-glucose, as well as the more complex media components (eRDF and yeastolate) in a model media system. This is of interest in bioprocess monitoring because it is important to track the concentration of media components, as this directly affects the metabolism of cells and influences production yield. 3.1 Spectral Analysis When compared to the other cell culture media components made up at their working concentration in water, D-glucose had the strongest and most defined Raman spectrum (Figure 40). It was typically present in media at the highest concentration (~44% of the solid formulation weight). However, water had the biggest influence on the spectrum for all of the sample sets. This was indicated by two broad bands at 1364 and 1640 cm–1 and an intense band above 3100 cm–1. It was not surprising since water represented ~93% of the overall quantity of matter in the various samples. As for the other components, eRDF had only trace levels of detail that offered some quantitative information while the Raman spectra of yeastolate, L-glutamine and D-galactose were too similar to the water spectrum to relay any quantitative information (Figure 40). D- glucose gave the only signal with a significant level of detail because of its intrinsically high working concentration in media. Therefore it was logical to assume its quantification in media using Raman spectroscopy should be straight-forward. Exploiting this information led to the D-glucose based datasets made up for the development of a Raman method; see Table 2, Table 3 and Table 4 in Chapter 2, Section 2.12. 87 Figure 40 An overlay of the Raman spectra of aqueous solutions of eRDF (17 g/L), D-glucose (31 g/L), D-galactose (12.5 g/L), L-glutamine (4 g/L) and Yeastolate (5 g/L), the components used to formulate the cell culture media. The concentrations are those used in the final media formulation. The spectrum has been enlarged to highlight the weaker peaks. Figure 41 Raman spectra of an aqueous solution of D-glucose (49.6 g/L). Figure 41 shows the peaks contained in a concentrated M1Glu sample (49.6 g/L) spectrum. The top of the OH stretching band beyond 3000 cm–1 was omitted. The other OH band present is the strong OH bending band at 1640 cm–1 which obscures the carbonyl (C=O) group which would be seen at 1620–1680 cm–1. Several peaks 500 1000 1500 2000 2500 3000 2 4 6 8 10 12 x 10 4 Wavenumber (cm-1) In te n si ty (a) D-Glucose (b) eRDF (c) Yeastolate (d) L-Glutamine (e) D-Galactose (a) (b) (c) (d) (e) 88 for D-glucose can be assigned. The low wavenumber peaks 426 cm–1 and 514 cm–1 result from skeletal deformation by exo and endocyclic CCO, CCC, COC and OCO bending modes. The peaks seen at the high wavenumber 2902 cm–1 and 2950 cm–1 are the asymmetric and symmetric stretching of CH2 and CH3 groups respectively. In the fingerprint region the strongest peak at 1123 cm–1 can be assigned to a C–C stretching vibration along with the peak at 1067 cm–1 which is either a C–C stretching of ring vibration or molecular backbone. CH bending in the form of CH3, CH2 and CH deformation gives rise to the 1460 cm–1 peak. The symmetric CH3 deformation (CH twisting) results in the peak at 1373 cm–1 which is stronger than the other broad OH band seen at 1364 cm–1. The peaks at 843 cm–1 and 915 cm–1 can be assigned to the vibrations of the glycosidic bonds and sugar linkages [287-290]. As the datasets get more complex (M1Glu to M5Glu), the bands appear weaker due to lower overall concentration. In all cases multivariate analysis is required to extract information from the Raman spectra with a strong water signal. 3.1.1 Averaged Aqueous D-glucose (M1Glu) Data For each sample, nine spectra were collected and averaged to form the raw data. Figure 42 shows the averaged spectra of the triplicate measurements (M1GluR1, M1GluR2 and M1GluR3) of the M1Glu dataset (aqueous solution of D-glucose with concentration ranging from 1.6 g/L to 49.6 g/L). Above 3000 cm–1 was the OH stretching band of water which varied in intensity because of the different sample concentration and experimental setup. The M1Glu Raman spectra also showed a sloping baseline from 300–2500 cm–1. The spectra of the M1GluR1 data had a greater baseline offset in comparison to the M1GluR2 and M1GluR3 data. This was attributed to two changes in the experimental setup: a) The M1GluR1 data was collected using aluminium crucibles instead of the 96- well stainless steel plate used for the M1GluR2 and M1GluR3 data. b) For the M1GluR1 samples, line scan data collection setup was used. The scanning mode was changed from line scanning to mapping for the latter collections as it reduced baseline offset. 89 Figure 42 Averaged Raw Raman spectra of the (a) M1GluR1, (b) M1GluR2 and (c) M1GluR3 data (Table 2). The baseline variation resulted from the different depths and surface finishes of the sample containers. It affected the performance of data when used for qualitative and quantitative analysis until corrective action was taken. In order to ensure consistent data collection the collimation of the radiation from the samples required focusing. If the radiation was diffusely scattered, the level of collimation decreased and this led to an increase in the stray light resulting in more scatter in the spectra. A way to prevent scatter was to have the focal depth centred over the middle of the sample. This would help to avoid scatter from the container surface, which if present would contributed to the baseline offset [291]. 3.1.2 Baseline Offset Correction of the Aqueous D-glucose (M1Glu) Data Figure 43 shows the results of baseline offset correction on the averaged M1Glu spectra. The extensive difference observed between Figure 43a and Figure 42a confirmed that the aluminium crucibles used for data collection were poor sample 500 1000 1500 2000 2500 3000 0.5 1 1.5 2 2.5 x 10 5 Wavenumber (cm-1) In te n s it y (a) 500 1000 1500 2000 2500 3000 2 4 6 8 10 12 x 10 4 Wavenumber (cm-1) In te n s it y (b) 500 1000 1500 2000 2500 3000 2 4 6 8 10 12 x 10 4 Wavenumber (cm-1) In te n s it y (c) 90 containers and caused considerable background offset. Smaller spectral changes were seen with the baseline improved M1GluR2 and M1GluR3 data after offset correction. The baseline correction worked reasonably well, however it only removed the offset and a gradient effect was still visible below ~ 1000 cm–1. Other methods are able to deal with this type of gradient but they were not suitable in this case because of the weak signal and the strong water band. Here, the removal of the offset was more important since it was an unwantedsource of variation. The spectral region above 3000 cm–1 was a considerable source of variance as it was dominated by the OH band. Factors such as shot noise, detector quantum yield variation, sample placement and concentration effects caused the large variance in this region [3]. This spectral variance was not removed by the baseline offset correction therefore further action was required. Figure 43 Baseline Corrected Raman spectra of the (a) M1GluR1, (b) M1GluR2 and (c) M1GluR3 data. 3.1.3 Water Background Elimination of the Aqueous D- glucose (M1Glu) Data Spectral interference such as varying background caused by fluorescence or instrumental noise like dark and shot can hinder qualitative and quantitative analysis 500 1000 1500 2000 2500 3000 0 0.5 1 1.5 2 x 10 5 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 x 10 4 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 0 2 4 6 8 10 12 x 10 4 Wavenumber (cm-1) In te n s it y (a) (c) (b) 91 of chemical information. In the Raman spectra, water was the major signal as it was the largest component in cell culture media. This, however, had adverse effects on modelling weaker signals. The water bending bands (1636 and 1364 cm–1) overshadowed the signal in a large portion of the fingerprint region making specific analyte identification impossible. It was assumed that if the water signal was removed then the analyte signal should become clearer. However, after subtracting the pure water signal from the sample spectra, the resulting residual spectra contained noise as well as the D-glucose signal [292, 293]. Figure 44 (a) Raman spectra of M1GluR1 (49.6 g/L D-glucose) sample, water and a subtracted spectra. (b) M1GluR1, (c) M1GluR2 and (d) M1GluR3 are the water eliminated Raman spectra for the different datasets. The M1Glu and water spectra were both baseline corrected prior to water elimination. After water elimination (Figure 44), the analyte Raman bands were clearly visible and easier to identify and interpret. This was particularly important when analysing samples which displayed subtle changes. A drawback of water elimination was the introduction of artefacts such as negative peaks, baseline shift and enhanced noise. This was due to the variance amongst the spectra of the samples and water. It was also evident that each data collection series (R1, R2 and R3) were affected differently by water elimination. For example, above 3000 cm–1, the noise varied between replicate 500 1000 1500 2000 2500 3000 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x 10 4 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -2000 0 2000 4000 6000 8000 10000 12000 14000 Wavenumber (cm-1) In te n s it y (c) (b) 500 1000 1500 2000 2500 3000 0 0.5 1 1.5 2 2.5 x 10 5 Wavenumber (cm-1) In te n s it y M1R1 Sample Water Difference (a) 500 1000 1500 2000 2500 3000 -2000 0 2000 4000 6000 8000 10000 12000 14000 16000 Wavenumber(cm-1) In te n s it y (d) 92 runs. The M1GluR1 and M1GluR3 samples displayed similar artefacts. For instance at 600–900 cm–1, they both showed downward facing bands, and in the 1500–2000 cm–1 region, where the OH bending band was, the spectra showed increased levels of noise and baseline offset. For M1GluR2, water elimination introduced an increase baseline offset across the data. 3.2 Reproducibility of Raman Data Collection When the data collection is optimal, the samples should plot as a straight line along a single principal component representing the variance caused by the D-glucose concentration gradient. Changes and deviations in sampling represent sources of variation (experimental setup, power fluctuations, noise and sample preparation faults, etc.). This can have significant impact on the data quality and reproducibility. PCA was performed to provide a simple overview of the variance within the data. The averaged raw M1Glu spectra from the replicate measurements were amalgamated for comparison in order to assess data reproducibility. Sample grouping and data collection reproducibility were evaluated (Figure 45). During this study, changes in the experimental setup (sample container and scanning mode) led to differences in data collection. The scores plots (Figure 45), highlighted that the second and third runs - which were collected under the same conditions - overlapped, indicating reproducible data collection. The first run proved anomalous, however, due to the different data collection setup used. Sample variance was an issue with these media samples. It was clear from the scores plots that changing the setup minimised the measurements variance as seen by the tighter grouping of the M1R2 and M1R3 samples [51, 66, 294]. 93 Figure 45 Scores plots and loadings (L1, L2 and L3) of PC1, PC2 and PC3 for amalgamated averaged raw M1Glu data. The black circles represent run1, red triangles represent run2 and the green asterisks represent run3. The blue circle represents 95 % confidence level of explained variance. In Figure 45, the scores along PC3 for all the data collection runs showed the expected near linear variation as a result of the changing analyte concentration. Indeed the corresponding third loading contained peaks relating to the D-glucose. The other two loadings represented the water signal and the baseline offset, respectively. The large variability of run one measurements along PC2 was caused by the use of aluminium crucibles which resulted in a strong baseline shift. Moreover, the instrumental effects associated with line scanning were greater than those of the mapping mode (this was the reason why the measurement protocol was changed from line scanning to mapping). When baseline offset correction was performed and PCA was repeated, the number of components was reduced to two with the first representing the water signal and the second showing the analyte signal. Even after pre-processing, line scanning and mapping samples did not overlap. This made them incompatible when combined for modelling (data not shown but it was evident from the PCA results). -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x 10 6 -4 -3 -2 -1 0 1 2 x 10 5 Scores on PC 1 (98.21%) S co re s on P C 2 ( 1. 74 % ) -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x 10 6 -8 -6 -4 -2 0 2 4 6 8 x 10 4 Scores on PC 1 (98.21%) S co re s on P C 3 ( 0. 04 % ) -4 -3 -2 -1 0 1 2 x 10 5 -8 -6 -4 -2 0 2 4 6 8 x 10 4 Scores on PC 2 (1.74%) S co re s on P C 3 ( 0. 04 % ) 500 1000 1500 2000 2500 3000 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Wavenumber(cm-1) In te n si ty L1(98.21%) L2(1.74%) L3(0.04%) 94 A full data collection of line scanning samples was carried out on M1Glu, M3Glu and M5Glu, prior to the switch to the new experimental setup. The advantage of line scanning was the faster collection time. However, from the PCA results, the M1GluR1 data should be treated separately as: a) it did not match the R2/R3 data b) the variance of M1GluR1 samples was much greater as seen by the scores pattern in Figure 45. 3.3 Evaluation of Spectral Range Variable selection was used to improve quantitative modelling. It was thought that it may lead to better results due to the removal of areas that contain irrelevant information31, therefore placing the focus on the more informative regions of the data. Eliminating these irrelevant regions can lead to improved calibration models and smaller residual levels. The selection of spectral ranges by the operator is subjective and highly dependent on expertise. One may inadvertently discard a region with useful information [225, 295]. Numerical based methods for variable selection are available.One such method is Moving Window Partial Least Squares (MWPLS) which highlights the regions in the spectra that contribute to prediction accuracy and eliminate areas with high levels of uncertainty [271-273]. From the MWPLS results, a residual line plot of downward facing bands was observed (Figure 46). The bands represented areas rich in signal variation while the flat sections showed the areas of limited information. The five residual lines for the five principle components were used to determine the error level. MWPLS was performed on the M1Glu, M3Glu and M5Glu sample sets. The residual lines were clearer in the M1Glu dataset due to the stronger signal and the simple sample composition. In the region below 650 cm–1, there were scatter contributions that appeared as large sloping baseline variances. This region was compromised by Rayleigh light leakage from the filters, therefore the 250–600 cm–1 region was omitted. For the M1Glu data, the first informative band selected by MWPLS was at 31 Areas that contain interference effects - from the diffuse light scatter and noise from the hardware used - are removed. 95 818–1676 cm–1. This represented several groups from CH3 and CH2 deformations, C– O and O–O groups, C–N, C–C, and C=C stretching bands etc. The second informative band at 2774–3159 cm–1 represented C–H and O–H stretching modes. There were similar informative areas for M3Glu (802–1612 cm–1, 2798–3151 cm–1) and M5Glu data (826–1596 cm–1, 2814–3167 cm–1). Since the MWPLS results showed similar downward bands for all three datasets, the regions 800–1680 cm–1 and 2770–3170 cm– 1 were chosen for data analysis. In previous research by the NBL laboratory on cell culture media by Raman spectroscopy, [3, 6, 8] spectral regions were selected for chemometric analysis and the most significant bands were observed in the 707–1853 cm–1 region (Figure 47). The expected bands associated with the media components were observed in this region and it was also selected for data analysis. Figure 46 MWPLS Error Plot of Log (SSR) versus Raman Shift (cm–1) of the (A) M1Glu samples, (B) M3Glu samples and (C) M5Glu samples. In the preliminary data analysis, the 2774–3174 cm–1 spectral range was modelled for the M1Glu (Table 47) and M3Glu (Table 48) sample sets. The performance with the M1Glu data showed good correlation to D-glucose concentration and performed at a similar level to the other two reduced region models. Whilst for the M3Glu sample set, the models were weaker. The other reduced regions (800–1680 cm–1 and 707– 1853 cm–1) of M3Glu data outperformed the 2774–3174 cm–1 region. The reasons for 500 1000 1500 2000 2500 3000 0 0.5 1 1.5 2 Wavenumber (cm-1) Lo g [S R R ] 500 1000 1500 2000 2500 3000 1 1.5 2 2.5 3 3.5 Wavenumber (cm-1) Lo g [S R R ] 500 1000 1500 2000 2500 3000 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 Wavenumber (cm-1) Lo g [S R R ] (A) (C) (B) 96 poor performance came from the weak analyte signal together with the strong water band bordering this region which diminished the correlation to D-glucose seen in the high concentration M1Glu samples. Another cause of the poor performance of the 2774–3174 cm–1 region was the high levels of variance caused by instrumental factors such as lower detector sensitivity above 1000 nm and higher detector noise. Thus, after preliminary modelling, the 2774–3174 cm–1 region was omitted from further analysis. Figure 47 Raman spectra of the 49.6 g/L D-glucose aqueous solution over the full spectra range (250–3311 cm–1) and inset is the selected region of interest, 707–1853 cm–1. 3.4 Calibration Modelling The objective of calibration modelling was to establish a relationship between the known concentration of the analyte, e.g. D-glucose, and the spectral data. The reason for evaluating multiple calibration models was to establish the optimal linear relationship between the Raman spectra and the D-glucose concentration. With the correct calibration model, it should be possible to predict the concentration of unknown media samples with high accuracy. Assessment of the calibration models for M1Glu sample set using the averaged data, baseline corrected data and water eliminated data over the full region and in the reduced spectral ranges (707–1853 cm– 1 and 800-1680 cm–1) was performed. The reduced regions focused on the fingerprint region which contained a wide range of vibrational modes that were useful for media 97 component analysis. More models were constructed following pre-processing to further improve the calibration results. Table 8 Spectral areas selected for model generation using the Raman Data. Region ID Wavenumber Region (cm–1) Full 250–3300 cm–1 Reduced Region (ROI) 707–1853 cm–1 MWPLS Region 800–1680 cm–1 All preliminary models built with the averaged raw data were able to correlate the Raman spectra to the D-glucose concentration (Figure 48). Comparison of M1GluR1 and M1GluR2 revealed how the changes in sampling affected the reproducibility of the data. The new experimental setup improved the RMSECV for the M1GluR2 data. There was an improvement of 37% for averaged data, 47% for the baseline corrected and 27% for water eliminated data over the M1GluR1 data. In the raw data calibration model of M1GluR3 (Figure 48c), a weaker performance was seen compared to the M1GluR1 and M1GluR2 models. When the raw data was investigated, the first 18 samples collected were separated by a spectral offset from the remaining 14 samples that were collected later that day. The samples were tested in a random order and variation was therefore seen randomly throughout the dataset. This variation was most likely due to minor changes during sampling, such as the temperature, as these samples were kept at room temperature for longer before measurement. This type of change was only observed in this dataset, but it can have a major impact on data quality and therefore model performance. In this case the spectral offset variance was removed by pre-processing. Figure 48 Predicted versus Measured D-glucose concentration plots for the calibration models for the averaged data using full range for (a) M1GluR1, (b) M1GluR2 and (c) M1GluR3 sample sets. 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 Measured D-Glu Conc [g/L] P r e d ic te d D -G lu C o n c [ g /L ] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31R2 = 0.986 3 Latent Variables RMSEC = 1.7048 RMSECV = 1.8654 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 Measured D-Glu Conc [g/L] P r e d ic te d D -G lu C o n c [ g /L ] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 R2 = 0.996 3 Latent Variables RMSEC = 0.95139 RMSECV = 1.142 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 Measured D-Glu Conc [g/L] P r e d ic te d D -G lu C o n c [ g /L ] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 R2 = 0.979 3 Latent Variables RMSEC = 2.0663 RMSECV = 2.3858 (a) (b) (c) 98 For the M1GluR2 sample set, the preliminary models are shown in Table 9. For the averaged data, the performance of all three regions was similar but the reduced range used a smaller number of variables compared to the full dataset. The calibration models improved with baseline correction and marginally worsened with water elimination due to artefacts introduced by the signal elimination process. By using reduced regions with water eliminated data, some of the artefacts were avoided and model performance improved. Watereliminated data produced reasonable models in the preliminary study, but the process also introduced artefacts into the spectra. Further analysis was required to determine the severity of these artefacts. The PLS loadings from the calibration model generated using baseline corrected M1GluR2 data are shown in Figure 49. The first loading resembled the pure water signal and represented 99.88% of the explained variance in the data. The second and third loadings showed the baseline offset seen in the data. They both represented the D-glucose signal. Together they accounted for 0.11% of the explained variance but differed at 1450 cm–1 and 1367 cm–1; these bands originated from the asymmetric and symmetric stretching of CH3 respectively. The offset intensity in the second and third loadings suggested a high water signal and a low water signal as a result of hydrophobic interactions. The second loading reflected the high concentration samples while the third loading represented the low concentration samples and background noise in an aqueous environment [296, 297]. Figure 49 Loadings plot from the calibration model for baseline corrected M1GLUR2 data over the full range. 500 1000 1500 2000 2500 3000 -0.05 0 0.05 0.1 0.15 0.2 0.25 Wavenumber(cm-1) Lo ad in gs L1(99.88%) L2(0.10%) L3(0.01%) 99 Table 9 Models generated from M1GluR2 data after preliminary pre-processing. M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg Full (250–3311 cm–1) 3 0.996 0.95 1.14 4.45 Avg ROI (707–1853 cm–1) 2 0.994 1.13 1.23 4.80 Avg MWPLS (800–1680 cm–1) 2 0.995 1.02 1.11 4.33 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC Full (250–3311 cm–1) 3 0.996 0.89 1.02 3.98 BC ROI (707–1853 cm–1) 3 0.997 0.75 0.86 3.35 BC MWPLS (800-1680 cm–1) 3 0.998 0.62 0.72 2.81 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE Full (250–3311 cm–1) 2 0.996 0.97 1.09 4.25 WE ROI (707–1853 cm–1) 3 0.998 0.62 0.74 2.89 WE MWPLS (800–1680 cm–1) 3 0.998 0.64 0.76 2.96 3.5 Spectral Pre-Processing of M1Glu Sample Set With preliminary data treatment (offset correction and water elimination), good models were obtained. With further pre-processing it was possible to increase the accuracy to improve the models. Different pre-processing methods and their combinations were applied. Multiplicative scatter correction (MSC) and normalisation were used to correct intensity differences due to sampling. The first order derivative (FD) was used to remove baseline differences and clarify analyte signal. The M1GluR2 sample set was chosen to show the effects of using the different pre- processing methods. Also the influence of spectral region selection on pre-processing was investigated. 3.5.1 Pre-area and Post-area Selection for Spectral Pre- Processing Firstly the issue of carrying out region selection prior or post pre-processing was evaluated. Both ways were modelled and the results recommended that region selection should be performed after pre-processing. When undertaking spectral pre- processing, it was preferable to use the full spectral region. Then the dataset was truncated into smaller regions to exclude any end of range artefacts. Figure 50 shows end of range artefacts following derivative pre-processing where the first and last data points of the spectra were altered. 100 Figure 50 Overlay of the first derivative pre-processing of M1Glu samples with end of range artefacts. The blue traces used post area selection and the black traces used pre-area selection. 3.5.2 Multiplicative Scatter Correction of M1GluR2 Data Figure 51 Effects of MSC on spectra from M1GluR2 data in the ROI range (707–1853 cm–1), (a) Averaged Raw spectra, (b) after MSC. The Raman spectra of the M1Glu before and after MSC pre-processing are displayed in Figure 51. The additive/multiplicative effects observed in the raw data were reduced by MSC and the peaks were clarified. Table 10 outlines the calibration performance of M1GluR2 data after MSC for the D-glucose modelling. Similar results were obtained for the models using data before and after baseline correction. The best MSC model was built on averaged data using the 707–1853 cm–1 region. It used two LVs and gave an accuracy of roughly 1.53% REP. This model demonstrated an 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -1000 0 1000 2000 3000 4000 5000 6000 7000 8000 Wavenumber(cm-1) In te n s it y 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -1000 0 1000 2000 3000 4000 5000 6000 7000 Wavenumber(cm-1) In te n s it y (a) (b) 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -600 -400 -200 0 200 400 600 800 Wavenumber(cm-1) In te n si ty End of Range Artefacts 101 improved correlation for D-glucose concentration compared to the preliminary pre- processing models in Table 9. The MSC water eliminated (WE) models were generally poor as seen in Figure 52a. There were two distinct regions in the predicted versus expected plot for the high and low concentration samples. This indicated a big difference in how the glucose interacted with water at high and low concentrations. In the Hotelling’s T2 vs Q residuals plot (Figure 52c), there were several samples outside the 95% confidence limit. Low concentration water eliminated samples were excessively modified by MSC and were cast as outliers. This behaviour suggests the involvement of hydrophobic interactions due to grouping of the high and low water samples [296, 297]. The PC1 vs PC2 scores plot gave a linear response with decreasing water content, confirming the different hydrophobic behaviours seen in the samples (Figure 52d). Another explanation was that as glucose concentration increased the density and the refractive index of the sample changed. This could affect the Raman spectra, as the low concentration samples behaved differently to the high concentration samples after water elimination. A conclusion was drawn that accurate linear glucose models can only be constructed over a small concentration range. Further analysis of the MSC WE PLS loadings (Figure 52b) showed that the major signal was from D-glucose, with the first loading containing peaks attributed to D- glucose and describing 85.5% of the explained variance. With MSC, the signal was corrected to have a reduced level of scatter. Therefore the MSC WE data contained the enhanced analyte peaks as well as the water removal artefacts. This inadvertently introduced more noise into the data, with the second and third loadings accounting for almost 10% of the signal variance and mainly describing the water artefacts. The second loading (8.97%) represented the water removal artefacts present in the 1200– 2000 cm–1 region and the region beyond 3000 cm–1. The third loading described 0.95% of variance and its main contribution was the noise present beyond 3000 cm–1. 102 Figure 52 (a) Relationship between expected and predicted D-glucose content for WE MSC pre- treated M1Glu sample set, (b) the loadings plots of the three PCs, (c) Hotelling’s T2 vs Q residuals for water eliminated M1GluR2 data after MSC, and (d) the PC1 vs PC2 scores plot with the high concentration samples in black and the low concentration samples in red. Table 10 Calibration Evaluation for Multiplicative Scatter Correction on the M1GLUR2 data.32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg Full 2 0.998 0.56 0.60 2.34 Avg ROI 2 0.997 0.71 0.75 2.92 Avg ROI (1) 3 0.999 0.35 0.39 1.52 Avg MWPLS 2 0.998 0.68 0.71 2.77 Avg MWPLS (1) 2 0.997 0.77 0.82 3.20 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC Full 2 0.998 0.55 0.59 2.30 BC ROI 2 0.998 0.55 0.60 2.34 BC ROI (1) 2 0.998 0.55 0.60 2.34 BC MWPLS 2 0.998 0.60 0.65 2.53 BC MWPLS(1) 2 0.998 0.60 0.65 2.53 M1GluR2LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE Full 3 0.619 8.83 11.98 46.79 WE ROI 2 0.245 12.56 14.31 55.89 WE ROI (1) 3 0.494 10.17 12.57 49.10 WE MWPLS 2 0.244 12.55 14.26 55.70 WE MWPLS (1) 3 0.454 10.57 12.30 48.04 32 ROI (1) and MWPLS (1) signify that pre-processing was carried out on the reduced area of the spectra. No bracket represents area selection on data pre-processed on the full range. For ease of interpretation of the table, the best models will be highlighted in grey. 0 5 10 15 20 25 30 35 40 45 50 -20 -10 0 10 20 30 40 Expected (g/L) P re di ct ed ( g/ L) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 R 2̂ = 0.619 3 Latent Variables RMSEC = 8.8302 RMSECV = 11.982 (a) 500 1000 1500 2000 2500 3000 -0.3 -0.2 -0.1 0 0.1 0.2 Wavenumber(cm-1) In te n si ty L1(85.5%) L2(8.97%) L3(0.95%) (b) 0 2 4 6 8 10 12 14 x 10 7 0 5 10 15 20 25 30 Q Residuals (1.46%) H ot el lin g T ^2 ( 98 .5 4% ) 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 22 31 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 7 0 1 2 3 4 5 6 7 8 9 Q Residuals (0.65%) H ot el lin g T ^2 ( 99 .3 5% ) 1 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 2 4 6 8 10 12 x 10 5 0 1 2 3 4 5 6 7 8 Q Residuals (2.84%) H ot el lin g T ^2 ( 97 .1 6% ) 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 5 10 15 20 25 30 Q Residuals (1.91%) H ot el lin g T ^2 ( 98 .0 9% ) 1 2 3 4 5 6 7 8 9 10 11 12 15 16 22 27 29 31 (c) 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 x 10 4 -6 -5 -4 -3 -2 -1 0 1 2 3 4 x 10 4 Scores on PC 1 (85.65%) S co re s on P C 2 ( 8. 91 % ) (d) 103 3.5.3 Normalisation of M1GluR2 Data Normalisation corrects for variation in signal intensity due to experimental setup. Four different modes of normalisation were assessed here for enhancement of calibration performance over the full and reduced ranges. Figure 53 shows the spectral profiles of the normalised data using the different methods (Norm1, Norm2, NormINF, and NormOH see section 2.6.4). Figure 53 Effects of normalisation on M1GluR2 data (a) after NORM1 pre-processing, (b) after NORM2 pre-processing, (c) after NORMINF pre-processing, and (d) after NORMOH pre- processing. Normalisation improved the models built using the averaged raw data (Table 9). The calibration results (Table 11-Table 14) showed that the best model was obtained using Norm2 which gave a correlation co-efficient of 0.999. Model performances only varied slightly with the different normalisation methods, with Norm1 being the weakest. Figure 54a showed the PLS loadings for the raw (averaged) M1GluR2 data after Norm2 pre-processing; similar loadings resulted from the other normalisation methods. The first loading revealed, as expected, that the water signal was the dominant feature in the data while the second loading was the D-glucose signal and described less than 0.1% of the total variance. 500 1000 1500 2000 2500 3000 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 Wavenumber(cm-1) In te n si ty 500 1000 1500 2000 2500 3000 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Wavenumber(cm-1) In te n si ty 500 1000 1500 2000 2500 3000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Wavenumber(cm-1) In te n si ty 500 1000 1500 2000 2500 3000 1 2 3 4 5 6 7 8 Wavenumber(cm-1) In te n si ty (a) (b) (c) (d) 104 Figure 54 (a) shows the loadings plots of the two principal components from raw (averaged) M1GluR2 with Norm2 pre-processing and (b) the Water Eliminated M1GluR2 spectra with Norm 2 pre-processing. Figure 54b showed the water eliminated spectra with Norm2 pre-processing. The drawback of normalising water eliminated data was the increased noise artefacts and baseline offset. These had adverse implications for the calibration modelling. The normalisation methods used were not able to handle the water eliminated data (Figure 55a). After water elimination, three components described the data, see Figure 55b. The first and second loadings represented the D-glucose signal. The difference in these loadings showed the different levels of interaction between the water and the D- glucose at high (12.8–49.6 g/L) and low (1.6–19.2 g/L) concentrations. The normalised and MSC data behaved in a similar fashion. All the WE spectra suffered from an offset due to the changing concentrations and differing optical properties of the sample (Figure 54b). The third loading represented the signal from the low concentration samples and noise. Because the water signal was so strong in the low concentration samples, the water elimination had a bigger impact and the resulting spectra had more noise than detail compared to the high concentration samples. This was seen in the data where the first four samples were noisy and the D-glucose signal came through in the fifth sample (Figure 56a). In the Hotelling’s T2 vs Q residuals plot (Figure 55c), the low concentration samples showed more unexplained variance than the high concentration samples. The variance amongst the samples was also seen in the scores plot where the high concentration (black) samples were clustered together while the low concentration (red) samples were scattered due to their increased variability. Comparing these with the remaining high concentration samples showed the different groupings where the high concentrations were clustered together and the low concentration samples were dispersed (Figure 55d). It may then be 500 1000 1500 2000 2500 3000 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Wavenumber(cm-1) In te n si ty (b) 500 1000 1500 2000 2500 3000 -0.05 0 0.05 0.1 0.15 0.2 0.25 Wavenumber(cm-1) In te n si ty L1(99.89%) L2(0.09%) (a) 105 concluded that normalisation was adequate when the peak positions remained relatively constant across the samples. However, when specific changes affected the spectra, such as the noise laden spectra of the low concentration samples, normalisation failed. Figure 55 (a) Relationship between expected and predicted D-glucose content for WE Norm2 pre- treated M1Glu sample set, (b) the loadings spectrum of the three PCs, (c) Hotelling’s T2 vs Q residuals for H2O Eliminated M1GluR2 data after Norm2, (d) the scores plot for PC1 vs PC2, with the high concentration samples in black and the low concentration samples in red. Figure 56 Water Eliminated M1GluR2 spectra with Norm2 pre-processing (a) first five samples and (b) a selection of samples from M1GluSO1 to M1GluS31. 0 5 10 15 20 25 30 35 40 45 50 -20 -10 0 10 20 30 40 Expected(g/L) P re d ic te d (g /L ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 R2 = 0.702 3 Latent Variables RMSEC = 7.8161 RMSECV = 9.7818 (a) 500 1000 1500 2000 2500 3000 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Wavenumber(cm-1) In te n s it y L1(90.99%) L2(5.82%) L3(0.98%) (b) 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 -1 -0.5 0 0.5 1 1.5 Scores on PC 1 (91.01%) S c o re s o n P C 2 ( 5 .9 5 % ) 1 2 3 4 5 6 7 8 10 12 15 18 21 27 28 30 31 0 2 4 6 8 10 12 14 x 10 7 0 5 10 15 20 25 30 Q Residuals (1.46%) H o te lli n g T ^2 ( 9 8 .5 4 % ) 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 22 31 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 7 0 1 2 3 4 5 6 7 89 Q Residuals (0.65%) H o te lli n g T ^2 ( 9 9 .3 5 % ) 1 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 2 4 6 8 10 12 x 10 5 0 1 2 3 4 5 6 7 8 Q Residuals (2.84%) H o te lli n g T ^2 ( 9 7 .1 6 % ) 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 5 10 15 20 25 30 Q Residuals (1.91%) H o te lli n g T ^2 ( 9 8 .0 9 % ) 1 2 3 4 5 6 7 8 9 10 11 12 15 16 22 27 29 31 (c) (d) 500 1000 1500 2000 2500 3000 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Wavenumber(cm-1) In te n s it y M1S01 M1S02 M1S03 M1S04 M1S05 500 1000 1500 2000 2500 3000 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Wavenumber(cm-1) In te n s it y M1S01 M1S05 M1S10 M1S15 M1S20 M1S30 (a) (b) 106 Table 11 Calibration Evaluation for Normalisation on the M1GluR2 data using Norm1.32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg Full Norm1 2 0.995 0.99 1.09 4.25 Avg ROI Norm1 2 0.998 0.60 0.63 2.46 Avg ROI Norm1 (1) 2 0.996 0.85 0.91 3.55 Avg MWPLS Norm1 2 0.997 0.74 0.78 3.04 Avg MWPLS Norm1 (1) 2 0.994 1.12 1.20 4.68 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC Full Norm1 3 0.997 0.78 0.90 3.51 BC ROI Norm1 3 0.997 0.84 0.96 3.75 BC ROI Norm1 (1) 3 0.934 3.69 4.15 16.21 BC MWPLS Norm1 3 0.997 0.80 0.91 3.55 BC MWPLS Norm1 (1) 3 0.925 3.93 4.41 17.22 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE Full Norm1 3 0.692 7.94 9.87 38.55 WE ROI Norm1 4 0.726 7.49 9.64 37.65 WE ROI Norm1 (1) 4 0.664 8.29 9.51 37.14 WE MWPLS Norm1 5 0.826 5.97 10.26 40.07 WE MWPLS Norm1 (1) 4 0.684 8.04 10.43 40.74 Table 12 Calibration Evaluation for Normalisation on the M1GluR2 data using NormOH. 32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg Full Norm OH 2 0.996 0.84 0.93 3.63 Avg ROI Norm OH 2 0.998 0.62 0.65 2.53 Avg ROI NormOH (1) 2 0.998 0.62 0.65 2.53 Avg MWPLS NormOH 3 0.999 0.44 0.49 1.91 Avg MWPLS Norm OH(1) 3 0.999 0.44 0.49 1.91 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC Full Norm OH 3 0.994 1.06 1.22 4.76 BC ROI Norm OH 3 0.995 0.98 1.13 4.41 BC ROI Norm OH (1) 3 0.995 0.98 1.13 4.41 BC MWPLS Norm OH 3 0.995 0.97 1.12 4.37 BC MWPLS Norm OH(1) 3 0.995 0.97 1.12 4.37 107 Table 13 Calibration Evaluation for Normalisation on the M1GluR2 data using Norm2. 32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg Full Norm2 2 0.995 1.02 1.12 4.37 Avg ROI Norm2 2 0.999 0.43 0.45 1.75 Avg ROI Norm2 (1) 2 0.996 0.85 0.91 3.55 Avg MWPLS Norm2 2 0.999 0.53 0.56 2.18 Avg MWPLS Norm2 (1) 2 0.994 1.12 1.21 4.72 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC Full Norm2 3 0.999 0.42 0.49 1.91 BC ROI Norm2 3 0.999 0.45 0.50 1.95 BC ROI Norm2 (1) 3 0.966 2.65 3.00 11.71 BC MWPLS Norm2 3 0.999 0.37 0.41 1.60 BC MWPLS Norm2 (1) 3 0.965 2.69 3.05 11.91 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE Full Norm2 4 0.822 5.45 7.22 28.20 WE ROI Norm2 3 0.664 7.49 9.28 36.25 WE ROI Norm2(1) 5 0.822 5.44 8.57 33.47 WE MWPLS Norm2 5 0.842 5.14 7.41 28.94 WE MWPLS Norm2(1) 5 0.834 5.25 8.02 31.32 Table 14 Calibration Evaluation for Normalisation on the M1GluR2 data using Norm INF. 32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg Full Norm INF 2 0.995 1.03 1.13 4.41 Avg ROI Norm INF 2 0.997 0.71 0.75 2.92 Avg ROI Norm INF (1) 2 0.996 0.89 0.96 3.75 Avg MWPLS Norm INF 2 0.999 0.53 0.56 2.18 Avg MWPLS Norm INF(1) 2 0.995 1.05 1.13 4.41 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC Full Norm INF 3 0.999 0.41 0.47 1.83 BC ROI Norm INF 3 0.999 0.47 0.52 2.03 BC ROI Norm INF (1) 3 0.995 0.98 1.12 4.37 BC MWPLS Norm INF 3 0.999 0.36 0.40 1.56 BC MWPLS Norm INF(1) 3 0.995 0.97 1.12 4.37 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE Full Norm INF 4 0.781 6.05 9.32 36.40 WE ROI Norm INF 8 0.860 4.83 8.54 33.35 WE ROI Norm INF(1) 6 0.896 4.16 8.47 33.08 WE MWPLS Norm INF 5 0.819 5.50 9.42 36.79 WE MWPLS Norm INF(1) 5 0.797 5.82 9.07 35.42 108 3.5.4 Derivative Pre-Processing of M1GluR2 Data The first order derivative smoothed and resolved peaks in complex spectral profiles. It also caused the spectral effects of baseline offset and slopes to diminish. Here derivative pre-processing of the data was performed using the Savitzky Golay algorithm. The settings chosen were first order derivative with a filter width of eleven and a polynomial order of three. In Figure 57 the averaged Raman spectra of the M1Glu sample and the spectral profiles after first order derivative pre-processing of the different regions (full, ROI and MWPLS) are compared. The first order derivative spectra of M1GluR2 contained a large peak above 2900 cm–1, with multiple smaller positive and negative peaks in the fingerprint region (400–1800 cm–1), indicative of the D-glucose signal. Figure 57 Effects of first order derivative pre-processing on M1GluR2 data, (a) Raw spectra, (b) after processing in the full range (c) in the ROI range and (d) in the MW range. Thus far the WE data only worked without further pre-processing. MSC and normalisation increased artefacts produced by the water elimination dividing the dataset into two populations of samples (high and low concentration). However the first order derivative handled the water eliminated spectra as the baseline offset was corrected and the scatter was reduced. When comparing the first derivative of the 500 1000 1500 2000 2500 3000 2 4 6 8 10 12 x 10 4 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -2000 -1000 0 1000 2000 3000 4000 Wavenumber (cm-1) In te n s it y 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -1000 -800 -600 -400 -200 0 200 400 600 Wavenumber (cm-1) In te n s it y 900 1000 1100 1200 1300 1400 1500 1600 -1000 -800 -600 -400 -200 0 200 400 600 Wavenumber (cm-1) In te n s it y (a) (c) (d) (b) 109 averaged data (Figure 57b) to the water eliminated data (Figure 58), the OH bending band (1640 cm–1) was clearly removed. In addition, the OH stretch above 3000 cm–1 left some noise, but region selection avoided interference from that spectral region. The first order derivative data generated reasonable PLS models, however the models generated using MSC and Norm2 pre-processing were better. Table 15 shows consistent values for the PLS models; models for the region selection were virtually the same and showed a slight improvement on the full range models.33 Table 15 Calibration Evaluation for first order derivative (FD) on the M1GluR2 data 32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg FD13 Full 2 0.994 1.06 1.17 4.57 Avg FD13 ROI 3 0.997 0.84 0.98 3.82 Avg FD13 ROI (1) 3 0.997 0.84 0.99 3.86 Avg FD13 MWPLS 3 0.996 0.84 0.99 3.86 Avg FD13 MWPLS (1) 3 0.996 0.85 0.99 3.86 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC FD13 Full 2 0.994 1.06 1.16 4.53 BC FD13 ROI 3 0.997 0.84 1.00 3.90 BC FD13 ROI (1) 3 0.997 0.84 0.99 3.86 BC FD13 MWPLS 3 0.996 0.85 1.00 3.90 BC FD13 MWPLS(1) 3 0.996 0.85 1.00 3.90 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE FD13 Full 5 0.999 0.54 0.92 3.59 WE FD13 ROI 3 0.998 0.62 0.97 3.78 WE FD13 ROI (1) 3 0.998 0.63 0.99 3.86 WE FD13 MWPLS 4 0.999 0.51 0.99 3.86 WE FD13 MWPLS (1) 3 0.998 0.61 0.98 3.82 Figure 58 First order derivative pre-processing on the water eliminated M1GluR2 Raman spectra. 33 The settings used in Figure 50 are not the same as those for FD modelling and the end of range effects are less severe with the FD setting of filterwidth of eleven and polynomial order of three. 500 1000 1500 2000 2500 3000 -1000 -500 0 500 1000 1500 2000 Wavenumber(cm-1) Int en sit y 110 3.5.5 MSC-FD and FD-MSC Pre-Processing of M1GluR2 Data The next logical step was to consider the combination of multiple pre-processing methods and how these might improve model quality in terms of RMSEC/RMSECV. The pre-processing combination was based on the performance of the individual method models. The combination of MSC and FD was investigated as their singular models could be improved34 and also these methods complement each other. MSC is a signal and scatter correction method and FD is a signal correction method. Together they can remove baseline shift and additive effects. Considering the large impact that baseline had on the data, the use of first derivative pre-processing alone eliminated the baseline. However, a drawback of first derivative was that the artefacts generated by smoothing and filtering had increased the noise. In order to prevent this, MSC was used to increase the signal to noise ratio following derivative pre-processing. Figure 59 Effects of FD-MSC (left) and MSC-FD (right) pre- processing on the M1GluR2 data in the 707-1853cm–1, where the red is for pre-processing after area selection and blue represents pre-processing before area selection. Both sequences (MSC-FD and FD-MSC) of pre-processing were investigated (see Table 16 and Table 17). FD-MSC outperformed MSC-FD leading to the best model. Therefore for all remaining data analysis, FD-MSC was used. The different arrangements generated different results for region selection. The FD-MSC led to a poor performance when performed after area selection while MSC-FD was less affected by area selection. In the FD-MSC spectra (Figure 59), the OH bending band (1640 cm–1) was the main difference between the pre-processing before and after area selection. With pre-processing after area selection, more variance caused by this band 34 The best model thus far was from Norm2 averaged data in the region 707–1853 cm–1. 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -1000 -500 0 500 1000 Wavenumber(cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -2000 -1000 0 1000 2000 3000 FSTMSCROI(1) FSTMSCROI 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -1000 -500 0 500 1000 Wavenumber(cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -2000 -1000 0 1000 2000 3000 MSCFSTROI(1) MSCFSTROI 111 resulted in the weaker calibration models, due to the strong influence that water had on the data. The WE process proved to be problematic with FD-MSC and MSC-FD (Table 16 and Table 17). As in the case of the MSC and normalisation methods, after WE, the signal was inundated with too much noise to be useful, since the low concentration samples were largely composed of water. The low concentration samples were altered making them different from high concentration samples (Figure 60 a and c). From the spectra for M1GluS01 and M1GluS31 and the loadings (Figure 60 b and d), it was clear that the low concentration sample signal was laden with noise while the higher concentration sample reflected the D-glucose signal matching the first loadings signal. The first loading explained 66.95% of the explained variance; this was much lower compared to good correlation models (e.g. Figure 54a, with 99.99%). Overall the noise artefacts generated in the low concentration samples prevented a correlation between the Raman signal and D-glucose concentration. Figure 60 (a) Hotelling’s T2 vs Q residuals, (b) M1GluSO1 and M1GluS31 spectra for WE M1GluR2 data after FD-MSC, (c) the scores plot for PC1 vs PC3, with the high concentration samples marked black and the low concentration samples marked red, and (d) the loadings spectra for the four PCs. 0 2 4 6 8 10 12 x 10 6 0 5 10 15 20 25 Q Residuals (6.36%) H o te ll in g T ^ 2 ( 9 3 .6 4 % ) 1 2 3 4 5 6 7 9 10 11 12 16 18 30 31 2000 2500 3000 3500 4000 4500 5000 5500 6000 -8000 -6000 -4000 -2000 0 2000 4000 6000 8000 Scores on PC 1 (67.48%) S c o re s o n P C 3 ( 8 .7 7 % ) 1 2 3 4 5 6 7 8 9 15 18 19 20 22 31 500 1000 1500 2000 2500 3000 -2.2 -2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 Wavenumber(cm-1) L o a d in g s L1(66.95%) L2(17.15%) L3(1.71%) L4(7.84%) 500 1000 1500 2000 2500 3000 -3000 -2000 -1000 0 1000 2000 3000 4000 Wavenumber(cm-1) In te n s it y M1S01 M1S31 (c) (d) (a) (b) 112 Table 16 Calibration Evaluation for MSC-FD on the M1GluR2 data.32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg MSCFD Full 2 0.999 0.48 0.53 2.07 Avg MSCFD ROI 2 0.999 0.38 0.40 1.56 Avg MSCFD ROI (1) 2 0.999 0.39 0.42 1.64 Avg MSCFD MWPLS 2 0.999 0.38 0.40 1.56 Avg MSCFD MWPLS (1) 3 0.998 0.59 0.69 2.69 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC MSCFD Full 2 0.999 0.44 0.49 1.91 BC MSCFD ROI 2 0.999 0.35 0.38 1.48 BC MSCFD ROI (1) 2 0.998 0.68 0.72 2.81 BC MSCFD MWPLS 2 0.999 0.35 0.38 1.48 BC MSCFD MWPLS(1) 2 0.997 0.76 0.80 3.12 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE MSCFD Full 5 0.773 6.81 12.98 50.70 WE MSCFD ROI 3 0.647 8.50 11.28 44.06 WE MSCFD ROI (1) 3 0.518 9.85 11.71 45.74 WE MSCFD MWPLS 2 0.404 11.04 11.74 45.85 WE MSCFD MWPLS (1) 2 0.353 11.53 12.56 49.06 Table 17 Calibration Evaluation for FD-MSC on the M1GluR2 data.32 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% Avg FDMSC Full 2 0.999 0.38 0.42 1.64 Avg FDMSC ROI 2 0.999 0.32 0.35 1.36 Avg FDMSC ROI (1) 3 0.971 2.44 2.89 11.28 Avg FDMSC MWPLS 2 0.999 0.32 0.34 1.32 Avg FDMSC MWPLS (1) 2 0.944 3.38 3.70 14.45 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC FDMSC Full 2 0.999 0.38 0.42 1.64 BC FDMSC ROI 2 0.999 0.32 0.35 1.36 BC FDMSC ROI (1) 3 0.971 2.44 2.89 11.28 BC FDMSC MWPLS 2 0.999 0.32 0.35 1.36 BC FDMSC MWPLS(1) 2 0.944 3.38 3.70 14.45 M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% WE FDMSC Full 4 0.745 7.22 12.71 49.64 WE FDMSC ROI 2 0.391 11.16 11.99 46.83 WE FDMSC ROI (1) 4 0.723 7.53 10.79 42.14 WE FDMSC MWPLS 2 0.391 11.16 11.95 46.67 WE FDMSC MWPLS (1) 2 0.359 11.45 12.07 47.14 113 3.6 Outcomes from M1Glu Data Analysis The glucose in water model was used to evaluate which data collection, data-pre- processing and PLS modelling conditions that were most important for accurate prediction of concentration using Raman spectroscopy. This investigative study showed that it was possible to precisely quantify D-glucose in water at concentrations typical of that used in cell culture (REP< 1.5%). The two main issues to overcome in order to achieve this result were the strong water band and the baseline offset in the Raman signal. To deal with the strong water signal, WE was implemented and worked up to a point. However, the WE method also introduced artefacts which highlighted differences between high and low concentration samples. These artefacts were eliminated with a simple first derivative, but not with MSC or normalisation which saw two linear ranges emerge for high and low concentration samples: the first range of samples (1 to 12) covered 1.6 g/L to 19.0 g/L and the second range started from sample 8, covering 12 g/L to 50 g/L. The baseline offset had a negative impact on calibration modelling. Through changes to the experimental setup and pre-processing, reduction of spectral variance and improved D-glucose signal were observed. The best pre-processing was FD-MSC. The FD pre-processing removed baseline offset and also smoothed and resolved peaks. This was prior to MSC, where correction of remaining offset, scatter and baselinesshift, as well as derivative artefacts was performed. The best performing models are shown in Table 18. It was clear that for each model the performance was improved by region selection, which removed the influence of the large OH band. Overall, the best calibration model was built on the averaged data, with further pre-processing by FD and MSC, before finally being reduced to the best performing region of 800–1680 cm–1. This model was referred to as M1Glu AVG FD- MSC MW. 114 Table 18 The optimal M1GluR2 models generated after the different pre-processing methods. M1GluR2 LV R2 RMSEC (g/L) RMSECV (g/L) REP% BC MW (800–1680 cm–1) 3 0.998 0.62 0.72 2.81 AVG ROI (707–1853 cm–1) MSC pre-processing 3 0.999 0.35 0.39 1.52 AVG ROI (707–1853 cm–1) Norm2 pre-processing 2 0.999 0.43 0.45 1.75 AVG ROI (707–1853 cm–1) FD pre-processing 3 0.997 0.84 0.98 3.82 AVG MW (800–1680 cm–1) FD-MSC pre-processing 2 0.999 0.32 0.34 1.32 3.7 Quantification of D-glucose in a ternary mixture (M3Glu-Data) The feasibility of using Raman with dilute solutions to extract relevant information with an acceptable level of accuracy (~2% REP) was studied [4]. For the M3Glu sample set35, the total dissolved analyte concentration ranged from 1.74 to 15.06 g/L and was thus a dilute cell culture medium. The rationale for using a dilute cell culture medium was that dilute solutions may generate better quality spectra for both fluorescence and SERS measurements in quantification of the complex ingredients (eRDF and YE). Therefore it was decided to investigate whether it was possible to accurately quantify the glucose content in these dilute media. In practice, the goal might be to take a medium sample, dilute it, and then perform fluorescence, conventional Raman and SERS all on the same sample. The second goal of the M3Glu study was to assess the effect of spectral overlap when trying to calibrate the D-glucose concentration in the presence of a strong water background signal and multiple similar components: e.g. L-glutamine and D-Galactose (Figure 61). 35 Table 3 Composition of the M3Glu 115 Figure 61 Overlay of the Raman spectra of solid D-glucose, L-glutamine and D-galactose (λex 785 nm). These spectra were collected as single scan (10 second exposure) from 250–3311 cm–1 with 8 cm–1 resolution. 3.7.1 Spectral Analysis of M3Glu Data The averaged, baseline corrected and water eliminated M3GluR2 spectra are displayed with the averaged M1GluR2 spectrum for comparison (Figure 62). With these low concentration M3Glu samples, the analyte signal was weak. Any overlap between the analytes (glucose, galactose and glutamine) was eclipsed by the water signal in the data. A water analyte ratio (WAR) of 11.37 for the M3Glu data was observed compared to the WAR of 9.35 in the M1Glu data. The higher WAR signified the larger water signal within the weaker M3Glu dataset. 500 1000 1500 2000 2500 3000 1 2 3 4 5 6 7 x 10 4 Wavenumber(cm-1) In te n s it y D-Glucose D-Galactose L-Glutamine 116 Figure 62 (a) Averaged Raman spectra of the M1GluR2 data, (b) averaged Raman spectra of the M3GluR2 data, (c) baseline corrected Raman spectra of the M3GluR2 data, and (d) water eliminated Raman spectra of the M3GluR2 data Similar to the M1Glu data, a sloping baseline, baseline offset and large water signal were characteristic of the M3Glu data. The baseline offset effects were increased by the low analyte concentrations of the M3Glu samples as seen in Figure 62a and b. After water elimination, a more detailed spectrum was generated with distinguishable bands and water elimination artefacts. The artefacts were seen as the increased baseline offset (1200–2100 cm–1) and the large noise signal seen above 3000 cm–1. The M1Glu data analysis showed that these artefacts had a negative impact on the modelling ability and limited useful spectral ranges. Water elimination was then tested for M3Glu samples to verify if the same outcome occurred with more complex data. 3.7.2 Reproducibility When the PCA analysis was performed on the individual data collections, it did not reveal any outliers and only two components were needed to describe the M3Glu data. The first component represented the water signal and the second component contained analyte signal. The M3Glu data showed a similar pattern to the M1Glu data for the 500 1000 1500 2000 2500 3000 0 0.5 1 1.5 2 2.5 3 3.5 x 10 4 Wavenumber (cm-1) In te n si ty 500 1000 1500 2000 2500 3000 -0.5 0 0.5 1 1.5 2 2.5 x 10 4 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -2000 -1000 0 1000 2000 3000 Wavenumber (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 -1 0 1 2 3 4 5 6 7 x 10 4 Wavenumber(cm-1) In te n si ty (a) M1 Raw Data (c) M3 BC Data (b) M3Raw Data (d) M3 WE Data 117 amalgamated samples sets (Figure 63). The second and third data collections were close together while the first was separated due to the sampling setup change. Figure 63 PCA Scores and Loadings plots for triplicate measurements (a)/(b) for averaged raw M3Glu samples and (c)/(d) for the FDMSC M3Glu samples. Run 1 is black, Run 2 is red, and Run 3 is green. The impact of water and spectral offset was evident in the loadings (Figure 63b). For example, in the M3GluL3 data, there was a severe baseline slope, a large downward peak at the water bending band (1640 cm–1) and small analyte peaks. Their visibility was hampered by the low analyte concentration, baseline slope and the large water signal. After FD-MSC pre-processing (Figure 63c and d), the deviations caused by scatter effects were dealt with and the number of loadings describing the media was reduced to two, where the second loading described the D-glucose signal. This showed that the data can be adequately corrected for quantitative analysis by chemometric pre-processing. -3 -2 -1 0 1 2 3 4 5 6 x 10 4 -2500 -2000 -1500 -1000 -500 0 500 1000 1500 2000 2500 Scores on PC 1 (99.42%) S c o re s o n P C 2 ( 0 .3 0 % ) 500 1000 1500 2000 2500 3000 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Wavenumber(cm-1) L o a d in g s L1(99.42%) L2(0.30%) 500 1000 1500 2000 2500 3000 -0.1 -0.05 0 0.05 0.1 0.15 Wavenumber(cm-1) In te n s it y L1(96.7%) L2(3.28%) L3(0.01%) -3 -2 -1 0 1 2 x 10 5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x 10 4 Scores on PC 2 (3.28%) S c o re s o n P C 3 ( 0 .0 1 % ) (b)M3 Loadings (a) M3 Scores (c) M3_FDMSC Scores (d)M3_FDMSC Loadings 118 3.7.3 Quantitative Analysis: Calibrating D-glucose in M3Glu Data To determine which model best estimated the D-glucose content in media samples, a wide variety of models were assessed (Table 65–Table 69). The best M3Glu models for the different pre-processing methods are shown in Table 19. Table 19 The best performing M3Glu models generated after the different pre-processing methods. M3Glu Data LV Correlation Coefficient RMSEC (g/L) RMSECV (g/L) REP% WE Data MW (800–1680 cm–1) 4 0.990 0.28 0.38 7.65 BC MW (800–1680 cm–1) MSC pre-processing 4 0.993 0.25 0.30 6.04 BCMW (800–1680 cm–1) Norm2 pre-processing 4 0.993 0.25 0.32 6.44 BC FD MW (800–1680 cm–1) FD pre-processing 3 0.993 0.25 0.29 5.84 BCFDMSCMW (800–1680 cm–1) FD MSC pre-processing 3 0.995 0.20 0.23 4.63 When compared with the best M1Glu models (Table 18), there was a decline in the calibration model quality. This was the result of the lower analyte concentrations in these samples. Overall, the baseline corrected spectra suited the calibration models and the best model also used FD-MSC pre-processing. It aided the model by resolvingthe peaks and removing baseline offset to improve the analyte signal. As with the M1Glu data, water elimination on the M3Glu data only worked with first derivative pre-processing. For each pre-processing method, the best models used the 800–1680 cm–1 range as the water bending band at 1640 cm–1 acted like an internal reference. The 1640 cm–1 water band remained steady, unlike the strong OH band above 3000 cm–1 which was affected by detector limitations, shot noise and sampling effects, causing greater variation. In these very dilute solutions compared to the M1Glu data, the strong water signal was a benefit. The water signal showed little change and thus acted as an internal standard. This region also contained several peaks 915, 1059, 1123, 1372, and 1460 cm–1 related to the D-glucose signal. The variation in the D-glucose signal was 119 self-referenced to the stable water signal allowing for estimation of the D-glucose concentration[98, 131]. The best overall model was obtained after FD-MSC was applied to baseline corrected data and when the spectral range was reduced to 800–1680 cm–1. Three latent variables were necessary to model the data. The first loading (98.53%) represented the large water signal within M3Glu data (Figure 64). Figure 65 shows a comparison water spectrum before and after FDMSC pre-processing. The second loading accounted for 1.16% of the explained variance and was the analyte signal. The third loading (0.10%) was unresolved analyte signal and spectral noise. Figure 64 The BC M3Glu calibration model is built using FDMSC pre-treated data in the 800- 1680cm–1 region, (left) the predicted versus expected plot and (right) the three latent variables loadings used in the calibration model. This model showed that for the more complex and dilute media such as M3Glu, one could use Raman together with chemometrics to estimate the D-glucose concentration with reasonable accuracy (REP of 4.63%). The same pre-processing and region selection as for the M1Glu samples was used. The next step was to determine if the same methodology worked on the more complex M5Glu Media which contains five media components. 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.995 3 Latent Variables RMSEC = 0.20568 RMSECV = 0.23945 900 1000 1100 1200 1300 1400 1500 1600 -0.3 -0.2 -0.1 0 0.1 0.2 Wavenumber(cm-1) In te n s it y L1(98.53%) L2(1.16%) L3(0.10%) 120 Figure 65 Raman spectra of water before and after by FDMSC pre-processing and inset is the 707-1853 cm–1 region FDMSC spectrum. 3.8 Quantification of D-glucose in a quinary mixture (M5Glu-Data) Using a recipe for media deployed within industry, a set of samples containing a fixed concentration of eRDF, yeastolate, D-galactose and L-glutamine and a varying concentration of D-glucose (0.0 g/L-9.92 g/L) were prepared (see section 2.12.4). 3.8.1 Spectral Analysis and Reproducibility of M5GLU Data The M5Glu spectra showed a strong baseline offset with large water signal obscuring a lot of the analyte signal. During the analysis of the M1Glu, M3Glu and M5Glu data, baseline offset increased with the change to low concentration samples. The WAR for M5Glu dataset was 9.71, which was lower than the M3Glu dataset (11.37), as the M5Glu had a more complex sample makeup with a higher percentage of dissolved solids. 500 1000 1500 2000 2500 3000 0 1 2 3 4 5 6 7 8 9 10 x 10 4 Wavenumber(cm-1) In te n s it y H2O 500 1000 1500 2000 2500 3000 -2000 -1000 0 1000 2000 3000 4000 Wavenumber(cm-1) In te n s it y FDMSC-H2O 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -600 -400 -200 0 200 400 600 Wavenumber(cm-1) In te n si ty FDMSC-H2O 121 Figure 66 Averaged raw Raman spectra of the M5GluR1 data (a) and M5GluR2 (b). For the first data collection, the averaged spectra (Figure 66a) displayed a severe baseline offset. After baseline correction, however, the offset was removed. When conducting PCA analysis on the individual data collections for M5Glu, the first data collection showed no outliers. However in the M5GluR2 data (Figure 66b) a sample displayed a lower intensity than the other samples; this sample was identified as M5GluR2S12. PCA analysis (Figure 67) confirmed the outlier. When M5GluS12 was measured as part of the third data collection, no outliers were present. Therefore this outlier was the result of an odd measurement due to experimental error. After M5GluR2S12 was removed, the repeated PCA analysis of the raw averaged M5GluR2 data revealed that the remaining samples were within the 95% confidence limit. Figure 67: PCA Scores plots for averaged M5GluR2 data before and after the outlier removal and the loadings of the PC1 and PC2 for Run 2 data after outlier has been removed. The M5Glu PCA loadings (Figure 67) illustrated a trend similar to the one seen in all the media data analysed so far. The first loading was dominated by the water signal (99.98% of the explained variance) and the second revealed small peaks (426, 522, 1066, 1123, 1362 and 2898 cm–1) for only ~0.02% explained variance. This was a -5 0 5 10 15 x 10 5 -1 -0.5 0 0.5 1 1.5 2 x 10 4 Scores on PC 1 (99.98%) S c o re s o n P C 2 ( 0 .0 1 % ) 1 3 4 5 6 11 12 13 16 17 18 21 26 29 30 31 32 -5 0 5 10 15 x 10 5 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 x 10 4 Scores on PC 1 (99.98%) S c o re s o n P C 2 ( 0 .0 1 % ) 1 4 5 6 9 11 16 17 18 21 23 24 25 26 27 29 30 31 32 500 1000 1500 2000 2500 3000 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Wavenumber(cm-1) In te n s it y L1(99.98%) L2(0.01%) 500 1000 1500 2000 2500 3000 0.5 1 1.5 2 2.5 x 10 5 Wavenumber(cm-1) In te n s it y 500 1000 1500 2000 2500 3000 1 2 3 4 5 6 7 8 9 10 x 10 4 Wavenumber(cm-1) In te n s it y (a) (b) 122 result of the low analyte signal intensity compared to the water signal within the spectra. Pre-processing of the data increased the explained variance for PC2 to 0.33%, by enhancing the analyte signal. Looking at the third data collection for M5Glu spectra (Figure 68), the raw data was comprised of two groups of spectra as a result of being collected over 2 days. This grouping was caused by instrumental variation of the Raman Station, a single channel instrument which has no internal calibration to prevent possible power fluctuations which could have caused the shift in the M5Glu data. This variation was corrected by FD-MSC pre-processing and when the FD-MSC corrected data was overlaid for Day 1 and Day 2 (Figure 68b), the samples were distributed according to their concentrations. Figure 68 Averaged Raman spectra of the M5GluR3 data (left) and PCA Scores plots for M5GluR3 data first derivative and multiplicative scatter correction (right). Red refers to day one and black is day two. 1200 1250 1300 1350 1400 1450 1500 1550 -500 -400 -300 -200 -100 0 100 200 300 400 500 Scores on PC 1 (98.38%) S c o re s o n P C 2 ( 1 .0 4 % ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19 20 21 23 25 26 27 28 29 30 31 32 500 1000 1500 2000 2500 3000 1 2 3 4 5 6 7 8 9 10 x 10 4 Wavenumber(cm-1) In te n s it y (a) (b) Low High 123 Figure 69 PCA Scores and loadings plots for triplicate measurements of M5Glu sample sets. Run 1 is black, Run 2 is red and Run 3 is green. As with the M1Glu and M3Glu data, PCA analysis on the M5Glu data showed that the samples from the second and third runs were close together while the first data collection was separated due to the sampling setup issue (Figure 69). Pre-processing wasunable to correct for this variance. The strong water signal was observed in the first loading, the second loading depicted the analyte signal and the third loading represented the offset effects caused by water at 1640 cm–1 and above 3000 cm–1. After FD-MSC pre-processing of the M5GluR3 data, the loadings were reduced to two: one for the water signal and the other for the analyte, since the interferences seen in the raw data were removed. -6 -4 -2 0 2 4 x 10 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x 10 4 Scores on PC 2 (0.12%) S c o re s o n P C 3 ( 0 .0 1 % ) 500 1000 1500 2000 2500 3000 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Wavenumber(cm-1) In te n s it y L1(99.87%) L2(0.12%) L3(0.01%) (a) M5 Scores (b) M5 Loadings -2 0 2 4 6 8 x 10 4 -5000 -4000 -3000 -2000 -1000 0 1000 2000 3000 4000 5000 Scores on PC 1 (98.74%) S c o re s o n P C 2 ( 0 .8 6 % ) 500 1000 1500 2000 2500 3000 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Wavenumber(cm-1) L o a d in g s L1(98.74%) L2(0.86%) (c) M5_FDMSC Scores (d) M5_FDMSC Loadings 124 3.8.2 Quantification: Glucose in M5Glu Data Several models were built for the M5Glu data (Table 70 to Table 74) and the best models for estimating the D-glucose concentration are listed in Table 20. From the M1Glu/M3Glu dataset, it was observed that the reduced region of 800–1680 cm–1 and FD-MSC pre-processing gave the best results. The same held true for the M5Glu data (Figure 70). The best model used two latent variables, the first represented the water signal and the second the analyte signal. Figure 70 Predicted versus expected plots for the calibration of BC FDMSC M5Glu data in the 800-1680 cm–1 region and the loadings showing the components represented in the calibration model. Table 20 The best M5Glu models generated after the different pre-processing methods. M5Glu Data LV Correlation Coefficient RMSEC (g/L) RMSECV (g/L) REP (%) BC Data (Entire region) Preliminary pre-processing 4 0.983 0.38 0.50 8.63 BC MW (800–1680 cm–1) MSC pre-processing 4 0.990 0.29 0.35 6.04 AVG MW (800–1680 cm–1) Norm2 pre-processing 4 0.992 0.26 0.34 5.87 BC MW (800–1680 cm–1) FD pre-processing 3 0.985 0.36 0.44 7.59 BC MW (800–1680 cm–1) FD-MSC pre-processing 2 0.993 0.25 0.27 4.66 The chemometric modelling ability of Raman data to quantify D-glucose in increasingly complex media was dependent on media complexity and concentration. The M1Glu/M3Glu data modelling showed that the estimation of D-glucose was possible at both high and low concentrations. The low analyte concentration reduced the PLS model accuracy but the overall performance was still reasonable. The quantification of glucose in M5Glu equalled that obtained for the simple M3Glu samples. This indicated that increasing complexity did not adversely affect the 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Expected (g/L) P re di ct ed (g /L ) R2 = 0.993 2 Latent Variables RMSEC = 0.25039 RMSECV = 0.27805 900 1000 1100 1200 1300 1400 1500 1600 -0.3 -0.2 -0.1 0 0.1 0.2 Wavenumber(cm-1) In te ns ity L1(99.78%) L2(0.92%) 125 spectral data quality and that glucose quantification should always be possible in complex media as long as it is present in relatively high concentration. If more samples and more replicates were used in the calibration set and if a smaller, more appropriate (to the designed formulation) D-glucose concentration range was employed, then it should be feasible to get a much lower REP in the 1–2 % range. 3.9 Quantification of eRDF and Yeastolate in quinary mixtures (M5eRDF and M5Ye) It was possible to use Raman spectroscopy to estimate a single simple component (D- glucose) to a reasonable level of accuracy. It was therefore also desirable to know if the same was possible with complex media ingredients as a whole unit within the media formulation. Therefore the goal of this section is to ascertain if Raman can be used to determine if the correct amount of eRDF or yeastolate was added to a media. Two sets of samples were prepared. Both contained D-galactose, D-glucose, L- glutamine eRDF and yeastolate but for M5eRDF sample set, the concentration of eRDF was varied while for M5Ye samples, the yeastolate concentration was varied. The other components were kept at a constant level. 3.9.1 Spectra Analysis of M5eRDF and M5Ye Data Figure 71 Averaged raw Raman spectra of (a) M5Ye (0.1-1.72 g/L) and (b) M5eRDF (1-6.4g/L) and, in red, the spectra after multiplicative scatter correction. The raw Raman spectra for M5eRDF and M5Ye resembled water spectra with strong baseline offset effects (Figure 71). At the low concentrations used in these samples, the Raman signal was weak and there was little difference observed in the spectra for 500 1000 1500 2000 2500 3000 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 5 Wavenumber cm-1 In te n s it y 500 1000 1500 2000 2500 3000 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 5 Wavenumber cm-1 In te n s it y (a) M5Ye (b) M5eRDF 126 M5Ye, M5eRDF and M5Glu samples (Figure 66). The WARs were similar for M5eRDF (14.93) and M5Ye (14.64) while the WAR for M5Glu data was 9.71. The M5Glu samples were a more concentrated sample set compared to the M5eRDF and M5Ye samples. As a result of the weak analyte signal, multivariate analysis was required to extract relevant analyte information. MSC pre-processing was used to remove spectral offset and noise (Figure 71) and for the quantitative analysis the same spectral regions used with M5Glu data were used (250–3311 cm–1, 707–1853 cm–1 and 800–1680 cm–1). 3.9.2 Quantification: eRDF in M5eRDF The best PLS calibration models built with the M5eRDF Raman data are summarized in Table 21. A full account of each model is available in the appendix; see section 8.3.4. The Raman data for M5eRDF was similar to the M1Glu, M3Glu and M5Glu datasets. The M5eRDF spectra suffered from baseline offset and a strong water signal. The same pre-processing methods worked well for the M5eRDF samples; FDMSC removed the baseline offset efficiently and resolved the analyte peaks within the data. Table 21 Summary of Calibration models for the M5eRDF samples using averaged Raman data. M5eRDF LV Correlation Coefficient RMSEC (g/L) RMSECV (g/L) REP% BC Data MW(800–1680 cm–1) 5 0.980 0.24 0.70 18.91 Avg ROI (707–1853 cm–1) MSC pre-processing 5 0.995 0.12 0.62 16.94 BC ROI (707–1853 cm–1) NINF pre-processing 5 0.988 0.18 0.72 19.62 BC MW(800–1680 cm–1) Norm2 pre-processing 4 0.974 0.27 0.77 20.81 BC ROI (707–1853 cm–1) FST11 pre-processing 3 0.975 0.27 0.62 16.75 BC ROI(707–1853 cm–1) FST11MSC pre-processing 4 0.993 0.14 0.59 15.94 The pre-processed M5eRDF spectra and the best calibration model are shown in Figure 72. The best model was built using the reduced range of 707–1853 cm–1; this eliminated the strong OH band and the sloping baseline seen below 700 cm–1; (visible in Figure 71). The model used four variables and the second loading correlated with the analyte signal and its scores showed a noisy linear correlation of 0.911 with 127 increasing concentration (Figure 73). The first loading represented the water signal with 99% of the explained variance in the spectra. The remaining two loadings described less than 1% of the explained variance (analyte signal and noise). When compared to the second loading, these represented ~38% and ~23% of the analyte signal and noise. The overwhelming water signal reduced their influence on the model. Figure 72 The pre-treated spectra of BC FDMSC M5eRDF data and the predicted versus expected eRDF concentration plot for the calibration model for M5eRDF in the region of 707- 1853 cm–1. Figure73 M5eRDF loadings (left) and scores (right) of the second component for BC FDMSC ROI calibration model. Compared to the M5Glu calibration models (Table 20) where the best REP was ~ 5%, the M5eRDF model had significantly lower accuracy with a REP of ~16%. The difference in performance was the result of the high WAR of the M5eRDF compared to the M5Glu sample set and the different sample numbers used per sample sets.36 M5Glu comprised of 32 samples spanning 0.0 g/L to 9.92 g/L while only 10 samples 36 Measurement precision increases with more samples. 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -1000 -500 0 500 1000 Wavenumber(cm-1) In te n s it y 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.993 4 Latent Variables RMSEC = 0.14718 RMSECV = 0.59011 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Wavenumber(cm-1) L o a d in g s LPC1(99.69%) LPC2(0.13%) LPC3(0.05%) LPC4(0.03%) 128 were used for M5eRDF covering 1.0 g/L to 6.4 g/L range. The performance of the M5eRDF model could then be improved by increasing the sample numbers and increasing the concentration of eRDF within the samples. When comparing the performance of the M5eRDF model to the M5Glu one, over- fitting37 was evident within the data as noted by the low RMSEC and high RMSECV. The M5eRDF models used more latent variables on average. For the best M5Glu models the average ratio of SECV to SEC was 1.2, while for the M5eRDF the averaged ratio was 3.5. This was another indicator that M5eRDF data was over-fitted. This was likely due to samples not changing enough to be correlated with eRDF concentration increase. The concentration change signal was not seen as a whole but as a product of multiple individual components changing since eRDF is a mixture. This resulted in a more complex but less intense change in the Raman signal. The Raman method worked for M5Glu dataset not only due to the fact that only a single component signal was changing, but also the M5Glu dataset benefitted from having more samples thus allowing for greater precision. 3.9.3 Quantification: Yeastolate in M5Ye For the quantification of yeastolate, the M5Ye calibration models generated were very weak. Raman was not sensitive enough to the low concentration changes occurring in the data (0.1–1.72 g/L). The high WAR value of 14.64 was also indicative of the very weak analyte signal for yeastolate. Some models are shown in Table 2238 including the best calibration model which was not acceptable with a REP level of ~38%. Anything with a REP > 20/30% is unusable and indicated that the Raman data could not be modelled in terms of yeastolate concentration. Figure 74 shows a linear correlation. However, there was too much scatter and this prevented generation of an accurate model. This calibration model used three variables to describe the system. The first loading explained 99.65 % of variance and 37 When too many latent variables are used the model essentially fits noise which is specific to the calibration set. Over-fitting is then characterised by the large RMSECV which results from the prediction of samples with their own noise pattern. 38 A full listing of the calibration models is available in the appendix, see section 8.3.5. 129 matched the water signal after pre-processing shown in Figure 65. The first loading for the M5Ye matched the first loadings for the M5eRDF data as water was the major component in these media samples. After the water signal was described only 0.24 % of variance was left to be explained by the second and third loadings. Together they represented the noisy analyte signal buried beneath the water signal, as well as shot noise. Model accuracy may be improved if a higher concentration range of yeastolate was studied and if more samples and replicates were used in the calibration set. Table 22 Summary of models performance for the M5Ye samples using averaged Raman data M5GLUYe LV Correlation Coefficient RMSEC (g/L) RMSECV (g/L) REP% BC Data Full(250–3311 cm–1) 4 0.935 0.13 0.36 39.67 BC Full (250–3311 cm–1) MSC Pre-processing 4 0.944 0.12 0.36 39.78 BC Full (250–3311 cm–1) NINF Pre-processing 4 0.938 0.12 0.36 40.32 BC Full (250–3311 cm–1) Norm2 Pre-processing 4 0.938 0.12 0.36 39.56 Avg ROI (707–1853 cm–1) FD11 Pre-processing 3 0.941 0.12 0.35 38.46 Avg ROI (707–1853 cm–1) FD11MSC Pre-processing 3 0.929 0.13 0.43 47.25 Figure 74 Predicted versus expected concentration plot and loadings of the calibration model for FD M5Ye data in the region of 707–1853 cm–1. 3.10 Model Validation During calibration modelling, the model was internally validated using leave one out cross validation. However, further validation was required in order to determine if the model was robust. Two validation methods were implemented. First, as the number of test samples was limited, a common semi-external validation was performed by splitting the available M5Glu data into training and test sets. A limitation of sample 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.941 3 Latent Variables RMSEC = 0.12588 RMSECV = 0.35503 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Wavenumber(cm-1) L o ad in g s LPC3(0.04%) LPC2(0.20%) LPC1(99.65%) 130 splitting validation was the similarity between the training and test set where common spectral features were modelled. The second and preferred validation method was external validation via an independent test set. It was more relevant than cross validation or sample splitting because the results had higher significance in predicting new samples and testing the models robustness. [298] 3.10.1 Prediction Performance by Sample Splitting into a Training and Test set The M5Glu sample set was split into training and test sets of 42 and 21 samples in a random fashion using Matlab. This was done 10 times and, for each new training and test combination, a new calibration model was generated which was then used to predict the relevant test set. Each training and test combination had a slightly different concentration range. The results are summarised in Table 23. The performance of the Subset_A09 training and test subsets was highlighted in grey and the calibration and prediction plot for this model are shown in Figure 75. The steady REP values (stdev: 0.87%) showed the reproducible nature of the models. The slight changes in the concentration of the training and test datasets were handled by the models. This indicated the reliability of the model based on internal samples. In comparison to the M5GluR2 model (Figure 70, REP 4.66%), the error level was higher for the validation models (avg. REP 7.10%). For these validation tests, the samples were subject to more day to day variation which did not affect the M5GluR2 model. To overcome this source of error and improve the results, the data should be normalised prior to validation. 131 Figure 75 Predicted versus expected prediction plot with the calibration samples for BC FDMSC M5Glu data using the 800–1680 cm–1 range. Table 23 Results for the Internal Validation on 10 different subsets for the BC FDMSC M5Glu data in the 800–1680 cm–1 region. Dataset LV Correlation RMSEC (g/L) RMSECV (g/L) RMSEP (g/L) REP% Subset_A01C 3 0.991 0.28 0.34 6.26 Subset_A01P 3 0.984 0.28 0.34 0.38 6.99 Subset_A02C 3 0.987 0.33 0.38 6.99 Subset_A02P 3 0.987 0.33 0.38 0.31 5.70 Subset_A03C 3 0.990 0.28 0.33 6.21 Subset_A03P 30.991 0.28 0.33 0.44 8.28 Subset_A04C 3 0.992 0.26 0.32 7.40 Subset_A04P 3 0.986 0.26 0.32 0.39 9.02 Subset_A05C 3 0.986 0.32 0.37 7.82 Subset_A05P 3 0.991 0.32 0.37 0.32 6.76 Subset_A06C 3 0.988 0.32 0.38 8.29 Subset_A06P 3 0.984 0.32 0.38 0.35 7.64 Subset_A07C 3 0.989 0.30 0.35 6.90 Subset_A07P 3 0.984 0.30 0.35 0.43 8.48 Subset_A08C 3 0.991 0.27 0.33 6.58 Subset_A08P 3 0.984 0.27 0.33 0.37 7.38 Subset_A09C 3 0.987 0.30 0.37 7.50 Subset_A09P 3 0.989 0.30 0.37 0.30 5.89 Subset_A10C 3 0.990 0.30 0.35 7.11 Subset_A10P 3 0.985 0.30 0.35 0.37 7.52 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 Measured (g/L) P re d ic te d ( g /L ) R2 = 0.989 3 Latent Variables RMSEC = 0.30786 RMSECV = 0.37503 RMSEP = 0.30607 132 3.10.2 Independent Test Set Prediction A new sample set (T5) was collected to determine the capability of the larger M5Glu dataset for the prediction of unknown samples. The samples were prepared in the same way as the M5Glu samples but at a different time and date. The composition of the samples can be seen in Materials and Methods (Table 5). The D-glucose concentration in the T5 samples ranged from 1.7 g/L to 9.8 g/L. Calibration was performed using the amalgamated data of the second and third M5Glu data collections (62 samples39). Modelling was also conducted on the averaged replicate data (32 samples). As well as the amalgamated and the averaged M5Glu sample sets, normalised data for these were also tested in order to minimise the intensity offset between the different data collections. PCA was used to evaluate the closeness between the T5/M5Glu data. The closer the calibration set was to the prediction set, the better it was for the model and quantitative performance. The scores plot showed that there was an overlap between M5Glu/T5 datasets and the T5 samples were spread across the M5Glu sample set (Figure 76). Figure 76 Scores plot for the PCA comparison of T5 and M5Glu Data after BC FDMSC MW pre-processing. 39 Samples M5R2S12 and M5R2S23 were removed as outliers. 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 -0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 Scores on PC 1 (98.50%) S c o re s o n P C 2 ( 1 .0 0 % ) M5 T5 133 Table 24 Calibration and prediction models generated by BC FDMSC M5Glu and BC FDMSC T5 samples using the 800–1680 cm–1 range. # of Samples Pre-processing Correlation Coefficient LV RMSEC RMSECV RMSEP REP% 62 0.982 3 0.40 0.45 9.07 10 0.980 3 0.40 0.45 0.57 9.93 62_Normalised 0.971 2 0.50 0.52 10.48 10_ Normalised 0.990 2 0.50 0.52 0.52 9.05 32 0.992 2 0.26 0.29 5.84 10 0.988 2 0.26 0.29 0.47 8.18 32_ Normalised 0.991 2 0.27 0.29 5.84 10_ Normalised 0.990 2 0.27 0.29 0.34 5.92 The external validation of M5Glu data (Table 24) showed that the models were capable of performing predictions and produced a prediction performance equivalent to the validation model generated by sample splitting. Averaging the M5Glu data improved the model error level, while normalisation saw only a slight improvement in the averaged and amalgamated data. The averaged sample sets performed better as the variance within the data was reduced compared to the amalgamated sample sets. The lower correlation in the validation models was due to the increased variation between the calibration and prediction as the result of the different sample make up and concentration ranges. Figure 77 Predicted versus expected prediction plot for normalised BC FDMSC T5 with the M5Glu calibration samples data using the 800–1680 cm–1 range. 0 1 2 3 4 5 6 7 8 9 10 -2 0 2 4 6 8 10 12 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.990 2 Latent Variables RMSEC = 0.2727 RMSECV = 0.29923 RMSEP = 0.34207 134 Table 25 Prediction results of T5 data based on the normalised BCFDMSC data in the 800-1680 cm–1 region model from the M5Glu data. Sample No Expected g/L Predicted g/L Difference g/L T5Glu01 1.7 2.07 +0.37 T5Glu02 2.6 3.02 +0.42 T5Glu03 3.5 3.96 +0.46 T5Glu04 4.4 4.58 +0.18 T5Glu05 5.3 5.55 +0.25 T5Glu06 6.2 6.67 +0.47 T5Glu07 7.1 7.29 +0.19 T5Glu08 8.0 8.24 +0.24 T5Glu09 8.9 8.55 -0.35 T5Glu10 9.8 9.49 -0.31 Table 25 displays the prediction results for the T5 samples. The reasonable performance from the M5Glu/T5 model indicated the potential of Raman spectroscopy for modelling high concentration samples. These results showed that prediction of the low concentration samples was worse than the higher concentration ones. This was consistent with previously modelled M1Glu data, where better calibration models were obtained as a result of higher concentration and stronger signal. Therefore, adapting the Raman procedure to higher concentration ranges would lead to an analytical tool with better accuracy for quantifying D-glucose concentration in media. 3.11 General Conclusions: Raman Analysis The use of Raman spectroscopy was investigated as an analytical tool for the measurement of cell culture media components (D-glucose, eRDF and yeastolate) in a model aqueous media, as it offers rapid, non-destructive analysis with little sample preparation. The determination of D-glucose in three different model media with increasing complexity (M1Glu, M3Glu and M5Glu) was investigated. This was carried out in a stepwise fashion to cover the different factors affecting Raman spectra quality, the required pre-processing, and the correlation of the signal with compositional changes. The major issue with these Raman datasets was the large water signal compared to weak analyte signal as seen in Figure 78. The water signal dominated the first loading for the different sample sets. M1Glu was a simple system where the analyte was at 135 higher concentrations, giving a stronger performing model. For this reason, the first loading of the model selected for M1Glu had small peaks from the D-glucose beneath the water signal. However for the M3Glu and M5Glu data, the first loadings were the same as the water signal with varying baselines. Elimination of the water signal from the data by simple subtraction was possible though it produced spectra containing artefacts. These artefacts occurred when the variance amongst the spectra was caused by more than the weak analyte signal and its varying concentration. Unresolved issues such as baseline offset, sloping baseline and noise also contributed to the spectral variance. Further pre-processing prior to water elimination may prevent the appearance of some artefacts. The efficacy of the water elimination method as part of a series of pre-processing steps was shown by Li et al. when performed on relatively low concentration samples (1–2% dissolved solids). [3] However in this study, the samples (M3Glu/M5Glu) had a lower concentration of dissolved solids (~1%) and WE was not a suitable method. The second loading for the M1Glu, M3Glu and M5Glu data revealed the impact of lower D-glucose concentration and increasing sample complexity (Figure 78). For M1Glu (blue), the analyte signal was clear but became less defined with M3Glu (green) and M5Glu (red). The noise level was elevated with the M3Glu and M5Glu data. M3Glu was severely affected by the water bending band at 1640 cm–1 as it had the lowest level of dissolved solids. Figure 78 (Left) First loadings and water spectra and (Right) second loading for M1Glu, M3Glu and M5Glu from equivalent40 PCA models. 40 Same data treatment. 500 1000 1500 2000 2500 3000 0 0.2 0.4 0.6 0.8 1 Wavenumber(cm-1) In te n s it y L1M1(99.91%) L1M3(99.99%) L1M5(99.98%) H2O 500 1000 1500 2000 2500 3000 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.10.15 0.2 0.25 Wavenumber(cm-1) In te n s it y L2M1(0.08%) L2M3(0.03%) L2M5(0.01%) 136 The calibration methods were used on high (M1Glu) and low concentration (M3Glu, M5Glu, M5eRDF and M5Ye) samples. The performance of the various models reflected the different concentration magnitudes and dataset complexity. However, the same pre-processing and region selection were found to be adequate for all the different D-glucose samples sets. The M5Glu model developed here performed well in the 0–9.92 g/L range giving a REP of ~ 5%. This was well within the typical concentration range of glucose in media samples for mammalian cell culture (which is typically 1–10 g/L). [23, 299, 300] The glucose concentration required for plant, yeast and bacterium cell lines is higher (20–30 g/L). [24, 25, 301] The Raman method worked to a limited level of accuracy for the D-glucose samples as a result of the low concentrations studied. An improvement of the model performance would be expected with a higher concentration range where the signal would be stronger. For the quantification of the more complex components, Raman analysis did not perform well. No acceptable model was found for the weak M5Ye samples and the M5eRDF model was three times weaker than the M5Glu model. These results could be improved by a larger number of samples but were intrinsically weakened by the low concentration ranges. This study has shown that (1) Raman analysis is not sensitive enough for the dilute solutions tested here, (2) the analyte signal is obscured by the water signal and the associated shot noise, and (3) the water elimination correction method added more interfering information rather than being of any benefit to the analyte signal. For these reasons the next logical step was to use SERS to enhance the analyte signal while limiting the impact of the water signal. SERS can however only be applied to yeastolate and eRDF quantification as D-glucose is not a SERS active molecule. 137 138 4 Surface Enhanced Raman Spectroscopy (SERS) Analysis of Complex Media Components 4.1 Rationale for Quantitative Analysis using SERS The Raman method worked for the determination of D-glucose in M5Glu samples with a REP of ~ 5%, but the method was much less effective or did not work at all for the determination of eRDF and yeastolate. D-glucose was relatively easily quantifiable because it was present in reasonable concentrations. However yeastolate and eRDF are complex mixtures and therefore the individual analyte concentrations were much lower. For example the pyridoxine concentration in the M5eRDF samples varied from 0.058–0.376 mg/g. It was also not feasible to measure each individual component as eRDF is composed of over 30 components and yeastolate is even more complex. SERS offered greater sensitivity and was applied to the qualitative analysis of yeastolate and other media components [5, 159]. The use of SERS for monitoring changes in yeastolate showed that the SERS signal from complex media components was a useful qualitative tool for detecting batch to batch variations and storage changes [5]. With this in mind, we propose to use SERS to quantify the eRDF and yeastolate concentrations in cell culture media. Specifically we want to quantify the global concentration of eRDF/yeastolate and not individual constituents. This chapter focuses on two topics: (a) The investigation of SERS signals from complex media components eRDF and yeastolate; and (b) The quantification of eRDF and yeastolate in media by SERS. SERS measurements were carried out on the same M5eRDF and M5Ye sample sets as for conventional Raman (Figure 79). The comparison of SERS spectra obtained by using colloidal silver nanoparticles, with the corresponding Raman spectrum revealed a significant enhancement of the Raman signal (Figure 80). 139 Figure 79 (Left) SERS spectra of M5eRDF samples (1–6.4 g/L), and (Right) SERS spectra of M5Ye samples (0.1–1.72 g/L). Figure 80 Conventional Raman spectra, SERS spectra and difference for the M5eRDF samples (Left) and M5Ye samples (Right). 4.2 Experimental Considerations for SERS Analysis The silver colloid was prepared by reduction of silver nitrate with sodium citrate using the Lee and Meisel method. Silver colloids made by this method have shown to be stable and display activity for several (~6) months [302]. The relative intensity of the SERS signal depended on various experimental parameters such as the quality of the silver colloid, the ratio of the colloid to sample, the use of aggregating agents and the incubation time. No aggregating agents were used in this study as eRDF already contained two common aggregating agents in its formulation, sodium chloride and magnesium sulphate. In the absence of additional external aggregating agents, colloid mixed with M5 media samples gave detectable bands [303]. 4.2.1 The Absorption Spectrum (λ max and FWHM) Batches of silver colloid were repeatedly prepared in order to achieve consistent colloids and were assessed based on their UV-Vis spectrum (Figure 81). The UV-Vis 500 1000 1500 2000 2500 3000 4 6 8 10 12 14 16 x 10 4 Raman Shift (cm-1) In te n s it y 500 1000 1500 2000 2500 3000 4 6 8 10 12 14 x 10 4 Raman Shift (cm-1) In te n s it y 50 100 150 200 250 300 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 Wavenumber (cm-1) In te n s it y eRDF AvgSERS eRDF AVGCR Difference 50 100 150 200 250 300 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 Wavenumber (cm-1) In te n s it y Ye AvgSERS Ye AVGCR Difference 140 spectrum gave information on the size (λmax) and the size distribution (fwhm) of colloid particles. For optimum SERS enhancement when using 785 nm excitation, the good quality silver colloids showed λmax values close to ~400 nm and a fwhm of <60 nm [304-306]. An increase in the fwhm value indicated an increasing particle size variation [307]. From our preparation of silver colloids, good batches had an absorption maximum (λmax) of ~406 nm with a full width half maximum of 80 nm. Acceptable Raman spectra were also achieved from colloids with a λmax as high 412 nm; however colloids with a λmax of ~430 nm generated poor Raman spectra (Figure 81). Figure 81 Normalised UV-Vis absorption spectra of ten different batches of silver colloid, where the optimal colloids are the solid lines while the poor performing colloids are represented by the dashed lines. In order to overcome batch variation based on colloid particle size, several batches of good quality SERS colloids were mixed together to form a single colloid. Mixing batches minimized batch-to-batch variation which would otherwise adversely affected spectral reproducibility [306]. 4.2.2 Sampling Time Variation in the time between the addition of silver colloid to the sample and the measurement of the spectra can affect the intensity of the SERS signal. The intensity of the SERS signals will rise, stabilise and decline with different incubation times. After mixing the sample and colloid, a number of competing effects happen. Firstly as 300 350 400 450 500 550 600 650 700 750 800 850 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Wavelength(nm) In te n si ty B1 408nm B2 430nm B3 404nm B4 406nm B5 412nm B6 404nm B7 404nm B8 434nm B9 404nm B10 404nm 141 particles aggregate they form junctions and the structure of these junctions generates highly localised plasmon fields. These result in a dramatic enhancement if SERS active analytes are present. These hot spots generate very intense SERS signals. Secondly as the nanoparticles aggregate to form bigger particles, these can simply precipitate out of solution, decreasing thesignal [5, 233, 308-310]. It was previously seen in an eRDF solution (18 g/L) using a 1:4 sample to colloid ratio that the SERS signal steadily increased for about 6 minutes before levelling off [101]. Therefore, data collection was performed within minutes of colloid addition before levelling off could occur. The colloid was added, the solution was mixed five times and the spectra were then measured. The incubation times for all samples were kept close and as short as possible. Figure 82 (a) Plot of SERS spectra for M5Ye sample (1.54 g/L) versus time (sixteen measurements taken over an hour), (b) Intensity profiles for selected peaks showing the increasing intensity and (c) the intensity ratio for the selected peaks against the water peak at 1604 cm–1. The SERS spectra were measured using a single point collection with an exposure time of 2×10 s. A sample to colloid ratio of 1:1 was used. 600 800 1000 1200 1400 1600 1800 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 Wavenumber (cm-1) In te ns ity 60 mins 0 mins (a) (b) (c) 142 In order to investigate incubation time effects on the M5Ye (yeastolate at 1.54 g/L) and M5eRDF (eRDF at 3.4 g/L) samples, a series of SERS spectra were taken over an hour using a 1:1 sample colloid ratio. Sixteen measurements were made with no re- suspension to show the evolving spectra for M5Ye and M5eRDF (Figure 82a and Figure 83a). The spectra displayed a steady increase in baseline intensity and enhancement with time indicating a reasonably stable sample colloid mixture with no need of re-suspension. These results differed from the SERS testing of eRDF (17.7 g/L) solution with the 1:4 sample to colloid ratio. In that case, aggregation of the nanoparticles was induced at a higher rate compared to the less concentrated M5Ye and M5eRDF samples [101]. The use of dilute media samples required a smaller quantity of colloid which saw less aggregation occurring and provided a steadily increasing signal for testing. In the M5Ye sample (Figure 82b), the 730 cm–1 peak exhibited the greatest intensity increase followed by the 1332 cm–1 peak, while for the M5eRDF sample (Figure 83b), it was the 650 cm–1 peak along with the 1388 cm–1 that displayed the greatest intensity. We speculated that the 730 cm–1 peak was related to the adenine signal in yeastolate, while the 650 cm–1 peak was a result of the L-cysteine hydrochloride monohydrate present in eRDF and the 1332 cm–1 and 1388 cm–1 signified the amino acid portion of the M5eRDF and M5Ye samples, respectively. In terms of variance compared to the 1604 cm–1 band (OH bending band), there were two trends: the low- wavenumber bands (650 cm–1 and 730 cm–1) showed a relative increase and the high- wavenumber bands remained stable (see Figure 82c and Figure 83c). This could be a result of the high-wavenumber bands relating to the stretching vibrations while the low-wavenumber bands involved bending vibrations (since more energy is required to stretch a group than to bend a group) [311, 312]. Other factors such as the surface orientation of individual analytes may also be the source of differences in the intensity bands. As the micro-environment of the sample was continually changing because of the aggregation of the colloid, more hotspots were forming resulting in an increasing signal from molecules41 closer to the hotspots [311]. 41 Molecules that are perpendicular to the surface are more significantly enhanced than those parallel to the surface. 143 For these samples, using a 1:1 ratio of sample to colloid gave a stable mixture without re-suspension of the sample as the signal increased or remained constant during the testing period, indicating that precipitation did not occur and the SERS signal was steady. Figure 83 (a) Plot of SERS spectra for M5eRDF sample (3.4 g/L) versus time (sixteen measurements taken over an hour), (b) Intensity profiles for selected peaks showing the increasing intensity and (c) the intensity ratio for the selected peaks against the water peak at 1604 cm–1. The SERS spectra were measured using a single point collection with an exposure time of 2×10 s. A sample to colloid ratio of 1:1 was used. 4.2.3 Reproducibility Reproducibility was a major issue with SERS measurements. If the sample was left to stand, the colloid was liable to aggregate and precipitate out of solution before testing, therefore immediate testing of the sample was preferable. PCA scores (Figure 84) of the replicate runs42 for the M5eRDF and M5Ye measurements demonstrated the class variance for the different data collections. M5Ye samples (Figure 84b) were more stable with overlapping ellipsoids, while the M5eRDF samples showed greater 42 The PCA scores for the individual runs are shown in 8.4.1 600 800 1000 1200 1400 1600 1800 1000 2000 3000 4000 5000 6000 7000 8000 Wavenumber cm-1 In te ns ity 60 mins 0 mins (a) (b) (c) 144 variability especially amongst the low concentration samples. For example, M5eRDFS01 from data collection one (#1) and three (#21) were close but for data collection two (#11), it differed with its high PC2 reading. This may be a low concentration effect where matrix effects were causing more fluctuations to occur, given that the high concentration samples M5eRDFS10 (#10, #20, #30) were grouped together in the centre of the scores plot. Figure 84 Scores plots for the three raw data collections for M5eRDF (a) and M5Ye (b), with the replicate runs outlined. The ellipsoids represent the PCA subspace generated for each dataset. 4.3 Spectral Analysis of M5eRDF and M5Ye Data While the conventional Raman spectra of the aqueous solutions of eRDF (17 g/L) and yeastolate (5 g/L) displayed very weak analyte bands with a strong water signal, SERS gave detailed spectra with multiple peaks visible over the original water signal. The SERS spectra of eRDF and yeastolate were visually different (Figure 85 and Figure 86). It was not possible to specifically assign vibrational modes for peaks within complex mixtures but one may speculate as to the origin of certain peaks. The SERS spectrum depended on the Raman response of the media components capable of binding to the colloid surface. For example within the eRDF and yeastolate samples, some compounds were strongly SERS active and some were not. There were similarities between the two materials in that they both contained amino acids as well as a wide range of other biochemical compounds. It was known that some of these compounds were strongly SERS active; for example, compounds containing nitrogen and sulphur. Common peak positions visible at 730, 802 and 955 cm–1 suggested that the same materials within eRDF and yeastolate were binding [5]. As previously mentioned, both of these have high amino acid concentrations and therefore it may be hypothesised that what we observe in the SERS spectra were most likely signals 4 4.5 5 5.5 6 6.5 7 7.5 8 x 10 5 -8 -6 -4 -2 0 2 4 6 8 10 x 10 4 Scores on PC 1 (99.73%) S c o re s o n P C 2 ( 0 .1 4 % ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 19 20 21 22 23 24 25 26 27 29 30 4 4.5 5 5.5 6 6.5 7 7.5 8 x 10 5 -6 -4 -2 0 2 4 6 x 10 4 Scores on PC 1 (99.86%) S c o re s o n P C 2 ( 0 .0 6 % ) Run 1 Run 2 Run 3 Run 1 Run 2 Run 3 (a) (b) 145 originating from the amino acid constituents of the media. The peak at the 655–666 cm–1 may be assigned to the C–S stretching vibration of cysteine, as L-Cysteine hydrochloride monohydrate was present in eRDF. [313, 314] Previous amino acids studies [315-318] suggest that the bands at 1332–1396 cm–1 were related to the C–COO− stretching vibration and COO− symmetric stretching vibration enhancements. This demonstrated binding to the silver surface through the carboxylic group. The broad band at 1644 cm–1 was the Raman signal from the water solvent (OH bending). In previous work on yeast extracts, the strongest SERS peak at 730 cm–1 was associated with the adenine ring breathing mode as adenine produces a strong SERS signal. [5, 319] Adenine was not, however, listed in the formulations for either eRDF (Table 45) or yeastolate (Table 46). Despite this, since yeastolate was not chemically defined, adenine may be present but may not have been tested in the yeastolate processing. Previous studies have shown adenine to be present in yeast extract because of its cellular role as a direct or indirect building block. Eleven lots of yeast extracts were tested using pre-column derivatisation with reverse phase HPLC and fluorescence detection. The adenine content recorded was an average of 1.16 mg/g + 0.71 mg/g. [320] Figure 85 SERS and Raman spectra for an aqueous solution of yeastolate (5 g/L). 500 1000 1500 2000 2500 3000 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 5 Wavenumber (cm-1) In te ns ity Yeastolate Ye SERS 730 1332 1027 1388 802 955 1644 658 2934 146 Figure 86 SERS and Raman spectra for an aqueous solution of eRDF (17 g/L). 4.4 Region Selection for Quantitative Analysis Since the aim was to quantify the eRDF and yeastolate content in the M5 media, spectra were collected at three spatial points per sample and averaged. The averages of the replicate measurements were used to generate the calibration model in order to get a more representative signal. This stage involved determining the most informative spectral data. MWPLS was tested, but was inconclusive, so instead five regions were manually selected (Table 26). Table 26 Spectral areas selected for calibration modelling of the SERS spectra. Region ID Wavenumber Region (cm–1) Full(F) 250–3300 cm–1 Reduced Region (ROI) 707–1853 cm–1 Region A (A) 602–995 cm–1 Region B (B) 1260–1444 cm–1 Region A and B (AB) 602–995 cm–1 & 1260–1444 cm–1 4.5 Quantitative Analysis of Yeastolate in M5Ye SERS Data By using SERS, the aim was to improve the calibration modelling that was achieved using conventional Raman. The replicate measurements were also modelled separately. The values for those models are shown in the appendix, see section 8.4.3. Comparison of the models confirmed that averaging the data improved the models performance for the M5Ye data. Table 27 shows the best results of the calibration 500 1000 1500 2000 2500 3000 4 6 8 10 12 14 16 x 10 4 Wavenumber (cm-1) In te ns ity eRDF eRDF SERS 2942 1612 1340 1035 955 730 899 802 666 1396 147 models for determining yeastolate concentration using different spectral pre- processing methods. Using SERS data it was possible to generate a better correlation between the yeastolate concentration and the M5YE spectra. The model performance of the conventional Raman data gave an error of 38–47% compared to 10–18% error for these SERS models. However the error levels were still high and accurate predictions were difficult to reliably obtain. Compared to the other pre-processing methods (Table 27), the result for FDMSC offered the best performance; the calibration model had REP of ~12% with three latent variables. Also seen in Table 27 was a high correlation of 0.998 with a lower REP of ~10% for the NormINF model. This model appeared better than FDMSC model but it was subject to over-fitting43 as indicated by the higher number of latent variables used and the large difference between the RMSEC and RMSECV values. The reduced region (1260–1444 cm–1) of the spectra used for the FDMSC model (Figure 87) contributed to the correlation observed. From previous studies[95, 145], the bands from 1300 cm–1 to 1400 cm–1 were attributed to CH /CH3 bending and deformation in amino acids. Since there were multiple amino acids within the M5Ye model media, it was not surprising that the amino acid components influenced the calibration model. Figure 87 SERS spectra for M5Ye of the 1260–1444 cm–1 range after FD-MSC pre-processing with the predicted versus expected plot for the resulting calibration model. 43 Over-fitting leads to poor prediction results as the model is too specific to the calibration samples. 1260 1280 1300 1320 1340 1360 1380 1400 1420 1440 -1500 -1000 -500 0 500 1000 1500 2000 Wavenumber(cm-1) In te n s it y 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.979 3 Latent Variables RMSEC = 0.075789 RMSECV = 0.11168 148 Table 27 The best results for calibration modelling of yeastolate in the M5YE SERS data M5Ye LV (R2) RMSEC (g/L) RMSECV (g/L) REP% WE Data (250–3311 cm–1) 4 0.966 0.09 0.16 17.58 WE MSC (250–3311 cm–1) 4 0.973 0.08 0.15 16.48 WE NINF A (602–995 cm–1) 5 0.998 0.02 0.10 10.98 WE Norm2A (602–995 cm–1) 4 0.985 0.06 0.14 15.38 BC FST11B (1260–1444 cm–1) 3 0.951 0.11 0.15 16.48 BC FST11MSC B (1260–1444 cm –1 ) 3 0.979 0.07 0.11 12.08 The loadings and scores (Figure 88) explained more about the behaviour of the model. The first loading was representative of the average FDMSC signal with peaks at 1310 cm–1 and 1380 cm–1, while the corresponding first score plot confirmed the increasing signal for an increasing yeastolate concentration. Figure 88 The loadings and scores versus samples from the M5Ye SERS calibration model with three components: the first component in blue, the second in green and the third in red. The second loading dealt with the 1310 cm–1 and 1420 cm–1 peaks while the third loading covered the peaks at 1270 cm–1, 1330 cm–1, 1380 cm–1 and 1440 cm–1. From the loadings, it was clear that the M5YeS06 was different from the neighbouring samples. However in the calibration M5YeS06 was on the regression line and was not an outlying sample. Also in the PCA results (Figure 116), M5YeS06 was grouped with the other samples. It seemed to be an anomaly. The fault in this sample may be 1260 1280 1300 1320 1340 1360 1380 1400 1420 1440 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 Wavenumber(cm-1) Lo ad in gs LPC1(88.90%) LPC2(10.18%) LPC3(0.64%) 149 in the sample preparation given that when it was compared with the other samples, it displays a similar spectrum to the M5YeS01 and M5YeS02. In the prediction testing of eRDF concentration within the M5YeS06 sample as part of the model evaluation for M5eRDF, the eRDF concentration was double what was expected. This gave merit to the hypothesis of incorrect sample preparation and also gave a reason as to why the M5YeS06 spectrum matched the M5YeS01 and M5YeS02 samples. They shared a common trait - their ratio of eRDF to yeastolate was large compared to the high concentration samples. Another factor linking this M5YeS06 sample to a strong eRDF signal was the proximity of the 1375 cm–1 peak to the 1388 cm–1 peak which was a significant peak in the M5eRDF spectra (Figure 83). 4.6 Quantitative Analysis of eRDF in M5eRDF SERS Data Here SERS was used to predict eRDF concentration in the M5eRDF samples. Various pre-processing methods were surveyed for the calibration modelling (see section 8.4.2). The top calibration models are listed in Table 28 with the best M5eRDF model shown in Figure 89. The best model was obtained by using water eliminated, normalised data. The water eliminated data enhanced the peaks in the fingerprint region of 600–1700 cm–1 and at 2900 cm–1. Even though water elimination artefacts were a problem, they were not as significant as those seen in the M1Glu/M3Glu/M5Gludata (Figure 89). The largest artefact was from the removal of the large OH band which left a large negative peak above 3000 cm–1. The other artefacts were spectral offset and sloping baseline. However normalisation removed a lot of the measurement error associated with absolute intensity fluctuations therefore decreasing the impact of these latter artefacts (offset and baseline effects). Table 28 Comparison between best calibration models after different pre-processing methods from the M5eRDF SERS data M5eRDF LV (R2) RMSEC (g/L) RMSECV (g/L) REP% AvgDataB(1260–1444 cm–1) 3 0.919 0.49 0.63 17.02 WE MSC (250–3311 cm–1) 2 0.908 0.52 0.63 17.02 WE NINF (250–3311 cm–1) 2 0.922 0.48 0.59 15.94 WE Norm2ROI (707–1853 cm–1) 4 0.943 0.41 0.61 16.48 Avg FST11ROI (707–1853 cm–1) 4 0.960 0.34 0.74 20.00 Avg FST11MSCROI(707–1853 cm –1 ) 4 0.965 0.32 0.74 20.00 150 Figure 89 Water eliminated SERS spectra after normalising for the M5eRDF over the entire range (250–3311 cm–1) and the predicted versus expected plot for corresponding calibration model. The REP for this model was ~16%, similar to that achieved with the conventional Raman data. In addition, only two latent variables were used and a low SECV/SEC ratio was noted, unlike the conventional Raman model. Both scores show increasing linear trends for increasing eRDF concentration per sample; see Figure 90. When comparing the scores versus concentration plots, the R2 value for L2 was 0.92 compared to 0.71 for L1. The first loading was representative of the average spectrum after water elimination. This led to a weaker correlation from the signal for the changing eRDF concentration as the largest water elimination artefact was also included in the model. The second variable was better correlated to the increase in eRDF concentration as its fingerprint region was highly detailed and the level of noise from the water elimination was significantly less. Figure 90 The loadings and scores versus samples for the calibration model of M5eRDF SERS with the first component in blue and the second in green. 500 1000 1500 2000 2500 3000 -0.2 0 0.2 0.4 0.6 0.8 Wavenumber(cm-1) In te n s it y 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.922 2 Latent Variables RMSEC = 0.4811 RMSECV = 0.5958 500 1000 1500 2000 2500 3000 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Wavenumber(cm-1) L o a d in g s LPC1(95.78%) LPC2(2.87%) 151 4.7 Model Evaluation The following procedure was used in order to perform the test set evaluation of the best M5eRDF and M5Ye models. For the yeastolate models, the M5eRDF sample set was used to attempt to predict its yeastolate concentration For the eRDF models the M5Ye sample set was used to attempt to predict its eRDF concentration. The advantage of doing this was that test sets with a significant in-built variability were being used while also removing the need to build a new test set. These samples were used for the prediction of the stable analyte concentration and also to see how spectral fluctuations may impact the prediction of analyte concentrations. Table 29 Prediction results based on the M5Ye prediction of yeastolate concentration in M5eRDF and M5eRDF prediction of eRDF concentration in M5Ye. Sample ID/ Expected Ye Conc. (g/L) SERS Predicted Ye Conc. Sample ID / Expected eRDF Conc. (g/L) SERS Predicted eRDF Raman Predicted eRDF M5eRDFS01 1 0.98 M5YeS01 3.4 3.98 3.21 M5eRDFS02 1 1.60 M5YeS02 3.4 3.27 3.06 M5eRDFS03 1 1.81 M5YeS03 3.4 3.26 3.03 M5eRDFS04 1 1.22 M5YeS04 3.4 3.40 3.61 M5eRDFS05 1 1.00 M5YeS05 3.4 3.34 3.23 M5eRDFS06 1 0.95 M5YeS06 3.4 5.60 3.10 M5eRDFS07 1 0.78 M5YeS07 3.4 3.28 2.91 M5eRDFS08 1 0.59 M5YeS08 3.4 3.38 3.80 M5eRDFS09 1 0.58 M5YeS09 3.4 3.59 3.24 M5eRDFS10 1 0.52 M5YeS10 3.4 3.51 3.79 As the REP for the SERS and conventional Raman M5eRDF models were the same, both models were tested for the prediction of eRDF. Yeastolate prediction used the best M5Ye SERS model and the results showed that prediction was possible (Table 29). The closer the validation samples were to the model samples the better the prediction was (as seen with the M5YeS04 and M5YeS05 for eRDF and samples 152 M5eRDFS05 and M5eRDFS06 for yeastolate). In the design of the experiment, the concentration of the samples overlapped in the mid-point range of the samples, see Table 6 and Table 7. The M5Ye SERS model only predicted three samples accurately. However for the yeastolate prediction, the influence of the eRDF concentration in the M5eRDF samples was evident. The prediction ability of the samples decreased with increasing eRDF concentration. The prediction of the eRDF concentration was not affected by the varying concentration of yeastolate within the M5Ye samples. Moreover, the SERS model was more accurate than the Raman model. Seven of the ten validation samples were within 10% of the target concentration while only four of the Raman predictions were within this limit. 4.8 General Conclusions: SERS Analysis of M5eRDF and M5YE SERS was investigated as an analytical tool to improve upon the previous measurements of eRDF and yeastolate in a media environment using conventional Raman. The SERS method showed that it was possible to enhance the analyte signal sufficiently to undertake quantitative ingredient analysis. When comparing the SERS and conventional Raman models for eRDF and yeastolate the following was observed: For eRDF, the best models gave equivalent prediction errors of ~16%. The SERS model was better than the Raman model, however, because it did not show any over-fitting44; which can lead to inaccurate predictions. For yeastolate, the SERS model was much improved with a percentage error of ~12% versus ~38% for Raman. The low concentration of M5Ye sample set with a strong water signal hampered the Raman model. The SERS method gave the signal enhancement needed for M5Ye samples to compensate for the strong water signal which led to a good r2 correlation of 0.979. 44 The Raman model has a high SECV/SEC ratio of 3.5 compared to 1.22 for the SERS model. 153 SERS showed some promise at quantifying the complex media components as a whole but the error levels were still too high. A point to note was that the concentration range used here is greater than the +10% variation typically expected in industrial use. The predictions showed that best results were closest to the mid-point of the concentration range for the media samples, thus a reduction of the concentration span would improve the quantification. In the prediction of yeastolate, the method only worked for the low to mid-range M5eRDF samples as another SERS active molecule contained in the eRDF component impacted the result. Therefore using the current setup it would not be feasible to quantify both eRDF and yeastolate simultaneously due to spectral overlap, as there were too few samples and the experimental design was not fully optimised. It may be more feasible once a better sampling and model setup are implemented. This could include: a calibration sample set of more than 60 samples, a reduced range of + 20% of concentration specification and a greater number (i.e. >3) of replicate measurements per calibration sample. These steps should result in a more reproducible data collection and yield a more reliable calibration model to base accurate predictions on. The overall goal of this work is the development of a robust quantitative method for complex ingredient analysis, and so another approach was investigated. As both eRDF and yeastolate contain fluorophores that will produce a distinctive fluorescencespectrum, fluorescence may be capable of quantifying the complex media components as a whole. 154 5 Fluorescence Spectroscopy Analysis of Complex Media Components The work in this chapter examined the use of multi-dimensional fluorescence spectroscopy for the quantitative analysis of complex media components. The excitation emission matrices (EEM) and total synchronous fluorescence scans (TSFS) were collected for the M5eRDF and M5Ye samples sets. Both EEM and TSFS data had a reasonable signal from the fluorophores in the M5eRDF and M5Ye samples. This suggested that it may be possible to quantify yeastolate and eRDF using fluorescence and thus produce an analytical method that is non-destructive, sensitive and selective. Table 30 lists the primary fluorophores in eRDF and yeastolate and their concentration ranges in the prepared samples. The fluorescence emission profiles of eRDF and yeastolate were similar because they both contained many of the same fluorophores (tryptophan, tyrosine and phenylalanine). In contrast, the signal was dissimilar as a result of the overall different chemical composition. This gave rise to the individuality within the EEM/TSFS spectra, and allowed for differentiation and quantification of yeastolate and eRDF in a media formulation. Table 30 Summary of the fluorophores present in eRDF and yeastolate.45 Fluorophore eRDF (mg/g) Yeastolate M5eRDF(mg/g) M5Ye (mg/g) Tryptophan 1.08 0.5% w/w (5 mg/g) 6.08–11.92 17.5–25.6 Tyrosine 5.11 0.8% w/w (8 mg/g) 13.11–40.75 28–40.96 Phenylalanine 4.37 3.6% w/w (36 mg/g) 40.37–63.97 126–184.32 Riboflavin 0.011 Not listed 0.011–0.075 0.04 Pyridoxine 0.058 Not listed 0.058–0.376 0.197 Folic acid 0.517 Not listed 0.517–3.308 1.757 Phenol Red 0.294 Not listed 0.294–1.882 0.999 45 Formulation compositions provided by the manufacturers, see 8.2.1 and 8.2.2. 155 5.1 The EEM/TSFS Analytical Procedure Multi-dimensional fluorescence data provided information about chemical composition because both the peak intensity and shape of the signal were sensitive to individual and global concentration changes in the analytes present [191, 206, 211, 321]. In this work, EEM/TSFS measurements were used to see if it was feasible to quantify the yeastolate and eRDF concentration in the model media. The NBL group has already demonstrated that EEM can be used for the quantification of individual components, [1] media variance and identification applications, [4] and media degradation [2]. Fluorescence data is information rich and can be analysed in multiple ways to extract both qualitative and quantitative results. The outline of the fluorescence workflow was: Spectral Overview o Identify Peaks in EEM/TSFS data PARAFAC/MCR Analysis o Identification of the fluorophores o Profile changes for the fluorophores in relation to concentration Variance analysis and Outlier detection o Investigate what causes changes in the spectra by PCA o Identify abnormal samples using ROBPCA Quantification – UPLS modelling of the media components (eRDF and yeastolate) in the M5eRDF and M5YE sample sets. 156 5.2 Spectral Overview of Media Samples (M5eRDF and M5Ye) Numerous peaks were seen in the EEM spectra obtained for the M5eRDF and M5Ye samples (Figure 91 and Figure 92). The samples from both samples sets had similar peaks, indicating the presence of multiple fluorophores (i.e. the fluorescent amino acids and vitamins) in eRDF and yeastolate. Previous studies in this lab had identified the key fluorophores in both eRDF and yeastolate. The peak locations of the M5eRDF/ M5YE samples in this study were similar to those of the chemically defined media samples described by Calvet et al. and yeastolate samples described by Li et al. [1, 2, 4, 7] In their studies five peaks were identified at λex/λem = 275/310 nm, λex/λem = 260–285/355 nm, λex/λem = 320/390 nm, λex/λem = 365/520 nm and λex/λem = 355/445 nm. These were due to the fluorescence of amino acids (tyrosine and tryptophan) and vitamins (pyridoxine, riboflavin, and folic acid). Figure 91 EEM landscape plots of (left) an M5eRDF sample (1 g/L eRDF) and (right) an M5Ye sample (0.1 g/L Ye). The Rayleigh scatter was removed from the spectra. 157 Figure 92 EEM contour profiles46 for (a) M5eRDF S01(1 g/L eRDF), (b) M5eRDF S10(6.4 g/L eRDF), (c) M5Ye S01(0.1 g/L Ye ) and (d) M5Ye S10 (1.72 g/L Ye). For the EEM landscape plots (Figure 91), the peak of maximum intensity for the M5eRDF and M5Ye data was located at excitation/emission wavelengths (λex/λem) of 285/355 nm. The secondary peaks were located at λex/λem of 280/305 nm, 230/305 nm and 230/360 nm. Second order bands started to appear at λex/λem of 280/595 nm and 230/595 nm. The contour plots (Figure 92) showed the changes in the signals with increasing concentration. For both the M5eRDF and M5Ye samples, the tryptophan signal peak at 285/355 nm dominated. In the case of M5eRDF samples, the tyrosine peaks at 280/305 nm and 230/305 nm were weak but observable (even after the increase in concentration from 1 g/L to 6.4 g/L). Similarly the low concentration M5YE sample displayed the same peaks indicative of tyrosine (at 280/305 nm and 230/305 nm). Tyrosine fluorescence can be difficult to observe clearly due to overlap with the tryptophan emission band and the occurrence of radiative energy transfer (RET) from tyrosine to tryptophan. With increasing sample concentration, the tyrosine signal decreased as the tryptophan signal increased. In both sample sets, the much weaker emission from pyridoxine (325/395 nm), riboflavin (455/520 nm), and 46 300 contour lines were used with 0.83 spacing starting with 0.90. Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 E x c it a ti o n W a v e le n g th ( n m ) Emission Wavelength (nm) 300 350 400 450 500 550 600 250 300 350 400 450 500 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 E x c it a ti o n W a v e le n g th ( n m ) Emission Wavelength (nm) 300 350 400 450 500 550 600 250 300 350 400 450 500 (a) (b) (c) (d) 158 folic acid (350/445 nm) only became visible at high concentrations of eRDF and yeastolate. TSFS spectra provided an alternative way of measuring the total emission of complex mixtures. When comparing TSFS with the EEM spectra, the output plot and data were orientated differently. In TSFS, the peaks were viewed by plotting the excitation wavelength against the delta wavelength offset (Δλ=λem−λex) with the intensity along the z-axis. The TSFS landscape plots for both M5eRDF and M5Ye (Figure 93) displayed the tyrosine signal at 230 nm excitation while the tryptophan signal was visible at 285 nm for the M5eRDF sample and at 290 nm for M5Ye. Figure 93 TSFS landscape plots for (left) M5eRDF media sample (1 g/L eRDF) and (right) M5Ye media sample (0.1 g/L Ye). In order to more easily compare EEM and TSFS data, the TSFS spectra were re- plotted after being mathematically transformed into EEM spectra. The transformation involved diagonally stacking the collected data and filling in zero for empty areas. When comparing the contour plots of the TSFS and EEM data (Figure 94c and d) the signal intensities for the samples were different but the peak positions were constant. From the fluorescence profiles of the media samples, it was possible to see that these samples had the same underlying components. The peaks were visible in both the EEM and TSFS contour plots, atthe following wavelengths: 285/355 nm (tryptophan, 1), 280/305 nm (tyrosine, 2), 325/395 nm (pyridoxine, 3), 455/520 nm (riboflavin, 4) and weakly at 350/445 nm (folic acid, 5) [1]. 159 Figure 94 (a) TSFS landscape for M5YeS05 media sample (0.82 g/L) and (b) TSFS contour profile, (c) rearranged TSFS profile into EEM format and (d) EEM contour profile for M5YeS05 media sample.46 Figure 95 (a) TSFS landscape plot for M5eRDFS05 media sample (3.4 g/L), (b) TSFS contour profile, (c) rearranged TSFS contour profile into EEM format and (d) EEM contour profile for M5eRDFS05 media sample. 46 50 100 150 200 300 400 500 50 100 150 200 250 Delta wavelength (nm)Excitation wavelength (nm) In te n s it y D e lt a W a v e le n g th ( n m ) Excitation Wavelength (nm) 250 300 350 400 450 500 20 40 60 80 100 120 140 160 180 200 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 E x c it a ti o n W a v e le n g th ( n m ) Emission Wavelength (nm) 300 350 400 450 500 550 600 250 300 350 400 450 500 (c) (a) (b) (d) (1) (2) (3) (5) (4) 50 100 150 200 300 400 500 50 100 150 200 250 Delta wavelength (nm)Excitation wavelength (nm) In te n s it y D e lt a W a v e le n g th ( n m ) Excitation Wavelength (nm) 250 300 350 400 450 500 20 40 60 80 100 120 140 160 180 200 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 (a) (b) (c) (d) (1) (2) (3) (5) (4) 160 5.3 Assessing Fluorophore Contributions from Media Fluorescence In the model media, the identities of the fluorophores which contributed most significantly to the EEM and TSFS profiles were tentatively identified by visual inspection. These findings were then corroborated with literature references [1, 2, 4, 7, 8]. While this was acceptable as an initial inspection, there needed to be a more precise identification of the fluorophores. A superior, more rigid approach was to use a mathematical, factor based, chemometric approach like PARAFAC or MCR in order to identify fluorophores and to analyse emission changes [250, 322-324]. PARAFAC decomposed the fluorescence data into the individual excitation and emission profiles for the fluorophores in the sample. The PARAFAC model gave loadings to help determine the fluorophores present and additionally generated a score associated with each component (fluorophore). Changes in each components contribution to the EEM spectrum were thus quantified as sample composition varied. PARAFAC scores may not correlate with fluorescence concentration because of non- linearities caused by IFE/RET etc. TSFS data was not suitable for PARAFAC analysis as it was not tri-linear [246]. PARAFAC was optimized to work with tri- linear data which was characterised by the fact that each component displayed the same pattern (profile) for the different samples in both excitation and emission modes. Within the M5eRDF/M5Ye samples, the complex sample matrix affected the fluorescence profile, leading to non-linear data. This was one of the challenges with these complex media where the emission and excitation spectra of many fluorophores overlapped. In the PARAFAC results, the unique profiles obtained were not the true profiles of the components, as the data was not tri-linear47. Since the PARAFAC results were found to be lacking, another factor analysis method (MCR) was utilised to help determine and better understand the behaviour of the underlying components. MCR worked better with non-trilinear data for the evaluation of the fluorescent 47 For the data to be tri-linear , the same sample profile must hold for different samples but can be scaled differently as a result of changing concentration. 161 components by bilinear decomposition [325]. MCR was applied to both EEM and TSFS data to solve for the underlying components. For the M5eRDF and M5Ye samples, the resolution of the components improved with the use of MCR (Table 31). Table 31 The number of fluorophores/components determined by PARAFAC and MCR for M5eRDF and M5Ye 5.3.1 Fluorophore Identification and Profile Changes by PARAFAC PARAFAC generated two types of plots: (1) the loadings for the M5eRDF and M5Ye models (Figure 96 and Figure 98), which approximated the emission or excitation spectra and (2) the scores, which showed the relative contribution of each factor/spectrum. From the PARAFAC scores the degree of spectral change can be approximated (Figure 97) and the possibility of a correlation between scores data and yeastolate/eRDF concentration was investigated. The scores allowed for visualisation of how each individual component (or mixtures of components) varied in terms of contribution as the yeastolate or eRDF concentration changed. The differences in the intensity values of the EEM spectra (Figure 99 and Figure 100) showed the changing composition between low and high concentration samples. In samples of higher concentration the loss of tyrosine signal in favour of the tryptophan signal was evident and the lesser fluorophores (pyridoxine and riboflavin) became more defined. Figure 96 PARAFAC loadings excitation (left) and emission (right) for M5eRDF for the replicate EEM data collections. 250 300 350 400 450 500 0 500 1000 1500 2000 2500 Excitation wavelength (nm) L1R1 L2R1 L1R2 L2R2 L1R3 L2R3 300 350 400 450 500 550 600 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Emission wavelength (nm) L1R1 L2R1 L1R2 L2R2 L1R3 L2R3 Dataset Model Type No of Components M5Ye EEM PARAFAC 3 M5eRDF EEM PARAFAC 2 M5Ye EEM MCR 5 M5Ye TSFS MCR 6 M5eRDF EEM MCR 5 M5eRD FTSFS MCR 6 162 The M5eRDF EEM data used a two component PARAFAC model (Figure 96), however the components were not well resolved. In the PARAFAC emission loadings, the first loading was clearly tryptophan with a contribution from tyrosine (visible as a shoulder at ~ 310 nm). The second emission loading was obviously a composite emission from multiple fluorophores (folic acid, vitamin B6 and its derivatives, and riboflavin) [326-328]. The concentration change in M5eRDFS01 and M5eRDFS10 was less visible in the contour plots (Figure 99) as these samples were more concentrated compared to the low concentration M5Ye samples. They were therefore subject to more IFE’s. The PARAFAC score for component one showed an increase for each sample with increasing eRDF concentration, while the second score showed a stable signal with minor changes for each sample. Figure 97 PARAFAC scores results for M5eRDF (left) and M5Ye (right). Figure 98 PARAFAC loadings excitation (left) and emission (right) of M5Ye for the replicate EEM data collections. PARAFAC of the M5Ye EEM data revealed three components (Figure 98). The first component was clearly tryptophan. The second component featured two unresolved bands from tyrosine with a shoulder peak indicative of tryptophan. The third component represented an amalgamated peak for multiple fluorophores (pyridoxine, 1 2 3 4 5 6 7 8 9 10 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 Samples Component1 (72.52%) Component2 (27.47%) 1 2 3 4 5 6 7 8 9 10 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Samples Component1 (53.00%) Component2 (39.41%) Component3 (7.58%) 250 300 350 400 450 500 0 500 1000 1500 2000 2500 Excitation Wavelength (nm) L1R1 L2R1 L3R1 L1R2 L2R2 L3R2 L1R3 L2R3 L3R3 300 350 400 450 500550 600 0 0.05 0.1 0.15 0.2 0.25 0.3 Emission Wavelength (nm) L1R1 L2R1 L3R1 L1R2 L2R2 L3R2 L1R3 L2R3 L3R3 163 folic acid and a shoulder for riboflavin at 520 nm). From Figure 100, the peak intensity showed a difference in the M5YeS01 and the M5YeS10 samples, as the fluorophores evolved with the changing concentration. This was in agreement with the PARAFAC scores where components one and three were increasing with the increasing yeastolate for each sample. In contrast, component two - which related to tyrosine decreased as the tryptophan signal, became more dominant at the higher yeastolate concentration. For M5Ye the third component represented a merged signal of pyridoxine, folic acid and riboflavin; this signal only became visible at higher concentration. Therefore, the subsequent PARAFAC scores showed the increase for each sample. This PARAFAC result indicated that there was more variace in the M5YE samples compared to M5eRDFsamples. The change in concentration for the M5Ye samples were more signficant as larger changes in the profile and the intensity were seen. While the M5eRDF sample set had a higher overall concentration, it was also more susceptible to IFE/ET and quenching which resulted in less dynamic changes in this complex medium. The PARAFAC scores, however, showed that there was some correlation with the change in fluorophores as the eRDF concentration increased. Figure 99 Comparison of EEM contour plots for the low concentration M5eRDFS01 (1 g/L) and the high concentration M5eRDFS10 (6.4 g/L) samples; the low to high concentration is based on the added eRDF.46 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 50 100 150 200 250 164 Figure 100 Comparison of EEM contour plots for the low concentration M5YESO1 (0.1 g/L) and the high concentration M5YES10 (1.72 g/L) samples; the low to high concentration is based on the added yeastolate.46 PARAFAC was able to resolve some components or groups of components as a result of the responses to the eRDF/yeastolate concentration increase. In previous PARAFAC studies on complex mixtures, for example with polycyclic aromatic hydrocarbons (PAHs), it was shown that components tend to be grouped into classes based on similar spectral and quenching characteristics [329]. This was also observed here with the M5eRDF and M5Ye samples, where grouping of amino acids and vitamins was visible in the PARAFAC loadings. This was due to the non-linearity experienced in the matrix where different regions of the EEM (and thus different fluorophores) were affected in different ways. This grouping confirmed that PARAFAC was not very good at resolving the individual fluorophores in these complex cell culture media. The PARAFAC results obtained were different from previous studies on cell culture media; in the study by Calvet et al., it was feasible to resolve more than two or three components [1, 2]. The main difference came from the co-linearity in the variation of several fluorophores increasing together within the yeastolate and eRDF. This behaviour was not modelled by PARAFAC because it was not able to determine if the fluorophores were different components, and also because PARAFAC assumes constant profiles in all dimensions48, which was not the case. The complex M5eRDF and M5Ye samples gave rise to non-linearity in the EEM data. This non-linearity could be resolved by significantly diluting the samples so that the interactions (energy 48 For succesful PARAFAC, the profile response should be the same. Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 Emission Wavelength (nm) E x c it a ti o n W a v e le n g th ( n m ) 300 350 400 450 500 550 600 250 300 350 400 450 500 50 100 150 200 250 165 transfer/quenching/IFE) between components were minimised. This was not, however, an ideal solution for media analysis where minimal sample handling was desired. Since PARAFAC did not effectively resolve the individual fluorophores, another factor based analysis method (MCR) was used with the hope of better elucidation of the fluorophores of M5eRDF and M5YE data. 5.3.2 Fluorophore Identification by MCR Analysis Spectral profiles obtained by MCR for EEM (Figure 101) and for TSFS (Figure 102), where the data was separated into 5 and 6 component models respectively. Figure 101 Resolved emission profiles of the M5eRDF (left) and M5Ye (right) for EEM MCR models. Figure 102 Resolved Delta Profiles for M5eRDF (left) and M5Ye (right) from the TSFS data. The EEM and TSFS spectra were collected for five pure fluorophores (tyrosine, tryptophan, riboflavin, pyridoxine and folic acid) that were listed in the eRDF formulation, Table 45. The excitation, emission and delta profiles were then resolved 300 350 400 450 500 550 600 0 100 200 300 400 500 Emission wavelemgth(nm) S c o re s S1- 66.6% S2- 16.6% S3- 1.07% S4- 10.7 S5- 4.9% 300 350 400 450 500 550 600 0 100 200 300 400 500 Emission wavelemgth(nm) S c o re s S1- 68.7% S2- 14.4% S3- 0.99% S4- 109% S5- 4.81% 20 40 60 80 100 120 140 160 180 200 0 50 100 150 200 250 300 350 400 450 Delta Wavelength (nm) S c o re s S1-41.78% S2-18.81% S3-5.30% S4-18.63% S5-12.79% S6-2.45% 20 40 60 80 100 120 140 160 180 200 0 50 100 150 200 250 300 350 Delta Wavelength (nm) S c o re s S1-34.69% S2-22.45% S3-2.70% S4-24.16% S5-14.06% S6-1.77% 166 by MCR for comparison to unknown fluorophore profiles recovered from the M5eRDF and M5Ye samples. The extracted MCR profiles for the media samples were a reflection of the relative fluorescence emission between components and their scores indicated their changes in the media environment. When the recovered emission profiles for the EEM data were compared with the pure spectra of tryptophan, tyrosine, pyridoxine and riboflavin, very close agreement was achieved (Figure 103). There were shifts seen in the maximum band position between the pure and recovered profiles as the extracted profiles were affected by the sample complexity. The emission profiles recovered for the same fluorophore in the different media environments (M5eRDF/M5Ye) were in close agreement than when compared to the pure fluorophore spectra. These spectral shifts were the result of energy transfer/quenching/IFE that occurred within the media samples. Tyrosine was red shifted by ~10 nm caused by energy transfer as it overlapped with the absorption of tryptophan. There was a large difference between the final recovered component and pure folic acid spectrum; this indicated that component five was not folic acid. It was difficult to determine the number of fluorophores present above the 375 nm emission region because of the lower signal intensity. The 450 nm peak could be a secondary excitation band for riboflavin [325]. In a proposed assignment of the unknown component five, it was noted that the fluorescence behaviour was similar to the biogenic fluorophores NADH and NADPH. NADPH fluorescence at 360/460 nm was seen (Figure 26) in the analysis of yeast samples and, because yeastolate is a digest of yeast, NADPH may be a component of yeastolate [211]. However without comprehensive compositional information about the yeastolate, it was unknown if NADPH was definitely present in yeastolate. The compositional information for yeastolate (listed in Table 46) was limited tomainly the amino acid and mineral content. 167 Figure 103 Emission spectra resolved by MCR from the EEM data for the pure components (solid line), M5Ye (dotted) and M5eRDF (dashed). The spectra were normalised to area equal to one. For TSFS data, both the excitation and delta profiles (Figure 104 and Figure 105) were considered. Six components were recovered for both M5eRDF and M5Ye, and when compared to the pure component profiles, they did not align well for the suspected components. From these plots it was clear that it was impossible to clearly assign the TSFS components with the pure component spectra. For example in both the M5eRDF and M5Ye data, the excitation profiles showed that all bands were excited in the 280 nm to 305 nm range (Table 32). This covered the tyrosine and tryptophan excitation range. However, pyridoxine, riboflavin and folic acid were not clearly indicated by the excitation profiles, showing that they were minor contributors to the overall sample fluorescence. The dynamic environment allowed molecules to undergo interactions with other media components resulting in various micro-states. TSFS proved to be too sensitive to these changes. Therefore the elucidation of components from TSFS data was not as clear as the EEM data. It was easier to interpret the emission profiles for the EEM data compared to the delta profiles for the TSFS data. This may be caused by the data displaying the information in different ways, given that the collection methods were analysing the same samples. From Figure 104 and Figure 105, the excitation profiles for the fluorophores were very close. One of the reasons for this was that only a 5 nm step was used between each excitation and delta profile. The offset measurement approach did not allow for clear 300 350 400 450 500 550 600 0 0.02 0.04 0.06 0.08 0.1 Emission wavelength (nm) In te ns ity 250 300 350 400 450 500 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 Excitation Wavelength (nm) In te ns ity (A U) Trp Tyr Py RB FA 168 resolution of the peaks but the EEM method did. Therefore, for complicated media samples the EEM MCR approach clearly offers the best method for identifying the specific components. Figure 104 TSFS delta and excitation profiles for M5eRDF compared to the pure component profiles (coloured traces) resolved by MCR. The spectra were normalised. 250 300 350 400 450 500 0 0.1 0.2 0.3 0.4 0.5 0.6 Excitation Wavelength (nm) In te ns ity C1 C2 C3 C4 C5 FA Py Rb Tyr Trp 20 40 60 80 100 120 140 160 180 200 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Delta Wavelength (nm) In te ns ity C1 C2 C3 C4 C5 FA Py Rb Tyr Trp 169 Figure 105 TSFS delta and excitation profiles for M5Ye compared to the pure component profiles (coloured traces) resolved by MCR. The spectra were normalised. Table 32 Excitation, Emission and % Fit values for MCR TSFS models using 1–6 factors. The emission wavelength (λem) was obtained by adding the delta (∆𝛌) to the excitation wavelength (λex). Pure Standard M5Ye M5eRDF λex (nm) λem (nm) λex (nm) λem (nm) % Fit λex (nm) λem (nm) % Fit Trp 270 360 C1 285 360 34.6 C1 285 360 41.7 Tyr 270 320 C2 290 345 22.4 C2 290 345 18.8 Py 355 400 C3 280 310 2.7 C3 280 310 5.3 Rb 400 530 C4 305 435 24.1 C4 305 430 18.6 FA 440 490 C5 285 385 14.0 C5 285 300 12.7 C6 285 300 1.7 C6 285 385 2.4 20 40 60 80 100 120 140 160 180 200 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Delta Wavelength (nm) In te ns ity C1 C2 C3 C4 C5 FA Py Rb Tyr Trp 250 300 350 400 450 500 0 0.1 0.2 0.3 0.4 0.5 0.6 Excitation Wavelength (nm) In te n s it y C1 C2 C3 C4 C5 FA Py Rb Tyr Trp 170 The EEM scores plot (Figure 106) showed an increase of two components (#2 and #3) from M5eRDF sample 1 to sample 10 while the other components showed little change. The TSFS scores plot showed that all components (bar component four) displayed an increase from M5eRDF sample 1 to sample 10. Component four gave a stable signal across the scores plot. Figure 106 M5eRDF MCR scores for EEM model (left) and TSFS model (right). The scores (Figure 107) for both the EEM and TSFS M5Ye data showed a similar trend. The scores of the components emitted at shorter wavelengths decreased, the intermediate components were stable, while at the longer wavelengths the component scores increased. This pattern was clearer in the TSFS scores. The short wavelength components were subject to more IFE compared to long wavelength components (as a result of the higher absorbance that occurred in these regions). The M5Ye scores also revealed that M5YeS01 deviated from the other score points. This observation was confirmed by ROBPCA results where the M5YeS01 sample was seen as different from the other samples. The reason for the altered profile was the low sample concentration, which gave rise to a more dilute spectral profile of the M5YeS01 sample compared to the other samples. Figure 107 M5Ye MCR scores plots for EEM model (left) and TSFS model (right). 1 2 3 4 5 6 7 8 9 10 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Samples C1 C2 C3 C4 C5 1 2 3 4 5 6 7 8 9 10 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 Samples C1 C2 C3 C4 C5 C6 1 2 3 4 5 6 7 8 9 10 0.5 1 1.5 2 2.5 3 Samples C1 C2 C3 C4 C5 1 2 3 4 5 6 7 8 9 10 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 Samples C1 C2 C3 C4 C5 C6 171 The MCR scores for all components varied as eRDF or yeastolate concentration increased. It was therefore appropriate that the full or reduced spectral area be used for the quantitative modelling. If one wanted to quantify individual fluorophores, one could of course look at a more restricted emission range [1]. 5.4 Variance Analysis 5.4.1 PCA Analysis To assess the degree of spectral variation of the M5 sample sets, PCA was performed individually on the M5eRDF and M5Ye data and then on the combined (M5eRDF+M5Ye) sample set. PCA of the combined sample sets clarified the size of the variance caused by changing each component as well as revealing which components contribute to changes seen in the EEM signal.49 Figure 108 Graphic results for the PCA analysis of M5eRDF/M5YE comparison, the arrows show the samples going from low to high concentration. (Top Left) PC1 vs PC2 Scores plot, (Top Right) Loadings for PC1 (97.15%), (Bottom Left) Loadings for PC2 (2.56%), (Bottom Right) Loadings for PC3 (0.17%). 49 The same trends occurred with the TSFS data and these results will not be shown. 300 350 400 450 500 550 600 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Emission Wavelength [nm] L o a d in g s o n P C 1 ( 9 7 .1 5 % ) 300 350 400 450 500 550 600 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 Emission Wavelength [nm] L o a d in g s o n P C 2 ( 2 .5 6 % ) 300 350 400 450 500 550 600 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 Emission Wavelength [nm] L o a d in g s o n P C 3 ( 0 .1 7 % ) 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 -1500 -1000 -500 0 500 1000 1500 Scores on PC 1 (97.15%) S c o re s o n P C 2 ( 2 .5 6 % ) M5eRDF M5Yeastolate 172 It was not possible to fully segregate the M5eRDF and M5Ye samples in the scores plots as they partially overlapped due to the common fluorophores. From the analysis of the loadings plots, it was proposed that PC1 was largely tryptophan signal; PC2 was largely tyrosine signal, while PC3 was an unresolved tryptophan and pyridoxine peak. From Figure 108(Top Left), it was clear that changes in the M5Ye were greater alongboth PC1 and PC2 than for M5eRDF. This was attributed to the decrease in the tyrosine signal and increase in the tryptophan signal with yeastolate concentration changes (Figure 100 and Figure 97). The M5eRDF PCA result was less susceptible to variation as a result of the higher concentration of these samples. This led to less dynamic change being observed in the fluorescence signal with increasing concentration as seen with the MCR scores (Figure 106). In the PCA analysis of the individual datasets, only two principal components were used to describe the variance in each sample set. For the individual M5Ye PCA model the first component described 95.53% of the explained variance while the second component described 4.23%. The first component showed a peak with emission at 355 nm i.e. tryptophan, which was the dominant feature in both the M5eRDF and M5Ye emission. The second component of the individual M5Ye PCA model matched the second component of the combined model i.e. tyrosine. Besides the peak for tyrosine in PC2 there was also a negative portion which was representative of the interaction with the tryptophan as the concentration changed [330]. With the increasing tryptophan and tyrosine concentration, tryptophan emission became dominant and radiative energy transfer potentially occurred between tyrosine and tryptophan fluorophores leading to a reduced signal from the tyrosine peak [331]. For the individual M5eRDF PCA model, the first component (emission peak at 355 nm) described 99.68% of the explained variance and while the second component (0.3%) matched the third component of the combined PCA model. Considering the strength of the tryptophan signal in the M5eRDF, the observed variation may be attributed to IFE and energy transfer interactions that occur with tryptophan within M5eRDF samples. 173 This variance - created by the differing fluorophores, IFE, energy transfer interactions, and quenching occurring within each sample - will be used to correlate the eRDF and yeastolate concentrations to the gross fluorescence signal. 5.4.2 ROBPCA Analysis While the PCA modelling provided a good insight into the size and magnitude of the fluorescence variance within the M5eRDF and M5Ye samples, its sensitivity to subtle differences was less than that of robust PCA [236]. ROBPCA flagged samples that differed from the majority of the data as an outliers. For ROBPCA modelling, the fluorescence data was unfolded from three-way data cube (sample vs. excitation wavelength vs. emission wavelength) to a two-way data array (sample vs. excitation/emission pair). An unfolded landscape is shown in Figure 109. Figure 109 EEM landscape spectra of a single M5eRDF solution (5.8g/L) and unfolded spectra of M5eRDF. Figure 110 ROBPCA Scores plot for PC1 versus PC2 of M5eRDF/M5YE comparison, EEM Scores (left) and TSFS scores (right). The arrows indicate the direction of the changing concentration from low to high. 300 400 500 600 300 400 500 0 50 100 150 200 250 Emission Wavelength (nm)Excitation Wavelength (nm) In te n s it y 0 500 1000 1500 2000 2500 3000 3500 4000 0 50 100 150 200 250 300 Data Points In te n s it y -500 0 500 1000 1500 -200 -100 0 100 200 300 400 Scores1 (69.22%) S c o r e s 1 ( 2 2 .6 0 % ) M5eRDF M5Ye -600 -400 -200 0 200 400 600 800 1000 1200 -300 -200 -100 0 100 200 Scores 1 (73.68%) S c o r e s 2 ( 2 2 .1 0 % ) M5Ye M5eRDF 174 The ROBPCA result showed that the sample concentration influenced sample fluorescence as the sample scores followed the increasing concentration (Figure 110). As the concentration changed in the media, so did the photo-physical behaviour as IFE/quenching/energy transfer increased. This caused differences in the spectra so the samples at either ends of the concentration range tended to be very different. The ROBPCA subspaces for EEM and TSFS were orientated differently due to the difference in the way the data was presented. When comparing the ROBPCA scores (Figure 110) to the PCA scores (Figure 108, top left), the same pattern of crossing over was observed as the samples go from low to high concentration. Table 33 Outlying observations from the EEM and TSFS datasets for M5eRDF, with the outlying samples being identified by ROBPCA. Dataset PC’s Outliers Identified Outliers Type M5eRDF EEM R1 3 1, 10 Good Leverage(1), Bad Leverage(10) M5eRDF EEM R2 3 1, 9 Bad Leverage(1), Good Leverage(9) M5eRDF EEM R3 2 1, 2, 9 Orthogonal(1), Bad Leverage(2) Good Leverage(9) M5eRDF EEM 30 3 8, 10, 16, 21 Good Leverage (8), Bad Leverage(10,21) Orthogonal (16) M5eRDF TSFS R1 3 1, 8 Good Leverage(1), Bad Leverage(8) M5eRDF TSFS R2 3 8, 10 Bad Leverage(10), Good Leverage(8) M5eRDF TSFS R3 2 1, 9 Orthogonal(1), Bad leverage(9) M5eRDF TSFS 30 4 8, 9, 16, 21 Good Leverage (8,16), Bad Leverage(9,21) Samples with either low or high additive concentration were identified as outliers. The samples in the edges of the data distribution were still valid samples but they were easily identified as outliers because of the small sample sets. It was however unwise to disregard these since the variance was a result of the variability associated with the limited linear range. The majority of outliers in Table 33 and Table 34 were either good or bad leverage points, indicating that they only deviated along a single aspect of the spectra. In addition, the orthogonal outliers occurred when there was a significant change from the mean data profile. A tentative assumption was made here that good or bad leverage points were representative of a spike in fluorophore intensity within a sample, while the orthogonal outliers were more frequently seen with the low concentration samples. This was due to a low tryptophan signal within the low concentration samples which altered them from the mean data. The PCA 175 results showed that the primary component (mean signal) in both the M5eRDF and M5Ye was an emission peak at 355 nm which was indicative of tryptophan. Table 34 Outlying observations from the EEM and TSFS datasets for M5Ye, with the outlying samples being identified by ROBPCA. Dataset PC’s Outliers Identified Outliers Type M5YE EEM R1 3 2 Orthogonal M5YE EEM R2 3 1 Orthogonal M5YE EEM R3 2 1, 2 Orthogonal M5YE EEM 30 4 10, 11, 17, 21, 20, 30 Good Leverage (11,21) Bad Leverage(10, 17,20, 30) M5YE TSFS R1 2 1, 2 Orthogonal M5YE TSFS R2 3 1, 9 Bad Leverage(1) Good Leverage(9) M5YE TSFS R3 2 1, 2 Orthogonal M5YE TSFS 30 4 10, 11, 14, 17, 21, 27 Good Leverage(10) Bad Leverage(14,27) Orthogonal (11,17,21) For the M5eRDF and M5Ye datasets, the outliers identified using ROBPCA were not seen in the SERS and Raman data because only PCA was carried out for these methods. The PCA results gave no outliers for the Raman or SERS data. In the SERS modelling of the M5Ye, the PLS scores revealed that M5YeS06 deviated along the second and third components of the model. This type of behaviour would have been picked up by a ROBPCA model. This is because the ROBPCA results for the fluorescence data highlighted the end-range samples as deviating from the mean, whereas the PCA results gave no outlier. The linearity range with these sample sets was too big, causing the limits to be exceeded. As a result, excluding end-range samples to reduce the concentration range may be necessary in order to achieve accurate quantification. In essence, this indicated that the fluorescence method was more-sensitive to matrix effects and smaller concentration ranges should be used. A similar situation was observed for single component quantification modelling [1]. 176 5.5 Quantitative Analysis of M5eRDF and M5Ye The PCA and ROBPCA scores plots (Figure 108and Figure 110) revealed that there was a clear linear trend observed with increasing concentrations of yeastolate and eRDF in the M5Ye and M5eRDF data respectively. This trend indicated that the fluorescence data was suitable for developing linear PLS models for yeastolate and eRDF. The data was unfolded prior to quantitative analysis (PLS) of EEM and TSFS (Figure 109). The use of unfolded PLS was chosen because it takes into consideration the analyte-background interactions when describing the calibration data as part of the number of latent variables (components) selected [332]. During the development of fluorescence based calibration models, the full region and two reduced regions were selected to focus on the strong analyte signal. The contour plots (Figure 111) for the EEM of the full and the two reduced regions showed the varying levels of information in the different regions. Figure 111 Contour plots for the Full Range (λex/λem 230–520/270–600 nm), Reduced Area A (λex/λem 230–315/270–435 nm) and Reduced Area B (λex/λem 250–360/285–425 nm) of a single M5eRDF solution.46 E x c it a ti o n w a v e le n g th ( n m ) Emission wavelength (nm) 300 350 400 450 500 550 600 250 300 350 400 450 500 E x c it a ti o n w a v e le n g th ( n m ) Emission wavelength (nm) 280 300 320 340 360 380 400 420 230 240 250 260 270 280 290 300 310 E x c it a ti o n w a v e le n g th ( n m ) Emission wavelength (nm) 300 320 340 360 380 400 420 250 260 270 280 290 300 310 320 330 340 350 360 (c) (b) (a) 177 For the EEM data the selected regions were: o Reduced region A - excitation 230–315 nm and emission 270–435 nm o Reduced region B - excitation 250–360 nm and emission 285–425 nm. For the TSFS data the selected regions were: o Reduced region A - excitation 230–400 nm and ∆λ 10–180 nm o Reduced region B - excitation 250–310 nm and ∆λ 10–140 nm Unfolded PLS was performed on the individual data collections for EEM and TSFS datasets with the inclusion of all the samples (Table 137 to Table 140). Some anomalies were noted, however, and so unfolded PLS was repeated on the averaged data of the M5eRDF and M5Ye sample sets (Table 35 to Table 38). When unfolded PLS modelling was performed on the individual runs for M5eRDF data, the sample M5eRDFS09R1 deviated from the expected measurement line. As a result, this sample was removed when generating the averaged M5eRDF sample set. The ROBPCA results were used as a guide for erroneous samples, but with so few samples over a large range, the ROBPCA results were hypersensitivity to linear deviations. For the M5Ye samples, the unfolded PLS on the individual runs revealed two anomalies. First, the sample M5YeS01R1 was an odd measurement; its spectrum was removed from the averaged dataset (Figure 112). The second anomaly was the non- linear behaviour of the high concentration samples. In the calibration models of the individual runs, there was a dramatic shift in between samples M5YeS07 and M5YeS08. When the spectra were investigated the biggest change was seen with the tryptophan peak. The intensity of the tryptophan peak was plotted (Figure 112). It showed a linear increase in signal up to sample M5YeS06, but after the data became non-linear. In order to improve the calibration model for the M5Ye data, the sample set was reduced to only six samples covering a linear range of 0.1–1.0 g/L yeastolate. 178 Figure 112 Predicted versus expected plots of the EEM calibration model for all 30 M5Ye samples (Left) and the intensity changes of the tryptophan peak (285/355 nm.) per sample (Right). 5.5.1 Correlation of eRDF Concentration to the M5eRDF Fluorescence Data In the calibration modelling of the averaged M5eRDF data (using 10 samples), a good correlation between the fluorescence signal and the concentration of eRDF was found for all of the PLS models (EEM/TSFS). The best EEM model was constructed from the reduced region B (λex/λem 250–360/285–425 nm) with no pre-processing while the TSFS model used the full spectral area and no pre-processing (Figure 113). The TSFS data models built with no pre-processing exceeded the performance of the models generated after MSC and normalisation. A similar trend was evident for the EEM data in the case of the full region and the reduced region B. Pre-processing proved useful, as the second best calibration model for the EEM data was produced using the reduced region A model after normalisation. Overall pre-processing and region selection did not have a major influence on the M5eRDF model performances (Table 35 and Table 36). There was little difference between the TSFS and EEM models, with the EEM being marginally better in terms of r2 and REP (7.29% versus 7.83%). R2 = 0.835 3 Latent Variables RMSEC = 0.20977 RMSECV = 0.23678 R2 = 0.835 3 Latent Variables RMSEC = 0.20977 RMSECV = 0.23678 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.835 3 Latent Variables RMSEC = 0.20977 RMSECV = 0.23678 1 2 3 4 5 6 7 8 9 10 220 230 240 250 260 270 280 290 300 Samples In te n s it y M5YeR1 M5YeR2 M5YeR3 179 Figure 113 Predicted versus expected plots for M5eRDF of the EEM calibration model (Left) and the TSFS calibration model (Right). Table 35 Calibration models using unfolded PLS regression for averaged M5eRDF EEM data. Model LV R2 RMSEC g/L RMSECV g/L REP% Full Range (λex 230–520 nm λem 270–600 nm) M5eRDF EEM Unfolded 3 0.990 0.17 0.34 9.18 M5eRDF EEM Unfolded MSC 3 0.978 0.25 0.48 12.97 M5eRDF EEM Unfolded Norm 3 0.985 0.21 0.42 11.35 Reduced Region A (λex 230–315 nm λem 270–435 nm) M5eRDF EEM Unfolded 3 0.988 0.19 0.34 9.18 M5eRDF EEM Unfolded MSC 4 0.998 0.08 0.41 11.08 M5eRDF EEM Unfolded Norm 4 0.998 0.08 0.32 8.64 Reduced Region B (λex 250–360 nm λem 285–425 nm) M5eRDF EEM Unfolded 3 0.993 0.14 0.27 7.29 M5eRDF EEM Unfolded MSC 2 0.962 0.33 0.47 12.70 M5eRDF EEM Unfolded Norm 2 0.974 0.27 0.38 10.27 Table 36 Calibration models using unfolded PLS regression for averaged M5eRDF TSFS data. Model LV R2 RMSEC g/L RMSECV g/L REP% Full Range (λex 230–520 nm ∆𝛌 10–200 nm) M5eRDF TSFS Unfolded 3 0.991 0.15 0.29 7.83 M5eRDF TSFS Unfolded MSC 3 0.979 0.24 0.46 12.43 M5eRDF TSFS Unfolded Norm 3 0.986 0.20 0.38 10.27 Reduced Region A (λex 230–310 nm ∆𝛌 10–190 nm) M5eRDF TSFS Unfolded 3 0.988 0.19 0.34 9.18 M5eRDF TSFS Unfolded MSC 4 0.995 0.12 0.47 12.70 M5eRDF TSFS Unfolded Norm 4 0.997 0.08 0.35 9.45 Reduced Region B (λex 250–310 nm ∆𝛌 10–140 nm) M5eRDF TSFS Unfolded 2 0.978 0.25 0.34 9.18 M5eRDF TSFS Unfolded MSC 2 0.961 0.34 0.47 12.70 M5eRDF TSFS Unfolded Norm 2 0.972 0.28 0.39 10.54 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.993 3 Latent Variables RMSEC = 0.14858 RMSECV = 0.27636 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.991 3 Latent Variables RMSEC = 0.15924 RMSECV = 0.29901 180 5.5.2 Correlation of Ye Concentration to the M5Ye Fluorescence Data The PCA showed that the spectral variation caused by changing yeastolate concentration was much larger than for eRDF (Figure 108). One consequence of this was that at the higher yeastolate concentrations the spectral changes became non- linear. The concentration range was reduced by removing the highest concentration samples to get a decent qualitative model. As a result the M5Ye calibration models were stronger than M5eRDF models and showed a good linear performance for both EEM and TSFS data (Figure 114). The best models were built using the reduced region B with MSC pre-processing for EEMand with no pre-processing for the TSFS. The model performance improved when the reduced region B was used over the full or reduced region A. Pre-processing positively adjusted the model results, as it minimized spectral variations that were not caused by changes in the analyte concentration. The best TSFS calibration model was reasonable with a REP of 7.27% but was weaker than the EEM whose model gave REP of 5.45%. Figure 114 Predicted versus expected for M5Ye with the EEM calibration model (left) and the TSFS calibration model (right). 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.998 3 Latent Variables RMSEC = 0.014723 RMSECV = 0.034587 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Expected (g/L) P re d ic te d ( g /L ) R2 = 0.994 3 Latent Variables RMSEC = 0.0246 RMSECV = 0.047416 181 Table 37 Unfolded PLS calibration model for the averaged M5Ye EEM data using 6 samples. Model LV R2 RMSEC g/L RMSECV g/L REP% Full Range (λex 230–520 nm λem 270–600 nm) M5YeEEM Unfolded 2 0.962 0.05 0.10 18.18 M5YeEEM Unfolded MSC 2 0.986 0.03 0.06 10.90 M5YeEEM Unfolded Norm 2 0.990 0.03 0.05 9.09 Reduced Area A (λex 230–315 nm λem 270–435 nm) M5YeEEM Unfolded 2 0.955 0.06 0.11 20.00 M5YeEEM Unfolded MSC 2 0.980 0.04 0.07 12.72 M5YeEEM Unfolded Norm 2 0.987 0.03 0.06 10.90 Reduced Area B (λex 250–360 nm λem 285–425 nm) M5YeEEM Unfolded 3 0.997 0.01 0.04 7.27 M5YeEEM Unfolded MSC 3 0.998 0.01 0.03 5.45 M5YeEEM Unfolded Norm 3 0.995 0.02 0.06 10.90 Table 38 Unfolded PLS calibration model for the averaged M5Ye TSFS data using 6 samples. Model LV R2 RMSEC g/L RMSECV g/L REP% Full Range (λex 230–520 nm ∆𝛌 10–200 nm) M5YeTSFS Unfolded 2 0.962 0.05 0.10 18.18 M5YeTSFS Unfolded MSC 2 0.982 0.04 0.06 10.90 M5YeTSFS Unfolded Norm 2 0.987 0.03 0.05 9.09 Reduced Area A (λex 230–310 nm ∆𝛌 10–190 nm) M5YeTSFS Unfolded 2 0.954 0.06 0.11 20.00 M5YeTSFS Unfolded MSC 2 0.977 0.04 0.08 14.54 M5YeTSFS Unfolded Norm 2 0.983 0.03 0.06 10.90 Reduced Area B (λex 250–310 nm ∆𝛌 10–140 nm) M5YeTSFS Unfolded 3 0.994 0.02 0.04 7.27 M5YeTSFS Unfolded MSC 3 0.994 0.02 0.04 7.27 M5YeTSFS Unfolded Norm 3 0.991 0.02 0.06 10.90 When comparing the EEM and TSFS models, the differences in performance were minor for the M5eRDF samples. However, the EEM model outperformed the TSFS model for the M5Ye data (Table 37and Table 38). Overall, the EEM method was better in the analysis of M5Ye and M5eRDF data. This was shown by the resolution of the fluorophores in the MCR results as well as by the good calibration models formed with the M5eRDF/M5Ye data. 182 5.5.3 Model Evaluation The fluorescence models were evaluated in the same way as the prediction ability of the best models was evaluated in the SERS study. For the M5eRDF calibration model, the M5Ye samples were taken to see if their eRDF concentrations could be correctly predicted. For the M5Ye calibration model, yeastolate concentrations were predicted on the M5eRDF data. Table 39 Prediction results of eRDF concentration in M5Ye, from the EEM and TSFS calibration models. Sample ID Expected Concentration Predicted Conc. from EEM Model Predicted Conc. from TSFS Model M5YeS01 3.4 8.83 9.43 M5YeS02 3.4 5.98 6.83 M5YeS03 3.4 4.15 4.97 M5YeS04 3.4 3.46 3.93 M5YeS05 3.4 3.46 3.65 M5YeS06 3.4 3.39 3.54 M5YeS07 3.4 3.29 2.79 M5YeS09 3.4 4.90 4.03 M5YeS10 3.4 4.25 3.69 The predictions of the eRDF concentration from the M5Ye sample set showed reasonable results for the mid–to high– concentration samples (Table 39). It was seen in the PCA scores that the overlap between the M5eRDF and M5Ye was small due to the different media changes within each sample set. The low concentration samples were very poorly predicted (with the prediction values for S01 and S02 approximately double the expected 3.4 g/L). The primary reason for the relatively low model accuracy and over estimation of the low concentration sample was the large spectral difference in the signal for the test set compared to M5eRDF calibration sample set. The variance was due to the different matrix environment given the varying yeastolate or eRDF concentration. It was obvious from the test set EEM contour plot (Figure 119) that there was a large change between the lowest to the highest prediction sample (the main fluorescence signal changes from 275/310 nm to 285/355 nm). This corresponded to the decrease in tyrosine signal as the tryptophan signal increased. In the calibration EEM contour plots (Figure 120) the main fluorescence signal remained at 285/355 nm from low to high concentration samples. 183 Table 40 Prediction results of M5Ye EEM and TSFS calibration models for yeastolate concentration in M5eRDF. Sample ID Expected Concentration Predicted Conc. from EEM Model Predicted Conc. from TSFS Model M5eRDFS01 1.0 1.00 0.91 M5eRDFS02 1.0 1.01 0.96 M5eRDFS03 1.0 1.04 0.98 M5eRDFS04 1.0 1.03 1.00 M5eRDFS05 1.0 0.98 0.97 M5eRDFS06 1.0 1.02 1.02 M5eRDFS07 1.0 1.06 1.06 M5eRDFS08 1.0 1.09 1.10 M5eRDFS09 1.0 1.05 1.08 M5eRDFS10 1.0 1.04 1.10 The prediction ability of the M5Ye calibration model was better than the M5eRDF model (Table 40). The better predictability came from the similarity in the M5eRDF and M5Ye samples which contained the same first principal component (i.e. the tryptophan emission) that was observed from the PCA results (Figure 108). Therefore, in predicting the yeastolate concentration of M5eRDF samples, the correlation between the calibration and prediction was based on the tryptophan emission band. Also, no dramatic changes were observed in M5eRDF test set; this can be seen in both the PARFAC and MCR scores (Figure 97, Figure 106 and Figure 107). Quantitative models for the more complex media components, eRDF and yeastolate, were developed. Both EEM and TSFS calibration and prediction models worked, but the EEM method outperformed TSFS with its results and ease of interpretation. These EEM/TSFS methods were part of the holistic approach to cell culture media analysis and this quantification method complemented previous work developed for the quantification of specific fluorophores [1, 2]. Calvet et al. showed that specific analytes could be quantified in chemically defined media. The prediction of tryptophan, tyrosine, pyridoxine, riboflavin and folic acid in eRDF media solutions worked using NPLS analysis (samples were prepared using a standard addition method (SAM)). These models used narrower spectral ranges from EEM data centred on each analyte emission. The analytes were predicted with the following error levels: tryptophan (4.5%), tyrosine (5.5%), pyridoxine (4.6%), riboflavin (2.3%), and folic acid (8.7%) [2]. 184 However, the impact of changing the concentration of a complex ingredient like eRDF or yeastolate resulted in large changes in the concentration of multiple fluorophores. This led to large matrix changes which reduced the potential linear ranges for quantification. For the M5eRDF, there were fluctuations that limited the correlation performance to the changes in eRDF concentration. For the M5eRDF data, there were large fluctuations in the test set that limited the correlation performance of the M5eRDF calibration model. This could be seen in the poor prediction of the eRDF concentration in the test set. The most accurate results were close to the mean concentration which had a stable tryptophan signal. These results indicated that eRDF could potentially be modelled within a specified range. For the M5Ye, it caused fluctuations and non-linearity to accurately correlate the changes inyeastolate concentration over the full linear range, but once the linear was reduced the correlation improved and the prediction worked well. This study showed that it was feasible to quantify complex ingredients of cell culture media. However the method does require further refinements such as Setting the ingredient concentration range to a more realistic range for X g/L ± 25%, so if we were to redo the M5eRDF sample set a more appropriate concentration range might be: 3.4 g/L ± 0.85 g/L. Using more samples in the calibration modeling. Li et al. used a ratio of 1 test sample to 4 or 5 calibration samples [7]. The minimum sample number should be set to 20 as the use of 10 samples in this model was not enough. The replicate measurement is usually set at three but double that may minimize day to day variation better. More suitable test sets designed using risk analysis of the media would be better for assessing model accuracy in real terms. These test sets would also inform the design of the calibration sample set so that there was correct overlap between the PCA subspaces of the samples. 5.6 General Conclusions: Fluorescence Study Fluorescence spectroscopy performed reasonably well for the analysis of media components, their behaviour and analyte quantification in cell culture media. Both data types (EEM and TSFS) were information rich and easy to collect. As both scan 185 types were collected on the same instrument with a short measurement time (less than 15 minutes), preliminary studies could use both scan types before deciding which was optimal for a particular process. For the samples used in this study, the EEM measurement performed better for yeastolate and eRDF analysis and was also easier to interpret in the qualitative analysis. With the TSFS data, it was less comparable to standards as the excitation profile/delta profile showed with the MCR results, making TSFS data analysis more challenging. The TSFS data is, however, adaptable and can be converted into an EEM format (Figure 94) and its main advantage is that it avoids the Rayleigh scatter that transects the EEM data. The linearity of the calibration plots indicated good performances in correlating the eRDF and yeastolate concentrations with the EEM and TSFS models. Using unfolded PLS it was possible to quantify the eRDF concentration with a 7.2% error level for the EEM data and a 7.8% error level for the TSFS data. For yeastolate concentration, calibration model errors were 5.4% and 7.2% using EEM and TSFS respectively. The M5Ye calibration model worked, and its prediction ability was good. The prediction results for yeastolate indicated that it was possible to quantify yeastolate from the gross analyte signal of test media samples. The M5eRDF samples generated good calibration models but had weaker prediction results. When the M5Ye prediction results were compared to the M5eRDF, it highlighted the importance of the test set. The test set for the M5eRDF model deviated too much from the calibration samples as there was a rapidly varying peak intensity from the IFEs at the edges of the concentration range of the test set samples. The type of deviation experienced in practice was seen with the test set for M5Ye calibration as the test set was devoid of major fluctuations. If comparing the results from fluorescence, Raman and SERS, the results revealed how the sample set under investigation changed the performance of the model. The fluorescence method improved the calibration for the M5eRDF data but the test set was undergoing too many matrix fluctuations for effective prediction. For this reason the prediction results of Raman and SERS data outperforms the fluorescence data (see Table 29). In the case of M5Ye samples, the fluorescence method outperformed 186 Raman and SERS in terms of both the calibration and prediction performance. The Raman data was not able to measure the weak yeastolate signal. The SERS method improved the correlation with the enhancement of the signal but its prediction performance suffered from spectral overlap from other components which led to poor prediction results. The overall quantification of yeastolate in the M5Ye samples was best achieved using fluorescence measurements. Within every fluorescence landscape, part of the acquired signal contained little or no fluorescence; these areas were eliminated to improve results. In the quantitative analysis, reduced area selections were favoured with the EEM and TSFS data for M5Ye. This finding was in agreement with other studies where variable selection for the most prominent excitation and emission combinations were chosen leading to improvement in prediction capability [7, 333, 334]. In previous studies with industrial cell media, Li et al. correlated the fluorescence signal in EEM data to the glycoprotein yields [7]. The use of the full spectrum resulted in weak calibration but when variable selection methods were applied the model performance improved. The R2 value went from 0.2 to 0.94 and the REP from ~8.94% to ~3.62% depending on the process stage being tested. This study showed that correlation was only dependent on high intensity fluorescence bands and emission properties of the analyte of interest. For multicomponent analytes like eRDF and yeastolate, specific emission ranges span multiple emission bands. As a result precise area selection could not be used to pin point the fluorescent analytes for multicomponent mixtures but mathematical variable selection50 which take into account the full area could be applied to improve model performance [7, 335-338]. In industrial settings, large quantities of powders are mixed together to produce the media; the variance seen would be within specification of industrial limits. Therefore no large fluctuations would be expected unless an outlier was present. Thus the method developed in this work could be adapted for industrial use where the variance in samples and the concentration ranges used are smaller. 50 Methods like competitive adaptive reweighted sampling (CARS) and ant colony optimization (ACO) based on mathematical evaluation of each wavelength importance would be better for the multiple component analytes. 187 6 Conclusions and Future Work The FDA and the biopharmaceutical industry want to better regulate and understand bio-processes through the use of quality by design (QbD) and PAT. One area where QbD and PAT can be applied for better control is media formulation. Prior to use, media are tested in order to determine whether they are fit for purpose. This can include small scale performance testing, but this is time consuming and expensive. Variability in media can have a large impact on product quality and process performance [77, 339-341]. The objective of this thesis was to develop rapid spectroscopic methods for quantifying certain components in media which could then be used for media formulation analysis. Raman, SERS and Fluorescence spectroscopic methods offer the possibility to carry out non-destructive qualitative and quantitative analysis of cell culture media in near real time. 6.1 Spectroscopic Conclusions In Chapter 3, the use of Raman spectroscopy for quantifying D-glucose, eRDF and yeastolate in cell culture media was investigated. Raman is a fast technique and allows for high throughput screening with the use of an in-house developed stainless steel 96 well-plate [285]. The common features of the Raman spectra were the large water signal, baseline offset and weak analyte signal. Water elimination was investigated because of the large water signal, but this led to spectral artefacts. However the majority of the baseline offset was removed by changing the experimentalsetup. For the remainder, various pre-processing methods were used to account for the baseline offset and spectral variance in order to improve the performance of the data for quantification. Data pre-processing methods also enhanced the analyte signal and removed scatter while region selection further improved the quantification performances. Using Raman for quantitative analysis of media samples enabled D-glucose determination with ~4.7% error, but eRDF and yeastolate with larger ~16% and ~38% error respectively. The D-glucose model worked better than an in line Raman method for D-glucose with 15.3% error and was close to the reference method (Bioprofile 400 analyser) with 4% [100]. It was, however, harder to get a good calibration with the more complicated media analytes. 188 The complex nature of eRDF and yeastolate meant that they were open to analysis by all of the spectroscopic method being investigated (Table 41). Chapter 4 and 5 covered the quantification of eRDF and yeastolate by SERS and fluorescence respectively. SERS gave the signal enhancement required to compensate for the strong water signal but the preparation of colloid made it the most labour intensive method. The results showed some promise in quantifying the complex media components as a whole but the error levels were too high to be useful. SERS gave an improved yeastolate model (12% error), while the eRDF model that it produced (~16% error) matched the Raman model. The SERS method can be improved with more control over sample to colloid ratio, increased sample numbers, reduced linear range as the data exceed ± 25%, incubation time, sample re-suspension, and use of aggregating agent. . In order to make the SERS method optimal, further development is required. Areas to improve upon are: sample to colloid ratio; whether to use an aggregating agent; and improved reproducibility testing, by taking ten replicate measurements. Table 41 Summary of the best calibration models for the media components generated from the different methods. Dataset Method Sample Number Range Pre-processing REP% M5Glu Raman 32 800–1680 cm–1 BC FD MSC 4.66 M5eRDF Raman 10 707–1853 cm–1 BC FST11MSC 15.94 M5eRDF SERS 10 250–3311 cm–1 WE NINF 15.94 M5eRDF EEM 10 λex 250–360 nm λem285–425 nm Unfolded 7.29 M5eRDF TSFS 10 λex 230–520 nm ∆λ 10–200 nm Unfolded 7.83 M5Ye Raman 10 707–1853 cm–1 Avg FD11 38.46 M5Ye SERS 10 1260–1444 cm–1 BC FST11MSC 12.08 M5Ye EEM 6 λex 250–360 nm λem 285–425 nm Unfolded MSC 5.45 M5Ye TSFS 6 λex 250–310 nm ∆λ 10–140 nm Unfolded 7.27 Fluorescence data was informative and easy to collect and the measurements were reproducible. The EEM and TSFS of yeastolate and eRDF shared common 189 fluorophores due to their biogenic components (amino acids, peptides and vitamins). Prior to quantitative analysis, an extra assessment of the data was conducted using ROBPCA for outlier detection. ROBPCA indicated that the samples at the ends of the concentration deviated most, in other words the matrix (in photophysical terms) had changed very significantly. This indicated that the linear range was too large for accurate quantification. For practical operational use, these methods would be better if they were developed with a more restricted concentration range (i.e. a range that varied by ± 25% of the set concentration value) like the M5YE after the sample numbers were reduced. This would limit the extent of matrix variations and generate accurate quantitative methods. Using unfolded PLS, it was possible to quantify yeastolate concentration with 5.4% error level for the EEM data and 7.2% for the TSFS data; for eRDF, the error levels were 7.2% and 7.8% using EEM and TSFS respectively. EEM outperformed TSFS in both the calibration and prediction performance for both M5eRDF and M5Ye. It also gave better resolved bands for identification of underlying fluorophores. 6.2 Future Studies and Solutions This thesis is an initial attempt at developing a protocol for the quantitative analysis of media formulations using spectroscopic methods. The first issue in developing any robust quantitative protocol is to determine which spectroscopic methods are optimal for the different ingredient types in a complex media. The second issue was to assess the best methods of extracting the quantitative information; in this case we used chemometrics. We have determined that Raman spectroscopy is only suitable for relatively high concentration single components (e.g. glucose) and that EEM is best suited for the complex ingredients where the individual measurable analytes are present in low concentration. In order to take these methods further, a revised experimental plan inspired by these findings and based more closely on the expected maximum concentration variances expected (±10%). The first change to be made should be the sample number as the ultimate accuracy of any chemometric method is related to sample number. Here we used relatively small sample sets to determine feasibility, but in practice, sample sets would be 3-4 time larger. The second major change to be implemented in the experimental design would be to restrict the concentration variance of any ingredient to ±15% or the nominal set value 190 for that ingredient in the media. This value is a truer reflection of the ingredient variance that industry is likely to accept, thus there is no need to look at the wider ranges tested here. . The third change would be to use independent test set validation throughout, and the test set samples should span a similar range to the calibration set with a 1 to 5 ratio of test samples to calibration samples. They should be evenly spread throughout the PCA subspace. 191 7 References 1. Calvet A, Li B, Ryder AG. Rapid quantification of tryptophan and tyrosine in chemically defined cell culture media using fluorescence spectroscopy. Journal of Pharmaceutical and Biomedical Analysis. 2012;71(0):89-98. 2. Calvet A, Li B, Ryder AG. A rapid fluorescence based method for the quantitative analysis of cell culture media photo-degradation. Analytica Chimica Acta. 2014;807(0):111-9. 3. Li B, Ryan PW, Ray BH, Leister KJ, Sirimuthu N, Ryder AG. Rapid characterization and quality control of complex cell culture media solutions using raman spectroscopy and chemometrics. Biotechnology and bioengineering. 2010;107(2):290-301. 4. Li B, Ryan PW, Shanahan M, Leister KJ, Ryder AG. Fluorescence Excitation– Emission Matrix (EEM) Spectroscopy for Rapid Identification and Quality Evaluation of Cell Culture Media Components. Applied spectroscopy. 2011;65(11):1240-9. 5. Li B, Sirimuthu NMS, Ray BH, Ryder AG. Using surface-enhanced Raman scattering (SERS) and fluorescence spectroscopy for screening yeast extracts, a complex component of cell culture media. Journal of Raman Spectroscopy. 2012;43(8):1074-82. 6. Li B, Ray BH, Leister KJ, Ryder AG. Performance Monitoring of a Mammalian Cell Based Bioprocess using Raman Spectroscopy. Analytica Chimica Acta. 2013. 7. Li B, Shanahan M, Calvet A, Leister K, Ryder AG. Comprehensive, quantitative bioprocess productivity monitoring using fluorescence EEM spectroscopy and chemometrics. Analyst. 2014. 8. Ryan PWL, B. Shanahan, M. Leister, K.J. Ryder, A.G. Prediction of cell culture media performance using fluorescence spectroscopy. Analytical Chemistry. 2010;82(4):1311-7. 9. Walsh G. Current status of biopharmaceuticals: Approved products and trends in approvals. Knäblein J, editor 2005. 1–34 p. 10. Walsh G. Second-generation biopharmaceuticals. European Journal of Pharmaceutics and Biopharmaceutics. 2004;58(2):185-96. 11. Macdougall IC, Eckardt K-U. Novel strategies for stimulating erythropoiesis and potential new treatmentsfor anaemia. The Lancet. 2006;368(9539):947-53. 12. Rader R. FDA Biopharmaceutical Product Approvals and Trends. Biotechnology Information Institute, www biopharma com/approvals_2011 html, accessed Jan. 2012;16. 13. Zhu J. Mammalian cell protein expression for biopharmaceutical production. Biotechnology Advances. 2011. 14. Cartwright T. Animal cells as bioreactors: Cambridge Univ Pr; 1994. 15. Hossler P, Khattak SF, Li ZJ. Optimal and consistent protein glycosylation in mammalian cell culture. Glycobiology. 2009;19(9):936-49. 16. Glassey J, Gernaey KV, Clemens C, Schulz TW, Oliveira R, Striedner G, et al. Process analytical technology (PAT) for biopharmaceuticals. Biotechnology Journal. 2011. 17. Langer E. Advances in Large-Scale Biopharmaceutical Manufacturing and Scale-Up Production: ASM Press and BioPlan Associates, Inc., Washington, DC; 2007. 192 18. Sanchez S, Demain A. Special issue on the production of recombinant proteins. Biotechnology Advances. 2012;30(5):1100-1. 19. Walsh G. Biopharmaceuticals: biochemistry and biotechnology: Wiley- Blackwell; 2003. 20. Masters JRW. Animal cell culture: A practical approach: Oxford University Press Oxford; 2000. 21. Butler M. Animal cell cultures: recent achievements and perspectives in the production of biopharmaceuticals. Applied Microbiology and Biotechnology. 2005;68(3):283-91. 22. Rhiel M, Cohen MB, Murhammer DW, Arnold MA. Nondestructive near‐ infrared spectroscopic measurement of multiple analytes in undiluted samples of serum‐based cell culture media. Biotechnology and Bioengineering. 2002;77(1):73- 82. 23. Riley MR, Crider HM, Nite ME, Garcia RA, Woo J, Wegge RM. Simultaneous measurement of 19 components in serum‐containing animal cell culture media by Fourier transform near‐infrared spectroscopy. Biotechnology progress. 2001;17(2):376-8. 24. Sivakesava S, Irudayaraj J, Ali D. Simultaneous determination of multiple components in lactic acid fermentation using FT-MIR, NIR, and FT-Raman spectroscopic techniques. Process Biochemistry. 2001;37(4):371-8. 25. Sivakesava S, Irudayaraj J, Demirci A. Monitoring a bioprocess for ethanol production using FT-MIR and FT-Raman spectroscopy. Journal of Industrial Microbiology and Biotechnology. 2001;26(4):185-90. 26. Perez-Garcia O, Escalante FME, de-Bashan LE, Bashan Y. Heterotrophic cultures of microalgae: Metabolism and potential products. Water Research. 2011;45(1):11-36. 27. Nielsen LK. The encyclopedia of cell technology: Wiley New York; 2000. 28. Hauser H, Wagner R. Mammalian cell biotechnology in protein production: Walter De Gruyter Inc; 1997. 29. Altamirano C, Paredes C, Cairo J, Godia F. Improvement of CHO cell culture medium formulation: simultaneous substitution of glucose and glutamine. Biotechnology progress. 2000;16(1):69-75. 30. Barngrover D, Thomas J, Thilly W. High density mammalian cell growth in Leibovitz bicarbonate-free medium: effects of fructose and galactose on culture biochemistry. Journal of cell science. 1985;78(1):173-89. 31. Altamirano C, Cairo J, Godia F. Decoupling cell growth and product formation in Chinese hamster ovary cells through metabolic control. Biotechnology and Bioengineering. 2001;76(4):351-60. 32. Coster J, McCauley R, Hall J. Glutamine: metabolism and application in nutrition support. Asia Pacific journal of clinical nutrition. 2004;13(1):25-31. 33. van der Valk J, Brunner D, De Smet K, Fex Svenningsen Å, Honegger P, Knudsen LE, et al. Optimization of chemically defined cell culture media – Replacing fetal bovine serum in mammalian in vitro methods. Toxicology in Vitro. 2010;24(4):1053-63. 34. Bashor MM. Dispersion and disruption of tissues. Methods in enzymology. 1979;58:119. 35. Williamson J, Cox P. Use of a new buffer in the culture of animal cells. Journal of General Virology. 1968;2(2):309-12. 193 36. Incorprated AB. Product Data Sheet HEPES Buffer Solution (1M) 2010. Available from: http://www.atlantabio.com/assets/PDS/File/PDS%20- %20HEPES%20Buffer%20Solution.pdf. 37. Luo S, Pal D, Shah SJ, Kwatra D, Paturi KD, Mitra AK. Effect of HEPES buffer on the uptake and transport of P-glycoprotein substrates and large neutral amino acids. Molecular pharmaceutics. 2010;7(2):412-20. 38. Delmouly K, Belondrade M, Casanova D, Milhavet O, Lehmann S. HEPES inhibits the conversion of prion protein in cell culture. Journal of General Virology. 2011;92(5):1244-50. 39. Denry Sato J, Kan, M. Media for culture of mammalian cells. Current Protocols in Cell Biology. 1998;1:1-.2. 40. Langer E. Trends to Watch in the Biopharmaceutical Industry: The Economy, Approvals, Contamination, and Going Animal-Free. 2010. 41. Cleland D, Jastrzembski K, Stamenova E, Benson J, Catranis C, Emerson D, et al. Growth characteristics of microorganisms on commercially available animal- free alternatives to tryptic soy medium. Journal of microbiological methods. 2007;69(2):345-52. 42. Jayme D, Watanabe T, Shimada T. Basal medium development for serum-free culture: a historical perspective. Cytotechnology. 1997;23(1):95-101. 43. Kim DY, Lee JC, Chang HN, Oh DJ. Development of serum-free media for a recombinant CHO cell line producing recombinant antibody. Enzyme and microbial technology. 2006;39(3):426-33. 44. Sung Y, Lim S, Chung J, Lee G. Yeast hydrolysate as a low-cost additive to serum-free medium for the production of human thrombopoietin in suspension cultures of Chinese hamster ovary cells. Applied Microbiology and Biotechnology. 2004;63(5):527-36. 45. Franěk F, Hohenwarter O, Katinger H. Plant protein hydrolysates: preparation of defined peptide fractions promoting growth and production in animal cells cultures. Biotechnology progress. 2000;16(5):688-92. 46. Michiels J-F, Barbau J, De Boel S, Dessy S, Agathos S, Schneider Y-J. Characterisation of beneficial and detrimental effects of a soy peptone, as an additive for CHO cell cultivation. Process Biochemistry. 2011;46(3):671-81. 47. Schlaeger E-J. The protein hydrolysate, Primatone RL, is a cost-effective multiple growth promoter of mammalian cell culture in serum-containing and serum- free media and displays anti-apoptosis properties. Journal of immunological methods. 1996;194(2):191-9. 48. Burteau CC, Verhoeye FR, Molsl JF, Ballez J-S, Agathos SN, Schneider Y-J. Fortification of a protein-free cell culture medium with plant peptones improves cultivation and productivity of an interferon-γ-producing CHO cell line. In Vitro Cellular & Developmental Biology-Animal. 2003;39(7):291-6. 49. Heidemann R, Zhang C, Qi H, Rule JL, Rozales C, Park S, et al. The use of peptones as medium additives for the production of a recombinant therapeutic protein in high density perfusion cultures of mammalian cells. Cytotechnology. 2000;32(2):157-67. 50. Dick LW, Kakaley JA, Mahon D, Qiu D, Cheng KC. Investigation of proteins and peptides from yeastolate and subsequent impurity testing of drug product. Biotechnology progress. 2009;25(2):570-7. http://www.atlantabio.com/assets/PDS/File/PDS%20-%20HEPES%20Buffer%20Solution.pdf http://www.atlantabio.com/assets/PDS/File/PDS%20-%20HEPES%20Buffer%20Solution.pdf 194 51. Mosser M, Chevalot I, Olmos E, Blanchard F, Kapel R, Oriol E, et al. Combination of yeast hydrolysates to improve CHO cell growth and IgG production. Cytotechnology. 2012:1-13. 52. Even MS, Sandusky CB, Barnard ND. Serum-free hybridoma culture: ethical, scientific and safety considerations. Trends in Biotechnology. 2006;24(3):105-8. 53. Sarkar A. Stem Cell Culture: Discovery Publishing House. 54. Ham RG. Clonal growth of mammalian cells in a chemically defined, synthetic medium. Proceedings of the National Academy of Sciences of the United States of America. 1965;53(2):288. 55. Gstraunthaler G. Alternatives to the use of fetal bovine serum: serum-free cell culture. Altex. 2003;20(4):275-81.56. Murakami H, Masui H, Sato GH, Sueoka N, Chow TP, Kano-Sueoka T. Growth of hybridoma cells in serum-free medium: ethanolamine is an essential component. Proceedings of the National Academy of Sciences. 1982;79(4):1158. 57. Kong ZLM, M. Murakami, H. Shinohara, K. Establishment of a macrophagelike cell line derived from U-937, human histiocytic lymphoma, grown serum-free. In Vitro Cellular & Developmental Biology-Plant. 1990;26(10):949-54. 58. Kawahara MN, A. Terada, S. Kato, K. Tsumoto, K. Kumagai, I. Miki, M. Mahoney, W. Ueda, H. Nagamune, T. Replacing factor-dependency with that for lysozyme: affordable culture of IL-6-dependent hybridoma by transfecting artificial cell surface receptor. Biotechnology and Bioengineering. 2001;74(5):416-23. 59. Chua F, Oh SKW, Yap M, Teo WK. Enhanced IgG production in eRDF media with and without serum:: A comparative study. Journal of immunological methods. 1994;167(1-2):109-19. 60. Garnick R, Solli N, Papa P. The role of quality control in biotechnology: An analytical perspective. Analytical Chemistry. 1988;60(23):2546-57. 61. Hanko VP, Rohrer JS. Determination of carbohydrates, sugar alcohols, and glycols in cell cultures and fermentation broths using high-performance anion- exchange chromatography with pulsed amperometric detection. Analytical Biochemistry. 2000;283(2):192-9. 62. Hanko VP, Rohrer JS. Determination of amino acids in cell culture and fermentation broth media using anion-exchange chromatography with integrated pulsed amperometric detection. Analytical Biochemistry. 2004;324(1):29-38. 63. Fa Y, Yang H, Ji C, Cui H, Zhu X, Du J, et al. Simultaneous determination of amino acids and carbohydrates in culture media of< i> Clostridium thermocellum</i> by valve-switching ion chromatography. Analytica Chimica Acta. 2013;798:97-102. 64. Buha SM, Panchal A, Panchal H, Chambhare R, Patel PR, Kumar S, et al. HPLC-FLD for the Simultaneous Determination of Primary and Secondary Amino Acids from Complex Biological Sample by Pre-column Derivatization. Journal of chromatographic science. 2011;49(2):118-23. 65. Genzel Y, König S, Reichl U. Amino acid analysis in mammalian cell culture media containing serum and high glucose concentrations by anion exchange chromatography and integrated pulsed amperometric detection. Analytical Biochemistry. 2004;335(1):119-25. 66. Potvin J, Fonchy E, Conway J, Champagne CP. An automatic turbidimetric method to screen yeast extracts as fermentation nutrient ingredients. Journal of microbiological methods. 1997;29(3):153-60. 195 67. Pohlscheidt M, Charaniya S, Bork C, Jenzsch M, Noetzel TL, Luebbert A. Bioprocess and fermentation monitoring. Encyclopedia of Industrial Biotechnology. 2012. 68. Sun Y-t, Zhao L, Ye Z, Fan L, Liu X-p, Tan W-S. Development of a fed-batch cultivation for antibody-producing cells based on combined feeding strategy of glucose and galactose. Biochemical Engineering Journal. 2013;81:126-35. 69. Food, Administration D. Guidance for Industry: PAT—a framework for innovative pharmaceutical development, manufacturing, and quality assurance. Rockville, MD. 2004. 70. De Beer T, Allesø M, Goethals F, Coppens A, Vander Heyden Y, De Diego HL, et al. Implementation of a process analytical technology system in a freeze-drying process using Raman spectroscopy for in-line process monitoring. Analytical Chemistry. 2007;79(21):7992-8003. 71. De Beer T, Burggraeve A, Fonteyne M, Saerens L, Remon JP, Vervaet C. Near infrared and Raman spectroscopy for the in-process monitoring of pharmaceutical production processes. International Journal of Pharmaceutics. 2011;417(1):32-47. 72. De Beer TRM, Bodson C, Dejaegher B, Walczak B, Vercruysse P, Burggraeve A, et al. Raman spectroscopy as a process analytical technology (PAT) tool for the in- line monitoring and understanding of a powder blending process. Journal of Pharmaceutical and Biomedical Analysis. 2008;48(3):772. 73. Johansson J, Pettersson S, Folestad S. Characterization of different laser irradiation methods for quantitative Raman tablet assessment. Journal of Pharmaceutical and Biomedical Analysis. 2005;39(3-4):510. 74. Clarke SJ, Littleford RE, Smith WE, Goodacre R. Rapid monitoring of antibiotics using Raman and surface enhanced Raman spectroscopy. Analyst. 2005;130(7):1019-26. 75. Jain G, Jayaraman G, Kökpinar Ö, Rinas U, Hitzmann B. On-line monitoring of recombinant bacterial cultures using multi-wavelength fluorescence spectroscopy. Biochemical Engineering Journal. 2011;58–59(0):133-9. 76. Lee HLT, Boccazzi P, Gorret N, Ram RJ, Sinskey AJ. In situ bioprocess monitoring of Escherichia coli bioreactions using Raman spectroscopy. Vibrational Spectroscopy. 2004;35(1-2):131. 77. Lee HW, Christie A, Yoon S. Characterization of Raw Material Influence on Mammalian Cell Culture Performance: Chemometric Based Data Fusion Approach. 78. Lourenço N, Lopes J, Almeida C, Sarraguça M, Pinheiro H. Bioreactor monitoring with spectroscopy and chemometrics: a review. Analytical and bioanalytical chemistry. 2012;404(4):1211-37. 79. Macaloney G, Draper I, Preston J, Anderson K, Rollins M, Thompson B, et al. At-Line Control and Fault Analysis In an Industrial High Cell Density Escherichia Coli Fermentation, Using NIR Spectroscopy. Food and Bioproducts Processing. 1996;74(4):212-20. 80. Marose S, Lindemann C, Scheper T. Two-Dimensional Fluorescence Spectroscopy: A New Tool for On-Line Bioprocess Monitoring. Biotechnology progress. 1998;14(1):63. 81. Triadaphillou S, Martin E, Montague G, Norden A, Jeffkins P, Stimpson S. Fermentation process tracking through enhanced spectral calibration modeling. Biotechnology and Bioengineering. 2007;97(3):554-67. 196 82. Ryder AGV, John De Li, Boyan Ryan, Paul W. Sirimuthu, Narayana M. S. Leister, Kirk J. A stainless steel multi-well plate (SS-MWP) for high-throughput Raman analysis of dilute solutions. Journal of Raman Spectroscopy. 2010;41(10):1266-75. 83. Settle FA. Handbook of Instrumental Techniques for Analytical Chemistry. Journal of Liquid Chromatography Related Technologies. 1998;21(19):3072-6. 84. Ewing GW. Analytical instrumentation handbook: CRC Press; 1997. 85. Willard HH, Merritt Jr LL, Dean JA. Instrumental methods of analysis. Settle Jr FA, editor: Wadsworth Pub. Co.; 1988. 86. Turrell G, Corset J. Raman microscopy: developments and applications: Access Online via Elsevier; 1996. 87. McCreery R. Raman spectroscopy for chemical analysis: Wiley-Interscience; 2000. 88. Hollas M. Modern Spectroscopy. 1987. New York: John Wiley & Sons. 89. Straughan B, Walker S. Spectroscopy: Chapman and Hall London; 1976. 90. Collette TW, Williams TL. The role of Raman spectroscopy in the analytical chemistry of potable water. Journal of Environmental Monitoring. 2002;4(1):27-34. 91. Egawa T, Yeh S-R. Structural and functional properties of hemoglobins from unicellular organisms as revealed by resonance Raman spectroscopy. Journal of Inorganic Biochemistry. 2005;99(1):72-96. 92. Smith E, Dent G. Modern Raman spectroscopy: a practical approach: Wiley; 2005. 93. Sebastian R, Petra R, Marion AS, Dorothea B, Malgorzata B, Hartwig S, et al. Nondestructive analysis of single rapeseeds by means of Raman spectroscopy. Journal of Raman Spectroscopy. 2007;38(3):301-8. 94. Ortiz C, Zhang D, Xie Y, Davisson VJ, Ben-Amotz D. Identification of insulin variants using Raman spectroscopy. Analytical Biochemistry. 2004;332(2):245. 95. Zhu G, Zhu X, Fan Q, Wan X. Raman spectra of amino acids and their aqueous solutions. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy. 2011;78(3):1187-95. 96. Kang J, Yuan X, Dong X, Gu H, editors. The effect of aqueous solution in Raman spectroscopy. Photonics and Optoelectronics Meetings 2009; 2009: International Society for Optics and Photonics. 97. Cannizzaro C, Rhiel