Baixe o app para aproveitar ainda mais
Prévia do material em texto
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/305175113 UC Irvine CEE-290: Topic 1 (Introduction and linear/nonlinear regression) Note: An animated version of this presentation can be found at faculty.sites.uci.edu/jasper/teaching These... Presentation · July 2016 DOI: 10.13140/RG.2.1.2154.6487 CITATIONS 0 READS 264 1 author: Some of the authors of this publication are also working on these related projects: National Science Foundation Partnerships for International Research and Education View project Climate Adaptation of Rural Areas View project Jasper Vrugt University of California, Irvine 253 PUBLICATIONS 11,306 CITATIONS SEE PROFILE All content following this page was uploaded by Jasper Vrugt on 11 July 2016. The user has requested enhancement of the downloaded file. 1 ? TOPIC 1: LINEAR/NONLINEAR REGRESSION MERGING MODELS WITH DATA JASPER A. VRUGT ENGINEERING TOWER 516B jasper@uci.edu CEE 290 @ UCI . 2 THE NETHERLANDS 3 NETHERLANDS: RED-WHITE-BLUE ( ALSO ORANGE ) 50 km AMSTERDAM 4 NETHERLANDS: RED-WHITE-BLUE ( ALSO ORANGE ) 50 km AMSTERDAM ANTEATER AGAINST DUTCH LION WHO WILL WIN? 4. “I’VE NEVER SEEN A BAG OF MONEY SCORE A GOAL” 5. “IF YOU CAN’T WIN, MAKE SURE YOU DON’T LOSE” 6. “IT IS BETTER TO GO DOWN WITH YOUR OWN VISION THAN SOMEONE ELSE’S VISION“ 7. “YOU WILL START TO SEE IT ONCE YOU UNDERSTAND IT” 8. “IF I START RUNNING SLIGHTLY EARLIER THAN SOMEONE ELSE, I SEEM FASTER” 9. “EVERY DISADVANTAGE HAS ITS ADVANTAGE“ 10. “IF I WANTED YOU TO UNDERSTAND IT, I WOULD HAVE EXPLAINED IT BETTER” 11. “QUALITY WITHOUT RESULTS IS POINTLESS. RESULTS WITHOUT QUALITY IS BORING” 12. “WE MUST MAKE SURE THEIR WORST PLAYERS GET THE BALL THE MOST. YOU’LL GET IT BACK IN NO TIME” 13. “YOU HAVE GOT TO SHOOT, OR YOU CAN'T SCORE” JOHAN CRUIJFF: DUTCH SOCCER WISDOM 5 1. “PLAYING FOOTBALL IS VERY SIMPLE, BUT PLAYING SIMPLE FOOTBALL IS THE HARDEST THING THERE IS” 2. “TO WIN YOU HAVE TO SCORE ONE MORE GOAL THAN YOUR OPPONENT” 3. “IF YOU PLAY ON BALL POSSESSION, YOU DON’T HAVE TO DEFEND, BECAUSE THERE’S ONLY ONE BALL” HYPOTHESIS FORMULATION EXPERIMENTAL DESIGN DATA COLLECTION MODEL-DATA ANALYSIS ? THE ITERATIVE RESEARCH CYCLE 6 7 CEE-290 ( INTRODUCTION ) 8 ENVIRONMENTAL MODELING … PHYSICALLY-BASED MODEL ( SCALE UP KNOWN PHYSICS ) Σ 𝐘 I 1 I 2 I 3 I 4 neurons 𝐰𝑖,𝑗 neuron bias inputs 1 1 input layer hidden layer output layer CONCEPTUAL MODEL ( STATE VARIABLES AND MASS BALANCE ENFORCED BUT CONCEPTUAL EQS. FLUX ) BLACK BOX MODEL ( REGRESSION FUNCTION ) NO MATTER HOW SPATIALLY EXPLICIT AND/OR DETAILED, ALL MODELS CONTAIN PARAMETERS (CONSTANTS) THAT ARE SYSTEM DEPENDENT 9 CONCEPTUAL PARAMETERS CONCEPTUAL PARAMETER PROPERTY THAT IMIGHT HAVE A PHYSICAL INTERPRETATION BUT VIRTUALLY IMPOSSIBLE TO MEASURE DIRECTLY IN FIELD OR LAB 𝑆 P EP 𝑆max 𝑄𝑡 = 𝑎𝑆𝑡 0 P rainfall EP evapotranspiration 𝑆 soil water storage 𝑄 subsurface flow 𝑆max soil water storage capacity 𝑎 reservoir constant STATE FLUX INPUT CONCEPTUAL PARAMETERS PARAMETERS: SYSTEM PROPERTIES THAT ARE TIME INVARIANT (FIXED) CONCEPTUAL MODELS CAN USE STATE VARIABLES AND MASS BALANCE LAWS BUT THE FLUX EQUATIONS DEFINING FLOWS INTO AND OUT OF THE CONTROL VOLUME(S) ARE TYPICALLY CONCEPTUAL RATHER THAN PHYSICS-BASED 10 PHYSICAL PARAMETERS PARAMETERS: SYSTEM PROPERTIES THAT ARE TIME INVARIANT (FIXED) PHYSICAL PARAMETER PROPERTY THAT HAS AN EXPLICIT PHYSICAL SIGNIFICANCE AND THEREFORE DIRECTLY MEASURABLE OR DERIVABLE FROM PHYSICAL LAWS LETS LOOK AT FORCES 𝐹down = 𝑚𝑙𝑔 𝐹up = 𝐴∆𝑃 AT EQUILIBRIUM WATER LEVEL, 𝑭𝐝𝐨𝐰𝐧 = 𝑭up LAPLACE EQUATION 𝜌𝑙𝜋𝑟 2ℎ𝑙𝑔 = 𝜋𝑟 2 2𝜎cos(𝛼) 𝑟 ℎ𝑙 = 2𝜎cos(α) 𝑟𝜌𝑙𝑔 𝐹down 𝐹up ℎl yields = 𝜌𝑙𝑉𝑙𝑔ℎ𝑙 = 𝜌𝑙𝜋𝑟 2ℎ𝑙𝑔 = 𝐴 2𝜎 𝑅 = 𝜋𝑟2 2𝜎cos(𝛼) 𝑟 𝜎 surface tension ( N m ) PHYSICAL PARAMETER WE CAN MEASURE DIRECTLY THE SURFACE TENSION OF A FLUID USING LAPLACE EQUATION ( THIS IS AN INTRINSIC FLUID PROPERTY ) 11 11 Σ Σ Σ Σ 𝐘 I 1 I 2 I 3 I 4 neurons 𝐰𝑖,𝑗 neuron b 1 1 input layer hidden layer output layer OTHER PARAMETERS FITTING PARAMETER PROPERTY THAT PROBABLY HAS NO EXPLICIT INTERPRETATION OTHER THAN THAT IT HELPS TO MIMIC THE EXPERIMENTAL DATA PARAMETERS: SYSTEM PROPERTIES THAT ARE TIME INVARIANT (FIXED) BLACK BOX MODELS ( NEURAL NETWORKS, REGRESSION MODELS, ETC. ) DO NOT USE EQUATIONS THAT ARE BASED ON PHYSICAL PRINCIPLES. INSTEAD THEY USE FLEXIBLE FUNCTIONS WITH FITTING COEFFICIENTS 12 HETEROGENEITY OF SYSTEM PROPERTIES PROBLEM: MOST SYSTEMS ARE HIGHLY HETEROGENEOUS WITH SYSTEM PROPERTIES THAT VARY A LOT SPATIALLY ( IN SPACE ) DIFFICULT TO CHARACTERIZE ADEQUATELY EVEN THE MOST PHYSICAL PARAMETERS WITHIN THE MODEL APPLICATION DOMAIN OF INTEREST FOR EXAMPLE: STUDY OF ROOT WATER UPTAKE AND SOIL WATER FLOW DIFFICULT TO CHARACTERIZE ADEQUATELY EVEN THE MOST PHYSICAL PARAMETERS WITHIN THE MODEL APPLICATION DOMAIN OF INTEREST -p re s s u re h e a d , h ( c m ) moisture content (cm3/cm3) 0.1 0.2 0.3 0.4 0.5 101 102 103 104 H y d ra u lic c o n d u c ti v it y, K ( c m /d a y ) 100 101 102 103 10 100 1,000 10,000 -pressure head, h (cm) 𝜃 ℎ = 𝜃s + 𝜃s − 𝜃r 1 + (𝛼 ℎ ) 𝑛 −𝑚 𝐾 ℎ = 𝐾s 1 − 1 − 1 + 𝛼 ℎ 𝑛 −1 𝑚 2 [1 + 𝛼 ℎ 𝑛]𝑚𝜆 𝜃s 𝜃r 𝛼 𝐾s HARD TO CHARACTERIZE SPATIAL VARIABILITY OF MATERIAL PROPERTIES ( TIME CONSUMING, EXPENSIVE, DESTRUCTIVE ) 13 HETEROGENEITY OF SYSTEM PROPERTIES PROBLEM: MOST SYSTEMS ARE HIGHLY HETEROGENEOUS WITH SYSTEM PROPERTIES THAT VARY A LOT SPATIALLY ( IN SPACE ) DIFFICULT TO CHARACTERIZE ADEQUATELY EVEN THE MOST PHYSICAL PARAMETERS WITHIN THE MODEL APPLICATION DOMAIN OF INTEREST FOR EXAMPLE: STUDY OF ROOT WATER UPTAKE AND SOIL WATER FLOW DIFFICULT TO CHARACTERIZE ADEQUATELY EVEN THE MOST PHYSICAL PARAMETERS WITHIN THE MODEL APPLICATION DOMAIN OF INTEREST -p re s s u re h e a d , h ( c m ) moisture content (cm3/cm3) 0.1 0.2 0.3 0.4 0.5 101 102 103 104 H y d ra u lic c o n d u c ti v it y, K ( c m /d a y ) 100 101 102 103 10 100 1,000 10,000 -pressure head, h (cm) 𝜃 ℎ = 𝜃s + 𝜃s − 𝜃r 1 + (𝛼 ℎ ) 𝑛 −𝑚 𝐾 ℎ = 𝐾s 1 − 1 − 1 + 𝛼 ℎ 𝑛 −1 𝑚 2 [1 + 𝛼 ℎ 𝑛]𝑚𝜆 𝜃s 𝜃r 𝛼 𝐾s HARD TO CHARACTERIZE SPATIAL VARIABILITY OF MATERIAL PROPERTIES ( TIME CONSUMING, EXPENSIVE, DESTRUCTIVE ) THE ONLY WAY WE CAN DERIVE THE VALUES OF THE PARAMETERS IS THROUGH MODEL CALIBRATION USING DATA OF THE OBSERVED SYSTEM BEHAVIOR 14 VERY SHORT SUMMARY OF CEE-290 TRUE INPUT TRUE RESPONSE OBSERVED INPUT SIMULATED RESPONSE OBSERVED RESPONSE OPTIMIZE ENVIRONMENTAL MODELING FRAMEWORK SYSTEM 𝑦 = 𝑓(𝛉) 15 MEASUREMENT PRIOR INFO PARAMETERS O U T P U T TIME PARAMETERS THE BEHAVIOR OF A REAL-WORLD SYSTEM IS DESCRIBED WITH A COMPUTER MODEL COMMON ASSUMPTION: THE INPUT TO THIS MODEL ( BOUNDARY CONDITIONS ) AND THE MODELITSELF ( EQUATIONS, PROCESS PHYSICS ) ARE “PERFECT” NOW TUNE PARAMETERS SO THAT MODEL MIMICS AS CLOSELY AND CONSISTENLY AS POSSIBLE THE OBSERVED SYSTEM RESPONSE 16 APPLY PNEUMATIC PRESSURE INCREASE PNEUMATIC PRESSURE INCREASE PNEUMATIC PRESSURE INCREASE PNEUMATIC PRESSURE INCREASE PNEUMATIC PRESSURE INCREASE PNEUMATIC PRESSURE LAB EXPERIMENT: AN EXAMPLE 17 𝑆e ℎ = 𝜃 ℎ − 𝜃r 𝜃s − 𝜃r = 1 + (𝛼 ℎ )𝑛 −𝑚 𝐾 𝑆e = 𝐾s 𝑆e 𝜆 1 − 1 − 𝑆e 1/𝑚 𝑚 2 SOIL WATER RETENTION FUNCTION SOIL HYDRAULIC CONDUCTIVITY FUNCTION 20 40 60 80 100 TIME [ hours ] 0 O U T F L O W [ c m ] 0.4 0.8 1.2 1.6 2.0 PRESSURE INCREMENT DATA POINT GOVERNING EQUATION 𝐶w ℎ 𝜕ℎ 𝜕𝑡 = 𝜕 𝜕𝑧 𝐾 ℎ 𝜕ℎ 𝜕𝑧 + 1 MEASURED DATA AND NUMERICAL MODEL 18 𝜃r 𝜃s(∗) 𝐾s(∗) 𝑛 𝛼 𝜆 20 40 60 80 100 O U T F L O W [ c m ] 0.4 0.8 1.2 1.6 2.0 TIME [ hours ] 0 0.040 0.40 1.40 0.0110 0.604 0.30 0.050 0.40 1.35 0.0100 0.604 0.50 0.059 0.40 1.36 0.0097 0.604 0.70 0.040 0.40 1.42 0.0080 0.604 0.80 PARAMETER FITTING (INVERSE MODELING) ∗ ASSUME INDEPENDENTLY MEASURED VALUE 19 OUTCOME: SOIL HYDRAULIC FUNCIONS moisture content (cm3/cm3) 0.1 0.2 0.3 0.4 0.5 -p re s s u re h e a d , h ( c m ) 101 102 103 104 SOIL WATER RETENTION FUNCTION SOIL HYDRAULIC CONDUCTIVITY FUNCTION H y d ra u lic c o n d u c ti v it y, K ( c m /d a y ) 1 101 102 103 10 100 1,000 10,000 -pressure head, h (cm) 20 REALITY MORE COMPLEX ~ MODEL OUTPUT(S) ( DIAGNOSTIC VARIABLE(S) ) INITIAL STATE(S) ( PROGNOSTIC VARIABLE(S) ) FORCING DATA ( INPUT VARIABLE(S) ) SYSTEM INVARIANT(S) ( PARAMETER(S) ) Y = 𝑓(𝛉, 𝐱 𝟎, I ) 𝑃(𝛉) 𝑃(𝐘) 𝑃(𝐱 0) 𝑃(𝐈 ) MODEL STRUCTURAL ERROR ( = EPISTEMIC ERROR ) ALSO CALLED BOUNDARY CONDITIONS ( WORING `MODEL INPUT’ IS NOT PRECISE ) 21 THE MODEL-DATA FUSION PROBLEM DERIVE JOINT PDF OF , , , , , OBSERVATION(S) ( CALIBRATION DATA ) ? ~ MODEL OUTPUT(S) ( DIAGNOSTIC VARIABLE(S) ) INITIAL STATE(S) ( PROGNOSTIC VARIABLE(S) ) FORCING DATA ( INPUT VARIABLE(S) ) SYSTEM INVARIANT(S) ( PARAMETER(S) ) Y = 𝑓(𝛉, 𝐱 𝟎, I ) 𝑃(𝛉) 𝑃(𝐘) 𝑃(𝐱 0) 𝑃(𝐈 ) MODEL STRUCTURAL ERROR ( = EPISTEMIC ERROR ) P(𝐘 ) IMAGINE THAT WE CONSIDER OUR MODEL STRUCTURE AND PARAMETER VALUES TO BE EQUIVALENT TO A HYPOTHESIS AND WE REFER TO THIS AS THE ENTITY 𝑨 KEY TO MODEL IMPROVEMENT ALSO CALLED BOUNDARY CONDITIONS ( WORING `MODEL INPUT’ IS NOT PRECISE ) BAYESIAN INFERENCE 22 EXAMPLE APPLICATIONS OF WHAT WE WILL LEARN 23 VADOSE ZONE HYDROLOGY Wohling and Vrugt, WRR (2011) 24 GLOBAL-SCALE HYDROLOGIC MODELING Sperna-Weiland et al., JOH (2015) 25 GEOPHYSICS: GROUND PENETRATING RADAR Laloy et al., WRR (2012) 26 ECOHYDROLOGY: OPTIMALITY CONCEPT Dekker et al., EH (2012) 27 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ERT SWC adikeqr,dike qs,dike ndike Ks,dike asoil nsoil Ks,soil HYDROGEOPHYSICS: VALUE OF DATA Huisman et al., JOH (2010); Hinnell et al., WRR (2010) 28 GEOSTATISTICS: SPATIAL UNCERTAINTY Minasny et al., GD (2011) 29 241 parameters HYDROGEOLOGY: CPU-INTENSIVE MODELS Keating et al., WRR (2010) 30 Flight time En er gy -u se LONG DISTANCE BIRD MIGRATION Vrugt et al., JAB (2007) 31 PLANKTON BIOGEOCHEMISTRY Martiny et al., NG (2013); Flombaum et al., PNAS (2013) 32 AEROSOL-CLOUD INTERACTIONS Partridge et al., ACP (2011, 2012) 33 STRUCTURAL ENGINEERING: VOID DETECTION Qin et al., ATM (2016) 34 GEOMORPHOLOGY:DEPTH TO BEDROCK Gomes et al., WRR (2016) 35 SOME BASIC CONCEPTS MATHEMATICAL NOTATION 𝑦 = 𝑓(𝛉) 𝑎 SCALAR ( SINGLE VALUE ) 𝐚 = 𝑎1, … , 𝑎𝑛 VECTOR OF 𝑛 VALUES ( ALSO CALLED 𝑛-VECTOR ) 𝐀 = 𝑎11 𝑎12 𝑎21 𝑎22 𝟐 × 𝟐MATRIX (𝟐 ROWS AND 𝟐COLUMNS ) 𝜃 PARAMETER VALUE ( SCALAR ) 𝛉 = 𝜃1, … , 𝜃𝑝 PARAMETER VECTOR ( 𝑝 VALUES ) 𝐱 0 = 𝑥 1, … , 𝑥 𝑑 VECTOR WITH INITIAL STATES ( 𝑑 VALUES ) 𝐈 = 𝑖 11 ⋯ 𝑖 1𝑚 ⋮ ⋱ ⋮ 𝑖 𝑛1 ⋯ 𝑖 𝑛𝑚 𝑛 × 𝑚 MATRIX WITH FORCING DATA ( TRANSIENT MODEL INPUTS ) 𝐘 = 𝑓(𝛉, 𝐱 0, 𝐈 ) NUMERICAL MODEL ( RETURNS SIMULATED OUTPUT ) NOTE: THE SUPERSCRIPT TILDE, ~, IS USED TO DENOTE MEASURED QUANTITIES 36 COMMON MATHEMATICS NOTATION I USE HEREIN MODEL-DATA FITTING: RESIDUALS 𝑥 , (-) 𝑦 , (- ) ASSUME MODEL 𝑦 = 𝑓 𝑥 = 𝑎 + 𝑏exp(−𝑥) ⇒ 𝛆(𝛉, 𝐱 0, 𝐈 ) = 𝐘 − 𝑓(𝛉, 𝐱 0, 𝐈 ) ⇒ 𝛆(𝛉) = 𝐘 − 𝑓 𝛉 FOR NOW LETS ASSUME WE IGNORE INITIAL STATE AND INPUT DATA ERROR ⇒ 𝛆(𝛉) = 𝐘 − 𝐘 𝛉 ⇒ 𝛆(𝛉) = 𝜀1 𝛉 , … , 𝜀𝑛 𝛉 ERROR RESIDUAL, 𝜺(∙), IS EQUIVALENT TO DISTANCE BETWEEN MODEL PREDICTION (SIMULATION) AND CORRESPONDING OBSERVATION 𝜀1 𝜀2 𝜀3 𝜀4 𝜀5 𝜀6 𝜀7 𝜀8 𝜀9 𝛉 = 𝑎, 𝑏 37 ⇒ 𝑦 = 𝜃1 + 𝜃2exp(−𝑥) MODEL-DATA FITTING: LEAST SQUARES 𝑥 , (-) 𝑦 , (- ) 𝜀1 𝜀2 𝜀3 𝜀4 𝜀5 𝜀6 𝜀7 𝜀8 𝜀9 𝑦 = 𝜃1 + 𝜃2exp(−𝑥) ASSUMPTIONS ABOUT RESIDUALS (1) REPRESENT MEASUREMENT ERRORS (2) EXHIBIT GAUSSIAN DISTRIBUTION (3) CONSTANT VARIANCE (4) UNCORRELATED argmin 𝛉= 𝑎,𝑏 ∈ℝ2 𝐹 𝛉 = 𝑦 𝑖 − 𝑓 𝑥 𝑖 , 𝛉 2 𝑛 𝑖=1 = 𝜀𝑖 𝛉 2 𝑛 𝑖=1 GAUSS LEGENDRE 𝑛 = 9 ( PRESENT EXAMPLE ) 38 MINIMIZE SUM OF SQUARED RESIDUALS LEAST SQUARES IS USED WIDELY TO FIT ALL TYPE OF MODELS TO DATA 39 𝐹 (𝛉 ) MODEL-DATA FITTING: RESPONSE SURFACE min 𝜃∈Θ 𝐹 𝛉 = 𝜀𝑖 𝛉 2 𝑛 𝑖=1 40 WHICH METHODS WILL WE DISCUSS? 41 METHODS:SINGLE-OBJECTIVE OPTIMIZATION 𝐹 (𝛉 ) 𝛉(0) 𝛉(0) GLOBAL MINIMUM LOCAL MINIMUM 𝛉 = argmin θ∈𝛩∈ℝ𝑝 𝐹 𝛉 42 METHODS:MULTI-CRITERIA OPTIMIZATION 𝐹1(𝛉) 𝐹 2 (𝛉 ) 𝛉 = argmin θ∈𝛩∈ℝ𝑝 𝐹1 𝛉 , 𝐹2 𝛉 PARETO FRONT PARETO SOLUTION 43 METHODS:BAYESIAN INFERENCE 𝜃 𝑃 𝜃 𝐃 𝑝max 𝜃 𝐃 𝑃(𝛉|𝐃 ) 𝑃(𝛉) 𝐿(𝛉|𝐃 ) = 𝑃(𝐃 ) 𝑃(𝛉|𝐃 ) POSTERIOR DISTRIBUTION 𝑃(𝛉) PRIOR DISTRIBUTION 𝐿(𝛉|𝐃 ) LIKELIHOOD FUNCTION 𝑃(𝐃 ) EVIDENCE 𝑃(𝛉|𝐃 ) ∝ 𝑃(𝛉) 𝐿(𝛉|𝐃 ) METHODS: DATA ASSIMILATION Vrugt et al., WRR (2003) TIME [ days ] S T O R A G E 𝐒 ( m m ) F O R C IN G D A T A 𝐏 , 𝐄 0 (m m / d ) T R U E IN P U T 𝐏 , 𝐄 0 (m m / d ) STATE-SPACE FORMULATION 𝑆𝑡+1 = 𝑓 𝛉, 𝑆𝑡 , 𝐈 𝑡 + 𝑤𝑡+1 𝑆 𝑡+1 𝑆𝑡+1 a 𝑆𝑡+1 f NO DATA ASSIMILATION WITH DATA ASSIMILATION METHODS: MODEL AVERAGING MODEL AVERAGING model 1 model 2 TIME M O D E L O U T P U T× M O D E L O U T P U T × point prediction weighted forecast density 45 TIME model 3 46 LINEAR REGRESSION: THEORY AND CASE STUDY 47 CASE STUDY: MODEL LINEAR IN PARAMETERS TIME, t (s) A L T IT U D E A S L , d ( m ) 0 0 100 200 300 400 500 2 8 4 6 10 12 𝑑(𝑡) = 𝑎 + 𝑏𝑡 − 1 2 𝑐𝑡2 BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME UNIT ANALYSIS! ALWAYS CHECK HOW TO DETERMINE THE PARAMETER VALUES, 𝛉 = 𝑎, 𝑏, 𝑐 ? ASSUME MODEL: 𝑎: INITIAL ALTITUDE ( m ) 𝑏: INITIAL VELOCITY ( m/s ) 𝑐: GRAVITATIONAL ACCELARATION ( m/s2 ) 48 LINEAR: SYSTEM OF EQUATIONS 𝑑(𝑡) = 𝑎 + 𝑏𝑡 − 1 2 𝑐𝑡2 𝜃1 𝜃2 𝜃3 = 𝑑 1 𝑑 2 𝑑 3 𝑑 4 MODEL IN MATRIX NOTATION 𝐗𝛉 = 𝐝 SUM OF SQUARED RESIDUALS ⇒ 𝐹 𝛉 = 𝐝 − 𝐗𝛉 𝑇 𝐝 − 𝐗𝛉 1 𝑡 1 − 1 2 𝑡 1 2 1 𝑡 2 − 1 2 𝑡 2 2 1 1 𝑡 3 𝑡 4 − 1 2 𝑡 3 2 − 1 2 𝑡 4 2 𝑇 ⟶ TRANSPOSE 𝐹 𝛉 = ε𝑖(𝛉) 2 𝑛 𝑖=1 ( = INNER PRODUCT ) 𝑡 𝑑 1 113.9 3 285.2 6 453.4 10 520.3 EXPERIMENTAL DATA NUMERICAL MODEL 𝐗 𝛉 𝐝 = 𝛆(𝛉)𝑇𝛆(𝛉) BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME 49 NOW SOLVE SYSTEM OF EQUATIONS SUM OF SQUARED RESIDUALS 𝐹 𝛉 = 𝐝 − 𝐗𝛉 𝑇 𝐝 − 𝐗𝛉 𝐹 𝛉 = 𝐝 𝑇𝐝 − 2𝛉𝑇𝐗𝑇𝐝 + 𝐗𝛉 𝑇 𝐗𝛉 SET DERIVATIVE TO ZERO ( = MINIMUM ) 𝛻𝐹 𝛉 = −2𝐗𝑇𝐝 + 2𝐗𝑇𝐗𝛉 = 𝟎 ⇒ 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇𝐝 ANALYTIC SOLUTION FOR MODELS THAT ARE LINEAR IN PARAMETERS RESULTS IN LOWEST VALUE OF L2 NORM (= SQUARED) OF RESIDUALS 50 AND COVARIANCE MATRIX FOR UNCERTAINTY 𝐝 = 𝐗𝛍 + 𝛆 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇 𝐗𝛍 + 𝛆 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇𝐝 𝐂 𝛉 = 𝔼 𝛉 − 𝛍 𝛉 − 𝛍 𝑇 LETS NOW DERIVE COVARIANCE MATRIX AT OPTIMAL PARAMETER VALUES 𝐂 𝛉 = 𝔼 𝛆𝛆𝑇 𝐗𝑇𝐗 −1𝐗𝑇𝐗 𝐗𝑇𝐗 −1 𝐂 𝛉 = 𝜎2 𝐗𝑇𝐗 −1 𝐂 𝛉 = SSR 𝑛 − 𝑝 𝐗𝑇𝐗 −1 THE PARAMETER ESTIMATES, 𝛉, ARE AN UNBIASED ESTIMATE OF THE TRUE PARAMETER VALUES, SAY 𝛍 LETS MAKE COMMON LEAST SQUARES ASSUMPTIONS OF THE ERROR RESIDUALS, 𝛆( ZERO MEAN, UNCORRELATED AND CONSTANT VARIANCE ) 𝛉 = 𝛍 + 𝐗𝑇𝐗 −1𝐗𝑇𝛆 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇𝐗𝛍 + 𝐗𝑇𝐗 −1𝐗𝑇𝛆 𝐂 𝛉 = 𝔼 𝐗𝑇𝐗 −1𝐗𝑇𝛆 𝐗𝑇𝐗 −1𝐗𝑇𝛆 𝑇 𝜎2 VARIANCE OF RESIDUALS SSR SUM OF SQUARED RESIDUALS 𝑛 NUMBER OF DATA POINTS 𝑝 NUMBER OF PARAMETERS 𝐗𝑇𝐗 −1 𝑇 = 𝐗𝑇𝐗 −1 SYMMETRIC MATRIX! THUS 𝛉 − 𝛍 = 𝐗𝑇𝐗 −1𝐗𝑇𝛆 𝐗𝑇𝐗 −1𝐗𝑇𝐗 = 𝟏 51 CASE STUDY: MODEL LINEAR IN PARAMETERS TIME, t (s) A L T IT U D E A S L , d ( m ) 0 0 100 200 300 400 500 2 8 4 6 10 12 𝑑(𝑡) = 𝑎 + 𝑏𝑡 − 1 2 𝑐𝑡2 𝐝 = 𝑑 1 𝑑 2 𝑑 3 𝑑 4 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇𝐝 𝐗 = 𝜕𝑑1(𝑡) 𝜕𝜃1 𝜕𝑑1(𝑡) 𝜕𝜃2 𝜕𝑑1(𝑡) 𝜕𝜃3 ⋮ ⋮ ⋮ 𝜕𝑑𝑛(𝑡) 𝜕𝜃1 𝜕𝑑𝑛(𝑡) 𝜕𝜃2 𝜕𝑑𝑛(𝑡) 𝜕𝜃3 𝐗 = 1 𝑡 1 −0.5𝑡 1 2 1 𝑡 2 −0.5𝑡 2 2 1 1 𝑡 3 𝑡 4 −0.5𝑡 3 2 −0.5𝑡 4 2 SENSITIVITY MATRIX BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME 𝑡 𝑑 1 113.9 3 285.2 6 453.4 10 520.3 52 CASE STUDY: MODEL LINEAR IN PARAMETERS 𝑑(𝑡) = 𝑎 + 𝑏𝑡 − 1 2 𝑐𝑡2 𝐗 = 1 1 −0.5 1 3 −4.5 1 1 6 10 −18 −50 𝐝 = 113.9 285.2 453.4 520.3 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇𝐝 ⇒ MATLAB 𝛉 = 12.08, 107.90, 11.42 𝑎: INITIAL ALTITUDE 12.08 ( m ) 𝑏: INITIAL VELOCITY 107.90 ( m/s ) 𝑐: GRAVITATIONAL ACCELARATION 11.42 ( m/s2 ) BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME 𝑡 𝑑 1 113.9 3 285.2 6 453.4 10 520.3 53 CASE STUDY: MODEL LINEAR IN PARAMETERS TIME, t (s) A L T IT U D E A S L , d ( m ) 0 0 100 200 300 400 500 2 8 4 6 10 12 𝑑(𝑡) = 12.08 + 107.90𝑡 − 1 2 11.42𝑡2 BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME 54 CASE STUDY: MODEL LINEAR IN PARAMETERS TIME, t (s) A L T IT U D E A S L , d ( m ) 0 0 100 200 300 400 500 2 8 4 6 10 12 𝑑(𝑡) = 12.08 + 107.90𝑡 − 1 2 11.42𝑡2 14 16 18 20 CALIBRATION ( = IN SAMPLE) PREDICTION ( = OUT OF SAMPLE ) MODEL GIVES WRONG PREDICTIONS? WHY? BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME 55 CASE STUDY: MODEL LINEAR IN PARAMETERS TIME, t (s) A L T IT U D E A S L , d ( m ) 0 0 100 200 300 400 500 2 8 4 6 10 12 𝑑(𝑡) = 12.08 + 107.90𝑡 − 1 2 11.42𝑡2 14 16 18 20 calibration (in-sample) prediction (out-of-sample) ANSWER ( WHY PREDICTIONS ARE NOT ACCURATE ) 1. MEASUREMENT DATA ERRORS AFFECT PARAMETER ESTIMATES, PARTICULARLY 2. BECAUSE ONLY A FEW DATA POINTS ARE USED FOR MODEL CALIBRATION DEGREES OF FREEDOM df = 𝑛 − 𝑝 = 1 VERY LOW 3. THOSE CALIBRATION DATA POINTS ONLY IN EARLY PERIOD WITH RISING BULLET ( NOT ALL PARAMETERS ARE THEN SENSITIVE TO MODEL OUTPUT ) 4. WE ALWAYS NEED TO LOOK BEYOND PREDICTION DETERMINISTIC FORECAST ( UNCERTAINTY QUANTIFICATION PARAMETERS + MODEL OUTPUT ) 𝐗 = 𝜕𝑑1(𝑡) 𝜕𝜃1 𝜕𝑑1(𝑡) 𝜕𝜃2 𝜕𝑑1(𝑡) 𝜕𝜃3 ⋮ ⋮ ⋮ 𝜕𝑑𝑛(𝑡) 𝜕𝜃1 𝜕𝑑𝑛(𝑡) 𝜕𝜃2 𝜕𝑑𝑛(𝑡) 𝜕𝜃3 = 1 𝑡 1 −0.5𝑡 1 2 1 𝑡 2 −0.5𝑡 2 2 1 1 𝑡 3 𝑡 4 −0.5𝑡 3 2 −0.5𝑡 4 2 = 𝜕𝐝(𝑡) 𝜕𝜃3 = 𝜕𝐝(𝑡) 𝜕𝜃2 BULLET FIRED FROM A GUN: HEIGHT OF BULLET VERSUS TIME MODEL GIVES WRONG PREDICTIONS? WHY? 56 CASE STUDY: PARAMETER UNCERTAINTY 𝑛 = 4 ( NUMBER OF DATA POINTS ) 𝑝 = 3 ( NUMBER OF PARAMETERS ) SSR = 276.96 ( SUM OF SQUARED RESIDUALS ) 𝐂 𝛉 = SSR 𝑛 − 𝑝 𝐗𝑇𝐗 −1 ⇒ 𝐂 𝛉 = 276.96 4 − 3 1 1 11 1 3 610 −0.5 −4.5 −18 − 50 1 1 −0.5 1 3 −4.5 1 1 6 10 −18 −50 −1 𝐂 𝛉 = 559.77 −226.25 −35.1 −226.25 119.17 20.25 −35.11 20.25 3.63 CL95% = 𝛉 ± 𝑡𝑛−𝑝,0.975 × diag 𝐂 𝛉 𝑡1,0.975 = 12.71 ⇒ 𝑑(𝑡) = (12.08 ± 300.62) + (107.90 ± 138.71)𝑡 − 1 2 (11.42 ± 24.21)𝑡2 variance covariance 𝝈 = 𝜎1, … , 𝜎𝑝 𝑡1000,0.975 = 1.9623 𝑡1000,0.84𝟎 = 0.9950 57 L2-NORM SENSITIVE TO OUTLIERS A L T IT U D E A S L , d ( m ) 0 0 100 200 300 400 500 2 8 4 6 10 TIME, t (s) 𝑳𝟐 - NORM ( IN SAMPLE ) 𝑳𝟏 - NORM ( IN SAMPLE ) 𝑳𝟐 −NORM VERY SENSITIVE TO OUTLIERS ( AFFECTS PARAMETER ESTIMATES ) VALID/INVALID BASIS FUNCTIONS 58 𝐗 = 𝜕𝑑1(𝑡) 𝜕𝜃1 ⋯ 𝜕𝑑1(𝑡) 𝜕𝜃𝑝 ⋮ ⋮ ⋮ 𝜕𝑑𝑛(𝑡) 𝜕𝜃1 ⋯ 𝜕𝑑𝑛(𝑡) 𝜕𝜃𝑝 1st BASIS FUNCTION 𝒑th BASIS FUNCTION 𝜕𝐝(𝑡) 𝜕𝜃1 = 𝑓(𝑡) 𝜕𝐝(𝑡) 𝜕𝜃1 = 𝑓(𝑡, 𝛉) VALID BASIS FUNCTION ( SENSITIVITY DEPENDENT ONLY ON MODEL INPUT ) INVALID BASIS FUNCTION ( DEPENDENT ON ONE OR MORE OTHER PARAMETERS ) IF ONE INVALID BASIS FUNCTION OF 𝐗 NO DIRECT SOLUTION FOR PARAMETERS, 𝛉 VALID/INVALID BASIS FUNCTIONS 𝑦 = 𝑑 = 𝑓 𝑥, 𝑎, 𝑏 = 𝑎𝑥 + 𝑏 VALID (DIRECT SOLUTION) 𝑦 = 𝑑 = 𝑓 𝑥, 𝑎, 𝑏, 𝑐 = 𝑎𝑥2 + 𝑏𝑥 + 𝑐 VALID (DIRECT SOLUTION) 𝑦 = 𝑑 = 𝑓 𝑥, 𝑎, 𝑏, 𝑐 = 𝑎𝑥3 + 𝑏exp(−𝑥) + 𝑐 𝑦 = 𝑑 = 𝑓 𝑥, 𝑎, 𝑏, 𝑐 = 𝑎sin(𝑥3) − 𝑏𝑥 + 𝑐 𝑦 = 𝑑 = 𝑓 𝑥, 𝑎, 𝑏, 𝑐, 𝑑 = 𝑎sin(𝑏𝑥3) − 𝑐cos(𝑥) + 𝑑 VALID (DIRECT SOLUTION)VALID (DIRECT SOLUTION) INVALID (ITERATIVE SOLUTION) 𝑦 = 𝑑 = 𝑓 𝑥, 𝑎, 𝑏 = 𝑎exp 𝑏𝑥 VALID (DIRECT SOLUTION) ⇒ log 𝑦 = log 𝑑 = log 𝑓 𝑥, 𝑎, 𝑏 = log 𝑎exp 𝑏𝑥 ⇒ log 𝑦 = log 𝑑 = log 𝑓 𝑥, 𝑎, 𝑏 = log 𝑎 + 𝑏𝑥 59 𝐗 = 𝜕𝑦1(𝑡) 𝜕𝑎 𝜕𝑦1(𝑡) 𝜕𝑏 𝜕𝑦1(𝑡) 𝜕𝑐 𝜕𝑦1(𝑡) 𝜕𝑑 ⋮ ⋮ ⋮⋮ 𝜕𝑦𝑛(𝑡) 𝜕𝑎 𝜕𝑦𝑛(𝑡) 𝜕𝑏 𝜕𝑦𝑛(𝑡) 𝜕𝑐 𝜕𝑦𝑛(𝑡) 𝜕𝑑 = sin(𝑏𝑥1 3) ? cos 𝑥1 1 ⋮ ⋮ ⋮⋮ sin(𝑏𝑥𝑛 3) ? cos(𝑥𝑛)1 60 INVALID BASIS FUNCTIONS: ITERATIVE SOLUTION IF WE CANNOT SET UP, 𝐗𝛉 = 𝐝 ( = SYSTEM OF EQUATIONS ) AN ITERATIVE SOLUTION IS NEEDED TO FIND SSR MINIMUM 61 NONLINEAR REGRESSION: NEWTON METHOD 62 INSPIRATION: ROOT FINDING (NEWTON METHOD) 𝑦 = 𝑓(𝑥, 𝑎, 𝑏) = 𝑎𝑥 + 𝑏 ASSUME MODEL: HOW TO FIND THE ROOT OF THIS FUNCTION? ⇒ 𝑥(𝑖+1) = 𝑥(𝑖) − 𝑓(𝑥(𝑖)) 𝑓′(𝑥(𝑖)) NUMERICAL SOLUTION NEWTON METHOD LINEAR FUNCTION: DIRECT ROOT NEWTON 0 𝑥 𝑥(0) 𝑥(R) 𝑓′(𝑥) 𝑓′(𝑥 0 ) ⇒ 𝑥(0) − 𝑥 R = 𝑦(0) − 0 𝑓′(𝑥(0)) ⇒ 𝑥(R) = 𝑥(0) − 𝑦(0) 𝑓′(𝑥(0)) ⇒ 𝑥(R)= 𝑥(0) − 𝑓(𝑥 0 ) 𝑓′(𝑥(0)) ROOT: 𝑥 R 𝑦 R = 0 NONLINEAR FUNCTION: ITERATE = ∆𝑦 ∆𝑥 = d𝑦 d𝑥 = (𝑦 0 − 𝑦 1 ) (𝑥(0)−𝑥 1 ) 𝑥(1) 𝑦(1) 𝑦 𝑦(0) DERIVATIVE TELLS DIRECTION TO GO 63 INSPIRATION: ROOT FINDING (NEWTON METHOD) 𝑦 = 𝑓 𝑥, 𝑎, 𝑏 = 1 100 𝑥3 − 1 ASSUME MODEL: -10 -8 -6 -4 -2 0 2 4 6 8 10 -15 -10 -5 0 5 10 𝑥(0) HOW TO FIND THE ROOT OF THIS FUNCTION? 𝑥(1) 𝑥(2) 𝑥(3) 𝑥(4) 𝑥(R) = 100 3 ≈ 4.64 𝑥(5) 𝑥, (-) 𝑦 , (- ) 𝑥(R) IN JUST A FEW ITERATIONS WE CAN APPROXIMATE THE ROOT, 𝒙(𝐑), FOR 𝒚 = 𝟎 𝑥(𝑖+1) = 𝑥(𝑖) − 𝑓(𝑥 𝑖 ) 𝑓′(𝑥(𝑖)) NEWTON METHOD 64 LOOKING FOR MINIMUM FUNCTION, NOT ROOT! 𝑥(𝑖+1) = 𝑥(𝑖) − 𝑓(𝑥 𝑖 ) 𝑓′(𝑥(𝑖)) NEWTON METHOD GIVES ROOT ⇒ 𝑥(𝑖+1) = 𝑥(𝑖) − 𝑓′(𝑥(𝑖)) 𝑓′′(𝑥(𝑖)) BUT SSR WILL SELDOM ATTAIN VALUE OF ZERO ( ONLY PERFECT FIT! ) WE HAVE TO LOOK WHEN DERIVATIVE OF SSR IS ZERO! 0 NEWTON METHOD 𝑥 𝑓 (𝑥 ) 𝜃 𝐹 𝜃 = SS R 0 ⇒ 𝜃(𝑖+1) = 𝜃(𝑖) − 𝐹′(𝜃(𝑖)) 𝐹′′(𝜃(𝑖)) THUS IF 𝒅 = 𝟏 65 NEWTON METHOD: NONLINEAR REGRESSION NEWTON METHOD (𝑝 = 1) 𝐹 𝛉 = 𝜀𝑖 𝛉 2 𝑛 𝑖=1 𝛁𝐹 𝛉 𝛉(0) CHOSEN AT RANDOM = 𝜕𝐹 𝛉 𝜕𝜃1 𝜕𝐹 𝛉 𝜕𝜃2 ⋮ 𝜕𝐹 𝛉 𝜕𝜃𝑝 𝛁2𝐹 𝛉 = 𝜕 𝜕𝛉 𝜕𝐹 𝛉 𝜕𝛉 = 𝜕2𝐹 𝛉 𝜕𝛉2 = 𝜕2𝐹 𝛉 𝜕𝜃1 2 𝜕2𝐹 𝛉 𝜕𝜃1𝜕𝜃2 ⋯ 𝜕2𝐹 𝛉 𝜕𝜃1𝜕𝜃𝑑 𝜕2𝐹 𝛉 𝜕𝜃2𝜕𝜃1 𝜕2𝐹 𝛉 𝜕𝜃2 2 ⋯ 𝜕2𝐹 𝛉 𝜕𝜃2𝜕𝜃𝑑 ⋮ 𝜕2𝐹 𝛉 𝜕𝜃𝑑𝜕𝜃1 ⋮ 𝜕2𝐹 𝛉 𝜕𝜃𝑑𝜕𝜃2 ⋮⋮ ⋯ 𝜕2𝐹 𝛉 𝜕𝜃𝑑 2 𝛉(𝑖+1) = 𝛉(𝑖) − 𝛁𝐹(𝛉(𝑖)) 𝛁2𝐹(𝛉(𝑖)) ⇒𝛉(𝑖+1) − 𝛉(𝑖) = − 𝛁𝐹(𝛉(𝑖)) 𝛁2𝐹(𝛉(𝑖)) ⇒∆𝛉(𝑖)= −𝛁 2𝐹(𝛉(𝑖)) −1𝛁𝐹(𝛉(𝑖)) ⇒∆𝛉(𝑖)= −𝐇(𝛉(𝑖)) −1𝐠(𝛉(𝑖)) HESSIAN MATRIX: (𝒑 × 𝒑 ) 𝜃(𝑖+1) = 𝜃(𝑖) − 𝐹′(𝜃(𝑖)) 𝐹′′(𝜃(𝑖)) = 𝛁 𝛁𝐹 𝛉 = 𝜕𝐹 𝛉 𝜕𝛉 66 NEWTON METHOD: NUMERICAL RECIPE ALGORITHM 1: NEWTON METHOD input 𝑓: ℝ𝑝 → ℝ𝑛 a function such that 𝐹 𝛉 = 𝑑 𝑖 − 𝑓𝑖(𝛉) 2𝑛 𝑖=1 where all the 𝑓𝑖 are twice differentiable functions from ℝ 𝑝 to ℝ𝑛 𝛉(0) an initial solution output 𝛉(∗) a local minimum of the SSR (cost) function, 𝐹(𝛉) begin 𝑖 ← 0 while notconverged and (𝑖 < 𝑖max) do 𝛉(𝑖+1) ← 𝛉(𝑖) + ∆𝛉(𝑖) with ∆𝛉(𝑖)= −𝐇 𝛉 𝑖 −1 𝐠 𝛉 𝑖 𝑖 ← 𝑖 + 1 end while return 𝛉(𝑖) end 67 NONLINEAR REGRESSION: ROSENBROCK CASE STUDY 68 ROSENBROCK FUNCTION: SIMPLE EXAMPLE 𝛼 = 10 𝑛 = 2 = 𝜕𝐹 𝛉 𝜕𝜃1 𝜕𝐹 𝛉 𝜕𝜃2 𝐠 𝛉 = 𝜕𝐹 𝛉 𝜕𝛉 = 2𝜃1 − 40𝜃1(𝜃2 − 𝜃1 2) − 2 20(𝜃2−𝜃1 2) 𝐇 𝛉 = 𝜕2𝐹 𝛉 𝜕𝛉2 = 𝜕 𝜕𝛉 𝐠 𝛉 = 𝜕2𝐹 𝛉 𝜕𝜃1 2 𝜕2𝐹 𝛉 𝜕𝜃1𝜕𝜃2 𝜕2𝐹 𝛉 𝜕𝜃2𝜕𝜃1 𝜕2𝐹 𝛉 𝜕𝜃2 2 = 120𝜃1 2 − 40𝜃2 + 2 −40𝜃1 −40𝜃1 20 = (1 − 𝜃1) 2+10(𝜃2 − 𝜃1 2)2 𝛆 𝛉 = 𝜀1 𝛉 𝜀2 𝛉 𝐹 𝛉 = 𝜀𝑖 𝛉 2 2 𝑖=1 = (1 − 𝜃1) 10(𝜃2 − 𝜃1 2) 𝑝 = 2 WE CAN DERIVE ANALYTICALLY THE GRADIENT VECTOR AND HESSIAN MATRIX 𝐹 𝑥, 𝑦 = (1 − 𝑥)2+𝛼(𝑦 − 𝑥2)2 ROSENBROCK FUNCTION 69 ROSENBROCK FUNCTION: NEWTON METHOD . 70 NONLINEAR REGRESSION: GAUSS NEWTON METHOD 71 NONLINEAR REGRESSION: JACOBIAN MATRIX 𝐉𝑓 𝛉 = 𝜕𝑑1(𝛉) 𝜕𝜃1 𝜕𝑑1(𝛉) 𝜕𝜃2 ⋯ 𝜕𝑑1(𝛉) 𝜕𝜃𝑝 𝜕𝑑2(𝛉) 𝜕𝜃1 𝜕𝑑2(𝛉) 𝜕𝜃2 ⋯ 𝜕𝑑2(𝛉) 𝜕𝜃𝑝 ⋮ 𝜕𝑑𝑛(𝛉) 𝜕𝜃1 ⋮ 𝜕𝑑𝑛(𝛉) 𝜕𝜃2 ⋮⋮ ⋯ 𝜕𝑑𝑛(𝛉) 𝜕𝜃𝑝 ( 𝑛 × 𝑝 – MATRIX ) 𝐉𝜀 𝛉 = 𝜕𝜀1(𝛉) 𝜕𝜃1 𝜕𝜀1(𝛉) 𝜕𝜃2 ⋯ 𝜕𝜀1(𝛉) 𝜕𝜃𝑝 𝜕𝜀2(𝛉) 𝜕𝜃1 𝜕𝜀2(𝛉) 𝜕𝜃2 ⋯ 𝜕𝜀2(𝛉) 𝜕𝜃𝑝 ⋮ 𝜕𝜀𝑛(𝛉) 𝜕𝜃1 ⋮ 𝜕𝜀𝑛(𝛉) 𝜕𝜃2 ⋮⋮ ⋯ 𝜕𝜀𝑛(𝛉) 𝜕𝜃𝑝 JACOBIAN OF MODEL OUTPUT ( 𝑛 × 𝑝 – MATRIX ) JACOBIAN OF ERROR RESIDUALS = 𝜕𝐝(𝛉) 𝜕𝛉 = 𝜕𝛆(𝛉) 𝜕𝛉 NEWTON METHOD ∆𝛉(𝑖)= −𝐇(𝛉(𝑖)) −1𝐠(𝛉(𝑖)) 72 NONLINEAR REGRESSION: JACOBIAN MATRIX 𝐠 𝛉 = 2𝐉𝜀(𝛉) 𝑇 𝐝 − 𝑓 𝛉 GRADIENT VECTOR CAN BE DERIVED FROM JACOBIAN MATRIX RESIDUALS CAN ALSO DERIVE HESSIAN FROM JACOBIAN MATRIX OF RESIDUALS 𝐇 𝛉 = 2𝐉𝜀(𝛉) 𝑇𝐉𝜀 𝛉 + 𝟐𝐐 = 2𝐉𝜀(𝛉) 𝑇𝛆 𝛉 𝐠𝑎,1 𝛉 = 𝜕𝐹 𝛉 𝜕𝜃𝑎 𝐉𝜀1:𝑛,𝑎 𝛉 = 𝜕ε(𝛉) 𝜕𝜃𝑎 UNIT ANALYSIS! ALWAYS CHECK 𝜕ε(𝛉) 𝜕𝜃𝑎 𝑇 𝜕ε(𝛉) 𝜕𝜃𝑏 𝐇𝑎,𝑏 𝛉 = 𝜕2𝐹 𝛉 𝜕𝜃𝑎𝜕𝜃𝑏 = 𝜕2𝐹 𝛉 𝜕𝜃𝑎𝜕𝜃𝑏 𝜕ε(𝛉) 𝜕𝜃𝑎 𝑇 𝛆 𝛉 = 𝜕𝐹 𝛉 𝜕𝜃𝑎 NEWTON METHOD ∆𝛉(𝑖)= −𝐇(𝛉(𝑖)) −1𝐠(𝛉(𝑖)) 73 𝐇 𝛉 = 2𝐉𝜀(𝛉) 𝑇𝐉𝜀 𝛉 + 𝟐𝐐 NONLINEAR REGRESSION: GAUSS–NEWTON ∆𝛉(𝑖)= −𝐇(𝛉(𝑖)) −1𝐠(𝛉(𝑖)) ⇒ ∆𝛉(𝑖)= − 2𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 2𝐉𝜀(𝛉(𝑖)) 𝑇 𝐝 − 𝑓(𝛉 𝑖 ) ⇒∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) GAUSS-NEWTON METHOD 𝛆 𝛉 = 𝐝 − 𝑓 𝛉 GAUSS 𝐠 𝛉 = 2𝐉𝜀(𝛉) 𝑇 𝐝 − 𝑓 𝛉 ⇒ 𝛉(𝑖+1)= 𝛉(𝑖) − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) NEWTON METHOD ∆𝛉(𝑖)= −𝐇(𝛉(𝑖)) −1𝐠(𝛉(𝑖)) ONE CAN ALSO WRITE 74 NONLINEAR REGRESSION: GAUSS–NEWTON 𝛉 = 𝐗𝑇𝐗 −1𝐗𝑇𝐝 LINEAR REGRESSION GREAT SIMILARITY BETWEEN DIRECT AND INDIRECT SOLUTION 𝐗 SENSITIVITY MATRIX OF MODEL OUTPUT TO PARAMETERS 𝐉𝜀(𝛉(𝑖)) SENSITIVITY MATRIX OF RESIDUALS TO PARAMETERS NONLINEAR REGRESSION JACOBIAN DEPENDS ON ACTUAL PARAMETER VALUES UNIT ANALYSIS! ALWAYS CHECK ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) GAUSS-NEWTON METHOD 75 NONLINEAR REGRESSION: GAUSS–NEWTON WHY IS THIS THE SAME? ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) GAUSS-NEWTON METHOD ∆𝛉(𝑖)= 𝐉𝑓(𝛉(𝑖)) 𝑇𝐉𝑓(𝛉(𝑖)) −1 𝐉𝑓(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) GAUSS-NEWTON METHOD 𝜺 𝛉 = 𝐝 − 𝑓(𝛉) ⇒ 𝐉𝜀(𝛉) = −𝐉𝑓(𝛉) ANSWER 76 GAUSS-NEWTON: NUMERICAL RECIPE ALGORITHM 2: GAUSS-NEWTON METHOD input 𝑓: ℝ𝑝 → ℝ𝑛 a function such that 𝐹 𝛉 = 𝑑 𝑖 − 𝑓𝑖(𝛉) 2𝑛 𝑖=1where all the 𝑓𝑖 are differentiable functions from ℝ 𝑝 to ℝ𝑛 𝛉(0) an initial solution output 𝛉(∗) a local minimum of the SSR (cost) function, 𝐹(𝛉) begin 𝑖 ← 0 while notconverged and (𝑖 < 𝑖max) do 𝛉(𝑖+1) ← 𝛉(𝑖) + ∆𝛉(𝑖) with ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) 𝑖 ← 𝑖 + 1 end while return 𝛉(𝑖) end 77 NONLINEAR REGRESSION: ROSENBROCK CASE STUDY 78 ROSENBROCK FUNCTION: SIMPLE EXAMPLE 𝐉𝜀 𝛉 = 𝜕𝛆(𝛉) 𝜕𝛉 = −1 0 −2 10𝜃1 10 = 𝜕𝜀1(𝛉) 𝜕𝜃1 𝜕𝜀1(𝛉) 𝜕𝜃2 𝜕𝜀2(𝛉) 𝜕𝜃1 𝜕𝜀2(𝛉) 𝜕𝜃2 𝐉𝑓 𝛉 = 𝜕𝑓(𝛉) 𝜕𝛉 = −𝐉𝜀 𝛉 = 1 0 2 10𝜃1 − 10 WE CAN DERIVE ANALYTICALLY THE JACOBIAN MATRICES 𝐹 𝑥, 𝑦 = (1 − 𝑥)2+𝛼(𝑦 − 𝑥2)2 ROSENBROCK FUNCTION = (1 − 𝜃1) 2+10(𝜃2 − 𝜃1 2)2 𝛆 𝛉 = 𝜀1 𝛉 𝜀2 𝛉 𝐹 𝛉 = 𝜀𝑖 𝛉 2 2 𝑖=1 = (1 − 𝜃1) 10(𝜃2 − 𝜃1 2) 𝛼 = 10 𝑛 = 2 𝑝 = 2 79 ROSENBROCK FUNCTION: GAUSS-NEWTON . 80 NONLINEAR REGRESSION: GRADIENT DESCENT 81 SIMPLEX METHOD: DIRECTION FROM m POINTS 𝜃1 𝜃 2 MINIMUM LETS LOOK AT CONTOUR PLOT COST FUNCTION ∆𝛉(𝑖)= −𝛼𝐠(𝛉 𝑖 ) GRADIENT-DESCENT 𝛼(𝑖) = arg min 𝛼∈ℝ+ 𝐹 𝛉(𝑖) − 𝛼𝐠(𝛉(𝑖)) WE CAN ALSO USE STEEPEST GRADIENT AT 𝛉 𝑖 DERIVED FROM ( 𝑝 × 1 ) – VECTOR 𝐠(𝛉 𝑖 ) AND OPTIMIZE STEP SIZE, 𝛼𝐠(𝛉 𝑖 ), USING A LINE SEARCH METHOD 𝛉(0) 𝛉(1) 𝛉(2) 𝛉(3) ∟ STEEPEST GRADIENT ORTHOGONAL VECTOR 82 GRADIENT-DESCENT: NUMERICAL RECIPE ALGORITHM 3: GRADIENT-DESCENT METHOD input 𝑓: ℝ𝑝 → ℝ𝑛 a function such that 𝐹 𝛉 = 𝑑 𝑖 − 𝑓𝑖(𝛉) 2𝑛 𝑖=1 where all the 𝑓𝑖 are differentiable functions from ℝ 𝑝 to ℝ𝑛 𝛉(0) an initial solution output 𝛉(∗) a local minimum of the SSR (cost) function, 𝐹(𝛉) begin 𝑖 ← 0 while notconverged and (𝑖 < 𝑖max) do 𝛉(𝑖+1) ← 𝛉(𝑖) − 𝛼(𝑖)𝐠(𝛉(𝑖)) with 𝛼(𝑖) = arg min 𝛼∈ℝ+ 𝐹 𝛉(𝑖) − 𝛼𝐠(𝛉(𝑖)) 𝑖 ← 𝑖 + 1 end while return 𝛉(𝑖) end 83 NONLINEAR REGRESSION: ROSENBROCK CASE STUDY 84 ROSENBROCK FUNCTION: GRADIENT DESCENT . 85 MODIFIED GAUSS NEWTON 86 NONLINEAR REGRESSION: MODIFIED GAUSS-NEWTON ⇒𝛉 𝑖+1 = 𝛉 𝑖 + 𝛿 𝑖 ∆(𝛉(𝑖)) MODIFIED GAUSS-NEWTON METHOD THE SSR MIGHT NOT DECREASE AT EVERY ITERATION WITH GAUSS-NEWTON BECAUSE THE STEP SIZE ∆𝛉(𝒊)MAY BE TOO LARGE ( OR TOO SMALL ) A MULTIPLIER, 𝜹(𝒊) ∈ [0,1] CAN BE USED TO ADAPT THE UPDATE AS FOLLOWS A GOOD VALUE FOR 𝜹(𝒊) CAN BE DERIVED IN EACH ITERATION USING LINE SEARCH 𝛿(𝑖) = arg min 𝛿∈ℝ+ 𝐹 𝛉(𝑖) + 𝛿 𝑖 ∆(𝛉(𝑖)) 2 GAUSS-NEWTON METHOD ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) 87 MODIFIED GAUSS NEWTON: NUMERICAL RECIPE ALGORITHM 4: MODIFIED GAUSS NEWTON METHOD input 𝑓: ℝ𝑝 → ℝ𝑛 a function such that 𝐹 𝛉 = 𝑑 𝑖 − 𝑓𝑖(𝛉) 2𝑛 𝑖=1 where all the 𝑓𝑖 are differentiable functions from ℝ 𝑝 to ℝ𝑛 𝛉(0) an initial solution output 𝛉(∗) a local minimum of the SSR (cost) function, 𝐹(𝛉) begin 𝑖 ← 0 while notconverged and (𝑖 < 𝑖max) do 𝛉(𝑖+1) ← 𝛉(𝑖) + 𝛿(𝑖) with 𝛿(𝑖) = arg min 𝛿∈ℝ+ 𝐹 𝛉(𝑖) + 𝐠(𝛉(𝑖))𝛿 2 𝑖 ← 𝑖 + 1 end while return 𝛉(𝑖) end 88 NONLINEAR REGRESSION: ROSENBROCK CASE STUDY 89 ROSENBROCK FUNCTION: MODIFIED GAUSS NEWTON . 90 LEVENBERG-MARQUARDT ALGORITHM 91 NONLINEAR REGRESSION: LEVENBERG MARQUARDT STEEPEST DESCENT LEVENBERG CREATED A DAMPED VERSION OF GAUSS-NEWTON AS FOLLOWS ∆𝛉 𝑖+1 = − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) + λ𝐈 −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) WHERE THE VARIABLE 𝝀IS ALSO REFERRED TO AS A DAMPING FACTOR IF 𝝀 IS LARGE, SAY, 𝝀 = 𝟏𝟎𝟒 ∆𝛉 𝑖+1 ≅ − 1 λ 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) ≅− 1 λ 𝐠(𝛉 𝑖 ) GAUSS-NEWTON IF 𝝀 IS SMALL, SAY, 𝝀 = 𝟏 ∆𝛉 𝑖+1 ≅ − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) MARQUARDT HAS ADAPTED THE LEVENBERG VARIANT TO YIELD ∆𝛉 𝑖+1 = − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) + λdiag 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) 92 GAUSS-NEWTON: NUMERICAL RECIPE ALGORITHM 5: LEVENBERG-MARQUARDT METHOD input 𝑓: ℝ𝑝 → ℝ𝑛 a function such that 𝐹 𝛉 = 𝑑 𝑖 − 𝑓𝑖(𝛉) 2𝑛 𝑖=1 where all the 𝑓𝑖 are differentiable functions from ℝ 𝑝 to ℝ𝑛 𝛉(0) an initial solution output 𝛉(∗) a local minimum of the SSR (cost) function, 𝐹(𝛉) begin 𝑖 ← 0 𝜆 ← max diag 𝐉𝑓(𝛉 0 ) 𝑇𝐉𝑓(𝛉 0 ) while notconverged and (𝑖 < 𝑖max) do Compute ∆𝛉(𝑖) = − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) + λdiag 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) if 𝐹 𝛉 𝑖 + ∆𝛉(𝑖) < 𝐹 𝛉(𝑖) then 𝛉(𝑖+1) ← 𝛉(𝑖) + ∆𝛉(𝑖) 𝜆 ← 𝜆/𝜈 else 𝜆 ← 𝜈𝜆 end if 𝑖 ← 𝑖 + 1 end while return 𝛉(𝑖) end 93 NONLINEAR REGRESSION: ROSENBROCK CASE STUDY ROSENBROCK FUNCTION: LEVENBERG-MARQUARDT 94 . 95 NELDER-MEAD (DOWNHILL) SIMPLEX METHOD SIMPLEX METHOD: DIRECTION FROM m POINTS 𝛉(1) 𝛉(3) 𝛉 2 𝛉(0) 𝜃1 𝜃 2 SIMPLEX METHOD DOES NOT USE THE GRADIENT VECTOR, NOR THE JACOBIAN AND HESSIAN MATRICES BUT INSTEAD DETERMINES THE SEARCH DIRECTION FROM THE VALUE OF THE COST FUNCTION, 𝐹(𝛉) ONLY AT 𝑚 = 𝑝 + 1 POINTS STARTING POINTS RAY VERTIX ( POINT ) SIMPLEX MINIMUM 1. SORT POINTS OF SIMPLEX IN ASCENDING ORDER, SO THAT 𝐹(𝛉 1 ) ≤ 𝐹(𝛉 2 ) ≤ 𝐹(𝛉 𝑚 ) LETS LOOK AT CONTOUR PLOT OF THE COST FUNCTION ( DARKER LARGER ) 96 2. COMPUTE CENTROID SIMPLEX, 𝛉(0) BUT WITHOUT 𝛉(3)( WORST POINT ) WHAT TO DO NEXT? NOTE: SIMPLEX METHOD EASY TO VISUALIZE FOR MODEL WITH 𝒑 = 𝟐 PARAMETERS ( BUT APPROACH IS SIMILAR FOR 𝒑 > 𝟐 ) 97 SIMPLEX METHOD: SUBSEQUENT STEPS 𝛉(2) 𝛉(1) 𝛉 R (1) REFLECTION STEP 𝛉(2) 𝛉(1) 𝛉 E (2) EXPANSION STEP 𝛉(3) 𝛉(2) 𝛉 C 𝛉(0) (3) CONTRACTION STEP 𝛉(0) 𝛉(0) (4) REDUCTION STEP 𝛉(2) 𝛉(1) 𝛉 3 𝛉 2 𝛉(R) = 𝛉(0) + 𝛼(𝛉 0 − 𝛉 3 ) 𝛉(E) = 𝛉(0) + 𝛾(𝛉 0 − 𝛉 3 ) 𝛉(C) = 𝛉(0) + 𝜌(𝛉 0 − 𝛉 3 ) 𝛉(𝑖) = 𝛉(1) + 𝜎(𝛉 𝑖 − 𝛉 1 ) 𝛼 = 1 𝛾 = 2 𝜌 = 0.5 𝜎 = 0.5 000000000000000000 98 -1.0 -0.5 0.0 0.5 1.0 -0.5 0.0 0.5 1.0 1.5 SIMPLEX METHOD (EXPLAINED IN STEPS) 𝛉(1) 𝛉(0) 𝛉(R) = 𝛉(0) + 𝛼 𝛉(0) − 𝛉(𝑚) 𝛉(2) 𝛉(3) WE START WITH AN INITIAL SIMPLEX (𝑚 = 𝑃 + 1 POINTS DRAWN RANDOMLY) AND CALCULATE FOR EACH POINT THE COST FUNCTION, 𝐹 𝛉 FOLLOWED BY RANKING 𝛉(R) θ(𝑚) HIGHEST POINT θ(1) LOWEST POINT RECIPE FOR IMPROVEMENT (1) REFLECTION STEP IF 𝐹(𝛉 1 ) ≤ 𝐹(𝛉 R ) < 𝐹(𝛉 𝑚−1 ) THEN 𝛉(𝑚) = 𝛉(R) AND GO BACK TO (1) OTHERWISE DO EXPANSION STEP EXPANSION STEP 𝛼 = 1 𝒎 = 𝟑 ( PRESENT CASE ) FIRST CALCULATE CENTROID Of 𝑚 − 1 BEST SOLUTIONS 99 -1.0 -0.5 0.0 0.5 1.0 -0.5 0.0 0.5 1.0 1.5 DOWNHILL SIMPLEX METHOD (NELDER-MEAD) 𝛉(E) = 𝛉(0) + 𝛾 𝛉(0) − 𝛉(𝑚) IF 𝐹(𝛉 R ) < 𝐹(𝛉 1 ) THEN PROCEED WITH EXPANSION STEP θ(𝑚) HIGHEST POINT θ(1) LOWEST POINT RECIPE FOR IMPROVEMENT (2) EXPANSION STEP IF 𝐹(𝛉(E)) < 𝐹(𝛉 R ) THEN SET θ(𝑚) = θ(E) OTHERWISE θ(𝑚) = θ(R) GO BACK TO STEP (1) θ(𝑚) = θ(R) AND BACK TO (1) 𝛾 = 2 𝒎 = 𝟑 ( PRESENT CASE )𝛉(E) 𝛉(1) 𝛉(0) 𝛉(2) 𝛉(3) -1.0 -0.5 0.0 0.5 1.0 -0.5 0.0 0.5 1.0 1.5 100 DOWNHILL SIMPLEX METHOD (NELDER-MEAD) 𝛉(3) 𝛉(1) THUS PREVIOUS STEP HAS TOLD US THAT θ(𝑚) = θ(R). LETS DO THIS AND REDEFINE THE SIMPLEX WITH NEW CENTROID AND RANK OF EACH POINT θ(𝑚) HIGHEST POINT θ(1) LOWEST POINT RECIPE FOR IMPROVEMENT 𝛉(2) 𝒎 = 𝟑 ( PRESENT CASE ) 𝛉(1) 𝛉(0) 𝛉(2) 𝛉(3) 𝛉(R) 𝛉(0) 101 -1.0 -0.5 0.0 0.5 1.0 -0.5 0.0 0.5 1.0 1.5 DOWNHILL SIMPLEX METHOD (NELDER-MEAD) NOW LETS PROCEED AGAIN WITH A REFLECTION STEP θ(𝑚) HIGHEST POINT θ(1) LOWEST POINT RECIPE FOR IMPROVEMENT 𝛉(𝑚) = 𝛉(R) AND BACK TO (1) 𝒎 = 𝟑 ( PRESENT CASE ) 𝛉(1) 𝛉(2) 𝛉(0) 𝛉(3) 𝛉(R) = 𝛉(0) + 𝛼 𝛉(0) − 𝛉(𝑚) (1) REFLECTION STEP IF 𝐹(𝛉 1 ) ≤ 𝐹(𝛉 R ) < 𝐹(𝛉 𝑚−1 ) THEN 𝛉(𝑚) = 𝛉(R) AND GO BACK TO (1) OTHERWISE DO EXPANSION STEP 𝛼 = 1 𝛉(R) 102 -1.0 -0.5 0.0 0.5 1.0 -0.5 0.0 0.5 1.0 1.5 DOWNHILL SIMPLEX METHOD (NELDER-MEAD) θ(C) = θ(0) + 𝜌 θ(0) − θ(𝑚) LETS GO BACK TO FIRST REFLECTION STEP, AND IMAGINE THAT IT TURNED OUT THAT 𝐹 𝛉 R > 𝐹(𝛉 m−1 ). IN THAT CASE WE HAVE TO DO A CONTRACTION STEP θ(𝑚) HIGHEST POINT θ(1) LOWEST POINT RECIPE FOR IMPROVEMENT (3) CONTRACTION STEP IF 𝐹(𝛉(C)) <𝐹(𝛉(m)) THEN SET 𝛉(𝑚) = θ(C) AND GO BACK TO (1) OTHERWISE DO A REDUCTION STEP SET 𝛉(𝑚) = 𝛉(𝐂)AND TO STEP (1) 𝜌 = 1/2 𝒎 = 𝟑 ( PRESENT CASE ) 𝛉(E) 𝛉(1) 𝛉(0) 𝛉(2) 𝛉(3) 𝛉(C) 103 -1.0 -0.5 0.0 0.5 1.0 -0.5 0.0 0.5 1.0 1.5 DOWNHILL SIMPLEX METHOD (NELDER-MEAD) θ(𝑖) = θ(1) + 𝜎 θ(𝑖) − θ(1) ASSUME THAT PREVIOUS CONTRACTION STEP WOULD BE FOLLOWED BY A REDUCTION STEP INSTEAD. LETS LOOK AT THIS LAST STEP NOW θ(𝑚) HIGHEST POINT θ(1) LOWEST POINT RECIPE FOR IMPROVEMENT (4) REDUCTION STEP THEN GO BACK TO STEP (1) BACK TO STEP (1) FOR ALL 𝑖 = 2, … , 𝑝 ( THUS NOT BEST POINT ) 𝜎 = 1/2 𝒎 = 𝟑 ( PRESENT CASE ) 𝛉(1) 𝛉(0) 𝛉(2) 𝛉(3) 𝛉(C) 𝛉(2) 𝛉(3) 𝛉(0) 𝛉(1) 104 SIMPLEX METHOD: NUMERICAL RECIPE ALGORITHM 6: SIMPLEX METHOD input The cost function 𝐹: ℝ𝑛 → ℝ An initial simplex consisting of 𝑚 vertices made up of points, 𝛉(1), … , 𝛉(𝑚) output 𝛉(1) a local minimum of the SSR (cost) function, 𝐹(𝛉) begin 𝑖 ← 0 and 𝑚 ← 𝑝 + 1 while not converged and (𝑖 < 𝑖max) do Order 𝑚 points according to the cost function values, 𝐹(𝛉 1 ) ≤ 𝐹(𝛉 2 ) ≤ 𝐹(𝛉 𝑚 ) Compute 𝛉(0) the centroid (mean) of all points except 𝛉(𝑚) Compute reflection step 𝛉(R) ← 𝛉(0) + 𝛼 𝛉(0) − 𝛉(𝑚) where 𝛼 > 0 is the reflection coefficient if 𝐹(𝛉 1 ) ≤ 𝐹(𝛉 R ) < 𝐹(𝛉 𝑚−1 ) 𝛉(𝑚) ← 𝛉(R) else if 𝐹(𝛉 R ) ≤ 𝐹(𝛉 1 ) Compute expansion step 𝛉(E) ← 𝛉(0) + 𝛾 𝛉(0) − 𝛉(𝑚) where 𝛾 > 1 is the expansion coefficient if 𝐹(𝛉 E ) < 𝐹(𝛉 R ) then 𝛉(𝑚) ← 𝛉(E) else 𝛉(𝑚) ← 𝛉(R) end if else ( at this point it is certain that 𝐹(𝛉 R ) ≥ 𝐹(𝛉 𝑚−1 ) ) Compute contraction step 𝛉(C) ← 𝛉(0) + 𝛽 𝛉(0) − 𝛉(𝑚) where 0 < 𝜌 < 0.5 is the contraction coefficient if 𝐹(𝛉 C ) < 𝐹(𝛉 𝑚 ) then 𝛉(𝑚) ← 𝛉(C) else Compute reduction step 𝛉(𝑖) ← θ(1) + 𝜎 θ(𝑖) − θ(1) , ∀𝑖 ∈ [2, 𝑚] where 0 < 𝜎 < 1 is the reduction coefficient end if end if 𝑖 ← 𝑖 + 1 end while return 𝛉(1) end 105 NONLINEAR REGRESSION: ROSENBROCK CASE STUDY ROSENBROCK FUNCTION: SIMPLEX METHOD 106 . 107 NONLINEAR REGRESSION: SIMPLEX METHOD 108 GAUSS NEWTON: BIOLOGICAL CASE STUDY 109 CASE STUDY: NONLINEAR REGRESSION LETS USE GAUSS NEWTON! INVALID BASIS FUNCTIONS THUS WE HAVE TO USE IERATIVE SOLVER IN A BIOLOGY EXPERIMENT THE RELATION WAS MEASURED BETWEEN SUBSTRATE CONCENTRATION, 𝑆 AND REACTION RATE, R, ( ENZYME-MEDIATED REACTION) EXP 1 2 3 4 5 6 7 𝐒 0.038 0.194 0.425 0.626 1.253 2.500 3.740 𝐑 0.050 0.127 0.094 0.212 0.273 0.267 0.331 𝑅(𝑆, 𝑎, 𝑏) = 𝑎𝑆 𝑏 + 𝑆 MODEL FORMULATION NOW LETS PLOTS DATA WE NEED JACOBIAN OF RESIDUALS 1.0 2.0 3.0 4.0 0 CONCENTRATION, 𝑆 0.1 0.2 0.3 0 R E A C T IO N R A T E , 𝑅 110 CASE STUDY: NONLINEAR REGRESSION IN A BIOLOGY EXPERIMENT THE RELATION WAS MEASURED BETWEEN SUBSTRATE CONCENTRATION, 𝑆 AND REACTION RATE, R, ( ENZYME-MEDIATED REACTION) EXP 1 2 3 4 5 6 7 𝐒 0.038 0.194 0.425 0.626 1.253 2.500 3.740 𝐑 0.050 0.127 0.094 0.212 0.273 0.267 0.331 𝐉𝜀 𝛉 = 𝜕𝜀1 𝛉 𝜕𝜃1 𝜕𝜀1 𝛉 𝜕𝜃2 ⋮ ⋮ 𝜕𝜀𝑛 𝛉 𝜕𝜃1 𝜕𝜀𝑛 𝛉 𝜕𝜃2 = − 𝑆 1 𝜃1 + 𝑆 1 𝜃1𝑆 1 𝜃2 + 𝑆 1 2 ⋮ ⋮ − 𝑆 𝑛 𝜃1 + 𝑆 𝑛 𝜃1𝑆 𝑛 𝜃2 + 𝑆 𝑛 2 𝛆 𝛉 = 𝐝 − 𝑓 𝛉 = 𝐝 − 𝜃1𝐒 𝜃2 + 𝐒 JACOBIAN OF RESIDUALS: ANALYTICALLY 𝑅(𝑆, 𝑎, 𝑏) = 𝑎𝑆 𝑏 + 𝑆 MODEL FORMULATION 1.0 2.0 3.0 4.0 0 CONCENTRATION, 𝑆 0.1 0.2 0.3 0 R E A C T IO N R A T E , 𝑅 111 CASE STUDY: NONLINEAR REGRESSION GAUSS-NEWTON METHOD 𝛉(0) = 0.9, 0.2 𝐉𝜀 𝛉(0) = −0.04050.6038 −0.17731.1247 −0.32080.9792 −0.41020.8258 −0.58200.5341 −0.73530.3086 −0.80600.2168 𝑓 𝛉(0) = 0.1437 0.4431 0.6120 0.6821 0.7761 0.8333 0.8543 ∆𝛉(0)= −0.6506, 0.2246 𝛉(1) = 0.2494, 0.4246 ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) EXP 1 2 3 4 5 6 7 𝐒 0.038 0.194 0.425 0.626 1.253 2.500 3.740 𝐑 0.050 0.127 0.094 0.212 0.273 0.267 0.331 STARTING VALUES PARAMETERS JACOBIAN MATRIX RESIDUALS FUNCTION VALUES GAUSS-NEWTON 𝑅(𝑆, 𝑎, 𝑏) = 𝑎𝑆 𝑏 + 𝑆 MODEL FORMULATION 112 CASE STUDY: NONLINEAR REGRESSION 𝐉𝜀 𝛉(1) = −0.13220.0443 −0.43750.1264 −0.63020.1469 −0.71510.1415 −0.83400.1110 −0.90930.0729 −0.93750.0538 𝑓 𝛉(1) = 0.0205 0.0782 0.1248 0.1486 0.1863 0.2132 0.2240 ∆𝛉(1)= 0.1110, 0.2533 𝛉(2) = 0.3604, 0.6779 𝐉𝜀 𝛉(2) = −0.09540.0267 −0.34990.0920 −0.54110.1259 −0.63460.1327 −0.77660.1211 −0.87400.0892 −0.91210.0691 𝑓 𝛉(2) = 0.0191 0.0802 0.1389 0.1730 0.2339 0.2835 0.3051 ∆𝛉(2)= 0.0021, −0.1263 𝛉(3) = 0.3625, 0.5515 𝐉𝜀 𝛉(3) = −0.09490.0396 −0.34860.1265 −0.53970.1615 −0.63330.1636 −0.77560.1395 −0.87340.0973 −0.91160.0736 𝑓 𝛉(3) = 0.0234 0.0943 0.1578 0.1927 0.2517 0.2970 0.3159 ∆𝛉(3)= −0.0004, 0. 0053 𝛉(4) = 0.3621, 0.5569 JACOBIAN MATRIX RESIDUALS FUNCTION VALUES GAUSS-NEWTON GAUSS-NEWTON METHOD ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) 113 CASE STUDY: NONLINEAR REGRESSION 𝐉𝜀 𝛉(4) = −0.09500.0389 −0.34890.1246 −0.54000.1596 −0.63360.1620 −0.77580.1385 −0.87350.0969 −0.91170.0733 𝑓 𝛉(4) = 0.0231 0.0935 0.1567 0.1916 0.2507 0.2961 0.3151 ∆𝛉(4)= 0.0001, 0. 0008 𝛉(5) = 0.3622, 0.5577 AFTER ABOUT FIVE ITERATIONS THE MINIMUM SSR IS FOUND 𝑓 𝛉(5) = 0.0231 0.0935 0.1566 0.1915 0.2506 0.2961 0.3152 𝛆 𝛉(5) = −0.0269 −0.0335 0.0626 −0.0207 0.0223 0.0296 −0.0165 𝐹 𝛉(5) = 𝜀𝑖 𝛉(5) 2𝑛 𝑖=1 = 0.0078 FUNCTION VALUES GAUSS-NEWTON GAUSS-NEWTON METHOD ∆𝛉(𝑖)= − 𝐉𝜀(𝛉(𝑖)) 𝑇𝐉𝜀(𝛉(𝑖)) −1 𝐉𝜀(𝛉(𝑖)) 𝑇𝛆(𝛉 𝑖 ) JACOBIAN MATRIX RESIDUALS 114 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 0.00 1.00 5.00 10.0 15.0 20.0 GAUSS-NEWTON: STEPS FROM RANDOM STARTING POINT × 115 CASE STUDY: NONLINEAR REGRESSION 𝑅(𝑆) = 0.3622𝑆 0.5577 + 𝑆 IN A BIOLOGY EXPERIMENT THE RELATION WAS MEASURED BETWEEN SUBSTRATE CONCENTRATION, 𝑆 AND REACTION RATE, R, ( ENZYME-MEDIATED REACTION) EXP 1 2 3 4 5 6 7 𝐒 0.038 0.194 0.425 0.626 1.253 2.500 3.740 𝐑 0.050 0.127 0.094 0.212 0.273 0.267 0.331 𝑅(𝑆, 𝑎, 𝑏) = 𝑎𝑆 𝑏 + 𝑆 MODEL FORMULATION 1.0 2.0 3.0 4.0 0 CONCENTRATION, 𝑆 0.1 0.2 0.3 0 R E A C T IO N R A T E , 𝑅 NOW LETS PLOT MODEL 𝑓 𝛉(5) = 0.0231 0.0935 0.1566 0.1915 0.2506 0.2961 0.3152 𝛉(5) = 0.3622, 0.5577 116 GAUSS-NEWTON: PARAMETER UNCERTAINTY 𝐂 𝛉 = SSR 𝑛 − 𝑝 𝐉𝑓 𝛉 𝑇𝐉𝑓 𝛉 −1 NONLINEAR REGRESSION 𝐉𝑓 𝛉(5) = 0.0950 − 0.0388 0.3488 − 0.1243 0.5399 − 0.1594 0.6335 − 0.1618 0.7758 − 0.1384 0.8735 − 0.0968 0.9117 − 0.0733 SSR = 0.0078; 𝑛 = 7; 𝑝 = 2 𝐂 𝛉(5) = 0.0025 0.0119 0.0119 0.0726 CL95% = 𝛉 ± 𝑡𝑛−𝑝,0.975 × diag 𝐂 𝛉 𝑡5,0.975 = 2.57 𝑅(𝑆) = 0.3622 ± 0.1276 𝑆 0.5577 ± 0.6924 + 𝑆 𝐂 𝛉 = SSR 𝑛 − 𝑝 𝐗𝑇𝐗 −1 LINEAR REGRESSION 1.0 2.0 3.0 4.0 0 CONCENTRATION, 𝑆 0.1 0.2 0.3 0 R E A C T IO N R A T E , 𝑅 117 GAUSS-NEWTON: PARAMETER CORRELATION 𝐂 𝛉(5) = 0.0025 0.0119 0.0119 0.0726 𝜌 𝑋, 𝑌 = cov(𝑋, 𝑌) var(𝑋) var(𝑌) 𝜌 𝜃1, 𝜃2 = 𝐂(𝜃1, 𝜃2) 𝐂(𝜃1, 𝜃1) 𝐂(𝜃2, 𝜃2) 𝜌 𝜃1, 𝜃2 = 0.0118 0.0025 0.0721 𝜌 𝜃1, 𝜃2 = 0.8886 PARAMETERS HIGHLY CORRELATED CORRELATION VALUES 𝐂 𝛉 = SSR 𝑛 − 𝑝 𝐉𝑓 𝛉 𝑇𝐉𝑓 𝛉 −1 NONLINEAR REGRESSION 1.0 2.0 3.0 4.0 0 CONCENTRATION, 𝑆 0.1 0.2 0.3 0 R E A C T IO N R A T E , 𝑅 𝑅(𝑆) = 0.3622 ± 0.1276 𝑆 0.5577 ± 0.6924 + 𝑆 118 FIRST-ORDER PARAMETER UNCERTAINTY 119 NONLINEAR REGRESSION: PARAMETER UNCERTAINTY -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 500 1000 1500 2000 2500 × OPTIMUM × IF PARAMETERS ARE NONLINEARLY CORRELATED: FIRST-ORDER APPROXIMATION AROUND OPTIMUM CAN BE UNRELIABLE 120 REGULARIZATION ( PRIOR INFORMATION) 121 A REGULARIZATION TERM – DEVIATION FROM A PRIORI MODEL ( AS IN EQUATION ABOVE ) – VARIATION OF MODEL – SMOOTHNESS OF THE MODEL ( FIRST DERIVATIVE OF 𝛉 ) – ROUGHNESS OF THE MODEL ( SECOND DERIVATIVE OF 𝛉 ) DEALING WITH MANY PARAMETERS WHEN THE NUMBER OF MODEL PARAMETERS INCREASES, THERE MIGHT NOT BE ENOUGH INFORMATION IN DATA TO CONSTRAIN ALL THE PARAMETERS WE CAN STABILIZE/CONSTRAIN PARAMETERS BY USING A PENALTY ( = REGULARIZATION ) TERM 𝐝 − 𝐗𝛉 𝑇 𝐝 − 𝐗𝛉 +𝑤 𝛉 − 𝛉 𝑇 𝛉 − 𝛉 𝐹 𝛉 = SUM OF SQUARED RESIDUALS 122 STABILIZE SOLUTION USING REGULARIZATION SSR TERM R E G U L A R IZ A T IO N T E R M QUESTIONS 1. HOW TO PICK BEST SOLUTION? 2. HOW DO WE CREATE THIS TRADE-OFF SOLUTION SET? 0 0 𝐝 − 𝐗𝛉 𝑇 𝐝 − 𝐗𝛉 +𝑤 𝛉 − 𝛉 𝑇 𝛉 − 𝛉 𝐹 𝛉 = 123 PROBLEMS WITH LOCAL SEARCH METHODS 124 |M od e l- d at a| GRADIENT METHODS: FAST CONVERGENCE BUT 𝛉(0) 𝛉(0) GLOBAL MINIMUM LOCAL MINIMUM GRADIENT METHODS PRONE TO PREMATURE CONVERGENCE IN LOCAL MINIMA CAN PICK BEST SOLUTION FROM MULTIPLE TRIALS BUT NO GUARANTEE YOU FOUND OPTIMUM ( AND THIS IS RATHER INEFFICIENT – MORE LATER ) . 125 DERIVATIVE FREE (SIMPLEX): SAME ISSUE TRIAL #1 ( GLOBAL ) TRIAL #2 ( GLOBAL ) TRIAL #3 ( LOCAL ) TRIAL #4 ( LOCAL ) GRADIENT-FREE METHODS ( SIMPLEX ) ARE ALSO SENSITIVE TO PREMATURE CONVERGENCE IN LOCAL MINIMA THUS LOCAL OPTIMIZATION METHODS MIGHT NOT GIVE US RELIABLE SOLUTIONS View publication statsView publication stats
Compartilhar