Baixe o app para aproveitar ainda mais
Prévia do material em texto
SGeMS User’s Guide Nicolas Remy, Alexandre Boucher & Jianbing Wu April 13, 2006 Chapter 1 General Overview Section ?? indicated how GsTL can be integrated into an already existing software. SGeMS, the Geostatistical Earth Modeling Software, is an example of software built from scratch using the GsTL. The source code of SGeMS serves as an example of how to use GsTL facilities. SGeMS was designed with two aims in mind. The first one, geared toward the end- user, is to provide a user-friendly software which offers a large range of geostatistics tools: the most common geostatistics algorithms are implemented, in addition to more recent developments such as multiple-point statistics simulation. The user-friendliness of SGeMS comes mainly from its non-obtrusive graphical user interface, and the pos- sibility to directly visualize data sets and results in a full 3-D interactive environment. The second objective was to design a software whose functionalities could conveniently be augmented. New features can be added into SGeMS through a system of plug-ins, i.e. pieces of software which can not be run by themselves but complement a main soft- ware. In SGeMS, plug-ins can be used to add new (geostatistics) tools, add new grid data structures (faulted stratigraphic grids for example) or define new import/export filters. 1.1 First Steps with SGeMS 1.1.1 A quick tour to the graphical user interface The graphical user interface (GUI) of SGeMS is divided into 3 main parts, see Fig. 1.1: The Algorithm Panel The user selects in this panel which geostatistics tool to use and 1 Figure 1.1: SGeMS’s graphical interface. The three main panels are highlighted in red. The top-right panel is the Algorithm Panel, the top-left is the Visualization Panel and the bottom is the Command Panel inputs the required parameters (see Fig. 1.2). The top part of that panel shows a list of available algorithms, e.g. kriging, sequential Gaussian simulation. When an algorithm from that list is selected, a form containing the corresponding input parameters appears below the tools list. The Visualization Panel One or multiple objects can be displayed in this panel, e.g. a Cartesian grid and a set of points, in an interactive 3-D environment. Visualization options such as color-maps are also set in the Visualization Panel. The Visualization Panel is shown in more details in Fig. 1.3. The Command Panel This panel is not shown by default when SGeMS is started. It gives the possibility to control the software from a command line rather that from 2 Figure 1.2: The 3 parts of the Algorithm Panel highlighted in red. The top part displays the list of available tools. The middle part is where the input parameters for the selected tool are entered. the GUI. It displays a history of all commands executed so far and provides an input field where new commands can be typed (see Fig. 1.4). See tutorial 1.1.2 for more details about the Command Panel. 1.1.2 A simple tutorial This short tutorial describes a SGeMS session in which ordinary kriging is performed on a 100 ∗ 130 ∗ 30 Cartesian grid. The data consist of a set of 400 points in 3-D space, with a rock porosity value associated to each data point. This point-set object is called porosity data. The steps involved are the following: 1. Load the data set 3 Figure 1.3: The Visualization Panel. The left-hand side (highlighted in red) controls which objects (e.g. grids) are visible. It is also used to set display options, such as which color-map to use. Figure 1.4: The Command Panel 2. Create a Cartesian grid 3. Select the kriging tool and enter the necessary parameters 4. Display and save the result 4 Loading the data set The first step is to load the data set into the objects database of SGeMS. Click Objects | Load Object and select the file containing the data. Refer to section 1.2.1 for a description of the available data file formats. When the object is loaded a new entry called porosity data appears in the Objects section of the Visualization Panel, as shown in Fig. 1.5. Figure 1.5: Object list after the data set is loaded Click in the square before the point-set name to display it. Displayed objects have a little eye painted inside the rectangle before their name. The plus sign before the square indicates that the object contains properties. Click on the plus sign to display the list of properties. Click in the square before the property name to paint the object with the corresponding property (see Fig. 1.6). Figure 1.6: Showing/hiding an object or a property 5 Creating a grid The next step is to create the grid on which kriging will be performed. The grid we will create is a 3-D Cartesian grid with 100 ∗ 130 ∗ 30 cells. Click Objects | New Cartesian Grid to open the grid creation dialog. Enter the di- mensions of the grid, the coordinates of the origin of the grid, i.e. the lower left corner (i.e. the grid node with smallest x, y, z coordinates), and the dimensions of each grid cell. Provide a name for the new grid, working_grid for example. Click Create Grid to create the grid. A new entry called working_grid appears in the Objects panel of the Visualization Panel, as shown in Fig. 1.7. Figure 1.7: Object list after the Cartesian grid is created The object data-base now contains two objects: a point-set with the rock porosity property, and a Cartesian grid with not yet any property attached. We can proceed to the kriging run. Running the kriging algorithm Select the kriging tool from the list in the Algorithm Panel. A form prompting for the kriging parameters appears below the algorithms list. You can either type in the para- meters or load them from a file. Fig. 1.8 shows an example of parameter file for kriging (refer to section 1.2.2 for a description of the parameter file format and to section 5.0.2 for details on kriging parameters). Using the parameters of Fig. 1.8, ordinary kriging is performed with an isotropic search ellipsoid of radius 50 (section ?? describes how to specify a 3-D ellipsoid in SGeMS) and an isotropic spherical variogram of range 30, sill 1, and a nugget effect of 0.1 (section ?? explains how to specify a variogram). Once all parameters have been entered, click the Run Algorithm button. If some parameters are not correctly set, they are highlighted in red and a description of the error will appear if the mouse is left a few seconds on the offending parameter. 6 <parameters> <algorithm name="kriging" /> <Grid_Name value="working_grid" /> <Property_Name value="krig_porosity" /> <Hard_Data grid="porosity data" property="porosity" /> <Kriging_Type type="Ordinary Kriging (OK)" > <parameters /> </Kriging_Type> <Max_Conditioning_Data value="12" /> <Search_Ellipsoid value="50 50 50 0 0 0 " /> <Variogram nugget="0.1" structures_count="1" > <structure_1 contribution="0.9" type="Spherical" > <ranges max="30" medium="30" min="30" /> <angles x="0" y="0" z="0" /> </structure_1> </Variogram> </parameters> Figure 1.8: Kriging parameter file If kriging was run with the parameters shown on Fig. 1.8, the grid named working_grid now contains a new property called krig_porosity. Displaying and saving the result The algorithm Kriging created a new property krig_porosity in the grid working_grid. Click on the plus sign before the working_grid entry in the objects list to show the list of properties attached to the grid, and click in the square before the newly created property to display it. To save the results, click Object | Save Object to open the Save Object dialog. Provide a file name, the name of the object to save, e.g. working_grid and the file format to use (see section 1.2.1). It is also possible to save all the objects at once by saving the project (File | Save Project). 7 1.1.3 Automating tasks in SGeMS Tutorial1.1.2 showed how to perform a single run of the kriging algorithm. Next, one would like to study the sensitivity of the algorithm to parameter Max Conditioning Data, the maximum number of conditioning data retained for each kriging. The user would like to vary that number from 1 to 50 in increments of 1. It would be very impractical to perform such a study in successive sequences as explained in Tutorial 1.1.2. SGeMS provides a solution to this problem through its command line interface. Many actions in SGeMS can either be performed with mouse clicks or by typing a command in the Command Panel. For example, loading the data set in step 1 of Tutorial 1.1.2 could have been achieved by typing the following command: LoadObjectFromFile /home/user/data_file.dat:All Each command has the following format: • the name of the command, e.g. LoadObjectFromFile • a list of parameters, separated by a colon “:”. In the previous example two parame- ters were supplied: the name of the file to load, and the file format All (meaning that every available file format should be considered). Every command performed in SGeMS, either typed or resulting from mouse clicks, is recorded to both the “Commands History” section of the Command Panel and to a file called gstlappli history.log. Hence if one does not know a command name, one can use the GUI to perform the corresponding action and check the command name and syntax in the command history. It is possible to combine several commands into a single file and have SGeMS execute them all sequentially. For the sensitivity study example, one would have to write a “script” file containing 50 kriging commands, each time changing the Max Conditioning Data parameter. Note that there are no control structures, e.g. “for” loops, in SGeMS script files. However there are many programming tools available (Perl, Awk, Shell,. . . ) that can be used to generate such script file. 8 1.2 File Formats 1.2.1 Objects Files SGeMS supports two file formats by default to describe grids and sets of points: the GSLIB format and the SGeMS binary format. The GSLIB file format It is a simple ASCII format used by the GSLIB software (ref ??). It is organized by lines: • the first line gives the title. • the second line is a single number n indicating the number of properties in the object. • the n following lines contain the names of each property (one property name per line) • each remaining line contains the values of each property (n values per line) sepa- rated by spaces or tabulations. The order of the property values along each line is given by the order in which the property names were entered. Note that the same format is used for describing both point sets and Cartesian grids. When a GSLIB file is loaded into SGeMS the user has to supply all the information that are not provided in the file itself, e.g. the name of the object, the number of cells in each direction if it is a Cartesian grid. The SGeMS binary file format SGeMS can use an uncompressed binary file format to store objects. Binary formats have two main advantages over ASCII files: they occupy less disk space and they can be loaded and saved faster. The drawback is a lack of portability between platforms. SGeMS binary format is self-contained, hence the user need not provide any additional information when loading such a file. 9 1.2.2 Parameter Files When an algorithm is selected from the Algorithm Panel (see step 3 of Tutorial 1.1.2), several parameters are called for. It is possible to save those parameters to a file, and later retrieve them. The format of a parameter file in SGeMS is based on the eXtended Markup Language (XML), a standard formating language of the World Wild Web Consortium (www.w3.org). Fig. 1.8 shows an example of such parameter file. In a parameter file, each parameter is represented by an XML element. An element consists of an opening and a closing tag, e.g. <tag> and </tag>, and one or several attributes. Following is an example of an element called algorithm which contains a single attribute “name”: <algorithm name="kriging"> </algorithm> Elements can themselves contain other elements: <Variogram nugget="0.1" structures_count="1" > <structure_1 contribution="0.9" type="Spherical" > <ranges max="30" medium="30" min="30"> </ranges> <angles x="0" y="0" z="0"> </angles> </structure_1> </Variogram> Here the element Variogram contains an element structure_1, which itself contains two elements: ranges and angles. Each of these elements have attributes. Note that if an element only contains attributes the closing tag can be abbreviated: in the previous example, element ranges only contains attributes and could have been written: <ranges max="30" medium="30" min="30" /> The /> sequence indicates the end of the element A SGeMS parameter file always has the two following elements: • element parameters. It is the root element: it contains all other elements • element algorithm. It has a single attribute name which gives the name of the algorithm for which parameters are specified. 10 All other elements are algorithm-dependent and are described in sections 5 and ??. Such XML formated parameter file has several advantages: • Elements can be entered in any order • comments can be inserted anywhere in the file. A comment block starts with <!-- and end with -->. They can span multiple lines. For example: <!-- An example of a comment block spanning multiple lines --> <parameters> <algorithm name="kriging" /> <!-- the name of the working grid --> <Grid_Name value="working_grid" /> </parameters> 11 Chapter 2 Data Sets & SGeMS EDA Tools This chapter presents the data sets which will be used to demonstrate the geostatistics algorithms in the following chapters. It also provides an introduction to the exploratory data analysis (EDA) tools of the SGeMS software. Section 2.1 presents the two data sets: one in 2D and one in 3D. The smaller 2D data set will be used for most of the examples of running geostatistics algorithms. The 3D data set, which mimics a large deltaic channel reservoir, will be used to demonstrate the practice of these algorithms on the real and large 3D applications. Section 2.2 introduces some basic EDA tools, such as histogram, Q-Q (quantile to quantile) plot, P-P (probability to probability) plot and scatter plot. These EDA tools are very useful to check/compare both visually and statistically any given data sets. 2.1 The Data Sets The application example given for each algorithm should be short yet informative, allow- ing the user to check his understanding of the algorithm and its input-output parameters. The results of any application based on a specific data set, should NOT be extended into a the general conclusion. Although the SGeMS software was initially designed for reser- voir modeling, the application examples should stress the generality of the algorithms and their software implementation. 2.1.1 The 2D data To be done ... 12 2.1.2 The 3D data The 3D data set retained in this book is extracted from a layer of Stanford VI, a synthetic fluvial channel reservoir (Castro et al., 2005). The corresponding SGeMS project is lo- cated at ‘data set/stanford6.prj’. This project contains three SGeMS objects: well, grid and container. • The well object contains the well data set. There are a total of 26 wells (21 verti- cal wells, 4 deviated wells and 1 horizontal well). The properties associated with the wells are density, binary facies (sand channel or mud floodplain), P-wave im- pedance, P-wave velocity, permeability and porosity. These data can be used as hard or soft conditioning data for the geostatistics algorithms. Fig. 2.1 shows the well locations and the porosity distribution along the wells. • The grid object is a cartesian reservoir grid (see its rectangularboundary on Fig. 2.1), with, - grid size: 150× 200× 80, - origin point at (0,0,0), - unit cell size in each x/y/z direction. This reservoir grid holds the petrophysical properties, currently it contains: 1. Seismic data. The original Stanford VI contains a cube of P-wave seismic impedance, which is the product of density and P-wave velocity. However for most geostatistics algorithms such seismic data cannot be used directly as is, it must be calibrated into, e.g., facies probability distributions. Here, two sand probability cubes (properties P(sand|seis) and P(sand|seis) 2) are provided: one displaying sharp channel boundaries (best quality data, see Fig. 2.2a); and a second displaying more fuzzy channel boundaries with noise (poor quality data, see Fig. 2.2b). These probability data will be used as soft data to con- strain the facies modeling. 2. Region code. Typically a large reservoir would be divided into different re- gions with each individual region having its own characteristics, for instance, different channel orientations, channel thickness. The regions associated with the Stanford VI reservoir are rotation regions (property angle) correspond- ing to different channel orientations (Fig. 2.3), and affinity regions (property affinity) corresponding to different channel thickness (Fig. 2.4). Each rotation 13 Angle category 0 1 2 3 4 5 6 7 8 9 Angle value (degree) -63 -49 -35 -21 -7 7 21 35 49 63 Table 2.1: Rotation region indicaotrs Affinity category 0 1 2 Affinity value ([x,y,z]) [0.5, 0.5, 0.5] [ 1, 1, 1 ] [ 2, 2, 2 ] Table 2.2: Affinity region indicaotrs region is labeled with an indicator number, and assigned an angle value, see Table 2.1. The affinity indicators and the attached affinity values are given in Table 2.2. An affinity value must be assigned for each x/y/z direction; the smaller the affinity value, the thicker the channel in that direction. This grid object is used to demonstrate the application examples. • The container object is composed of all the reservoir nodes which are located inside the channels, hence it is a point set with (x,y,z) coordinates. The user can perform geostatistics on this channel container, for example, to obtain the within channel petrophysical properties. In Fig. 2.5 the channel container is represented by all nodes with value 1. Although this 3D data set is taken from a reservoir model, it could represent any 3D spatially-distributed attribute and applications other than reservoir modeling. For exam- ple, one can interpret each layer of the seismic data cube as satellite measurements defined over the same area but recorded at different times. The application could then be modeling landscape change in both space and time. 2.2 The SGeMS EDA Tools SGeMS software provides some useful exploratory data analysis (EDA) tools, such as histogram, quantile (Q-Q) plot, probability (P-P) plot, scatter plot, variogram and cross variogram calculation and fit. In this chapter, the first four elementary tools are presented; the two latter tools are presented in the next chapter. 14 Figure 2.1: Well locations and the porosity distribution along the wells. (a) Good quality probability. (b) Poor quality probability. Figure 2.2: Sand probability cube calibrated from seismic impedance. Figure 2.3: Angle indicator cube. Figure 2.4: Affinity indicator cube. 15 Figure 2.5: Channel container (nodes with value 1). All the EDA tools can be invoked through the Data Analysis menu from the main SGeMS graphical interface. Once a specific tool is selected, the corresponding SGeMS window is popped up. The EDA tool window is independent of the main SGeMS inter- face, and the user can have multiple windows for each EDA tool. 2.2.1 Common Parameters The main interface for any of the EDA tools presented in this chapter has 3 main panels (see Fig. 2.6, Fig. 2.7 and Fig. 2.8): A. Parameter Panel The user selects in this panel the properties to be analyzed and the display options. This panel has two pages: ‘Data’ and ‘Display Options’, the latter being common to all EDA tools; B. Visualization Panel This panel shows the graphic result of the selected statistics; C. Statistics Panel This panel displays some relevant statistical summaries. In the lower part of main interface, there are two buttons: Save as Image and Close. The Save as Image button is used to save a graphical visualization (for example his- togram) into a picture data file in either ‘png’ format or ‘bmp’ format. The user can also write the statistical summaries and/or paint the grid into the data file by selecting the cor- responding options in the figure saving dialog box. The Close button is used to close the current EDA tool interface. 16 Parameters Description The parameters of the ‘Display Options’ page are listed as follows: • X Axis: The X axis for variable 1. Only the property value between ‘Min’ and ‘Max’ are displayed in the plot; the values less than ‘Min’ or greater than ‘Max’ still contribute to the statistical summaries. The default values of ‘Min’ and ‘Max’ are the minimum and maximum of the selected Property. The X Axis can be set to a logarithmic scale by marking the corresponding check box. • Y Axis: The Y axis for variable 2. The previous remarks made for the X Axis apply. The user can modify the parameters through either the keyboard or the mouse. Any modification through mouse will instantly reflect on the visualization or the statistical summaries; while the change through keyboard must be inured by pressing the ‘Enter’ key. 2.2.2 Histogram The histogram tool creates a visual output of the frequency distribution, and displays some statistical summaries, such as the mean and variance of the selected variable. The histogram tool is activated by clicking Data Analysis|Histogram. Although the pro- gram will automatically scale the histogram, the user can set the histogram limits in the Parameter Panel. The main histogram interface is given in Fig. 2.6, and the parameters of the ‘Data’ page are listed below. Parameters Description • Object: A cartesian grid or a point set containing variables. • Property: The values or distribution of a certain variable listed in the Object above. • bins: The number of classes. The user can change this number through the key- board, or by clicking the scroll bar. Any value change will be instantly reflected on the histogram display. • Clipping Values: Statistical calculation settings. All values less than ‘Min’ and greater than ‘Max’ are ignored, and any change of ‘Min’ and ‘Max’ will affect the statistics calculation. The default values of ‘Min’ and ‘Max’ are the minimum and 17 maximum of the selected Property. After modifying ‘Min’ and/or ‘Max’, the user can go back to the default setting by clicking ‘Reset’. 2.2.3 Q-Q plot and P-P plot The Q-Q plot compares the equal p-values quantiles of two variables; and P-P plot com- pares the cumulative probability distributions of two variables. The two variables need not be in the same Object or have the same number of data. The Q-Q plot and P-P plot are combined into one program, which can be invoked from Data Analysis|QQ-plot. This EDA tool generates both a graph in the Visualization Panel and some statistical sum- maries (mean and variance for each variable) in the Statistics Panel, see Fig. 2.7. The parameters in the ‘Data’ page are listed below. Parameters Description • Analysis Type: Algorithm selection. The user can choose either a Q-Q plot or a P-P plot. • Variable 1: The variable selection for the X axis. The user must choose first an Object, then the Property name. • Clipping Values for Variable 1: All values strictly less than ‘Min’ and strictly greater than ‘Max’ are ignored, any change of ‘Min’ and ‘Max’ will affect the statistics calculation. The user can go back to the default setting byclicking ‘Reset’. • Variable 2: The variable selection for the Y axis. The user must choose first an Object, then the Property name. Note that Variable 2 and Variable 1 might be from different objects. • Clipping Values for Variable 2: Remarks similar to those for Clipping Values for Variable 1. 2.2.4 Scatter Plot The scatterplot tool (executed by clicking Data Analysis|Scatter-plot) is used to com- pare a pair of variables by displaying their bivariate scatter plot and some statistics. All available data pairs are used to compute the summary statistics, such as the correlation 18 coefficient, the mean and varianace of each variable (see part [C] in Fig. 2.8). To avoid a crowded figure in the Visualization Panel, only up to 10,000 data pairs are given in the scatter plot. The parameters in the ‘Data’ page are listed below. Parameters Description • Object: A cartesian grid or a point set containing variables. This Object must at least contain two properties. • Variable 1: The variable property listed in the Object above. This variable is associated with the X axis. • Clipping Values for Variable 1: All values strictly less than ‘Min’ and strictly greater than ‘Max’ are ignored, hence any change of ‘Min’ and ‘Max’ will affect the statistics calculation. The user can go back to the default setting by clicking ‘Reset’. If Variable 1 has more than 10,000 data, then the ‘Reset’ button can be used to generate a new scatter plot with a new set of data pairs containing up to 10,000 data. • Variable 2: The variable property listed in the upper Object. This variable is associated with the Y axis. Variable 2 must have the same number of data as Variable 1. • Clipping Values for Variable 2: Remarks similar to those for Variable 1. • Options: The choice of visualizing the least square fit line in the scatter plot. The slope and the intercept are always given below the check box ‘Show Least Square Fit’. 19 Fi gu re 2. 6: H ist og ra m in te rfa ce [A ]: pa ra m et er pa ne l; [B ]: v isu al iz at io n pa ne l; [C ]: st at ist ic sp an el . 20 Fi gu re 2. 7: Q- Q pl ot in te rfa ce [A ]: pa ra m et er pa ne l; [B ]: v isu al iz at io n pa ne l; [C ]: st at ist ic sp an el . 21 Fi gu re 2. 8: Sc at te rp lo ti nt er fa ce [A ]: pa ra m et er pa ne l; [B ]: v isu al iz at io n pa ne l; [C ]: st at ist ic sp an el . 22 Chapter 3 Utilities 3.1 Non-parametric distribution In SGeMS a non-parametric cumulative distribution function, cdf F (z), is infered from a set of values z1 ≤ . . . ≤ zL which can either be read from a file or from a property. F (z) is built such that: F (z1) = 1L and F (zL) = 1 − 1L . This requires modeling of the tails of the distribution F (z) if the values z1 and zL are not the minimum and maximum values. The lower tail extrapolation function provides the shape of the distribution between the minimum zmin and the first value z1. The options for the lower tail are: • Z is bounded: F (zmin) = 0. The lower tail of F is then modeled with a power model: F (z1)− F (z) F (z1) = ( z1 − z z1 − zmin )ω ∀z ∈]zmin, z1[ (3.1) The parameter ω controls the decrease of the function, with the constraint ω ≥ 1. The greater ω the less likely are low values close to zmin. For ω = 1, all values between zmin and z1 are equally likely. • Z is not bounded: the lower tail is modeled with an exponential funtion: F (z) = F (z1) exp (− (z − z1)2) ∀z < z1 (3.2) The options for the upper tail extrapolation function are similar but applied to the interval [zL, zmax]. 23 • Z is bounded: F (zmax) = 1. The upper tail of F is then modeled with a power model: F (z)− F (zL) 1− F (zL) = ( z − zL zmax − z1 )ω ∀z ∈]zL, zmax[ (3.3) The parameter ω controls the decrease of the function, with the constraint ω ∈ [0, 1]. The lower the ω value the less likely are extreme values close to zmax. For ω = 1, all values between zL and zmax are equally likely. • Z is not bounded: the lower tail is modeled by an hyperbolic model 1− F (z) 1− F (zL) = zL z ∀z > zL (3.4) All L − 1 intermediary intervals [zi, zi+1] for i = 1, ..., L − 1 are interpolated linearly , corresponding to power model with ω = 1. Note: When zmin and zmax values are set to z1 and zL, the functional shape of the tail extrapolation function becomes irrelevant. Parameters description 1. Reference Distribution: Read the data either from the file [ref on file] or on the grid [ref on grid] . 2. File with reference distribution [filename]: File containing the reference distribu- tion in one column without header. Required if [ref on file] is selected 3. Property with reference distribution: The grid [grid] and property [property] contains the values for the non-parametric distributions 4. Lower Tail Extrapolation: Parametrization of the lower tail. The type of extrapola- tion function is selected with [LTI function]. If the power model is selected, the minimum value [LTI min] and the parameter ω [LTI omega] must be specified. Note that the minimum [LTI min] must be less or equal to the minimum datum value as entered in the reference distribution, and the power omega [LTI omega] must be greater or equal to 1. The exponential model does not require any parameter. 5. Upper Tail Extrapolation: Parametrization of the upper tail. The type of extrap- olation fucntion is selected with [UTI function]. If the power model is selected, 24 the maximum value [UTI max] and the parameter ω [UTI omega] must be specified. Note that the minimum [UTI max] must be greater or equal to the maximum datum value as entered in the reference distribution, and the power omega [UTI omega] must be lesser or equal to 1. The hyperbolic model only required the parameter omega [UTI omega], the upper tail is unbounded. Histogram Transformation The algorithm TRANS allows to transform histogram into any other one. For example, the Gaussian simulation algorithms (SGSIM and COSGSIM), as described in Section 6.1.1 and 6.1.2, assume Gaussian variables. If the attribute of interest is not Gaussian, it is pos- sible to transform the marginal distribution of that attribute into a Gaussian distribution, then work on the transformed variable. The algorithm TRANS transform a property following a source distribution to a new variable following a target distribution. The transformation of a variable Z with a source cdf FZ into variable Y with target cdf FY is written: Y = F−1Y ( FZ ( Z )) (3.5) Note : The transformation of a source distribution to a Gaussian distribution does not ensure that Y is multivariate Gaussian, only its marginal distribution is. One should check that the multivariate (or at least bi-variate) Gaussian hypothesis holds for Y before performing Gaussian simulation. If the hypothesis is not appropriate, other algorithms that do not require Gaussianity, e.g. sequential indicator simulation (SISIM) should be considered. Histogram transformation with conditioning data It is possible to apply a weighting factor to control how much specific values are trans- formed. y = z − ω(z − F−1Y (FZ (z))) When ω = 0 then y = z, there is no transformation. 25 When ω = 1 then y = F−1Y (FZ (z))) which is the standard rank transform. The weight ω can be set equal to the standardized kriging variance. At data locations that kriging variance is zero, hence there is no transform and the datum value is unchanged. Away from data locations the kriging variance increases allowing for a larger transform. That option is to be used for slight adjustment of the marginal distribution. When weightsare used the transform is not rank preserving anymore and would only approximately match the target distribution. The histogram transformation algorithm is: Algorithm 3.1 Histogram transformation 1: for Each value zk to be rank transformed do 2: Get the quantile from the source histogram associated with zk, pk = FZ(zk) 3: Get the values yk from the target distribution associated with pk, yk = F−1Y (pk) 4: if weighted transformation then 5: Applied weighting to the transform. 6: end if 7: end for Parameters description The TRANS algorithm is activated from Utilities—trans in the algorithm panel. The TRANS interface contains 3 pages: Data, Source and Target (see Figure 3.2). The text inside [ ] is the corresponding keyword in the TRANS parameter file. 1. Object Name [grid]: Selection of the grid containing the properties to be transformed 2. Properties [props]: Properties to be transformed 3. Suffix for output [out suffix]: The name for each output property consists of the original name plus the suffix entered here. 4. Local Conditioning [is cond]: Enables the use of weight for the histogram transfor- mation 5. Weight Property [cond prop]: Property with weights for transformation of the his- togram conditional to data, see Eq. 3.1. The standardized kriging variance is a good weighting option. Only required if Local Conditioning[is cond] is selected 26 6. Control Parameter [weight factor]: Value between 0 and 1 adjusting the weights. Only required if Local Conditioning [is cond] is selected 7. Source histogram [ref type source]: Define the type of the source histogram. That histogram may be either non parametric, Gaussian, LogNormal, or Uniform. Each has its own interface for parameters 8. Gaussian parameters: The mean is given in Mean [G mean source] and the variance in Variance[G variance source] 9. LogNormal parameters: The mean is given in Mean [LN mean source] and the variance in Variance[LN variance source] 10. Uniform parameters: The minimum value is given in Min [Unif min source] and the maximum in Max[Unif max source] 11. Non Parametric: The non parametric distribution is entered in [nonParamCdf source], see 3.1 12. Target histogram [ref type target]: Define the type of the target histogram. The histogram may be either non parametric, Gaussian, LogNormal, or Uniform. Each has its own interface for parameters 13. Gaussian parameters: The mean is given in Mean [G mean target] and the variance in Variance[G variance target] 14. LogNormal parameters: The mean is given in Mean [LN mean target] and the variance in Variance[LN variance target] 15. Uniform parameters: The minimum value is given in Min [Unif min target] and the maximum in Max[Unif max target] 16. Non Parametric: The non parametric distribution is entered in [nonParamCdf target], see 3.1 27 Figure 3.1: Widget for non parametric distribution (a) Data tab (b) Source tab (c) Target tab Figure 3.2: User interface for TRANS. 28 Chapter 4 Variograms and variography Variogram models are used by many geostatistical algorithms; SGeMS allows four basic models and any positive linear combinations of them, that is the linear model of region- alization. Linear models of coregionalization are also possible but the software does not check the permissibility of the model, it is the user responsability. The four basic models are: the nugget effect model, the spherical model, the exponential model and the Gaussian model. nugget effect model γ(h) = 0 if ‖h‖ = 01 otherwise (4.1) A pure nugget effect model for a variable Z(u) expresses a lack of (linear) depen- dence between variables Z(u) and Z(u+ h) spherical model with actual range a γ(h) = 32 ‖h‖ a − 1 2 (‖h‖ a )3 if ‖h‖ ≤ a 1 otherwise (4.2) exponential model with practical range a γ(h) = 1− exp (−3‖h‖ a ) (4.3) Gaussian model with practical range a γ(h) = 1− exp (−3‖h‖2 a2 ) (4.4) 29 Notes: Gaussian models are not permissible for binary indicator variables. The term variogram is used to designate what is strictly speaking a semi-variogram γ(h) It is also understood as cross semivariogram depending on the context. All above models are permissible in 3-D and have a covariance counterpart: C(h) = C(0)− γ(h) In SGeMS a variogram model: γ(h) = c0γ(0)(h)+ ∑L l=1 clγ (l)(h) is characterized by the following parameters: • a nugget effect c0γ(0) with nugget constant c0 ≥ 0 • the number L of nested structures. Each structure γ(l)(h) is then defined by: – a variance contribution cl ≥ 0 – the type of the variogram: spherical, exponential or Gaussian – six parameters: the three practical ranges along the main directions of the anisotropy ellipsoid and the three corresponding rotation angles defining that ellipsoid. Note that each nested structure can have a different anisotropy. Example: Consider the variogram model γ(h) = 0.3γ(0)(h) + 0.4γ(1)(h) + 0.3γ(2)(h), with: • γ(0)(h) a nugget with sill 0.3 • γ(1)(h) an anisotropic spherical variogram with major range 40, medium range 20 and minor range 5, and angles α = 45o, β = 0, θ = 0 • γ(2)(h) an isotropic Gaussian variogram of range 200 The corresponding XML parameter file snippet would be: 30 <[Parameter_name] nugget="0.3" structures_count="2" > <structure_1 contribution="0.4" type="Spherical" > <ranges max="40" medium="20" min="5" /> <angles x="45" y="0" z="0" /> </structure_1> <structure_2 contribution="0.3" type="Exponential" > <ranges max="200" medium="200" min="200" /> <angles x="0" y="0" z="0" /> </structure_2> </[Parameter_name]> Where [Parameter_name] is the name of the parameter. 4.1 Modeling variogram in SGeMS SGeMS offers the computation and modeling of experimental variograms for data both on regular grids and on point sets. The variogram module is accessible from the Data Analy- sis—Variogram menu. In addition to the variogram, the experimental covariance and correlogram can also be calculated. SGeMS only fits permissible models to variograms. The variography in SGeMS is done in three steps 1) choosing the grid and properties, 2) calculating the experimental variograms and 3) fitting a model. Step 2) differs depending whether the properties are to be found on a point set or a grid. For all steps, the next windows are accessed by clicking on Next. The parametes for the variogram computation are entered in the second window. That window takes two forms depending if the property resides on a point set, shown in Figure 4.1(b), or on a cartesian grid 4.1(c). Selecting properties The selection of the properties on which to perform the variography is done in the first window, shown in Figure 4.1(a). The experimental variogram can either be loaded from a previous session or computed from scratch. When the computation is required, the user has to specified two properties, termed head and tail. Specifying different properties for head and tail results in computing their cross-variogram. If it is retrieved from a 31 previous session, then the next window is the modeling window of Figure 4.1(d), the parametrization window is skipped. 1. Experimental variogram: Either compute the experimental variogram from scratch or retrieve it from a previous modeling session. 2. Grid and Properties: Select the properties for the experimental varigrams. Specifying different properties for head and tail results in computing a cross-variogram. Using the same property for head and tail produces a univariate experimental variogram Experimental variogram on point set Calculating experimental variograms on point set is challenging due to the irregularity of data locations. For a fixed lag and direction, it is unlikely that enough pairs, if any, would be found to calculate the corresponding variogram. Thus, tolerance on the distance and anglesis introduced such that enough data pairs can be found on a point set. Parameters 3 to 13 shown in Figure 4.1(b) are required when working on a point set. Parameters entered in this window can be saved for future use. 3. Number of lags: Number of variogram lags to compute 4. Lag separation: Distance between two lags. The experimental variogram will be computed from h = 0 to h = Number of lags times the lag separation. 5. Lag tolerance: Tolerance around the lag separation. All pairs of data separated by Lag separation ± lag tolerance would be reported to the same lag. 6. Number of directions: Number of directions along which to compute the experimen- tal variograms, each with the same number of lags. Each direction is parametrize by the next four items. 7. Azimuth: Horizontal degrees in angle from North moving clockwise 8. Dip: Vertical gegrees in angle from the horizontal plane, moving clockwise. 9. Tolerance: Half-window angle tolerance.Tolerance on the angle to pair data. 10. Bandwidh: Maximum acceptable deviation from the direction vector.. 32 11. Measure type: Choice of the metric to measure bivariate spatial patterns. The op- tions are: variogram, indicator variogram, covariance and correlogram. The mod- eling capability of SGeMS permits only to fit a model on variogram and indicator variogram. 12. Head indicator cutoff: Threshold defining the indicator variable. The value of the head continuous property is coded one if it is below that threshold, it is zero other- wise. This field is to be used only for indicator variogram. 13. Tail indicator cutoff: Threshold defining the indicator variable. The value of the tail continuous property is coded one if it is below that threshold, it is zero otherwise. This field is to be used only for indicator variogram. Different head and tail thresh- olds lead paramount to a cross-indicator variogram if the tail and head refer to the same property, or to a cross-indicator cross-variogram if the head and tail property are different. Experimental variogram on cartesian grid Pairs of data are easier to find on a cartesian grid. Experimental variograms on cartesian grid does not require tolerance. The computation take advantage of the grid by finding pairs along discrete direction vector increment, (∆x,∆y,∆z), specified by the user. For example, a direction vector of (1, 0, 0) is horizontal East-West, (1, 1, 0) correponds to azimuth 45 degree, while (0, 0, 1) is vertical. Parameters 14 to 18 shown in Figure 4.1(c) are required when the selected head and tail properties are located on a cartesian grid. Parameters entered in this window can be saved for future use. The description of parameters without reference number in 4.1(c) already been described for Figure 4.1(b). 14. Number of lags: Number of variograms lags to consider. 15. Number of directions: Number of discretized direction vectors 16. x: East-West increment 17. y: North-South increment 18. z: Vertical increment 33 Fitting a permissible model The tools to fit a model to the experimental variogram are shown in Figure 4.1(d). The figures can be reorganized in the window either manually or by using the menu Window. The axes for one or all the plots can be changed from the Settings menu. The File menu allows to save or load the variogram model in XML format. The experimental variograms can also be saved in file from that menu. The last option is to take snapshot for all or for a particular variogram display. Right-clicking on any variogram display will toggle the number of pairs next to each experimental values. When fitting, points with few pairs should be given less considera- tion. 19. Nugget Effect: Contribution to the sill of the nugget efect 20. Nb of structures: Number of structures for the linear model of regionalization 21. Sill Contribution: Contribution to the sill of that structure 22. Type: Type of variogram for that structure. There are three variogram type: spherical, exponential and Gaussian. 23. Ranges: Ranges of the variogram. The range can either be changed be manually entering the value, or by sliding the bar. 24. Angles: Angles defining the anisotropy 34 (a) Property selection (b) Parameters for point set (c) Parameters for cartesian grid (d) Model fitting Figure 4.1: User interface for variogram modeling. 35 Chapter 5 Estimation Algorithms SGeMS provides several tools for estimating a spatially distributed variable Z(u) from a limited number of samples. All algorithms rely on the linear, least-square estimation paradigm called kriging. The kriging estimator can be extended to deal with a variable whose mean is not stationary, and to the case of multiple covariates (cokriging). 5.0.1 Common Parameters All estimation algorithms in SGeMS work on a 3-D object, loosely called the estimation grid: in the current version, that object can either be a Cartesian grid or an unstructured set of points. When an estimation algorithm is run on an object, a new property containing the result of the estimation is attached to that object. All estimation algorithms require the following two parameters: Parameters description Grid Name: The grid (or more generally, the object) on which the estimation is to be performed Property Name: The name of the new property resulting from the estimation 5.0.2 Kriging Kriging is a 3-D estimation program for variables defined on a constant volume support. Estimation can either be performed by simple kriging (SK), ordinary kriging (OK), krig- 36 ing with a polynomial trend (KT) or simple kriging with a locally varying mean (LVM). SK, OK and KT were introduced in ??, p. ??. LVM is another variant of kriging, in which the mean m(u) is assumed not constant but known for every u. The kriging prob- lem consists of finding the weights {λα}, α = 1, . . . , n such that: V ar ( n∑ α=1 λα[Z(uα)−m(uα)]− [Z(u)−m(u)] ) is minimum Example: Simple kriging is used to estimate porosity in the second layer of Stanford V. Porosity is modeled by a stationary random function with an anisotropic spherical variogram. Its expected value, inferred from the well data, is 0.17. The input parameters for algorithm Kriging are reproduced in Fig. 5.1. Fig. 5.2 shows the estimated porosity field. <parameters> <algorithm name="kriging" /> <Grid_Name value="layer 2" /> <Property_Name value="krig" /> <Hard_Data grid="layer 2 well data" property="porosity" /> <Kriging_Type type="Simple Kriging (SK)" > <parameters mean="0.17" /> </Kriging_Type> <Max_Conditioning_Data value="200" /> <Search_Ellipsoid value="100 100 5 0 0 0 " /> <Variogram nugget="0.2" structures_count="1" > <structure_1 contribution="0.8" type="Spherical" > <ranges max="80" medium="40" min="2" /> <angles x="30" y="0" z="0" /> </structure_1> </Variogram> </parameters> Figure 5.1: Simple kriging parameters 37 (a) 3D view (b) Cross-section 1 (c) Cross-section 2 Figure 5.2: Porosity estimated by simple kriging Parameters description 1. Hard Data [Hard Data]: The grid and the name of the property containing the hard data 2. Kriging Type [Kriging Type]: Possible types of kriging include: Simple Kriging (SK), Ordinary Kriging (OK), Kriging with Trend (KT) and Simple Kriging with Locally Varying Mean (LVM) 38 3. Trend components [Trend]: The trend components. Possible components are, with u = (x, y, z): t1(u) = x, t2(u) = y, t3(u) = z, t4(u) = x.y, t5(u) = x.z, t6(u) = y.z, t7(u) = x 2 , t8(u) = y 2 , t9(u) = z 2 . The trend is coded by a string of 9 flags, for example the string “0 1 0 0 0 1 0 1 0’’ would correspond to a trend with components t2, t6 and t8, i.e. T (u) = α x+ β yz + δ y2. Each flag is separated by a space. 4. Local Mean Property [Local Mean Property]: The property of the simulation grid containing the non-stationarymean. A mean value must be available at each loca- tion to be estimated. 5. Max Conditioning Data [Max Conditioning Data]: The maximum number of con- ditioning data to be used for kriging at any location 6. Search Ellipsoid [Search Ellipsoid]: The ranges and angles defining the search ellipsoid 7. Variogram [Variogram]: The variogram of the variable to be estimated by kriging 5.0.3 Indicator Kriging Let Z(u) be a continuous random variable, and I(u, zk) the binary indicator function defined at cutoff zk: I(u, zk) = { 1 if Z(u) ≤ zk 0 otherwise The aim of indicator kriging is to estimate the conditional cumulative distribution function (ccdf) at any cutoff zk, conditional to data (n): I∗(u, zk) = E∗ ( I(u, zk) | (n) ) = Prob∗ ( Z(u) < zk | (n) ) (5.1) I∗(u, zk) is estimated by the simple kriging estimator: I∗(u, zk)− E{I(u, zk)} = n∑ α=1 λα ( I(uα, zk)− E{I(uα, zk)} ) Estimating I∗(u, zk) for different cutoffs zk, k =1,. . . ,K, yields a discrete estimate of the conditional cumulative distribution function at the threshold values z1, . . . , zK . 39 The algorithm Indicator Kriging assumes that the marginal probabilitiesE{I(uα, zk)} are constant and known for all uα and zk (Simple Indicator Kriging ref?? Median IK Estimating a conditional cumulative distribution function at a given location u requires the knowledge of the variogram of each of the indicator variables I(., zk), k =1, . . . , K. Inference of these K variograms can be a daunting task. Moreover a kriging system has to be solved for each indicator variable. The inference and computational burden can be alleviated if two conditions are met: • the K indicator variables are intrinsically correlated ref(??): γZ(h) = γI(h, zk) = γI(h, zk, zk′) ∀zk, zk′ where γZ(h) is the variogram of variable Z(u) and γI(h, zk, zk′) is the cross- variogram between indicator variables I(., zk) and I(., zk′). All these variograms are standardized to a unit sill • All vectors of hard indicator data are complete: there are no missing values as could result from inequality constraints on the Z value. When those two conditions are met, it is only necessary to infer one variogram γZ(h) and only one kriging system has to be solved for all thresholds zk (indeed, the data locations and the variogram are the same for all thresholds). This simplification is called median indicator kriging. Ensuring the validity of the estimated cdf Since kriging is a non-convex estimator, the cdf values estimated by kriging Prob∗ ( Z(u) < zk ) , k = 1, . . . , K estimated by indicator kriging may not satisfy the properties of a cdf F , that is: ∀k ∈ {1, ..., K} F (zk) = Prob(Z(u) < zk) ∈ [0, 1] (5.2) ∀zk ≥ zk′ F (zk) ≥ F (zk′) (5.3) If properties 5.2 and 5.3 are not verified, the program Indicator Kriging modifies the kriging values so as to ensure the validity of the ccdf. The corrections are performed in 3 steps. 40 1. Upward correction: • Loop through all the cutoffs z1,. . . , zK , starting with the lowest cutoff z1. • At cutoff zk, if the estimated probability I∗(Z(u), zk) is not in [I∗(Z(u), zk−1), 1], reset it to the closest bound 2. Downward correction: • Loop through all the cutoffs z1,. . . , zK , starting with the highest cutoff zK . • At cutoff zk, if the estimated probability I∗(Z(u), zk) is not in [0, I∗(Z(u), zk+1)], reset it to the closest bound 3. The final corrected probability values are the average of the upward and downward corrections Categorical Variables Indicator Kriging can also be applied to categorical variables, i.e. variables that take a dis- crete, finite, number of values (also called classes or categories): z(u) ∈ {0, ..., K − 1}. The indicator variable for class k is then defined as: I(u, k) = { 1 if Z(u) = k 0 otherwise and the probability I∗(u, k) of Z(u) belonging to class k is estimated by simple kriging: I∗(u, k)− E{I(u, k)} = n∑ α=1 λα ( I(uα, k)− E{I(uα, k)} ) In the case of categorical variables, the estimated probabilities must all be in [0, 1] and verify: K∑ k=1 P ( Z(u = k) | (n) ) = 1 (5.4) If not, they are corrected as follows: 1. If I∗(Z(u), zk) /∈ [0, 1] reset it to the closest bound. If all the probability values are lesser or equal to 0, no correction is made and a warning is issued. 2. Standardize the values so that they sum-up to 1: I∗corrected(Z(u), zk) = I∗(Z(u), zk)∑K i=1 I ∗(Z(u), zi) 41 Parameters description 1. Hard Data Grid [Hard Data Grid]: The grid containing the hard data 2. Hard Data Property [Hard Data Property]: The name of the properties containing the hard data. If there are K indicators, K properties must be specified. Each property contains the indicator values I(u, k), k = 1,. . . ,K 3. Categorical Variable Flag [Categorical Variable Flag]: A flag indicating whether the variable to be kriged is categorical or continuous. If the flag is on (equal to 1), the variable is assumed categorical 4. Marginal Probabilities [Marginal Probabilities]: The marginal probabilities of each indicator. The probability values are separated by one or more space. The first probability corresponds to the probability of the first indicator and so on. 5. Max Conditioning Data [Max Conditioning Data]: The maximum number of con- ditioning data to be used for each kriging 6. Search Ellipsoid [Search Ellipsoid]: The ranges and angles defining the search ellipsoid 7. Median IK Flag [Median Ik Flag]: A flag indicating whether median IK should be performed 8. Full IK Flag [Full Ik Flag]: A flag indicating whether full IK should be performed. If Median Ik Flag is on/off, Full Ik Flag is off/on 9. Median IK Variogram [Variogram Median Ik]: The variogram of the indicator vari- ables used for median IK. IfMedian Ik Flag is off this parameter is ignored 10. Full IK Variogram [Variogram Full Ik]: The variograms of each indicator vari- able. IfMedian Ik Flag is on, this parameter is ignored 42 5.0.4 CoKriging The aim of cokriging is to estimate a variable Z1(u) accounting for data on related vari- ables Z2, ..., ZJ+1. The cokriging estimator is given by: Z∗1(u)−m1(u) = ∑ α λα [ Z1(uα)−m1(uα) ] + J+1∑ j=2 Nj∑ βj λβj [ Zj(uβj)−mj(uβj) ] (5.5) Algorithm CoKriging allows to distinguish between the two cases of simple and ordi- nary cokriging: • the means m1(u), ...,mJ+1(u) are known and constant: simple cokriging • the means are locally constant but unknown. The weights λα and λβj must then satisfy: ∑ α λα = 1 (5.6) Nj∑ βj λβj = 0 ∀j ∈ {1, ..., J} (5.7) Solving the cokriging system calls for the inference and modeling of multiple vari- ograms: the variogram of each variable and all cross-variograms between any two vari- ables. In order to alleviate the burden of modeling all these variograms, two models have been proposed: the Markov Model 1 and the Markov Model 2 ref(??) For text clarity, we will only consider the case of a single secondary variable (J = 1). Markov Model 1 The Markov Model 1 (MM1) considers the following Markov-type screening hypothesis: E ( Z2(u) | Z1(u), Z1(u+ h) ) = E ( Z2(u) | Z1(u) ) (5.8) i.e. the dependence of the secondary variable on the primary is limited to the co-located primary variable. The cross-variogram (or cross-covariance) is then proportional to the auto-variogram of the primary variable (??) C12(h) = C12(0) C11(0) C11(h) (5.9) 43 Where C12 is the covariance between Z1 and Z2 and C11 is the covariance of Z1. Solving the cokriging system under the MM1 model only calls for knowledge of C11, hence the inference and modeling effort is the same as for univariate kriging. Although very congenial the MM1 model should not be used when the support of the secondary variable Z2 is larger than the one of Z1, lest thevariance of Z1 would be underestimated. It is better to use the Markov Model 2 in that case. Markov Model 2 The Markov Model 2 (MM2) was developed for the case where the volume support of the secondary variable is larger than that of the primary variable ref(??) . This is often the case with remote sensing and seismic-related data. The Markov-type hypothesis is now: E ( Z1(u) | Z2(u), Z2(u+ h) ) = E ( Z1(u) | Z2(u) ) (5.10) i.e. the dependence of the primary variable on the secondary is limited to the co-located secondary variable. The cross-variogram is now proportional to the variogram of the secondary variable: C12(h) = C12(0) C11(0) C22(h) (5.11) In order for all three covariances C11, C12 and C22 to be consistent, ref(??) proposed to model C11 as a linear combination of C22 and any permissible residual covariance CR. Expressed in term of correlograms (C11(h) = C11(0)ρ11(h)), this is written: ρ11(h) = ρ 2 12 ρ22(h) + (1− ρ212) ρR(h) (5.12) Parameters description 1. Primary Hard Data Grid [Primary Harddata Grid]: The grid containing the pri- mary hard data 2. Primary Variable [Primary Variable]: The name of the property containing the primary hard data 3. Assign Hard Data [Assign Hard Data]: A flag specifying whether the hard data should be re-located on the simulation grid. See ?? 44 4. Secondary Hard Data [Secondary Harddata Grid]: The grid containing the sec- ondary hard data 5. Secondary Variable [Secondary Variable]: The name of the property containing the secondary hard data 6. Kriging Type [Kriging Type]: Possible types of cokriging include: Simple Kriging (SK) and Ordinary Kriging (OK). Note that selecting OK while retaining only a single secondary datum (e.g. when doing co-located cokriging) amounts to com- pletely ignoring the secondary information : from Eq. (5.7), the weight associated to the single secondary datum will be 0. 7. SK Means [SK Means]: The mean of the primary and secondary variables. If the selected kriging type is Ordinary Kriging, this parameter is ignored 8. Co-kriging Type [Cokriging Type]: The cokriging option. Available options are: Full Cokriging, Co-located Cokriging with Markov Model 1 (MM1), and Co-located Cokriging with Markov Model 2 (MM2) 9. Max Conditioning Data [Max Conditioning Data]: The maximum number of pri- mary conditioning data to be used for each cokriging 10. Search Ellipsoid - Primary Variable [Search Ellipsoid 1]: The ranges and angles defining the search ellipsoid used to find the primary conditioning data 11. Primary Variable Variogram [Variogram C11]: The variogram model for the pri- mary variable 12. Max Conditioning Data [Max Conditioning Data 2]: The maximum number of secondary conditioning data used for each cokriging. This parameter is only re- quired if Cokriging Type is set to Full Cokriging 13. Search Ellipsoid - Secondary Variable [Search Ellipsoid 2]: The ranges and an- gles defining the search ellipsoid used to find the secondary conditioning data. This parameter is only required if Cokriging Type is set to Full Cokriging 45 14. Cross-Variogram [Variogram C12]: The cross-variogram between the primary and secondary variable. This parameter is only required if Cokriging Type is set to Full Cokriging 15. Secondary Variable Variogram [Variogram C22]: The variogram of the secondary variable. This parameter is only required if Cokriging Type is set to Full Cokrig- ing 16. Coefficient of correlation [Correl Z1Z2]: The coefficient of correlation between the primary and secondary variable. This parameter is only required if Cokriging Type is set to MM1 17. Coefficient of correlation [MM2 Correl Z1Z2]: The coefficient of correlation be- tween the primary and secondary variable. This parameter is only required if Cokriging Type is set to MM2 18. Secondary Variable Variogram [MM2 Variogram C22]: The variogram of the sec- ondary variable. This parameter is only required if Cokriging Type is set to MM2. Note that the variogram of the secondary variable and the variogram of the primary variable must verify the condition (5.12) 46 Chapter 6 Simulation Algorithms This chapter presents the collection of stochastic simulation algorithms provided by SGeMS. Section 6.1 presents the traditional variogram based (two-points) algorithms, such as SGSIM (Sequential Gaussian Simulation), SISIM (Sequential Indicator Simulation), COS- GSIM (Sequential Gaussian Co-simulation), COSISIM (Sequential Indicator co-imulation). SGSIM and COSGSIM are the choices for most applications with continuous variables, while SISIM and COSISMI would be choice for categorical variables. Section 6.2 gives detailed descriptions of the two multiple-point simulation (mps) al- gorithms: SNESIM (Single Normal Equation Simulation) and FILTERSIM (Filter-based Simulation). SNESIM only works for categorical variables as involved in facies distribu- tions, while FILTERSIM is suited to both continuous and categorical variables. Each simulation algorithm presented in this chapter is demonstrated with some exam- ple runs. 6.1 Variogram-based Simulations This section covers the variogram-based sequential simulation algorithms implemented in SGeMS. The stochastic realizations from any of these algorithms draw their spatial pat- terns from input model variograms. Variograms-based algorithms are to be used when the critical patterns are reasonably amorphous (high entropy) and can be represented by bi- variate statistics. Cases when variograms fail to reproduce critical spatial patterns should lead to using the multiple-point geostatistics algorithms described in the next section. Variogram-based sequential simulations are the most popular stochastic simulation 47 algorithms mostly due to their robustness and ease of conditioning, both for hard and soft data. Moreover, they do not require rasterized (regular or cartesian ) grid and allow simulation on irregular grids such as point sets. SGeMS takes full advantage of this flexibility; all the simulation algorithms in this section work both on point sets and on cartesian grids. Moreover, the conditioning data may or may not be on the simulation grid. However, working on point set induces a performance penalty as the search for neighboring data is significantly more costly than on a regular (cartesian) grid. When the simulation grid is cartesian, all algorithms have the option to relocate the conditioning data to the nearest grid node for increased execution speed. The re-allocation strategy is to move each datum to the closest grid node. In case two data share the same closest grid node, the farthest one is ignored. This section presents first the simulation algorithms requiring some Gaussian as- sumption: the sequential Gaussian simulation SGSIM and the sequential Gaussian co- simulation COSGSIM for integration of secondary information through a coregionaliza- tion model. Next, the direct sequential simulation DSSIM is presented, DSSIM does not require any Gaussian assumption but may not reproduce the target distribution. Finally the implementation of the indicator simulation algorithms SISIM and COSISIM are dis- cussed. These last two algorithms rely on the decomposition of the cumulative distribution function by a set of thresholds. At any location the probability of exceeding each thresh- old is estimated by kriging and these propabilities are combined to construct the local conditional cumulative distribution function (ccdf) from which simulation is performed. 6.1.1 SGSIM: sequential Gaussian simulation Let Y (u) be a multivariate Gaussian random function with zero mean, unit variance, and a given variogram model γ(h). Realizations of Y (u), conditioned to data (n) can be generated by Algorithm 6.1. A non Gaussian random function Z(u) must first be transformed into Gaussian ran- dom function Y (u); Z(u)7→ Y (u). If no analytical model are available, a normal score transform may be applied. The Algorithm 6.1 then becomes Algorithm 6.2 Using the normal score transform involves more than just transforming and back trans- forming the data at simulation time. First, the algorithm calls for the variogram of the normal score not of the original data. Only that normal score variogram is guaranteed 48 Algorithm 6.1 Sequential Gaussian Simulation 1: Define a random path visiting each node of the grid 2: for Each node u along the path do 3: Get the conditioning data consisting of both neighboring original data (n) and pre- viously simulated values 4: Estimate the local conditional cdf as a Gaussian distribution with mean given by a form of kriging and its variance by the kriging variance 5: Draw a value from that Gaussian ccdf and add the simulated value to the data set 6: end for 7: Repeat for another realization Algorithm 6.2 Sequential Gaussian Simulation with normal score transform 1: Transform the data into normal score space. Z(u) 7→ Y (u) 2: Perform Algorithm 6.1 3: Back transform the Gaussian simulated field into the data space. Y (u) 7→ Z(u) to be reproduced, not the original Z-value variogram. However, in many cases the back transform does not affect adversely the reproduction of the Z-value variogram model. If required, the SGeMS implementation of SGSIM can automatically perform the normal score transform of the original hard data and back-transform the simulated realizations. The user must still perform independently the normal score transform of the hard data with the TRANS, see Algorithm 3.1, to calculate and model the normal score variogram. At each data location the algorithm can either solve a simple kriging, ordinary krig- ing, kriging with local mean ot kriging with a trend system. Theory only guarantees reproduction of the variogram with the simple kriging system. SGSIM with local varying mean In most cases, the local varying mean zm(u) = E{Z(u)} is given as a Z-value and must be converted into the Gaussian space, such that ym(u) = E{Y (u)}. Transforming zm(u) into ym(u) using the FZ and the rank preserving transform ym(u) = F −1 Y ( FZ ( zm(u) )) would not, in all generality, ensure that ym(u) = E{Y (u)}. A better alternative, when possible, is to get directly ym(u) by direct calibration of the secondary information with 49 the normal score transform y of the primary attribute z. A note on Gaussian spatial law Gaussian random functions have very specific and consequential spatial structures and distribution law. Median values are maximally correlated while extreme values are in- creasingly less correlated. That property is known as destructuration. If the phenomenon under study is known to have well correlated extreme values, a Gaussian-related simula- tion algorithm is not appropriate. Example: Figure 6.6 shows two conditional realizations on a point set using simple kriging and normal score transform of the hard data. Parameters description The SGSIM algorithm is activated from Simulation—sgsim in algorithm panel. The main SGSIM interface contains 3 pages: General, Data and Variogram (see Figure 6.1). The text inside ‘[ ]’ is the corresponding keyword in the SGSIM parameter file. 1. Simulation Grid Name [Grid Name]: Name of the simulation grid. 2. Property Name Prefix [Property Name]: Prefix for the simulation output. The suffix real# is added for each realization. 3. # of realizations [Nb Realizations]: Number of simulations to generate. 4. Seed [Seed]: Seed for the random number generator. 5. Kriging Type [Kriging Type]: Select the type of kriging system to be solved at each node along the random path. The simple kriging (SK) mean is set to zero. 6. Hard Data—Object [Hard Data.grid]: Name of the grid containing the conditioning data. If no grid is selected, the realizations are unconditional. 7. Hard Data—Property [Hard Data.property]: Property for the conditioning data. Only required if a grid has been selected in Hard Data—Object[Hard Data.grid] 50 8. Assign hard data to simulation grid [Assign Hard Data]: If selected, the hard data are copied on the simulation grid. The program does not proceed if the copying fails. This option significantly increases execution speed. 9. Max Conditionning data [Max Conditioning Data]: Maximum number of data to be retained in the search neighborhood. 10. Search Ellipsoid Geometry [Search Ellipsoid]: Parametrization of the search el- lipsoid. 11. Target Histogram: If used, the data are normal score transformed prior to simulation and the simulated field is transformed back to the original space. Use Target Histogram [Use Target Histogram] flag to use the normal score trans- form. [nonParamCdf] parametrize the target histogram (see section 3.1) 12. Variogram [Variogram]: Parametrization of the normal score variogram. 6.1.2 COSGSIM: Sequential Gaussian co-simulation COSGSIM allows to simulate a Gaussian variable accounting for secondary information. Let Y1(u) and Y2(u) be two correlated multiGaussian random variables. Y1(u) is called the primary variable, and Y2(u) the secondary variable. The COSGSIM simulation of the primary variable Y1 conditioned to both primary and secondary data is described in Algorithm 6.3. Algorithm 6.3 Sequential Gaussian Cosimulation 1: Define a path visiting each node u of the grid 2: for At each node u do 3: Get the conditioning data consisting of neighboring original data, previously sim- ulated values and secondary data 4: Get the local Gaussian ccdf for the primary attribute, its mean is estimated by a cokriging estimate and its variance by the cokriging variance. 5: Draw a value from the that Gaussian ccdf and add it to the data set 6: end for If the primary and secondary variable are not Gaussian, make sure that the trans- formed variables Y1 and Y2 are at least bigaussian. If they are not, another simulation 51 algorithm should be considered, COSISIM for example (see 6.1.5). If no analytical model are available for such transformation, a normal score transform may be applied to both variables. The TRANS algorithm only ensures that the respective marginals are Gaussian. The algorithm 6.3 then becomes Algorithm 6.4 Algorithm 6.4 Sequential Gaussian Cosimulation for non-Gaussian variable 1: Transform Z1 and Z2 into Gaussian variables Y1 and Y2, according to Eq. (3.5). 2: Perform Algorithm Algorithm 6.3. 3: “Back-transform” the simulated values y1,1, ..., y1,N into z1,1, ..., z1,N : Example: Figure 6.11 shows two conditional realizations with a MM1-type of coregionalization and simple cokriging. Both the primary and secondary information are normal score transformed. Parameters description The COSGSIM algorithm is activated from Simulation—cosgsim in the algorithm panel. The COSGSIM interface contains 5 pages: General, Primary Data, Secondary Data, Pri- mary Variogram and Secondary Variogram (see Figure 6.2). The text inside ‘[ ]’ is the corresponding keyword in the COSGSIM parameter file. 1. Simulation Grid Name [Grid Name]: Name of the simulation grid. 2. Property Name Prefix [Property Name]: Prefix for the simulation output. The suffix real# is added for each realization. 3. # of realizations [Nb Realizations]: Number of simulations to generate. 4. Seed [Seed]: Seed for the random number generator. 5. Kriging Type [Kriging Type]: Select the type of kriging system to be solved at each node along the random path. 6. Cokriging Option [Cokriging Type]: Select the type of coregionalization model: LMC, MM1 or MM2 52 7. Primary Hard Data Grid [Primary Harddata Grid]: Selection of the grid for the primary variable. If no grid is selected, the realizations are unconditional. 8. Primary Property [Primary Variable]: Selection of the hard data property for the primary variable.9. Assign hard data to simulation grid [Assign Hard Data]: If selected, the hard data are copied on the simulation grid. The program does not proceed if the copying fails. This option significantly increases execution speed. 10. Primary Max Conditionning data [Max Conditioning Data 1]: Maximum num- ber of primary data to be retained in the search neighborhood. 11. Primary Search Ellipsoid Geometry [Search Ellipsoid 1]: Parametrization of the search ellipsoid for the primary variable. 12. Target Histogram: If used, the primary data are normal score transformed prior to the simulation and the simulated field is transformed back to the original space. Transform Primary Variable [Transform Primary Variable] flag to use the nor- mal score transform. [nonParamCdf primary] parametrize the primary variable tar- get histogram (see section 3.1). 13. Secondary Data Grid [Secondary Harddata Grid]: Selection of the grid for the secondary variable. 14. Secondary Property [Secondary Variable]: Selection of the data property for the secondary variable. 15. Secondary Max Conditionning data [Max Conditioning Data 2]: Maximum num- ber of secondary data to be retained in the search neighborhood. 16. Secondary Search Ellipsoid Geometry [Search Ellipsoid 2]: Parametrization of the search ellipsoid for the secondary variable. 17. Target Histogram: If used, the primary data are normal score transformed prior to the simulation and the simulated field is transformed back to the original space. Transform Secondary Variable [Transform Secondary Variable] flag to use the normal score transform. [nonParamCdf secondary] parametrizes the secondary variable target histogram (see section 3.1). 53 18. Variogram for primary variable [Variogram C11]: Parametrization of the normal score variogram for the primary variable. 19. Cross-variogram between primary and secondary variables [Variogram C12]: Parametrization of the cross-variogram for the normal score primary and secondary variables. Required if Cokriging Option [Cokriging Type] is set to Full Cokrig- ing 20. Variogram for secondary variable [Variogram C22]: Parametrization of the nor- mal score variogram for the secondary variable. Required if Cokriging Option [Cokriging Type] is set to Full Cokriging 21. Coef. Correlation Z1,Z2: Coefficient of correlation between the primary and sec- ondary variable. Only required if Cokriging Option [Cokriging Type] is set to MM1 or MM2. The correlation keyword is [Correl Z1Z2] for the MM1 coregion- alization, and [MM2 Correl Z1Z2] for the MM2 coregionalization. 22. Covariance for secondary variable [MM2 Variogram C22]: Parametrization of the normal score variogram for the secondary vaiable. Required if Cokriging Option [Cokriging Type] is set to MM2 6.1.3 DSSIM: Direct Sequential Simulation The direct sequential simulation algorithm DSSIM performs simulation of continuous attributes without prior indicator coding or Gaussian transform. It can be shown that the only condition for the model variogram to be reproduced is that the ccdf has for mean and variance the simple kriging estimate and variance. The shape of the ccdf does not matter, it may not even be the same for each node. The drawback is that there is no guarantee that the marginal distribution is reproduced. One solution is to post-process the simulated realizations with a rank transform al- gorithm to identify the target histogram, see algorithm TRANS in section 3.1. This may affect variogram reproduction. The second alternative is to set the shape of the local ccdf, at all locations along the path, such that the marginal distribution is approximated at the end of each realization. SGeMS offers two potential distributions type for the first alternative; the ccdf is either a uniform distribution or a lognormal distribution. Neither of these distributions would 54 produce realizations that have either uniform or lognormal marginal distributions, thus some rank transformation may be required to identify the target marginal histogram. For the second alternative, the method proposed by Soares (2001) is implemented. The ccdf is sampled from the data marginal distribution, modified to be centered on the kriging estimate with spread equal to the kriging variance. The resulting shape of each lo- cal ccdf differs from location to location. The method gives reasonable results for a target symmetric (even multi-modal) distribution, but less so for a highly skewed distribution; in this latter case the first option of a log-normal ccdf type followed by a rank tranform may give better results. Algorithm 6.5 Direct Sequential Simulation 1: Define a random path visiting each node u of the grid 2: for Each location u along the path do 3: Get the conditioning data consisting of both neighboring original data and previ- ously simulated values 4: Define the local ccdf with its mean and variance given by the kriging estimate and variance. 5: Draw a value from that ccdf and add the simulated value to the data set 6: end for Example: Figure 6.8 shows two conditional realizations on a point set using the Soares method and simple kriging. A local varying mean with the Soares method is used in Figures 6.9. Parameters description The DSSIM algorithm is activated from Simulation—dssim in the upper part of algorithm panel. The main DSSIM interface contains 3 pages: General, Data and Variogram (see Figure 6.3). The text inside ‘[ ]’ is the corresponding keyword in the DSSIM parameter file. 1. Simulation Grid Name [Grid Name]: Name of the simulation grid. 2. Property Name Prefix [Property Name]: Prefix for the simulation output. The suffix real# is added for each realization. 55 3. # of realizations [Nb Realizations]: Number of simulations to generate. 4. Seed [Seed]: Seed for the random number generator. 5. Kriging Type [Kriging Type]: Select the form of kriging system to be solved at each node along the random path. 6. SK mean [SK mean]: Mean of the attribute. Only required if Kriging Type[Kriging Type] is set to Simple Kriging(SK). 7. Hard Data—Object [Hard Data.grid]: Name of the grid containing the conditioning data. If no grid is selected, the realizations are unconditional. 8. Hard Data—Property [Hard Data.property]: Property for the conditioning data. Only required if a grid has been selected in Hard Data—Object[Hard Data.grid] 9. Assign hard data to simulation grid [Assign Hard Data]: If selected, the hard data are copied on the simulation grid. The program does not proceed if the copying fails. This option significantly increases execution speed. 10. Max Conditionning data [Max Conditioning Data]: Maximum number of data to be retained in the search neighborhood. 11. Search Ellipsoid Geometry [Search Ellipsoid]: Parametrization of the search el- lipsoid. 12. Distribution type [cdf type]: Select the type of cdf to be build at each location along the random path 13. LogNormal parameters: Only activated if Distribution type[cdf type] is set to LogNormal. Parametrization of the global lognormal distribution, the mean is specified by Mean[LN mean] and the variance by Variance[LN variance]. 14. Uniform parameters: Only activated if Distribution type[cdf type] is set to Uniform. Parametrization of the global Uniform distribution, the minimum is specified by Min[U min] and the maximum by Max[U max]. 15. Soares Distribution [nonParamCdf]: Only activated if Distribution type[cdf type] is set to Soares. Parametrization of the global distribution from which the lo- cal distribution is sampled (see section 3.1). 56 16. Variogram [Variogram]: Parametrization of the variogram. For this algorithm, the sill of the variogram is a critical information and should not be standardized to 1. 6.1.4 SISIM: sequential indicator simulation Sequential indicator simulation SISIM combines the indicator formalism
Compartilhar