Home » Articles » Հոդվածներ

Simulation

Simulation

Simulation in IBM® SPSS® Statistics refers to simulating input data to predictive models using the Monte Carlo method and evaluating the model based on the simulated data. The distribution of predicted target values can then be used to evaluate the likelihood of various outcomes.

Each input in the predictive model is specified as simulated or fixed. Values of simulated inputs are generated by drawing from a specified probability distribution, allowing you to account for uncertainty in the values of those inputs. Fixed inputs are those whose values are known and held fixed at the known values.

The predictive model is evaluated using the generated values of the simulated inputs and the specified values of the fixed inputs to calculate the target (or targets) of the model. The process is repeated many times (typically tens of thousands or hundreds of thousands of times), resulting in a distribution of target values. Each repetition of the process generates a separate data record that consists of the values of the inputs and the predicted target (or targets) of the model.

Two interfaces are available for working with simulations:

  • Simulation Builder. This is an advanced interface for users who are designing and running simulations. It provides the full set of capabilities for designing a simulation, saving the specifications to a simulation plan file, specifying output and running the simulation. You can build a simulation based on an IBM SPSS model file, or on a set of custom equations that you define in the Simulation Builder. The case study on modeling treatment costs for diabetes uses the Simulation Builder.
  • Run Simulation dialog. The Run Simulation dialog is designed for users who have a simulation plan and primarily want to run the simulation. It allows you to modify settings that enable you to run the simulation under different conditions, but does not provide the full capabilities of the Simulation Builder for designing simulations. The case study on risk in a net present value calculation uses the Run Simulation dialog to do sensitivity analysis.

Running the simulation

  1. Open the data file diabetes_costs.sav.
  2. From the menus choose:

    Analyze > Simulation...

    Figure 1. Simulation: Model Source dialog

    Simulation: Model Source dialog

  3. Select Select SPSS Model File and click Continue.
  4. In the Select SPSS Model File dialog, browse to the Samples directory (under the installation directory) and open the file diabetes_costs.xml.Figure 2. Simulation Builder Simulated Fields

    Simulation Builder Simulated Fields

    The Simulated Fields panel lists all of the fields that are inputs in the predictive model. The model used in this example contains the following fields: age is the age of the individual covered by the policy; glucose is the individual's average glycated hemoglobin level, which indicates the blood glucose level over prolonged periods; income is the household income of the individual.

    To run a simulation, you must specify a distribution for each input in the model, or specify the input as fixed and provide the fixed value. In this example, all model inputs will be simulated, so a distribution must be specified for each of the three inputs. When the data used to build the model are available, as in this example, you can automatically find the distribution that most closely fits the data for each of the inputs.

  5. Click Fit All to automatically fit distributions to the data.Figure 3. Simulation Builder Simulated Fields

    Simulation Builder Simulated Fields

    The results are shown in the Distribution column. The name of the distribution that most closely fits the data for each input is displayed, along with the distribution parameters. For example, the data for glucose are most closely fit by a lognormal distribution with parameters a=7.55 and b=0.19. The chart adjacent to the distribution specifications shows the distribution function superimposed on a histogram of the data for that input from the active dataset.

    You can examine all distributions that were considered when fitting a particular input, along with the goodness of fit statistics for each fitted distribution.

  6. Click the row for age in the Simulated Fields grid and then click Fit Details.Figure 4. Fit Details dialog

    Fit Details dialog

    For continuous inputs, such as age, the Anderson-Darling test is used by default to find the distribution that most closely fits the data. The distribution with the smallest value of the Anderson-Darling statistic is the one that provides the closest fit to the data. The Anderson-Darling statistic is given by the value of A in the Fit Statistics column and has the value 1.29 for the triangular distribution. Values for the alternative Kolmogorov-Smirnov statistic are denoted by K in the Fit Statistics column. In this example, the values for the Kolmogorov-Smirnov statistic are displayed but are not used to rank the distributions. A setting on the Advanced Options panel allows you to choose which test statistic will be used to rank the distributions.

    When examining fit statistics for distributions, larger p-values indicate higher statistical significance. As a general rule, p-values less than 0.05 indicate that the distribution may not provide a close fit to the data. This is true for both the Anderson-Darling statistic as well as the Kolmogorov-Smirnov statistic. Notice that in this example, the p-value (given by the value of P in the Fit Statistics column) is missing for the triangular distribution. This is due to the fact that the p-value is not available for the triangular distribution (it is also not available for the beta distribution). From visual inspection, however, the triangular distribution does appear to provide the closest fit to the data.

  7. Click Cancel.

    When data are simulated for glucose from the associated lognormal distribution, arbitrarily small positive values may be generated because the range of the lognormal distribution includes all positive values (the value 0 is excluded). However, arbitrarily small positive values for the glycated hemoglobin level do not occur in practice. In cases such as this where the range of the distribution is beyond the range of reasonable values for an input, you can specify the minimum or maximum value (or both) that will be simulated. When data are simulated for such an input, the associated distribution is sampled until a value within the specified range is obtained.

    Glycated hemoglobin levels for people with diabetes are typically found in the range from 5 to 14.

  8. Enter 5 in the Min field for glucose and enter 14 in the associated Max field.

    For this example, we will only consider the income range from $20,000 to $100,000.

  9. Enter 20000 in the Min field for income and enter 100000 in the associated Max field.

    The triangular distribution for age specifies a minimum value of 13 and a maximum value of 65, which are the minimum and maximum values for the variable age in the historical dataset diabetes_costs.sav. Given that adults over the age of 65 will typically no longer be policy holders, this is a reasonable maximum to use for the simulation. Likewise, given the increased occurrence of type II diabetes in teenagers, the minimum value of 13 will ensure that this age group will be represented in the simulation.

  10. Click Correlations in the Select an Item list on the Simulation tab.Figure 5. Simulation Builder Correlations

    Simulation Builder Correlations

    The Correlations panel displays the Pearson correlations between the simulated inputs, as calculated from the variables associated with those inputs in the active dataset. Known correlations between simulated inputs are preserved when simulating data for those inputs.

    Correlations are calculated when you click Fit All or Fit on the Simulated Fields panel. You can modify any of the correlation values by typing a value in the associated cell of the Correlations table. For this example, we will accept the correlations as calculated from the data.

  11. Click Output in the Select an Item list on the Simulation tab.Figure 6. Simulation Builder Output

    Simulation Builder Output

  12. In the Display Formats grid, change the formats for cost and income to Dollar and change the associated number of decimals to 0. Also, set the number of decimals to 0 for age.
  13. Click Save in the Select an Item list on the Simulation tab.Figure 7. Simulation Builder Save

    Simulation Builder Save

    The Save the plan file for this simulation box is checked, indicating that the specifications for the simulation will be saved to a simulation plan file. You can open a simulation plan file in the Simulation Builder or the Run Simulation dialog, optionally make modifications and re-run the simulation without having to re-enter all of the specifications. You can also share the simulation plan file with other users, who can then run the simulation.

  14. Click Browse to navigate to where you want to save the simulation plan file and enter a name for the file.
  15. Click Run.
Category: Հոդվածներ | Added by: Vahik (2017-08-07)
Views: 321 | Rating: 0.0/0
Total comments: 0
avatar