# Build Code

This document provides a comprehensive reference for understanding and configuring builds for the PetroAI pipeline. Suggested values and ranges are also included.

# MySQL Data Tables

GridAttributeData
GridAttributeHeader
GridStructureData
GridStructureHeader
InventoryWells
MicroseismicEvent
MonthlyProduction
ReservoirStressWellLogRecord
StressOrientationMeasure
Well
WellDirectionalSurveyPoint
WellExtra
WellLookup

These data tables are the sources for any input data used in the PetroAI pipeline.

# Products

products:
  - core1
  - core2
  - core3
  - raw
  - diag
  - grid
  - inv

Defines the set of modules enabled for outputs to be generated 'products'.

core1 - PDP forecast parameters and volumes. Linear regression modeling is used to fit historical production for forecasts.
core2 - PDP feature attribution and analysis. Geologic grids are sampled to each well and feature correlations are calculated to suppot training feature optimization.
core3 - Predictions for oil and gas production based on chosen input features. Regression Tree modeling quantifies the impact of each feature and uses analogous groupings to predict oil and gas production.
raw - Pipeline for how wells are grouped for the decline curve analysis.
diag - Diagnostic data for quantifying model quality and reliability. Quantifies model predictions against actuals and quantifies feature importance and impact using shapley analysis.
grid - Predictive forecasts of undeveloped wells using a generic grid. Multiple scenarios may be defined for differing engineering designs and geologic attribution is based on provided input grids.
inv - Discrete predictive forecasts of undeveloped wells loaded into the InventoryWells table. Explicitly defined well locations, designs, and timing are ingested to generate forecasts based on the given inputs and features within the model.

# Phases Configuration

# Shared Configurations

Suggested defaults provided below:

phases:
  shared:
    midas_project_options:
      batch_size_drain: 3
      batch_size_features: 100
      batch_size_frac: 10
      batch_size_plot: 3
      batch_num_tiles_earth: 1
      buffer_ft: 4000
      crs_proj4: +proj=utm +zone=13 +datum=NAD27 +units=m +no_defs
      generate_stage_method: fixed
      fixed_stage_count: 1
      frac_height_ft: 3000
      frac_width_ft: 3000
      frac_value_high: 1
      frac_value_low: 0.05
      max_drainage_distance_ft: 1500
      max_frac_distance_ft: 1500
      max_frac_horiz_distance_ft: 1500
      max_frac_vertical_distance_ft: 1500
      max_parallelism: 10
      min_lateral_overlap_ft: 1000
      number_threads_drain: 1
      number_threads_earth: 10
      number_threads_feature: 5
      number_threads_frac: 10
      number_threads_plot: 5
      sibling_days: 180
      stage_length_ft: 1000
      use_existing_stages: false
      use_frac_penny: true
      use_ortho_stress: false
      run_drainage: true
      create_vertical_well_segments: false
      max_offset_direction_difference_deg: 180

Defaults applied globally unless overridden in sub-phases.

# Midas Project

Midas project is the named module for the earth model pipeline.

crs_proj4 - Explicitly define the coordinate reference system to be used for any geo-spatial analysis
generate_stage_method - How the horizontal section is discretized, either using a fixed number of segments (e.g. 3), or using pre-loaded stage depths (unusual). Should normally be fixed.
fixed_stage_count - Will use the specified number of stages to segment the lateral length for spacing, landing, and drainage calculations.

Drainage vs Fracture: Drainage indicates the total volume from which a well may be producing, including matrix contribution. Fracture indicates a maximum theoretical distance where two wells may be in communication.

frac_height_ft - The total height for any modeled fracture growth. Suggested ranges vary depending on basin, faults, and depositional geomechanics [100, 3000]. This setting is only applied when use_frac_penny is enabled.
frac_width_ft - The total width for any modeled fracture growth. Suggested ranges vary depending on basin, faults, and depositional geomechanics [100, 5000]. This setting is only applied when use_frac_penny is enabled.
frac_value_high and frac_value_low - control the amount of co-stimulation that is allowed. Setting the low value closer to 1 reduces the amount of co-stimulation.
max_drainage_distance_ft - The maximum distance for any modeled stimulated rock volume drainage. Suggested ranges vary depending on basin, well designs, and depositional geomechanics [100, 3000].
max_frac_distance_ft - The maximum distance in any directional vector for any fracture dimension. Suggested ranges vary depending on basin, well designs, and depositional geomechanics [100, 5000].
max_frac_horiz_distance_ft - The maximum horizontal distance a fracture will be modeled. Suggested ranges vary depending on basin, well designs, and depositional geomechanics [100, 3000].
max_frac_vertical_distance_ft - The maximum vertical distance a facture will be modeled. Suggested ranges vary depending on basin, well designs, and depositional geomechanics [100, 3000].
sibling_days - The threshold number of days an exiting well to be producing before the subject well to either be considered co-developed or a parent-infill relation. For example, if sibling_days = 180 and if Well A has been producing for 200 days before Well B, then Well A will be a parent and Well B will be an infill well. If Well has has been producing for 120 days before Well B, then both Well A and Well B are co-developed. Suggested ranges vary depending on basin operational strategies [60, 270].
stage_length_ft - Sets the dimension for the length of fracturing stages to be used in geomechanical modeling. Does not apply when generate_stage_method = fixed.
use_existing_stages - If stage lengths are provided in input data then setting to true will use the provided data instead of the above default in stage_length_ft. Does not apply when generate_stage_method = fixed. Should usually be set to false.
use_frac_penny - When enabled it will model 3D fractures as an ellipse with the provided dimensions. If false will use the earth model set in the geo phase with geo: frac_geometry_model_file.
use_ortho_stress - When enabled, spacing and drainage calculations will be performed orthogonal to the lateral of the wellbore as opposed to along the orientation of the maximum horizontal stress (which is the default behavior).
run_drainage - KEY FEATURE. When enabled it will use PetroAI's proprietary geomechanical earth model to derive each well's total drainage area. This is a fundamental feature for capturing the effects of well spacing and parent-infill interactions.
create_vertical_well_segments - When enabled, will process both vertical and horizontal wells to calculate penetrations depths through the structure grids.
max_offset_direction_difference_deg - Sets the threshold angle for offset well orientation to be considered a neighbor when performing geometric spacing calculations.

# Geo Phase

phases:
  geo:
    frac_geometry_model_file: apollo_1.json

frac_geometry_model_file - Specifies the JSON configuration file that contains the geomechanical model to be used in the build.

# Well Phase

phases:
  well:
    interval_alias_mapping:
      WOLFCAMP_A : WCA
      WOLFCAMP_B : WCB
      ...
    skip_well_extras: false

Set configurations specific for well data and features.

interval_alias_mapping - Each row provides an alias for any named values in interval.
skip_well_extras - If enabled, then the WellExtra table will not be ingested

# Features Phase

# Forecasting

phases:
  features:
    forecasting:
      wells_per_scenario: 750
      ignore_wells_without_forecast_summary: true
      early_life_well_forecast_options:
        max_radius_in_meters: 8000.0
      forecast_options:
        arps:
          qi: [1.1, 1.3]
          b: [0.5, 1.2] 
          de: [0.5, 0.99]
          dmin: [0.06, 0.06]
        normalize:
          startMode: peakRate
          peakFluid: auto
          gorThreshold: 10
          eol:
            enabled: false
        forecast:
          years: 40
          frequency: monthly
          minProductionCount: 3

Defines how forecast scenarios are configured, including Arps parameters and normalization rules.

wells_per_scenario - Sets the number of wells to group in a batch for forecasting for efficiency and grouping analogs.
ignore_wells_without_forecast_summary - If enabled, will skip any wells that did not pass an earlier pipeline processing production data for forecasting.

# Early Life Well Forecast Options

max_radius_in_meters - When a well has been producing for fewer months than specified in minProductionCount, this parameter sets the radius of investigation for referencing analogous wells for setting forecast parameters.

# Forecast Options

# Arps - Multi-segment Hyperbolic Parameterization

Reference: https://www.phdwin.com/wp-content/uploads/2017/05/About-Arps-Equations.pdf

qi - The ranges of initial starting rates for best fitting a curve to production data. Units are ratios to peak production. Suggested ranges: [0.8, 1.3].
b - The ranges for fitting the b-factor in the hyperbolic segment. Units are a dimensionless slope of log-rate and log-time unit slopes. Suggested ranges: [0.5, 1.2].
de - The ranges for fitting the secant effective instantaneous decline rate. Units are a fraction that must be less than 1. Suggested ranges: [0.5, 0.99].
dmin - The ranges for appending the exponential decline segment of a forecast. Units are a fraction that must be less than 1. Suggested ranges: [0.05, 0.12]. Set upper and lower bound to same number to assert a fixed dmin.

# Normalization of production data for fitting

startMode - Setting the starting point for fitting the production profile. Recommendation is to use peakRate for optimized curve fitting and to apply "ramp up" segments after forecast generation. Can be: start, peakRate, manual, or localPeakRate.
peakFluid - Explicitly define or allow the system to determin the best fluid for setting the starting point in the time-series for fitting forecast parameters. Recommendation is to use auto. Can be: oil, gas, or auto.
gorThreshold - Set a maximum limit for the gas-oil-ratio while fitting the different stream parameters. Units are mcf/bbl. Suggested ranges: [6, 20]
eol: - "End of life"
- enabled - When enabled with true will fit the forecast to only the last X years of production data when the well has been producing for Y years.
- yearsOn - If eol is enabled, it only applies to wells that have been producing for 'x' or more years, defined by this input
- yearsEnd - If eol is enabled, the decline curve will be fit through the last 'x' number of years, defined by this input

# Forecast generation settings

years - Sets the total number of years for the well forecast which limits the EUR and generates production volumes.
frequency - Sets the segments for generating forecast volumes. Options: yearly, monthly, and daily. Warning: daily will significantly increase compute time.
minProductionCount - Sets the minimum number of production records required to generate a forecast.

# Feature Building

phases:
  features:
    feature_building:
      num_processing_jobs: 60
      make_plots: false

Controls parallelization and optional plotting of feature sets.

num_processing_jobs - Defines how many parallel compute machines will be utilized. It is recommended to start with 4.
make_plots - If enabled generates gunbarrel images for evaluating and visualizing well spacing and drainage. Warning: if enabled will increase compute time significantly.

# Model Phase

For more details on the phases > model configuration and best practices, visit Model Configuration.

phases:
  model:
    model_configs:
      bundle1:
        training_filter: ...
        evaluation_filter: ...
        model_features:
          - lateralLength
          - totalDrainage
          ...
      bundle2:

Model bundles may be explicitly defined for different subset grouping of wells. Different feature sets may be used between different model bundles; however, it is recommended to maintain the same features across model bundles for reliable and meaningful interpretation and comparisons. Each model may be named by the user.

# Training Filter

The training_filter configuration passes a list of filters to select a subset of the data for training the machine learning models. It is recommended to use these filters to remove any erroneous data from the model. The structure is a long string that names variables and uses logic operands to define dictionary lists or values for filtering. The named variables should be fields from the CORE_well_feature tables.

For example:

model:
  model_configs:
    bundle1:
      training_filter: "interval in ['zone1', 'zone2', 'zone3'] & completionYear >=2010 & lateralLength > 3000"

# Evaluation Filter

The evaluation_filter should be similar as all training_filter operands except the production data filtering. A variation example may be to remove the min production filter so that predictions are made for early-life wells.

# Model Features

Provide a list of features from the CORE_well_features table to be used for the model training.

# Product Phase

# Raw

product:
  raw:
    dca_fs_batch_size: 100
    prod_batch_size: 250

Parameters to optimize downloading and publish PDP & DCA time series data.

# Core

product:
  core:
    forecast:
      dca_included_well_fields_to_types:
        wellId: "str"
        ...
    model:
      num_years_to_forecast: 40

Specifies included fields for DCA and how long PDP models forecast into the future.

# Inventory (inv)

inv:
  include_pdps: true
  max_nearby_pdp_distance_miles: 3
  valid_pdps_group_name : ""
  num_years_to_forecast: 40
  include_timeseries_monthly: true
  make_plots: true
  inventory_options:
      crs_proj4: "+proj=utm +zone=13 +datum=NAD27 +units=m +no_defs"
  midas_project_options: null
  partition_options: null
  well_features_transformations:
      # one or more of these
      - type: add_uniform_value_column
        column_name: totalProppantByPerfLength
        column_dtype: float
        column_value: 2500
        overwrite: true
      - type: add_uniform_value_column
        column_name: totalFluidByPerfLength
        column_dtype: float
        column_value: 2200
        overwrite: true
  sensitivity_features:
      sampled_features:
          - feature: totalProppantByPerfLength
            low: 1500
            high: 3000
            step: 500
      linked_features:
          - feature: totalFluidByPerfLength
            value_expr: "row['totalProppantByPerfLength']"

Inventory prediction controls including batch size, transformations, and sensitivity analysis.

# Grid

grid:
  num_years_to_forecast: 40
  grid_spec:
    workGroupPrefix: 4WPS
    scenarios:
      - name: 4WPS_04_BS2S
        wells:
          - name: 04_BS2S_w1
            lateralLength:
              value: 10000

Defines the layout for grid-based simulations including type curves and spacing.

# PDP2

pdp2:
  sensitivity_features:
    sampled_features:
      - feature: lateralLength
        low: 7500
        high: 15000

Controls for predicting PDP wells with different engineering parameters (e.g. simulating a different frac size).

# Additional Notes

midas_project_options are overridable at most levels
well_features_transformations ensure completeness of feature data
sensitivity_features allow for parameter sweeps (e.g. predict at proppant ranging from 1500-2000 lb/ft at 500 lb/ft increments)

← Build Overview Model Configuration →