# Build Overview

This document provides an overview of creating and launching builds in PetroAI, including the mechanics of navigating the site, versioning, and best practices.


# Compute Pipeline

Launching a build triggers a series of automated compute steps known collectively as the Compute Pipeline. Each major stage in this pipeline represents a key phase of the workflow, such as importing data, processing features, training models, and generating insights.

Each stage consists of smaller tasks that run in sequence or in parallel. For example, the INV stage (Inventory) includes several underlying steps like:

  • Generating inventory well locations
  • Extracting subsurface and operational features
  • Running predictions
  • Computing SHAP values for interpretability

The specific stages that run depend on your build configuration. If certain steps have already been completed in a previous build, the system will skip them to save time. For example, if well features were already processed, the Feature Engineering stage (FEAT) will be skipped, and the pipeline will start at the Modeling stage (MODL).

# Compute Pipeline Diagram

Interact with the diagram below to see all phases and dependencies of the compute pipeline. For a full-screen view, click here (opens new window).


# Launch a Build

  1. Select the repository
    Begin by selecting the appropriate repository where the build will be configured and launched. This ensures you are working with the correct source data.

  2. Navigate to the Builds page
    In the PetroAI's navigation menu, go to the Builds section. This is where all past and current builds are listed, and where you can initiate a new one.

  3. Click the "Add Build" icon
    Look for the Add Build icon, located at the top-right of the Builds page. Click it to open the fly-out menu to start configuring a new build.

  4. Name the build
    Enter a descriptive name for the new build. Choose a name that clearly communicates the purpose of the build (e.g., Midland South Inventory Prediction).

  5. Select a parent build using the "From Build" dropdown
    Use the From Build dropdown to select an existing build that this new build will inherit from. This helps in reusing configurations / data products and preventing unnecessary compute.

  6. Add build notes
    Provide a brief description or purpose for the build in the Build Notes section with notes and key configuration parameters. For example:
    Predict inventory on Midland South using 1320' spacing. Completion design: 1500-3000 lb/ft at 250 lb/ft increments. Model has been trained on 2015+ wells with laterals ranging 5000-1500'.
    These notes will help your team quickly understand the build's objective.

  7. Save the Build
    Once all required fields are filled in, click Save button in the lower right corner to initialize the new build. You will then be taken to the build's overview page where you can make any adjustments.

  8. Navigate to the Code section
    Go to the Code tab within the new build. This is where you will define the build’s behavior and data products using a configuration file. See the Build Configuration documentation for more details.

  9. Paste in the configuration file
    Paste your configuration file into the provided editor. It's recommended to copy the config from a prior build or locally stored file. Keep in mind the following:

    • Supported formats: Configuration files are natively stored in JSON, but can also be converted from YAML in the editor.
    • Include only the sections you want to run: The system will only execute the specified portions. For example, if you're training a new model for evaluation, but don't want to run downstream grid or inventory predictions, exclude the sections for 'grid' and 'inv'.
    • Configuration parameters: Refer to the Build Configuration section of the documentation for detailed parameter descriptions.
    • More guidance: See the How-To Guides for examples and best practices on running specific build types (e.g. Generate Well Features, Model Training, Inventory, etc.).
  10. Navigate to the Compute section and click "Launch Build"
    In the Compute tab, click Launch Build. The build will be submitted to the compute pipeline.

    • Once launched, the Compute page will display a progress tracker showing each stage of the build. Click Refresh to see the latest progress.

# Versioning

Builds will be assigned a version based on the parent build selected in Step 5 above, and options included in the phases configuration. It is important to correctly select the prior build within the same generation to maintain data consistency and avoid unnecessary rebuilding.

  • X.0.0.0.0 - Tracks changes to structure grids or geologic attribute grids.
  • 0.X.0.0.0 - Tracks changes to well data (e.g., header, production, surveys, or WellExtras).
  • 0.0.X.0.0 - Tracks changes to the earth model configuration (e.g., fracture dimensions or spacing).
  • 0.0.0.X.0 - Tracks changes to model training configuration (e.g., features, training wells, or partitioning).
  • 0.0.0.0.X - Tracks changes to undeveloped well predictions (e.g., scenario updates for grid or inventory).

# Best Practices

Following these best practices will help ensure builds are clearly documented, reproducible, and easy for teammates to interpret later.

# 1. Use Clear Naming and Detailed Notes

A descriptive name and detailed notes make it easier to identify builds and understand their purpose later:

  • Use a concise, descriptive name that reflects the build’s intent
    Example: Model update with new porosity feature
  • In the Notes field, include:
    • What was changed (e.g. features added, filters used, model type)
    • Key configuration parameters or flags
    • Any limitations or assumptions
    • References to related builds (if applicable)
    • Description of GRID or INV scenarios produced Example: Model update with new porosity feature

This is especially important when troubleshooting, reviewing history, or sharing work with others.


# 2. Coordinate When Incrementing Major Data Products

If your build will utilize major data products:

  • WALL (Well data - header, surveys, production)
  • GALL (Structure / attribute grids, logs)

Contact the PetroAI team first to ensure proper data QC and compute provisioning.


# 3. Streamline Model Experimentation

When running model experiments or testing new features:

  • Consider stopping the build pipeline at the DIAG phase (Diagnostics)
    • This lets you quickly analyze model quality (e.g. accuracy, SHAP values) without producing final outputs
    • Saves compute time and reduces confusion

You can adjust this in the config file by disabling downstream stages (GRID, INV, PDP2).

  • Take advantage of the Feature Experiments section in CORE2 to systematically test feature sets:
    • This configuration runs multiple randomized trials using tree-based models to identify high-performing subsets of features.
    • Use it to guide feature engineering or justify changes in your modeling workflow.
    • You can tune the number of trials and estimator count to balance precision vs speed.

# 4. Hide Unused Insight Dashboards

To avoid confusion for yourself and future users:

  • Hide insight cards for data products that are not generated by the build
    Examples: If the build skips GRID or INV, those insight cards can be hidden from the page.

This helps maintain a clean and relevant build interface.


# 5. Version Control & Traceability

Build versioning is automatically handled by the build server. The version is determined based on:

  • The selected parent build
  • The configuration blocks included in the build

To ensure the correct version increment:

  • Carefully select the appropriate parent build
  • Review the configuration to confirm only the intended blocks are included

Incorrect settings may result in an improper version chain or failed dependency resolution.

You do not need to manually assign version numbers—just ensure your inputs and selections reflect the build’s true scope.