Configuration file

This page documents settings in the IMI configuration file (config.yml).

Important

The config.yml file included with the IMI is setup for running the IMI on AWS.

If you want to run the IMI elsewhere, you will need to create your own environment and configuration files. See Running the IMI on a local cluster for more information.

General

RunName

Name for this inversion; will be used for directory names and prefixes.

Species

String defining the species to use for the inversion. The currently supported option is "CH4". Development for "CO2" is in process and will be available in an upcoming version.

SchedulerType

String defining the type of scheduler used to run the IMI. Currently supported options are “slurm”, “PBS”, or “tmux”. Select “tmux” to run the IMI with ./run_imi.sh` instead of PBS or slurm.

SafeMode

Boolean for running in safe mode to prevent overwriting existing files.

S3Upload

Boolean for uploading output directory to S3. If true, the S3UploadPath and S3UploadFiles settings must be set.

S3UploadPath

S3 path to upload files to (eg. s3://imi-output-dir/example-output/). Only used if S3Upload is true.

S3UploadFiles

Files to upload from the IMI output directory (eg. [*] will upload everything). Only used if S3Upload is true.

UseGCHP

Boolean for using GEOS-Chem High Performance (GCHP) for the forward model. Default is false, in which case GEOS-Chem Classic will be used.

Period of interest

StartDate

Beginning of the inversion period in YYYYMMDD format (this date is included in the inversion, 0-24h UTC).

EndDate

End of the inversion period in YYYYMMDD format (this date is excluded from the inversion, 0-24h UTC).

SpinupMonths

Number of months for the spinup simulation.

Hierarchical species settings

The following keys are now section-scoped and must be defined under CH4: or CO2: (not at top level):

  • SatelliteProduct

  • UseWaterObs

  • OptimizeOH

  • OptimizeSoil

  • PriorErrorOH

  • AdditionalDiagnostics

SatelliteProduct

Product string under CH4: or CO2:. For CH4, common values are "BlendedTROPOMI", "TROPOMI", or "Other". For CO2, "OCO2" is supported.

UseWaterObs

Boolean under CH4: or CO2: for whether to use observations over water (true) or not (false). Warning: if true, user should inspect data for potential artifacts.

OptimizeOH

Boolean under CH4: or CO2: to optimize OH during the inversion. Must also include PerturbValueOH and PriorErrorOH. Default value is false.

OptimizeSoil

Boolean under CH4: to optimize soil absorption during the inversion. Default value is true.

PriorErrorOH

Vector under CH4: or CO2: of errors in the OH estimates (relative percent). Default is [0.1] (10%) error.

AdditionalDiagnostics

Optional list under CH4: or CO2: to enable extra diagnostics (e.g., ["ObsPack"], ["TCCON"], ["PLANEFLIGHT"]).

The active section is selected following the defined Species. For example:

Species: "CH4"
CH4:
  SatelliteProduct: "BlendedTROPOMI"
  UseWaterObs: false
  OptimizeOH: false
  OptimizeSoil: true
  PriorErrorOH: [0.1]
  AdditionalDiagnostics: ["ObsPack"]
CO2:
  SatelliteProduct: "OCO2"
  UseWaterObs: false
  AdditionalDiagnostics: ["ObsPack"]

Region of interest

isRegional

Boolean for using the GEOS-Chem regional simulation. This should be set to false for global inversions. Default value is true.

RegionID

Two character region ID for using pre-cropped meteorology fields. Options are "AF" (Africa), "AS" (Asia), "EU" (Europe), "ME" (Middle East), "NA" (North America), "OC" (Oceania), "RU" (Russia), or "SA" (South America). To use global meteorology fields set this option to "" See the GEOS-Chem horizontal grids documentation for details about the available regional domains.

LonMin

Minimum longitude edge of the region of interest (only used if CreateAutomaticRectilinearStateVectorFile is true, otherwise lat/lon bounds are determined from StateVectorFile).

LonMax

Maximum longitude edge of the region of interest (only used if CreateAutomaticRectilinearStateVectorFile is true, otherwise lat/lon bounds are determined from StateVectorFile).

LatMin

Minimum latitude edge of the region of interest (only used if CreateAutomaticRectilinearStateVectorFile is true, otherwise lat/lon bounds are determined from StateVectorFile).

LatMax

Maximum latitude edge of the region of interest (only used if CreateAutomaticRectilinearStateVectorFile is true, otherwise lat/lon bounds are determined from StateVectorFile).

Meteorology

Met

Meteorology to use for the inversion. Options are "GEOSFP" or "MERRA2". Default value is GEOSFP.

Resolution

Res

Horizontal grid resolution for inversion. Options are "0.125x0.15625" (GEOS-FP only), "0.25x0.3125" (GEOS-FP only), "0.5x0.625", "2.0x2.5", or "4.0x5.0". Default value is 0.25x0.3125

Grid settings for GCHP

CS_RES

Cubed-sphere (CS) horizontal grid resolution for GCHP simulations. This is an integer representing the number of grid cells per cubed-sphere face side. Common options are 24, 30, 48, 90, 180, 360, or 720. See GCHP horizontal grids for more details.

STRETCH_GRID

Boolean to use the GCHP stretched grid option.

STRETCH_FACTOR

Parameter controlling the degree of stretching on the target face of the cubed-sphere grid. Minimum STRETCH_FACTOR value is 1.0001.

TARGET_LAT

Latitude defining the center point for the target face of the GCHP grid.

TARGET_LON

Longitude defining the center point for the target face of the GCHP grid. Must be in the range of [-180, 180].

Kalman filter options

KalmanMode

Boolean for running the IMI using a Kalman filter for continuous updates (true) or using a single inversion (false). See more details about Kalman Mode in the Kalman filter documentation. Default value is false.

UpdateFreqDays

Number of days in each Kalman filter update cycle (e.g. 7 days).

NudgeFactor

Fraction of original prior emissions to use in the prior for each Kalman filter update (e.g. 0.1). See Kalman mode documentation for more details.

MakePeriodsCSV

Option to automatically create periods.csv based on the constant number of days in UpdateFreqDays. Default is true. If false, a custom periods.csv will be used instead.

CustomPeriodsCSV

Path to custom periods.csv with user-defined start and end dates for each Kalman filter update period.

FirstPeriod

Optional variable to specify which Kalman period to start on, if restarting an inversion. Default is 1.

State vector

CreateAutomaticRectilinearStateVectorFile

Boolean for whether the IMI should automatically create a rectilinear state vector for the inversion. If false, a custom/pre-generated state vector netcdf file must be provided under StateVectorFile.

nBufferClusters

Number of buffer elements (clusters of GEOS-Chem grid cells lying outside the region of interest) to add to the state vector of emissions being optimized in the inversion. Default value is 8.

BufferDeg

Width of the buffer elements, in degrees; will not be used if CreateAutomaticRectilinearStateVectorFile is false. Default is 5 (~500 km).

EmisThreshold

GEOS-Chem grid cells with emissions above this threshold will be included in the state vector. Default value is 1.e-12.

OptimizeBCs

Boolean to optimize boundary conditions during the inversion. Must also include PerturbValueBCs and PriorErrorBCs. Default value is true.

Point source datasets

PointSourceDatasets

Optional list of public datasets to use for visualization of point sources to be included in state vector clustering. Current options are ["SRON"], ["CarbonMapper"], and ["IMEO"].

Clustering Options

For more information on using the clustering options take a look at the clustering options page.

ReducedDimensionStateVector

Boolean for whether to reduce the dimension of the statevector from the native resolution version by clustering elements. If false the native state vector is used with no dimension reduction.

DynamicKFClustering

Boolean for whether to update the statevector clustering with each Kalman Filter update. Note: KalmanMode must be set to true.

ClusteringMethod

Clustering method to use for state vector reduction. (e.g. "kmeans" or "mini-batch-kmeans")

ClusteringThreshold

Optional value for aggregate DOFS that a cluster must have before being added to the grid. Making this value higher will smooth out the clustering. Default value is Estimated_DOFS / NumberOfElements.

NumberOfElements

Number of elements in the reduced dimension state vector. This is only used if ReducedDimensionStateVector is true.

ForcedNativeResolutionElements

yaml list of of coordinates that you would like to force as native resolution state vector elements [lat, lon]. This is useful for ensuring hotspot locations are at the highest available resolution.

EmissionRateFilter

Emissions rate filter in kg/hour. Grid cells with mean emissions less than this value are not included. Specifying a value of 0 means all plumes will be included.

PlumeCountFilter

Grid cells with plume count less than this value are not included. Specifying a value of 0 means no filtering will be applied and only EmissionRateFilter will be used.

GroupByCountry

Boolean for whether to use grid cell’s country as k-means clustering feature. Set to true to avoid clusters that cross country boundaries.

Custom/pre-generated state vector

These settings are only used if CreateAutomaticRectilinearStateVectorFile is false. Use them to create a custom state vector file from a shapefile in conjunction with the statevector_from_shapefile.ipynb jupyter notebook located at:

$ /home/ubuntu/integrated_methane_inversion/src/notebooks/statevector_from_shapefile.ipynb

StateVectorFile

Path to the custom or pre-generated state vector netcdf file. File will be saved here if generating it from a shapefile.

ShapeFile

Path to a shapefile for use in creating a custom state vector file. This file is also used in determining bounds for inclusion of emission plumes.

Note: To setup a remote Jupyter notebook check out the quick start guide visualize results with python section.

Inversion

LognormalErrors

Boolean value whether to use lognormal error distribution for calculating emissions in the domain of interest. Note: Normal error is used for buffer elements and boundary condition optimization.

PriorError

Vector of errors in the prior estimates (1-sigma; relative). Default is [0.5] (50%) error.

PriorErrorBCs

Vector of errors in the prior estimates (using ppb). Default is [10] ppb error.

PriorErrorBufferElements

Vector of errors in the prior estimates for buffer elements (1-sigma; relative). Default is [0.5] (50%) error. Note: only used if LognormalErrors is true.

ObsError

Vector of observational errors (1-sigma; absolute; ppb). Default value is [15] ppb error.

Gamma

Vector of regularization parameters; typically between 0 and 1. Default value is [1.0].

PrecomputedJacobian

Boolean for whether the Jacobian matrix has already been computed (true) or not (false). Default value is false.

OffDiagonalPriorCov

Boolean for whether to build and use a prior error covariance matrix with off-diagonal terms during the inversion. Default value is false.

LengthScalePriorCov

Spatial length scale in km used when building the off-diagonal prior covariance matrix. Only used if OffDiagonalPriorCov is true. Default value is 25.

ReferenceRunDir

Path to the reference run directory containing previously generated Jacobian. Only used if PrecomputedJacobian is true.

Setup modules

These settings turn on/off (true / false) different steps for setting up the IMI.

RunSetup

Boolean to run the setup script (setup_imi.sh), including selected setup modules below.

SetupTemplateRundir

Boolean to create a GEOS-Chem run directory and modify it with settings from config.yml.

SetupSpinupRun

Boolean to set up a run directory for the spinup-simulation by copying the template run directory and modifying the start/end dates, restart file, and diagnostics.

SetupJacobianRuns

Boolean to set up run directories for N+1 simulations (one reference simulation, plus N sensitivity simulations for the N state vector elements) by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. Output from these simulations will be used to construct the Jacobian.

SetupInversion

Boolean to set up the inversion directory containing scripts needed to perform the inverse analysis; inversion results will be saved here.

SetupPosteriorRun

Boolean to set up the run directory for the posterior simulation by copying the template run directory and modifying the start/end dates, restart file, and diagnostics.

Run modules

These settings turn on/off (true / false) different steps for running the inversion.

DoHemcoPriorEmis

Boolean to run a HEMCO standalone simulation to generate the prior emissions.

DoSpinup

Boolean to run a spin-up simulation to generate a new restart file for initializing species concentrations in the Jacobian simulations.

DoJacobian

Boolean to run the reference and sensitivity forward model simulations.

ReDoJacobian

Boolean to only re-run sensitivity simulations that have not yet completed successfully. This is useful for resuming an interrupted inversion. false will re-run all sensitivity simulations.

DoInversion

Boolean to run the inverse analysis code.

DoPosterior

Boolean to run the posterior simulation and execute the visualization notebook summarizing the IMI results. These results are also saved in inversion/output/.

IMI preview

DoPreview

Boolean to run the IMI preview (true) or not (false).

DOFSThreshold

Threshold for estimated DOFS below which the IMI should automatically exit with a warning after performing the preview. Default value 0 prevents exit.

Job Resource Allocation

These settings are used to allocate resources (CPUs and Memory) to the different simulations needed to run the inversion. Note: some python scripts are also deployed using slurm and default to using the RequestedCPUs and RequestedMemory settings. If the inversion step requires more resources than the rest of the IMI workflow, using the optional InversionCPUs and InversionMemory variables can be convenient.

RequestedCPUs

Number of cores to allocate to slurm jobs.

RequestedMemory

Amount of memory to allocate to each in series simulation (e.g. “10gb”).

InversionCPUs

Optional Variable. Number of cores to allocate to the inversion job if different from RequestedCPUs.

InversionMemory

Optional Variable. Amount of memory to allocate to inversion sbatch job (e.g. “32gb”) if different from RequestedMemory.

RequestedTime

Max amount of time to allocate to each sbatch job (eg. “0-6:00”)

SchedulerPartition

Name of the partition(s) you would like all slurm jobs to run on (eg. “debug”). Partition names will vary depending on the cluster used.

MaxSimultaneousRuns

The maximum number of jacobian simulations to run simultaneously. The default is -1 (no limit) which will submit all jacobian simulations at once. If the value is greater than zero, the sbatch array statement will be modified to include the “%” separator and will limit the number of simultaneously running tasks from the job array to the specifed value.

NumJacobianTracers

The number of tracers to use for each jacobian simulation. A value of 1 will create and submit a jacobian run for each state vector element. Specifying a value greater than 1 will combine state vector elements into fewer runs. The default values is 5 tracers per simulation.

Advanced settings: Observing System Simulation Experiment (OSSE)

These settings are intended for advanced users who wish to run an OSSE. This effectively runs the inversion using simulated pseudo-observations with a known prior emissions field. The IMI will generate synthetic observations by randomly perturbing the prior emissions and adding noise to the generated observations based on user specification.

EnableOSSE

Boolean to enable running the IMI with pseudo-observations. Default value is false.

DoOSSE

Boolean to run the simulation that pseudo-observations will be generated on. This should be run after the SpinupSimulation. Default value is false.

EmisPerturbationOSSE

Amount of random perturbation to apply to the prior emissions to generate synthetic observations. Uses a Gaussian distribution to assign, unless LognormalErrors is set to true, then it uses a log-normal distribution. Default value is 0.5 (50%).

ObsErrorOSSE

Amount of random gaussian error to apply to the observations sampled from the OSSE simulation. Default value is 15 ppb.

CreateAutomaticScaleFactorFileOSSE

Boolean to create a scale factor file for the OSSE simulation. This file will be used to define the “true emissions” scaling from the prior emissions. Default value is true.

ScaleFactorFileOSSE

Path to the scale factor file for the OSSE simulation. This file will be used to define the “true emissions” scaling from the prior emissions. Only used if CreateAutomaticScaleFactorFileOSSE is false.

Advanced settings: GEOS-Chem options

These settings are intended for advanced users who wish to modify additional GEOS-Chem options.

PerturbValue

Target perturbation amount on the emissions in each sensitivity simulation. Default value is 1. Corresponding to a 1e-8 kg/m2/s perturbation.

PerturbValueOH

Value to perturb OH by if using OptimizeOH. Default value is 1.1.

PerturbValueBCs

Number of ppb to perturb emissions by for domain edges (North, South, East, West) if using OptimizeBCs. Default value is 10.0 ppb.

HourlySpecies

Boolean to save out hourly diagnostics from GEOS-Chem. This output is used in satellite operators via post-processing. Default value is true.

PLANEFLIGHT

Boolean to save out the planeflight diagnostic in GEOS-Chem. This output may be used to compare GEOS-Chem against planeflight data. The path to those data must be specified in geoschem_config.yml. See the planeflight diagnostic documentation for details. Default value is false.

DoObsPack

Boolean to save out the ObsPack diagnostic in GEOS-Chem. This output may be used to compare GEOS-Chem against NOAA ObsPack data. The path to those data must be specified in geoschem_config.yml. See the ObsPack diagnostic documentation for details. Default value is false. A sample python notebook for plotting GEOS-Chem against ObsPack can be found at src/notebooks/NOAA_ObsPack_MBL_compare.ipnyb.

GOSAT

Boolean to turn on the GOSAT observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is false.

TCCON

Boolean to turn on the TCCON observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is false.

AIRS

Boolean to turn on the AIRS observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is false.

UseBCsForRestart

Boolean for using global boundary condition files for initial conditions.

Advanced settings: Local cluster

These settings are intended for advanced users who wish to (run the IMI on a local cluster).

OutputPath

Path for IMI runs and output.

DataPath

Path to GEOS-Chem input data.

DataPathObs

Path to satellite input data.

GEOSChemEnv

Path to file that activates the GEOS-Chem environment (with fortran comiler, netCDF libraries, etc.)

PythonEnv

Path to file that activates the Python environment.

RestartDownload

Boolean for downloading an initial restart file from AWS S3. Default value is true.

RestartFilePrefix

Path to initial GEOS-Chem restart file plus file prefix (e.g. GEOSChem.BoundaryConditions. or GEOSChem.Restart.). The date string and file extension (YYYYMMDD_0000z.nc4) will be appended. This file will be used to initialize the spinup simulation.

BCpath

Path to GEOS-Chem boundary condition files (for regional simulations).

BCversion

Version of TROPOMI smoothed boundary conditions to use (e.g. v2025-06). Note: this will be appended onto BCpath as a subdirectory.

HemcoPriorEmisDryRun

Boolean to download missing GEOS-Chem data for the HEMCO prior emissions run. Default value is true.

SpinupDryRun

Boolean to download missing GEOS-Chem data for the spinup simulation. Default value is true.

ProductionDryRun

Boolean to download missing GEOS-Chem data for the production (i.e. Jacobian) simulations. Default value is true.

PosteriorDryRun

Boolean to download missing GEOS-Chem data for the posterior simulation. Default value is true.

BCDryRun

Boolean to download missing GEOS-Chem data for the preview run. Default value is true.