Configuration file ================== This page documents settings in the IMI configuration file (``config.yml``). .. important:: The ``config.yml`` file included with the IMI is setup for running the IMI on AWS. If you want to run the IMI elsewhere, you will need to create your own environment and configuration files. See :doc:`Running the IMI on a local cluster <../advanced/local-cluster>` for more information. General ~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``RunName`` - Name for this inversion; will be used for directory names and prefixes. * - ``Species`` - String defining the species to use for the inversion. The currently supported option is ``"CH4"``. Development for ``"CO2"`` is in process and will be available in an upcoming version. * - ``SchedulerType`` - String defining the type of scheduler used to run the IMI. Currently supported options are "slurm", "PBS", or "tmux". Select "tmux" to run the IMI with ``./run_imi.sh``` instead of PBS or slurm. * - ``SafeMode`` - Boolean for running in safe mode to prevent overwriting existing files. * - ``S3Upload`` - Boolean for uploading output directory to S3. If ``true``, the ``S3UploadPath`` and ``S3UploadFiles`` settings must be set. * - ``S3UploadPath`` - S3 path to upload files to (eg. ``s3://imi-output-dir/example-output/``). Only used if ``S3Upload`` is ``true``. * - ``S3UploadFiles`` - Files to upload from the IMI output directory (eg. ``[*]`` will upload everything). Only used if ``S3Upload`` is ``true``. * - ``UseGCHP`` - Boolean for using `GEOS-Chem High Performance (GCHP) `__ for the forward model. Default is ``false``, in which case `GEOS-Chem Classic `__ will be used. Period of interest ~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``StartDate`` - Beginning of the inversion period in ``YYYYMMDD`` format (this date is included in the inversion, 0-24h UTC). * - ``EndDate`` - End of the inversion period in ``YYYYMMDD`` format (this date is excluded from the inversion, 0-24h UTC). * - ``SpinupMonths`` - Number of months for the spinup simulation. Hierarchical species settings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following keys are now **section-scoped** and must be defined under ``CH4:`` or ``CO2:`` (not at top level): - ``SatelliteProduct`` - ``UseWaterObs`` - ``OptimizeOH`` - ``OptimizeSoil`` - ``PriorErrorOH`` - ``AdditionalDiagnostics`` .. list-table:: :widths: 30, 70 :class: tight-table * - ``SatelliteProduct`` - Product string under ``CH4:`` or ``CO2:``. For CH4, common values are ``"BlendedTROPOMI"``, ``"TROPOMI"``, or ``"Other"``. For CO2, ``"OCO2"`` is supported. * - ``UseWaterObs`` - Boolean under ``CH4:`` or ``CO2:`` for whether to use observations over water (``true``) or not (``false``). Warning: if ``true``, user should inspect data for potential artifacts. * - ``OptimizeOH`` - Boolean under ``CH4:`` or ``CO2:`` to optimize OH during the inversion. Must also include ``PerturbValueOH`` and ``PriorErrorOH``. Default value is ``false``. * - ``OptimizeSoil`` - Boolean under ``CH4:`` to optimize soil absorption during the inversion. Default value is ``true``. * - ``PriorErrorOH`` - Vector under ``CH4:`` or ``CO2:`` of errors in the OH estimates (relative percent). Default is ``[0.1]`` (10%) error. * - ``AdditionalDiagnostics`` - Optional list under ``CH4:`` or ``CO2:`` to enable extra diagnostics (e.g., ``["ObsPack"]``, ``["TCCON"]``, ``["PLANEFLIGHT"]``). The active section is selected following the defined ``Species``. For example: .. code-block:: yaml Species: "CH4" CH4: SatelliteProduct: "BlendedTROPOMI" UseWaterObs: false OptimizeOH: false OptimizeSoil: true PriorErrorOH: [0.1] AdditionalDiagnostics: ["ObsPack"] CO2: SatelliteProduct: "OCO2" UseWaterObs: false AdditionalDiagnostics: ["ObsPack"] Region of interest ~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``isRegional`` - Boolean for using the GEOS-Chem regional simulation. This should be set to ``false`` for global inversions. Default value is ``true``. * - ``RegionID`` - Two character region ID for using pre-cropped meteorology fields. Options are ``"AF"`` (Africa), ``"AS"`` (Asia), ``"EU"`` (Europe), ``"ME"`` (Middle East), ``"NA"`` (North America), ``"OC"`` (Oceania), ``"RU"`` (Russia), or ``"SA"`` (South America). To use global meteorology fields set this option to ``""`` See the `GEOS-Chem horizontal grids `_ documentation for details about the available regional domains. * - ``LonMin`` - Minimum longitude edge of the region of interest (only used if ``CreateAutomaticRectilinearStateVectorFile`` is ``true``, otherwise lat/lon bounds are determined from ``StateVectorFile``). * - ``LonMax`` - Maximum longitude edge of the region of interest (only used if ``CreateAutomaticRectilinearStateVectorFile`` is ``true``, otherwise lat/lon bounds are determined from ``StateVectorFile``). * - ``LatMin`` - Minimum latitude edge of the region of interest (only used if ``CreateAutomaticRectilinearStateVectorFile`` is ``true``, otherwise lat/lon bounds are determined from ``StateVectorFile``). * - ``LatMax`` - Maximum latitude edge of the region of interest (only used if ``CreateAutomaticRectilinearStateVectorFile`` is ``true``, otherwise lat/lon bounds are determined from ``StateVectorFile``). Meteorology ~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``Met`` - Meteorology to use for the inversion. Options are ``"GEOSFP"`` or ``"MERRA2"``. Default value is ``GEOSFP``. Resolution ~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``Res`` - Horizontal grid resolution for inversion. Options are ``"0.125x0.15625"`` (GEOS-FP only), ``"0.25x0.3125"`` (GEOS-FP only), ``"0.5x0.625"``, ``"2.0x2.5"``, or ``"4.0x5.0"``. Default value is ``0.25x0.3125`` Grid settings for GCHP ~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``CS_RES`` - Cubed-sphere (CS) horizontal grid resolution for GCHP simulations. This is an integer representing the number of grid cells per cubed-sphere face side. Common options are ``24``, ``30``, ``48``, ``90``, ``180``, ``360``, or ``720``. See `GCHP horizontal grids `__ for more details. * - ``STRETCH_GRID`` - Boolean to use the GCHP `stretched grid `__ option. * - ``STRETCH_FACTOR`` - Parameter controlling the degree of stretching on the target face of the cubed-sphere grid. Minimum ``STRETCH_FACTOR`` value is 1.0001. * - ``TARGET_LAT`` - Latitude defining the center point for the target face of the GCHP grid. * - ``TARGET_LON`` - Longitude defining the center point for the target face of the GCHP grid. Must be in the range of [-180, 180]. Kalman filter options ~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``KalmanMode`` - Boolean for running the IMI using a Kalman filter for continuous updates (``true``) or using a single inversion (``false``). See more details about Kalman Mode in the `Kalman filter documentation <../advanced/kalman-filter-mode.html>`_. Default value is ``false``. * - ``UpdateFreqDays`` - Number of days in each Kalman filter update cycle (e.g. ``7`` days). * - ``NudgeFactor`` - Fraction of original prior emissions to use in the prior for each Kalman filter update (e.g. ``0.1``). See Kalman mode documentation for more details. * - ``MakePeriodsCSV`` - Option to automatically create ``periods.csv`` based on the constant number of days in ``UpdateFreqDays``. Default is ``true``. If ``false``, a custom ``periods.csv`` will be used instead. * - ``CustomPeriodsCSV`` - Path to custom ``periods.csv`` with user-defined start and end dates for each Kalman filter update period. * - ``FirstPeriod`` - Optional variable to specify which Kalman period to start on, if restarting an inversion. Default is ``1``. State vector ~~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``CreateAutomaticRectilinearStateVectorFile`` - Boolean for whether the IMI should automatically create a rectilinear state vector for the inversion. If ``false``, a custom/pre-generated state vector netcdf file must be provided under ``StateVectorFile``. * - ``nBufferClusters`` - Number of buffer elements (clusters of GEOS-Chem grid cells lying outside the region of interest) to add to the state vector of emissions being optimized in the inversion. Default value is ``8``. * - ``BufferDeg`` - Width of the buffer elements, in degrees; will not be used if ``CreateAutomaticRectilinearStateVectorFile`` is ``false``. Default is ``5`` (~500 km). * - ``EmisThreshold`` - GEOS-Chem grid cells with emissions above this threshold will be included in the state vector. Default value is ``1.e-12``. * - ``OptimizeBCs`` - Boolean to optimize boundary conditions during the inversion. Must also include ``PerturbValueBCs`` and ``PriorErrorBCs``. Default value is ``true``. Point source datasets ~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``PointSourceDatasets`` - Optional list of public datasets to use for visualization of point sources to be included in state vector clustering. Current options are ``["SRON"]``, ``["CarbonMapper"]``, and ``["IMEO"]``. Clustering Options ~~~~~~~~~~~~~~~~~~ For more information on using the clustering options take a look at the `clustering options page <../advanced/using-clustering-options.html>`__. .. list-table:: :widths: 30, 70 :class: tight-table * - ``ReducedDimensionStateVector`` - Boolean for whether to reduce the dimension of the statevector from the native resolution version by clustering elements. If ``false`` the native state vector is used with no dimension reduction. * - ``DynamicKFClustering`` - Boolean for whether to update the statevector clustering with each Kalman Filter update. Note: ``KalmanMode`` must be set to true. * - ``ClusteringMethod`` - Clustering method to use for state vector reduction. (e.g. ``"kmeans"`` or ``"mini-batch-kmeans"``) * - ``ClusteringThreshold`` - Optional value for aggregate DOFS that a cluster must have before being added to the grid. Making this value higher will smooth out the clustering. Default value is ``Estimated_DOFS / NumberOfElements``. * - ``NumberOfElements`` - Number of elements in the reduced dimension state vector. This is only used if ``ReducedDimensionStateVector`` is ``true``. * - ``ForcedNativeResolutionElements`` - yaml list of of coordinates that you would like to force as native resolution state vector elements [lat, lon]. This is useful for ensuring hotspot locations are at the highest available resolution. * - ``EmissionRateFilter`` - Emissions rate filter in kg/hour. Grid cells with mean emissions less than this value are not included. Specifying a value of 0 means all plumes will be included. * - ``PlumeCountFilter`` - Grid cells with plume count less than this value are not included. Specifying a value of 0 means no filtering will be applied and only ``EmissionRateFilter`` will be used. * - ``GroupByCountry`` - Boolean for whether to use grid cell's country as k-means clustering feature. Set to ``true`` to avoid clusters that cross country boundaries. Custom/pre-generated state vector ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These settings are only used if ``CreateAutomaticRectilinearStateVectorFile`` is ``false``. Use them to :doc:`create a custom state vector file <../advanced/custom-state-vector>` from a shapefile in conjunction with the ``statevector_from_shapefile.ipynb`` jupyter notebook located at:: $ /home/ubuntu/integrated_methane_inversion/src/notebooks/statevector_from_shapefile.ipynb .. list-table:: :widths: 30, 70 :class: tight-table * - ``StateVectorFile`` - Path to the custom or pre-generated state vector netcdf file. File will be saved here if generating it from a shapefile. * - ``ShapeFile`` - Path to a shapefile for use in `creating a custom state vector file <../advanced/custom-state-vector.html>`__. This file is also used in determining bounds for inclusion of emission plumes. Note: To setup a remote Jupyter notebook check out the quick start guide `visualize results with python <../getting-started/quick-start.html#visualize-results-with-python>`__ section. Inversion ~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``LognormalErrors`` - Boolean value whether to use lognormal error distribution for calculating emissions in the domain of interest. Note: Normal error is used for buffer elements and boundary condition optimization. * - ``PriorError`` - Vector of errors in the prior estimates (1-sigma; relative). Default is ``[0.5]`` (50%) error. * - ``PriorErrorBCs`` - Vector of errors in the prior estimates (using ppb). Default is ``[10]`` ppb error. * - ``PriorErrorBufferElements`` - Vector of errors in the prior estimates for buffer elements (1-sigma; relative). Default is ``[0.5]`` (50%) error. Note: only used if ``LognormalErrors`` is ``true``. * - ``ObsError`` - Vector of observational errors (1-sigma; absolute; ppb). Default value is ``[15]`` ppb error. * - ``Gamma`` - Vector of regularization parameters; typically between 0 and 1. Default value is ``[1.0]``. * - ``PrecomputedJacobian`` - Boolean for whether the Jacobian matrix has already been computed (``true``) or not (``false``). Default value is ``false``. * - ``OffDiagonalPriorCov`` - Boolean for whether to build and use a prior error covariance matrix with off-diagonal terms during the inversion. Default value is ``false``. * - ``LengthScalePriorCov`` - Spatial length scale in km used when building the off-diagonal prior covariance matrix. Only used if ``OffDiagonalPriorCov`` is ``true``. Default value is ``25``. * - ``ReferenceRunDir`` - Path to the reference run directory containing previously generated Jacobian. Only used if ``PrecomputedJacobian`` is ``true``. Setup modules ~~~~~~~~~~~~~ These settings turn on/off (``true`` / ``false``) different steps for setting up the IMI. .. list-table:: :widths: 30, 70 :class: tight-table * - ``RunSetup`` - Boolean to run the setup script (``setup_imi.sh``), including selected setup modules below. * - ``SetupTemplateRundir`` - Boolean to create a GEOS-Chem run directory and modify it with settings from ``config.yml``. * - ``SetupSpinupRun`` - Boolean to set up a run directory for the spinup-simulation by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. * - ``SetupJacobianRuns`` - Boolean to set up run directories for N+1 simulations (one reference simulation, plus N sensitivity simulations for the N state vector elements) by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. Output from these simulations will be used to construct the Jacobian. * - ``SetupInversion`` - Boolean to set up the inversion directory containing scripts needed to perform the inverse analysis; inversion results will be saved here. * - ``SetupPosteriorRun`` - Boolean to set up the run directory for the posterior simulation by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. Run modules ~~~~~~~~~~~ These settings turn on/off (``true`` / ``false``) different steps for running the inversion. .. list-table:: :widths: 30, 70 :class: tight-table * - ``DoHemcoPriorEmis`` - Boolean to run a HEMCO standalone simulation to generate the prior emissions. * - ``DoSpinup`` - Boolean to run a spin-up simulation to generate a new restart file for initializing species concentrations in the Jacobian simulations. * - ``DoJacobian`` - Boolean to run the reference and sensitivity forward model simulations. * - ``ReDoJacobian`` - Boolean to only re-run sensitivity simulations that have not yet completed successfully. This is useful for resuming an interrupted inversion. ``false`` will re-run all sensitivity simulations. * - ``DoInversion`` - Boolean to run the inverse analysis code. * - ``DoPosterior`` - Boolean to run the posterior simulation and execute the visualization notebook summarizing the IMI results. These results are also saved in ``inversion/output/``. IMI preview ~~~~~~~~~~~ .. list-table:: :widths: 30, 70 :class: tight-table * - ``DoPreview`` - Boolean to run the :doc:`IMI preview ` (``true``) or not (``false``). * - ``DOFSThreshold`` - Threshold for estimated DOFS below which the IMI should automatically exit with a warning after performing the preview. Default value ``0`` prevents exit. Job Resource Allocation ~~~~~~~~~~~~~~~~~~~~~~~~~ These settings are used to allocate resources (CPUs and Memory) to the different simulations needed to run the inversion. Note: some python scripts are also deployed using slurm and default to using the ``RequestedCPUs`` and ``RequestedMemory`` settings. If the inversion step requires more resources than the rest of the IMI workflow, using the optional ``InversionCPUs`` and ``InversionMemory`` variables can be convenient. .. list-table:: :widths: 30, 70 :class: tight-table * - ``RequestedCPUs`` - Number of cores to allocate to slurm jobs. * - ``RequestedMemory`` - Amount of memory to allocate to each in series simulation (e.g. "10gb"). * - ``InversionCPUs`` - Optional Variable. Number of cores to allocate to the inversion job if different from ``RequestedCPUs``. * - ``InversionMemory`` - Optional Variable. Amount of memory to allocate to inversion sbatch job (e.g. "32gb") if different from ``RequestedMemory``. * - ``RequestedTime`` - Max amount of time to allocate to each sbatch job (eg. "0-6:00") * - ``SchedulerPartition`` - Name of the partition(s) you would like all slurm jobs to run on (eg. "debug"). Partition names will vary depending on the cluster used. * - ``MaxSimultaneousRuns`` - The maximum number of jacobian simulations to run simultaneously. The default is -1 (no limit) which will submit all jacobian simulations at once. If the value is greater than zero, the sbatch array statement will be modified to include the "%" separator and will limit the number of simultaneously running tasks from the job array to the specifed value. * - ``NumJacobianTracers`` - The number of tracers to use for each jacobian simulation. A value of 1 will create and submit a jacobian run for each state vector element. Specifying a value greater than 1 will combine state vector elements into fewer runs. The default values is 5 tracers per simulation. Advanced settings: Observing System Simulation Experiment (OSSE) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These settings are intended for advanced users who wish to run an OSSE. This effectively runs the inversion using simulated pseudo-observations with a known prior emissions field. The IMI will generate synthetic observations by randomly perturbing the prior emissions and adding noise to the generated observations based on user specification. .. list-table:: :widths: 30, 70 :class: tight-table * - ``EnableOSSE`` - Boolean to enable running the IMI with pseudo-observations. Default value is ``false``. * - ``DoOSSE`` - Boolean to run the simulation that pseudo-observations will be generated on. This should be run after the SpinupSimulation. Default value is ``false``. * - ``EmisPerturbationOSSE`` - Amount of random perturbation to apply to the prior emissions to generate synthetic observations. Uses a Gaussian distribution to assign, unless ``LognormalErrors`` is set to true, then it uses a log-normal distribution. Default value is ``0.5`` (50%). * - ``ObsErrorOSSE`` - Amount of random gaussian error to apply to the observations sampled from the OSSE simulation. Default value is ``15`` ppb. * - ``CreateAutomaticScaleFactorFileOSSE`` - Boolean to create a scale factor file for the OSSE simulation. This file will be used to define the "true emissions" scaling from the prior emissions. Default value is ``true``. * - ``ScaleFactorFileOSSE`` - Path to the scale factor file for the OSSE simulation. This file will be used to define the "true emissions" scaling from the prior emissions. Only used if ``CreateAutomaticScaleFactorFileOSSE`` is ``false``. Advanced settings: GEOS-Chem options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These settings are intended for advanced users who wish to modify additional GEOS-Chem options. .. list-table:: :widths: 30, 70 :class: tight-table * - ``PerturbValue`` - Target perturbation amount on the emissions in each sensitivity simulation. Default value is ``1``. Corresponding to a 1e-8 kg/m2/s perturbation. * - ``PerturbValueOH`` - Value to perturb OH by if using ``OptimizeOH``. Default value is ``1.1``. * - ``PerturbValueBCs`` - Number of ppb to perturb emissions by for domain edges (North, South, East, West) if using ``OptimizeBCs``. Default value is ``10.0`` ppb. * - ``HourlySpecies`` - Boolean to save out hourly diagnostics from GEOS-Chem. This output is used in satellite operators via post-processing. Default value is ``true``. * - ``PLANEFLIGHT`` - Boolean to save out the planeflight diagnostic in GEOS-Chem. This output may be used to compare GEOS-Chem against planeflight data. The path to those data must be specified in geoschem_config.yml. See the `planeflight diagnostic `_ documentation for details. Default value is ``false``. * - ``DoObsPack`` - Boolean to save out the ObsPack diagnostic in GEOS-Chem. This output may be used to compare GEOS-Chem against NOAA ObsPack data. The path to those data must be specified in geoschem_config.yml. See the `ObsPack diagnostic `_ documentation for details. Default value is ``false``. A sample python notebook for plotting GEOS-Chem against ObsPack can be found at ``src/notebooks/NOAA_ObsPack_MBL_compare.ipnyb``. * - ``GOSAT`` - Boolean to turn on the GOSAT observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is ``false``. * - ``TCCON`` - Boolean to turn on the TCCON observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is ``false``. * - ``AIRS`` - Boolean to turn on the AIRS observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is ``false``. * - ``UseBCsForRestart`` - Boolean for using global boundary condition files for initial conditions. Advanced settings: Local cluster ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ These settings are intended for advanced users who wish to (:doc:`run the IMI on a local cluster<../advanced/local-cluster>`). .. list-table:: :widths: 30, 70 :class: tight-table * - ``OutputPath`` - Path for IMI runs and output. * - ``DataPath`` - Path to GEOS-Chem input data. * - ``DataPathObs`` - Path to satellite input data. * - ``GEOSChemEnv`` - Path to file that activates the GEOS-Chem environment (with fortran comiler, netCDF libraries, etc.) * - ``PythonEnv`` - Path to file that activates the Python environment. * - ``RestartDownload`` - Boolean for downloading an initial restart file from AWS S3. Default value is ``true``. * - ``RestartFilePrefix`` - Path to initial GEOS-Chem restart file plus file prefix (e.g. ``GEOSChem.BoundaryConditions.`` or ``GEOSChem.Restart.``). The date string and file extension (``YYYYMMDD_0000z.nc4``) will be appended. This file will be used to initialize the spinup simulation. * - ``BCpath`` - Path to GEOS-Chem boundary condition files (for regional simulations). * - ``BCversion`` - Version of TROPOMI smoothed boundary conditions to use (e.g. ``v2025-06``). Note: this will be appended onto BCpath as a subdirectory. * - ``HemcoPriorEmisDryRun`` - Boolean to download missing GEOS-Chem data for the HEMCO prior emissions run. Default value is ``true``. * - ``SpinupDryRun`` - Boolean to download missing GEOS-Chem data for the spinup simulation. Default value is ``true``. * - ``ProductionDryRun`` - Boolean to download missing GEOS-Chem data for the production (i.e. Jacobian) simulations. Default value is ``true``. * - ``PosteriorDryRun`` - Boolean to download missing GEOS-Chem data for the posterior simulation. Default value is ``true``. * - ``BCDryRun`` - Boolean to download missing GEOS-Chem data for the preview run. Default value is ``true``.