Integrated Methane Inversion (IMI)
Important
Contributions (e.g., suggestions, edits, revisions) would be greatly appreciated. See editing this guide and our contributing guidelines. If you find something hard to understand, let us know!
The Integrated Methane Inversion (IMI) workflow is a cloud-computing tool for quantifying methane emissions by inversion of satellite observations from the TROPOspheric Monitoring Instrument (TROPOMI). It uses GEOS-Chem as forward model for the inversion and infers methane emissions at 25 × 25 km2 resolution.
This site provides instructions for using the IMI, including launching an AWS compute instance, configuring and running an inversion, and analyzing the results with a ready-made jupyter notebook.
Some instructions are specific to the Amazon Web Services (AWS) cloud, but the IMI can also be run on a local compute cluster either manually building the environment or using a docker container.
Quick start guide
1. Register with us
We encourage new users to email the IMI team at integrated-methane-inversion@g.harvard.edu with a description of your project and the organization you are affiliated with. Knowing our user base helps us to prioritize new features and updates to the IMI. Additionally, registered users can contact us for support and will receive notifications of any critical bugfixes or new releases/features added to the IMI.
Template introduction email:
Hello IMI Team!
My name is <insert name here> and I am affiliated with <insert-organization>.
We work on <research-interests> and are interested in using the IMI to <insert-application>.
Here is the link to our research page: <insert-link>. Please send us updates on any future
releases or critical bugfixes to the IMI.
2. Create an Amazon Web Services (AWS) account
If you do not already have an AWS account, you’ll need to sign up for one. Go to http://aws.amazon.com and click on “Create an AWS Account” in the upper-right corner:

You’ll need to enter some basic personal information and a credit card number.
Running the IMI is relatively inexpensive (usually on the order of USD $10-$100). The cost depends on the length of the inversion period, the size of the inversion domain, how long you retain your compute instance after completing the inversion, and how you store the final results.
For more information on costs, see Tips for Minimizing AWS costs.
Note
Students can check out subsidized educational credits at https://aws.amazon.com/education/awseducate/.
3. Add S3 user permissions
Default input data for the IMI are stored in the Amazon Simple Storage Service (S3). These include TROPOMI methane data, default prior emission estimates, GEOS-Chem meteorological data, and boundary condition data.
The IMI will automatically fetch the data needed for your inversion, but to enable this data transfer you’ll need to add S3 user permissions to your AWS account.
The easiest way to do this is to grant S3 access to an IAM role. Attaching the IAM role to a compute instance on the AWS Elastic Compute Cloud (EC2; Amazon’s basic computing service) will give the EC2 instance full access to S3.
Instructions to create an IAM role with full S3 access are provided in the GEOS-Chem Cloud Documentation. For more information on IAM roles, check out the AWS Documentation.
4. Launch an instance with the IMI
Once you’ve setup S3 permissions on your AWS account, login to the AWS console and go to the AWS Marketplace IMI listing (listed for free). This image contains the latest version of the IMI including all required software dependencies on an Amazon Machine Image (AMI). An AMI fully specifies the software side of your virtual system, including the operating system, software libraries, and default data files.
On the listing page click “Continue to Subscribe”.

On the following page click “Continue to Configuration”.

Select desired region and IMI version and click “Continue to Launch”. Choosing a region closer to your physical location will improve your network connectivity, but may result in increased costs compared to using the region where GEOS-Chem data are hosted (us-east-1, N.Virginia).

On the launch screen select “Launch through EC2” and then click launch.

Now it’s time to specify the hardware for running your system. Hardware choices differ primarily in CPU and RAM counts.
You can select from a large number of instance types in the “Instance Type” section. The IMI will run more quickly with a higher number of CPUs.
Choose the c5.9xlarge instance type, which includes 36 CPU cores and 72GB of RAM. Depending on your use case you may choose a different instance type with more/less cores and memory.

Note
Note: new AWS users may encounter a limit on the number of CPUs they can allocate. To request a limit increase follow the steps outlined in the aws docs on how to calculate a vCPU limit increase.
In the next section you create, or select an existing, ssh key pair. This is equivalent to the password you enter to ssh
to your local server.
Click “Create new key pair”. In the dialog box give your key pair a name (eg. imi_testing) and click “Create key pair”.
In the future, you can simply select your existing keypair from the dropdown menu.

The “Network Settings” section can be left as the defaults. Proceed to “Configure Storage” and select the size of your storage volume.

Note
Your storage needs will depend on the length of the inversion period, size of the inversion domain, and the inversion resolution. 100GB is generally sufficient for a 1-week inversion (such as for the Permian Basin), and 5 TB will likely be enough for a 1-year inversion.
Storage costs typically amount to USD $100 per month per TB of provisioned space. See our advice on selecting storage volume size to help minimize storage fees. And when your inversion is complete, consider copying output data to S3 and terminating your EC2 instance to avoid continued storage fees.
Expand the “Advance Details” section and select the IAM role you created in step 2 under “IAM Instance Profile”. This ensures that your EC2 instance has access to S3 (for downloading TROPOMI data and GEOS-Chem input data). All other config settings in “Advanced Details” can be left as the defaults.

Then, after reviewing the summary, just click on the “Launch Instance” button.. Once launched, you can monitor the instance in the EC2-Instance console as shown below. Within one minute of initialization, “Instance State” should show “running” (refresh the page if the status remains “pending”):

You now have your own system running on the cloud! Note that you will be charged continuously while the instance is running, so make sure to do the final tutorial step: shutdown the server if you need to pause your work to avoid unnecessary compute charges.
5. Login to your instance
Select your instance and click on the “Connect” button (shown in the figure above) near the blue “Launch Instance” button to show this instruction page:

On Mac or Linux, use the
ssh -i ...
command under “Example” to connect to the server in the terminal. Some minor changes are needed:cd
to the directory where your Key Pair is stored. People often put the key in~/.ssh/
but any directory will do.Use
chmod 400 your-key-name.pem
to change the key pair’s permission (also mentioned in the above figure; this only needs to be done once).Change the user name in the command from
root
toubuntu
so that the full command looks likessh -i "your-key-name.pem" ubuntu@ec2-##-###-##-##.compute-1.amazonaws.com
On Windows, you can install Git-BASH to emulate a Linux terminal. Simply accept all default options during installation, as the goal here is just to use Bash, not Git. Alternatively, you can use MobaXterm, Putty, Windows Subsystem for Linux (WSL), or PowerShell with OpenSSH. The Git-BASH solution should be the most painless, but these other options can work as well. Note: there is a bug on older versions of WSL that can prevent the
chmod
command from functioning.
Once you’ve followed the above instructions, you should see a “Welcome to Ubuntu” message indicating you’ve logged into your new EC2 instance.
6. Configure the IMI
Navigate to the IMI setup directory:
$ cd ~/integrated_methane_inversion
Open the config.yml
file with vim (vi
) or emacs:
$ emacs config.yml
This configuration file contains many settings that you can modify to suit your needs. See the IMI configuration file page for information on the different settings/options. Also see the common configurations page.
7. Run the IMI
After editing the configuration file, you can run the IMI by executing the following command:
$ sbatch run_imi.sh
The sbatch command runs the IMI and writes to the imi_output.log output file. You can track it’s progress by using:
$ tail --follow imi_output.log
The IMI can take minutes to days to complete, depending on the configuration and EC2 instance type. You can safely disconnect from your instance during this time, but the instance must remain active in the AWS console.
Alternatively, you can run the IMI with tmux to obtain a small to moderate speed-up.
Note
We strongly recommend using the IMI preview feature before running an inversion.
8. Visualize results with Python
When your inversion is complete, you can use the visualization notebook provided with the IMI to quickly inspect the results.
First navigate to the inversion directory:
$ cd /home/ubuntu/imi_output_dir/{YourRunName}/inversion
You can use the ls
command to view the contents of the directory, which will include several scripts, data directories,
and netcdf output files, along with visualization_notebook.ipynb
. For more information on the contents,
see Contents of the inversion directory.
To set up and connect to a jupyter notebook server on AWS follow these short instructions.
Once connected to the server, open visualization_notebook.ipynb
and run its contents to display key inversion results
including the state vector, prior and posterior emissions, TROPOMI data for the region/period of interest,
averaging kernel sensitivities, and more.
9. Shut down the instance
When you are ready to end your session, right-click on the instance in the AWS EC2 console to get this menu:

There are two options for ending the session: “Stop” (temporary shutdown) or “Terminate” (permanent deletion):
“Stop” will make the system inactive. You won’t be charged for CPU time, but you will be charged a disk storage fee for the number of GB provisioned on your EC2 instance. You can restart the instance at any time and all files will be preserved. When an instance is stopped, you can also change its hardware type (right click on the instance -> “Instance Settings” -> “Change Instance Type”).
“Terminate” will completely delete the instance so you will incur no further charges. Unless you save the contents of your instance as an AMI or transfer the data to another storage service (like S3), you will lose all your data and software.
10. Store data on S3
S3 is our preferred cloud storage platform due to cost and ease of access.
You can use the cp
command to copy your output files to an S3 bucket for long term storage:
$ aws s3 cp </path/to/output/files> s3://<bucket-name> --recursive
For more information on using s3
check out our tips for exporting data to S3.
IMI configuration file
This page documents settings in the IMI configuration file (config.yml
).
General
|
Name for this inversion; will be used for directory names and prefixes. |
|
Boolean for running the IMI on AWS ( |
|
Boolean for running the IMI as a batch job with |
|
Boolean for running in safe mode to prevent overwriting existing files. |
|
Boolean for uploading output directory to S3. If |
|
S3 path to upload files to (eg. |
|
Files to upload from the IMI Output directory (eg. |
|
Files to upload from the IMI Output directory (eg. |
Period of interest
|
Beginning of the inversion period in |
|
End of the inversion period in |
|
Number of months for the spinup simulation. |
TROPOMI data type
|
Boolean for if the Blended TROPOMI+GOSAT data should be used ( |
Region of interest
|
Minimum longitude edge of the region of interest (only used if |
|
Maximum longitude edge of the region of interest (only used if |
|
Minimum latitude edge of the region of interest (only used if |
|
Maximum latitude edge of the region of interest (only used if |
|
Boolean for using the GEOS-Chem nested grid simulation. Must be
|
|
Nesting domain for the inversion. Select |
State vector
|
Boolean for whether the IMI should automatically create a rectilinear state vector for the inversion. If |
|
Number of buffer elements (clusters of GEOS-Chem grid cells lying outside the region of interest) to add to the state vector of emissions being optimized in the inversion. Default value is |
|
Width of the buffer elements, in degrees; will not be used if |
|
Land-cover fraction below which to exclude GEOS-Chem grid cells from the state vector when creating the state vector file. Default value is |
|
Offshore GEOS-Chem grid cells with oil/gas emissions above this threshold will be included in the state vector. Default value is |
|
Boolean to optimize boundary conditions during the inversion. Must also include |
Clustering Options
For more information on using the clustering options take a look at the clustering options page.
|
Boolean for whether to reduce the dimension of the statevector from the native resolution version by clustering elements. If |
|
Boolean for whether to update the statevector clustering with each Kalman Filter update. Note: |
|
Clustering method to use for state vector reduction. (eg. “kmeans” or “mini-batch-kmeans”) |
|
Number of elements in the reduced dimension state vector. This is only used if |
|
yaml list of of coordinates that you would like to force as native resolution state vector elements [lat, lon]. This is useful for ensuring hotspot locations are at the highest available resolution. |
Custom/pre-generated state vector
These settings are only used if CreateAutomaticRectilinearStateVectorFile
is false
. Use them to create a custom state vector file from a shapefile in conjunction with the statevector_from_shapefile.ipynb
jupyter notebook located at:
$ /home/ubuntu/integrated_methane_inversion/src/notebooks/statevector_from_shapefile.ipynb
|
Path to the custom or pre-generated state vector netcdf file. File will be saved here if generating it from a shapefile. |
|
Path to the shapefile. |
Note: To setup a remote Jupyter notebook check out the quick start guide visualize results with python section.
Inversion
|
Error in the prior estimates (1-sigma; relative). Default is |
|
Error in the prior estimates (using ppb). Default is |
|
Observational error (1-sigma; absolute; ppb). Default value is |
|
Regularization parameter; typically between 0 and 1. Default value is |
|
Boolean for whether the Jacobian matrix has already been computed ( |
|
Path to IMI run directory with previously run jacobian simulations |
Grid
|
Resolution for inversion. Options are |
|
Meteorology to use for the inversion. Options are |
Setup modules
These settings turn on/off (true
/ false
) different steps for setting up the IMI.
|
Boolean to create a GEOS-Chem run directory and modify it with settings from |
|
Boolean to set up a run directory for the spinup-simulation by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. |
|
Boolean to set up run directories for N+1 simulations (one reference simulation, plus N sensitivity simulations for the N state vector elements) by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. Output from these simulations will be used to construct the Jacobian. |
|
Boolean to set up the inversion directory containing scripts needed to perform the inverse analysis; inversion results will be saved here. |
|
Boolean to set up the run directory for the posterior simulation by copying the template run directory and modifying the start/end dates, restart file, and diagnostics. |
Run modules
These settings turn on/off (true
/ false
) different steps for running the inversion.
|
Boolean to run the setup script ( |
|
Boolean to run the spin-up simulation. |
|
Boolean to run the reference and sensitivity simulations. |
|
Boolean to run the inverse analysis code. |
|
Boolean to run the posterior simulation. |
SLURM Resource Allocation
These settings are used to allocate resources (CPUs and Memory) to the different simulations needed to run the inversion.
Note: some python scripts are also deployed using slurm and default to using the SimulationCPUs
and SimulationMemory
settings.
|
Max amount of time to allocate to each sbatch job (eg. “0-6:00”) |
|
Number of cores to allocate to each in series simulation. |
|
Amount of memory to allocate to each in series simulation (in MB). |
|
Number of cores to allocate to each jacobian simulation (run in parallel). |
|
Amount of memory to allocate to each jacobian simulation (in MB). |
|
Name of the partition(s) you would like all slurm jobs to run on (eg. “debug,huce_intel,seas_compute,etc”). |
IMI preview
|
Boolean to run the IMI preview ( |
|
Threshold for estimated DOFS below which the IMI should automatically exit with a warning after performing the preview.
Default value |
Advanced settings: GEOS-Chem options
These settings are intended for advanced users who wish to modify additional GEOS-Chem options.
|
Value to perturb emissions by in each sensitivity simulation. Default value is |
|
Number of ppb to perturb emissions by for domain edges (North, South, East, West) if using OptimizeBCs. Default value is |
|
Boolean to apply emissions scale factors derived from a previous inversion. This file should be provided as a netCDF file and specified in HEMCO_Config.rc. Default value is |
|
Boolean to apply OH scale factors derived from a previous inversion. This file should be provided as a netCDF file and specified in HEMCO_Config.rc. Default value is |
|
Boolean to save out hourly diagnostics from GEOS-Chem. This output is used in satellite operators via post-processing. Default value is |
|
Boolean to save out the planeflight diagnostic in GEOS-Chem. This output may be used to compare GEOS-Chem against planeflight data. The path to those data must be specified in input.geos. See the planeflight diagnostic documentation for details. Default value is |
|
Boolean to turn on the GOSAT observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is |
|
Boolean to turn on the TCCON observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is |
|
Boolean to turn on the AIRS observation operator in GEOS-Chem. This will save out text files comparing GEOS-Chem to observations, but has to be manually incorporated into the IMI. Default value is |
Advanced settings: Local cluster
These settings are intended for advanced users who wish to (run the IMI on a local cluster).
|
Path for IMI runs and output. |
|
Path to GEOS-Chem input data. |
|
Path to TROPOMI input data. |
|
Path to file containing Conda environment settings. |
|
Name of conda environment. |
|
Boolean for downloading an initial restart file from AWS S3. Default value is |
|
Path to initial GEOS-Chem restart file plus file prefix (e.g. |
|
Path to initial GEOS-Chem restart file plus file prefix (e.g. |
|
Path to GEOS-Chem boundary condition files (for nested grid simulations). |
|
Version of TROPOMI smoothed boundary conditions to use (e.g. |
|
Boolean to download missing GEOS-Chem data for the preview run. Default value is |
|
Boolean to download missing GEOS-Chem data for the spinup simulation. Default value is |
|
Boolean to download missing GEOS-Chem data for the production (i.e. Jacobian) simulations. Default value is |
|
Boolean to download missing GEOS-Chem data for the posterior simulation. Default value is |
|
Boolean to download missing GEOS-Chem data for the preview run. Default value is |
|
Boolean to download missing GEOS-Chem boundary condition files. Default value is |
Note for *DryRun
options: If you are running on AWS, you will be charged if your ec2 instance is not in the us-east-1 region. If running on a local cluster you must have AWS CLI enabled or you can modify the ./download_data.py
commands in setup_imi.sh
to use washu
instead of aws
. See the GEOS-Chem documentation for more details.
IMI preview
The IMI preview feature allows users to estimate the quality and information content of a proposed inversion without actually performing the inversion.
Under the default configuration, the IMI performs only the preview and then stops. This is to prevent accidental initiation of low-quality but potentially expensive inversions. For more details about the preview (default) configuration, see the Common configurations page.
To run the preview after selecting a region and time period of interest in the configuration file
(and modifying any other configurable settings), simply run the IMI with the DoPreview
configuration option set to true
.
The IMI preview provides the following information for users to assess their proposed inversion (as it is described in the configuration file):
Map of mean TROPOMI observations for the region and period of interest
Map of prior emission estimates to be used in the inversion
Map of observation density for the region and period of interest
Map of mean SWIR albedo for the region and period of interest
Total number of observations available in the region of interest during the inversion period
Rough estimate of degrees of freedom for signal (DOFS) for the inversion
Rough estimate of USD financial cost of the inversion
This information is generated as a .txt
file and collection of .png
files in the preview directory,
which is located at:
/home/ubuntu/imi_output_dir/{YourRunName}/preview_run/
The .txt
file can be viewed directly in the terminal. To view the .png
files, first download them from
EC2 to your local computer using:
$ scp -i /local/path/to/my-key-pair.pem ubuntu@my-instance-public-dns-name:/path/to/my-file.png /local/path/to/my-file.png
For more informaton on this command, see the AWS Documentation.
Tips for interpreting the IMI preview results:
Inspect the maps of XCH4 observations and observation density to evaluate TROPOMI coverage for the region and period of interest.
Compare the maps of XCH4 observations and prior emission estimates to evaluate spatial correspondence between the two datasets.
Compare the maps of XCH4 observations and SWIR albedo to confirm that there are no obvious albedo-related artifacts in the methane retrieval field, which would be diagnosed by high spatial correlation between XCH4 and albedo.
DOFS > 1 is a bare minimum to achieve any solid information about emissions.
DOFS < 2 is marginal for most applications.
If there is an obvious mismatch between the XCH4 observations and the prior emissions (for example due to severe bias in the prior inventory) OR if the expected DOFS is low, consider: (a) using an improved prior inventory, (b) increasing the inversion period to incorporate more observations, and/or (c) increasing the prior error estimate.
If there is indication of albedo-related artifacts in the XCH4 field, consider removing the affected observations. This can be done by modifying the TROPOMI data filters via the
filter_tropomi()
function in/home/ubuntu/integrated_methane_inversion/src/inversion_scripts/utils.py
.
Tips for minimizing AWS costs
Switching instance types
The IMI can be very efficient when run on an EC2 instance with significant compute power. But if you only wish to run the IMI preview or analyze output data, then much of this compute power will be wasted.
Thankfully, it is possible to switch the instance type of an existing instance if you expect to be doing less compute-heavy work. See the AWS Documentation on how to change your instance type for more information.
Spot instances
Spot instances take advantage of unused compute capacity on the AWS cloud, allowing users to launch instances at a 70-90% reduction in price compared to on-demand instances.
However, this reduced pricing comes with the understanding that AWS can take back this extra capacity at any time, so your instance may be interrupted (with a 2 minute warning), causing the IMI to crash. Interruptions are generally rare (~5% of instances get interrupted) and once interupted your instance will be either Stopped or Terminated depending on the EC2 configuration.
We recommend using spot instances for inversions that take hours (not days) as it can greatly reduce your EC2 costs. For information on how to launch a spot instance see Create a Spot Instance Request. For more information on how to avoid and handle interruptions check out this post on Best Practices.
Selecting storage volume size
AWS charges continuous fees for the storage volume provisioned to an EC2 instance. These fees can become significant if you retain the volume for long periods of time (weeks/months).
It is best to provision only the amount of storage needed, and to delete your volume once finished with it to minimize costs.
You can always add storage space after launching an EC2 instance, but it is very difficult to retroactively reduce storage space; see the AWS Documentation for details.
Note
When unsure of the storage needs for an inversion, we recommend starting small. A good starting point is ~100 GB.
To determine your true storage needs, first ssh
into the instance and run a 1-week inversion for
your region of interest. When the 1-week inversion is complete, check how much storage has been used.
From there, you can scale-up the storage according to your actual period of interest. Consider that
the AMI itself takes about 20 GB of storage.
For example, if after the 1-week inversion you find that 75/100 GB are occupied, then you should budget 75 - 20 = 55 GB per inversion week. If you want to perform a 1-year inversion, then increasing the storage to 3.5 TB will leave you with about 500 GB of additional space to work with once the inversion is complete.
Exporting data to S3
Storing data in EBS volumes is more expensive than storing data in Amazon S3. Additionally, with S3 you are only charged for the amount of space you use, whereas EBS volumes charge you for the amount of space provisioned.
For these reasons, after running the IMI, we recommend pushing your output data to an S3 bucket for long term storage, rather than retaining the entire EBS volume. Resources for creating an S3 bucket and pushing data to it can be found here:
Running the IMI with tmux
The IMI can be run with tmux as an alternative to sbatch. Like sbatch, tmux allows you to run a program on your EC2 instance, disconnect, and then reconnect later to check progress.
Because of the way the IMI is parallelized, using tmux can grant a small to moderate speed-up.
Note
Before running the IMI with tmux, make sure the UseSlurm
option in the configuration file
is set to false
.
Using tmux
tmux comes preinstalled on the AMI. To start tmux run the following:
$ tmux
This enters a tmux shell. From there you can run the inversion script:
$ ./run_imi.sh > imi_output.log
This will start the workflow. To keep it running in the background, press ctrl-b
.
Then press d
(without holding ctrl
) to detach the tmux shell and get back to the original terminal.
At this point you can disconnect the from ssh and the IMI will continue to run in the background.
To check back in on the IMI, ssh back onto the EC2 instance and run the following to attach to the active tmux session:
$ tmux attach-session -t 0
The IMI Kalman Filter mode
What is a Kalman Filter Inversion?
A Kalman filter is a mathematical algorithm, developed by Rudolf Kalman, that estimates the state of a system by combining measurements and predictions while considering uncertainties. It operates recursively, continuously updating its estimate of the system state based on new measurements.
Kalman filters can be applied in atmospheric inversions by dividing an inversion period into smaller time intervals, such as weekly chunks. An inversion is sequentially run for each interval, estimating the emissions for that specific period based on measurements and predictions. The resulting optimized emissions are then used as prior emissions for the next interval, allowing the prior emissions of each successive week to be informed by the previous weeks.

Why use Kalman Filter Mode?
This approach enables tracking of how emissions change over time and provides insights into their distribution throughout the inversion period. By using the Kalman filter mode in the inversion, users can calculate intermediate emissions at the desired update frequency, such as weekly, revealing the temporal evolution of emissions.
How to use the Kalman Filter mode
The IMI Kalman Mode can be applied simplyy by updating the KalmanMode
config variable to true
.
This will enable the Kalman filter mode using the specified update frequency, nudge factor, and first
period.
Example Kalman filter config variables:
## Kalman filter options
KalmanMode: true
UpdateFreqDays: 7
NudgeFactor: 0.1
FirstPeriod: 1
UpdateFreqDays
The update frequency (UpdateFreqDays
) is the number of days to for each chunked inversion time
interval when running the kalman filter. Selecting a shorter update frequency will result in more
inversion chunks and a longer inversion run time. However, if the observation density per update
frequency interval you choose is too sparse, the inversion will not constrain emissions effectively.
Thus, the optimal update frequency will depend on the region of interest and the observation density.
Typically, areas with dense TROPOMI coverage can be updated on a weekly basis.
The NudgeFactor
A true Kalman Filter would use the posterior emissions from the previous interval as the prior
emissions for the next interval. However, in practice, a direct substitution of the posterior emissions
as the prior in the subsequent interval can lead to some emission elements getting locked at very
low values. Retaining some information from the prior emissions can help to avoid this issue
(Varon et al., 2023 ). The Kalman filter
mode in the IMI allows users to specify a nudge factor, which is the fraction of the original emissions
inventory that is retained in the prior for the next iteration. The rest of the emissions
(1 - NudgeFactor
) come from the posterior emissions of the previous iteration.
FirstPeriod
The FirstPeriod
config variable allows a user to select which chunked interval they would like
the Kalman Filter to start on. This is most useful if you have a number of periods succeed eg. 5 out
of 8 inversion intervals succeed, and you would like to start the Kalman Filter on the 6th period. The
FirstPeriod
variable is set to 1 by default, which means the Kalman Filter will start on the
first inversion time interval. If you would like to start the Kalman Filter on the 6th period, you would set
FirstPeriod
to 6. The FirstPeriod
variable is a convenience variable, and is not required to
run the Kalman Filter mode. If you do not specify a FirstPeriod
, the Kalman Filter will start on
the first inversion time interval by default.
Running the Kalman Filter mode
The Kalman Filter mode can be run in the same way as the standard IMI inversion mode. Each step of the
inversion can be toggled on or off based on the config variable toggles
(eg. DoSetup
, DoSpinup
). However, in Kalman mode, DoJacobian
, DoInversion
,
DoPosterior
must all be toggled on or off at the same time because the jacobian, inversion, and
posterior steps are dependent on each other for each inversion interval. The IMI will print an error
message if these variables are not toggled in tandem.
Clustering in Kalman Filter mode
Clustering the state vector in Kalman Filter mode is the same as clustering in standard IMI mode, but
with one optional, additional feature. By setting the config variable DynamicKFClustering
to
true
, the state vector will be updated at each iteration of the Kalman Filter. This is recommended
for areas with large seasonal differences in observation density to ensure that the clustering algorithm
allocates high resolution state vector elements to areas with enough observations to constrain them.
Generated state vectors at each iteration will be archived in the <imi-run-dir>/archive_sv
directory.
For more information on clustering, see the
Clustering options page.
Visualizing the results of the Kalman Filter
The results of each chunked inversion time interval can be visualized using the standard visualization
notebook located in <imi-run-dir>/kf_inversions/period<period_number>/visualization_notebook.ipynb
.
Additionally, we include another visualization notebook that can be used to visualize the results of
the time series of varying emissions for the entire inversion period. This notebook is located in
<imi-run-dir>/kf_inversions/kf_notebook.ipynb
.

Setting up Jupyter on EC2
The IMI relies on Jupyter notebooks to visualize the results of inversions. However, in order to view and run Jupyter notebooks you will need to set up a jupyter server on your EC2 instance and securely access it from your local browser. You can do this using a few different methods:
Using an SSL certificate (AWS Recommended)
Follow these short instructions to set up and connect to a jupyter notebook server on AWS using a self-signed SSL certificate. Note: If you are using Git-BASH, or similar software with ssh, you can follow the Configure a Linux or macOS Client section, which provides a simpler setup than the Windows instructions.
Using an automatically generated authentication token
If the above AWS recommended method is causing trouble you can also use the following method to create and connect to a jupyter server using an authentication token. The authentication token is a randomly generated hash code appended to the jupyter server url. The token, similar to a password, verifies you have permission to access the server.
To set up a jupyter notebook server on your ec2 instance, run the following command on your remote/EC2 terminal:
$ jupyter notebook --no-browser --port 8080
This will start a jupyter server on port 8080 and will print out a link with an authentication token, eg:
$ jupyter notebook --no-browser --port 8080
....
http://localhost:8080/?token=7a7ae708966c68e631bc76ba9eae7b1d287e4747cf7072e7
Then in a new local terminal (or GIT-Bash) window run the following command:
$ ssh -NL 8080:localhost:8080 -i /path/to/private_key
This creates an ssh tunnel from your ec2 instance to your local computer over port 8080, which will allow you to view your jupyter notebooks from your browser. Go to the link outputted from your remote serve command above (eg. http://localhost:8080/?token=7a7ae708966c68e631bc76ba9eae7b1d287e4747cf7072e7).
Creating a custom state vector file
By default the IMI uses latitude/longitude bounds to automatically create a gridded state vector file for a rectilinear region of interest with surrounding buffer elements.
The state vector file is located at
/home/ubuntu/imi_output_dir/{YourRunName}/StateVector.nc
It contains the state variable labels for every grid cell in the inversion domain. For example, if the region of interest contains 200 emission elements and the IMI is configured to use 8 additional buffer elements, then the total number of state variables is 208 and the state vector file will assign a number between 1 and 208 to every grid cell in the inversion domain.
Instead of the default rectilinear region of interest, you may want to use an irregular region as was done for the Permian Basin by Varon et al. (2022; link_TODO). To do so you will need to generate the state vector file yourself.
The easiest way to do this is by using a shapefile for the region of interest in conjunction with the
statevector_from_shapefile.ipynb
jupyter notebook. The notebook is located at
/home/ubuntu/integrated_methane_inversion/src/notebooks/statevector_from_shapefile.ipynb
First upload a shapefile for the custom region of interest to your EC2 instance:
$ scp -i /local/path/to/my-key-pair.pem /local/path/to/my-shapefile ubuntu@my-instance-public-dns-name:/path/to/my-shapefile
Next, open the configuration file and insert the path to your shapefile in the custom state vector section. Also provide the latitude/longitude bounds for the desired inversion domain, which will include both the irregular region of interest and the additional coarse buffer elements.
Next, follow these short instructions to set up and connect to
a jupyter notebook server on AWS. Once connected to the server, open statevector_from_shapefile.ipynb
and run its contents to
generate a state vector file from your shapefile.
If no shapefile is available, you will need to construct the custom state vector file manually. You may want to start from an automatically generated rectilinear state vector file.
Using the IMI Clustering Options
Why use the clustering options?
The main computational cost of the IMI is running the perturbation simulations necessary to construct the jacobian. This requires running a (GEOS-Chem) Jacobian simulation for each state vector element. The default state vector that is generated with the IMI has state vector elements in native resolution, meaning each element corresponds with a GEOS-Chem grid cell (.25 degree or .5 degree resolution). However, if your state vector has a sufficiently large number of elements this can limit the feasibility of running the IMI – either due to prohibitively high AWS costs or compute time. Clustering your state vector elements reduces the number of state vector elements by aggregating elements together.
Using the IMI clustering config options
To enable the IMI clustering options in the imi config file set
ReducedDimensionStateVector: true
. This enables the clustering component of the IMI.
Once enabled the IMI uses your specified NumberOfElements
to aggregate native resolution state vector elements
within your domain of interest using the specified ClusteringMethod
. eg:
ReducedDimensionStateVector: true
ClusteringMethod: "kmeans"
NumberOfElements: 39
This automatically generates a state vector with 39 elements (including buffer elements) in the
domain of interest. This is done by creating a set of information content informed clustering pairs (eg. [[1, 15], [2, 24]]).
Note: As you reduce the dimension of your state vector, you should also correspondingly decrease the
value of your regularization factor Gamma
. It can be scaled by the ratio of reduced number of
elements over the original number of elements (eg. len(new_elements)/len(orig_elements
)).
Each clustering pair consists of the the aggregation level and the number of cells you are allocating with the aggregation level. In the above example, the user is requesting 39 total state vector elements and the algortithm determines the information content informed pattern to be 15 native resolution state vector elements and 24 state vector elements to be aggregated with another element. Any additional elements that have not been allocated are then aggregated into a single element. Using the above clustering pairs, if the domain of interest has 63 elements in the original state vector, 15 of the elements would maintain the original resolution and 48 of the elements would be aggregated into 24 2-gridcell elements. If the original state vector has 75 elements in the domain of interest, then the remaining 12 unallocated elements are aggregated into a single element, netting a new state vector with 40 elements in the domain of interest.
The cluster pairings are generated by aggregating elements until they reach a threshold in the
estimated DOFS per cluster, which is a measure of information content. We find using the threshold of
total_DOFs / num_state_vector_elements
provides a reasonable result.
The ClusteringMethod
specifies which clustering method to use for state vector reduction. Currently
kmeans
or mini-batch-kmeans
are valid options. mini-batch-kmeans
is very similar to kmeans
,
but can be less accurate. It is best used for very large state vectors to speed up state vector reduction.
Note: The IMI preserves the original state vector file as NativeStateVector.nc in your run directory.
Incorporating point source information
If you have prior information of specific locations that you would like to maintain high resolution
(eg. point source detections) you can ensure the clustering algorithm preserves these locations by
using the ForcedNativeResolutionElements
config variable. This variable takes a list of lat/lon
locations using either yaml list or a path to a csv file.
For instance, if the user suspects a location to be an emission hotspot they can specify the
lat/lon coordinates as in the examples below and the clustering algorithm will ensure that the
native resolution element is preserved during the aggregation. In order for the IMI to
preserve the element, you must have enough NumberOfElements
specified to accomodate the
number of gridcells you would like to force to be native resolution.
Additionally, the PointSourceDatasets
config variable can be used to automatically scrape emission
hotspots from external point source datasets. Currently, the only supported dataset is the "SRON"
weekly plumes dataset.
yaml list example:
PointSourceDatasets: ["SRON"]
ForcedNativeResolutionElements:
- [31.5, -104]
- [32.5, -103.5]
csv file example:
PointSourceDatasets: ["SRON"]
ForcedNativeResolutionElements: "/path/to/point_source_locations.csv"
The csv file should have a header row with the column names lat
and lon
using lowercase letters.
The csv file can have additional columns, but they will be ignored.
Dynamic Kalman Filter clustering
When running the IMI in Kalman Filter mode, users can dynamically adjust clusters at each Kalman iteration
to best reflect the available information content by setting the DynamicKFClustering
variable to
true
. See the Kalman Filter IMI documentation for more details.
IMI clustering scheme
The IMI clustering algorithm uses a similar k-means based method as described in Nesser et al., 2021 to maintain native resolution in areas with high information content (high prior emissions, high observation density), while aggregating cells with low information content.
Reducing computational cost while maintaining inversion quality
While clustering is an effective method for alleviating computational constraints for running inversions at high resolution for large regions, it can introduce aggregation error and degrade the quality of your inversion (Turner and Jacob., 2014 ). Therefore, it is important to weigh the computational benefits of reducing your state vector against the inversion quality loss. This can be done by iteratively tuning the cluster pairings and running the IMI preview.IMI preview to assess the estimated DOFS. Ideally, you should find a middle groud where the estimated DOFS and computation cost is at a acceptable level before proceeding with the inversion.
Modifying prior emission estimates
To modify the default prior emission inventories, first generate the template run directory following the configuration instructions for modifying emissions.
Once the template run directory is ready, you will need to modify the emission inventories via HEMCO.
Start by transferring your custom emission inventory to EC2:
$ scp -i /local/path/to/my-key-pair.pem /local/path/to/my-inventory.nc ubuntu@my-instance-public-dns-name:/path/to/my-inventory.nc
The emissions need to be defined in a netcdf file formatted as HEMCO expects. See the HEMCO documentation for preparing data files for details on how to format your custom emission inventory for use with HEMCO.
Once your inventory has been properly formatted, you can include it as an emission field via HEMCO. To do this, navigate to the template
run directory and open the HEMCO configuration file with vim (vi
) or emacs:
$ cd /home/ubuntu/imi_output_dir/{YourRunName}/template_run
$ emacs HEMCO_Config.rc
Follow instructions in the HEMCO User’s Guide to add a new emission field.
You can run the IMI preview to quickly check that the updated emissions are working as expected.
Using custom regions with the IMI
The IMI supports regions within the following domains:
Africa: 37°S-40°N, 20°W-53°E
Asia: 11°S-55°N, 60°E-150°E
Europe: (33°N-61°N, 30°W-70°E
Middle East: 12°N-44°N, 20°W-70°E
North America: 10°N-70°N, 140°W-40°W
Oceania 50°S-5°N, 110°E-180°E
Russia 41°N-83°N, 19°E-180°E
South America 59°S-16°N, 88°W-31°W
These are the nested-grid windows used in GEOS-Chem for which pre-cut meteorological files are available. You may apply the IMI to other regions, but this requires either using global meteorological fields which can be computationally expensive (not recommended) or cropping global meteorological fields via a pre-processing step.
To facilite cropping global meteorological fields, a sample script (crop_met.sh) has been included with the IMI. This script utilizes the Climate Data Operators (CDO) . It also includes an option to first download global meteorological fields at 0.25° x 0.3125° resolution. The global files are large (approx. 300G per month), so when using that option it is recommend that you process short periods at a time and delete the global files before processing additional periods.
In a text editor, modify the user settings section in crop_met.sh
. The region
defined in crop_met.sh
should be the same or larger than the domain defined for your
IMI in config.yml.
|
Two-letter string to identify region (e.g. |
|
Minimum longitude edge of the region of interest. |
|
Maximum longitude edge of the region of interest. |
|
Minimum latitude edge of the region of interest. |
|
Maximum latitude edge of the region of interest. |
|
Boolean for downloading global 0.25° x 0.3125° meteorology fields for cropping. Default is |
|
Boolean for deleting global meteorology files after cropping. Default is |
|
Directory containing the global high-resolution meteorology fields. |
|
Directory where the cropped meteorology files will be placed. We recommend specifying this as |
The cropped meteorology files can be generated by then executing ./crop_met.sh
at the
command line or submitting the script to your cluster’s scheduler if available. Headers for the
SLURM scheduler are included at the top of the script, but you can modify or remove those as needed.
To utilize the cropped meteorology files in the IMI, you will need to create a new IMI directory. Modify config.yml
so that NestedRegion
matches the value set in crop_met.sh
. This will automatically
add the region ID string in the appropriate locations in the HEMCO_Config.rc
files utilized
by GEOS-Chem.
If you have regional emissions that you would like to use, please see modifying prior emission estimates.
Finally, you can run the IMI preview to quickly check that the IMI is working as expected for your custom region.
Constructing an inversion ensemble
After performing an inversion, you can use the IMI to create a low-cost ensemble of sensitivity inversions with different inversion parameters. This is because the Jacobian matrix computed in the first inversion can easily be reused.
See the Common configurations page
for instructions on how to re-configure the IMI to use a pre-computed Jacobian. Then modify
the values of PriorError
, ObsError
, and/or Gamma
in the configuration file and re-run the inversion.
Note
Make sure to archive the final results of the original inversion (inversion_result.nc
and gridded_posterior.nc
)
before running the sensitivity inversion. Those files will be overwritten.
If you want to run a sensitivity inversion with updated prior emission inventories, the pre-computed Jacobian needs to be scaled according to the differences between the original and updated inventories. Instructions for this to come.
Running the IMI on a local cluster
The IMI is setup to run on AWS by default. However, if you have a local cluster available to you, you may choose to run the IMI there. This option requires some manual changes and is therefore only recommended for advanced users.
You must first ensure you have the proper hardware and software requirements for running GEOS-Chem.
When logged onto your local cluster, navigate to the path where you want to download the IMI repository and type the following command:
$ git clone https://github.com/geoschem/integrated_methane_inversion.git
This will clone the IMI code into a local folder named integrated methane_inversion
.
Tip
If you wish, you can clone the IMI repository into a different local folder by supplying the name of the folder at the end of the git clone command. For example:
git clone https://github.com/geoschem/integrated_methane_inversion.git imi-v1.0
Next, download the GEOS-Chem source code and its submodules within the IMI folder using these commands:
$ cd integrated_methane_inversion
$ git clone https://github.com/geoschem/GCClassic.git
$ cd GCClassic
$ git submodule update --init --recursive
See Downloading the GEOS-Chem source code for more details.
Navigate back to the top-level IMI folder and view the contents:
$ cd ..
$ ls
config.yml envs/ LICENSE.md resources/ setup_imi.sh*
docs/ GCClassic/ README.md run_imi.sh* src/
Within the IMI is a subfolder called envs
that constains files for
running the IMI on different systems. By default, files are provided
for AWS and Harvard’s Cannon cluster.
$ ls envs/*
envs/aws:
conda_env.yml slurm/ spack_env.env
envs/Harvard-Cannon:
ch4_inv.yml gcclassic.rocky+gnu10.minimal.env* gcclassic.rocky+gnu10.env*
config.harvard-cannon.yml gcclassic.rocky+gnu12.minimal.env* README
We recommend you add a subfolder within envs
for your own system
to easily access your customized files needed for the IMI. In this
directory, we recommend storing any environment files needed to load
the libraries for GEOS-Chem (e.g. fortran compiler, netcdf, openmpi,
cmake), a conda environment file, and a copy of the IMI configuration file
modified for your system. See the files in envs/Harvard-Cannon
for examples.
We recommend basing your config file off of config.harvard-cannon.yml
.
Within the copied IMI configuration file, you will need to modify the
settings in the section labeled “Settings for running on your local
cluster.” If you already have the GEOS-Chem input data on your system,
you may set the *DryRun
options to false
.
It is recommended that you set up and run the IMI in stages when
running on a local cluster to ensure that each stage works
properly. You can do this by modifying the settings under “Setup
modules” and “Run modules” and turning them on one or a few at a
time. You may find that you need to manually edit some files. For
example, after creating the template run directory, but before
creating your spinup, Jacobian, and posterior run directories, you should open
ch4_run.template
in a text editor and modify as needed for your
system (by default this script is set up to submit to a SLURM
scheduler).
Once your have finished customizing the IMI settings for your cluster,
you can run the IMI by executing run_imi.sh
and passing an
argument for the location of your IMI configuration file. For example:
$ ./run_imi.sh config.harvard-cannon.yml
If you do not pass a configuration file, config.yml
in
the top-level IMI directory will be used. That file is set up for
running the IMI on AWS by default.
You can also run the IMI with slurm if your local cluster supports this by running:
$ sbatch -p <partition-name> -c <num-cores> --mem <amount-mem> -t <time-limit> ./run_imi.sh config.harvard-cannon.yml
Using the IMI Docker container
What is a container?
A Docker container is a lightweight, standalone, and executable software package that encapsulates an application and all its dependencies, including libraries, frameworks, and system tools. Docker containers provide a consistent and reproducible environment, ensuring that an application can run consistently across different systems, such as local clusters, cloud servers, and even local computers.
Why use the IMI Docker container?
Aside from providing a consistent environment, using the IMI container can significantly ease installation of the IMI on a new system. This is because the container has all the necessary dependencies and source code for running the IMI preinstalled and preloaded. This equals easier setup for you.
Additionally, Docker containers lend themselves very well to automated workflows, so using a docker container version of the IMI can make it easier to set up scheduled inversions of the IMI.
How to use the IMI Docker container
Prerequisites
To use the IMI Docker container, you must have Docker installed on your system. Docker can be installed on Windows, Mac, and Linux systems. For instructions on how to install Docker on your system, see the Docker documentation.
Additionally, configuring the container for your particular application is much easier if you have the docker compose plugin installed as well. For instructions on how to install docker compose, see the Docker documentation.
Note: if your cluster does not support Docker, you can also use Singularity as an alternative to Docker. See the section on Using Singularity instead of Docker for more information.
Pulling the image
To run the container you will first need to pull the image from our cloud repository:
$ docker pull public.ecr.aws/w1q7j9l2/imi-docker-image:latest
Setting up the compose.yml file
The IMI needs access to both input data and personalized configuration variables for running the inversion for your desired region and period of interest. In order to supply these settings we use a docker compose.yml file. The compose file allows you to input environment variables and mount files/directories from your local system into the container. This allows you to more easily configure the IMI and save the output directory to your local system.
IMI input data
The IMI needs input data in order to run the inversion. If you do not have the necessary input data available
locally then you will need to give the IMI container access to S3 on AWS, where the input data is available. This
can be done by specifying your
aws credentials in
the environment
section of the compose.yml file. Eg::
environment:
- AWS_ACCESS_KEY_ID=your_access_key_id
- AWS_SECRET_ACCESS_KEY=your_secret_access_key
- AWS_DEFAULT_REGION=us-east-1
Note: these credentials are sensitive, so do not post them publicly in any repository.
If you already have the necessary input data available locally, then you can mount it to the IMI container in the volumes section of the compose.yml file without setting your aws credentials. Eg::
volumes:
- /local/input/data:/home/al2/ExtData # mount input data directory
Storing the output data
In order to access the files from the inversion it is best to mount a volume from your local system onto the docker container. This allows the results of the inversion to persist after the container exits. We recommend making a dedicated IMI output directory using mkdir.:
volumes:
- /local/output/dir/imi_output:/home/al2/imi_output_dir # mount output directory
- /local/container/config.yml:/home/al2/integrated_methane_inversion/config.yml # mount desired config file
Updating the config.yml file
The config.yml file configures the IMI to run according to your specific inversion requirements. There are two mechanisms to update the config.yml file:
If you would only like to update specific variables you can pass them in as environment variables:
All environment variables matching the pattern IMI_<config-variable-name>
will update their corresponding config.yml
variable. For example::
environment:
- IMI_StartDate=20200501
- IMI_EndDate=20200601
will replace the StartDate
and EndDate
in the IMI config.yml file.
Replace the entire config.yml file with one from the host system:
To apply a config.yml file from your local system to the docker container, specify it in your compose.yml file as a
volume. Then set the IMI_CONFIG_PATH
environment variable to point to that path. Eg::
volumes:
- /local/path/to/config.yml:/home/al2/integrated_methane_inversion/config.yml # mount desired config file
environment:
- IMI_CONFIG_PATH=/home/al2/integrated_methane_inversion/config.yml # should point to the path in the container
Note: any env variables matching the pattern specified in option 1 will overwrite the corresponding config vars in IMI_CONFIG_PATH.
Example compose.yml file
This is an example of what a fully filled out compose.yml file looks like::
# IMI Docker Compose File
# This file is used to run the IMI Docker image
# and define important parameters for the container
services:
imi:
image: public.ecr.aws/w1q7j9l2/imi-docker-image:latest
volumes:
# comment out any volume mounts you do not need for your system
- /local/container/config.yml:/home/al2/integrated_methane_inversion/config.yml # mount desired config file
- /local/input/data:/home/al2/ExtData # mount input data directory
- /local/output/dir/imi_output:/home/al2/imi_output_dir # mount output directory
environment:
# comment out any environment vars you do not need for your system
- IMI_CONFIG_PATH=config.yml # path starts from /home/al2/integrated_methane_inversions
## ***** DO NOT push aws credentials to any public repositories *****
- AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
- AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
- AWS_DEFAULT_REGION=us-east-1
Running the IMI
Once you have configured the compose.yml file, you can run the IMI by running::
$ docker compose up
from the same directory as your compose.yml
file. This will start the IMI container and run the inversion.
The output will be saved to the directory you specified in the compose.yml file.
Alternatively, if you chose not to install docker compose
you should be able to run the IMI using the
docker run command, but this requires specifying
all env variables and volumes via flags.
Using Singularity instead of Docker
We use Docker Docker to containerize the IMI, but the docker containers can also be run using Singularity. Singularity is a container engine designed to run on HPC systems and local clusters, as some clusters do not allow Docker to be installed. Note: using Singularity to run the IMI is untested and may not work as expected.
First pull the image::
$ singularity pull public.ecr.aws/w1q7j9l2/imi-docker-image:latest
Then run the image::
$ singularity run imi-docker-repository_latest.sif
Common IMI configurations
This page provides examples of how to configure the IMI setup modules and run modules to accomplish some common tasks.
Default (preview) configuration
By default the IMI will download the TROPOMI data for the period of interest, set up the template GEOS-Chem run directory, run the preview, and then stop.
## Setup modules
SetupTemplateRundir: true
SetupSpinupRun: false
SetupJacobianRuns: false
SetupInversion: false
SetupPosteriorRun: false
## Run modules
RunSetup: true
DoSpinup: false
DoJacobian: false
DoInversion: false
DoPosterior: false
## IMI preview
DoPreview: true
If the results of the preview are satisfactory, you can try the next configuration on this page to run the inversion. If they are not satisfactory, modify the configuration file (e.g., the region and/or period of interest) and try again.
Running an inversion after the preview
If the preview is complete and the results are satisfactory, you can proceed with the inversion (without re-running the preview).
## Inversion
PrecomputedJacobian: false
## Setup modules
SetupTemplateRundir: false
SetupSpinupRun: true
SetupJacobianRuns: true
SetupInversion: true
SetupPosteriorRun: true
## Run modules
RunSetup: true
DoSpinup: true
DoJacobian: true
DoInversion: true
DoPosterior: true
## IMI preview
DoPreview: false
Running a sensitivity inversion
You’ve completed an initial inversion. Use the following configuration to run a new inversion with modified prior error (PriorError
),
observational error (ObsError
), or regularization parameter (Gamma
).
## Inversion
PrecomputedJacobian: true
ReferenceRunDir: "/path/to/your/run/dir"
## Setup modules
SetupTemplateRundir: false
SetupSpinupRun: false
SetupJacobianRuns: false
SetupInversion: false
SetupPosteriorRun: false
## Run modules
RunSetup: false
DoSpinup: false
DoJacobian: false
DoInversion: true
DoPosterior: false
## IMI preview
DoPreview: false
Note that the final results of the original inversion (inversion_result.nc
and gridded_posterior.nc
)
will be overwritten if not archived before running the sensitivity inversion.
Running an inversion without the preview
We generally don’t recommend doing this, but if you wish to perform an inversion without manually inspecting the results of the IMI preview, use the following configuration to run the IMI from end to end, with a threshold on the expected degrees of freedom for signal (DOFS) to cancel the inversion; if the expected DOFS are below the threshold, the IMI will exit with a warning.
## Setup modules
SetupTemplateRundir: true
SetupSpinupRun: true
SetupJacobianRuns: true
SetupInversion: true
SetupPosteriorRun: true
## Run modules
RunSetup: true
DoSpinup: true
DoJacobian: true
DoInversion: true
DoPosterior: true
## IMI preview
DoPreview: true
DOFSThreshold: {insert-threshold-value}
Modifying prior emission estimates
Set up the template run directory
## Setup modules
SetupTemplateRundir: true
SetupSpinupRun: false
SetupJacobianRuns: false
SetupInversion: false
SetupPosteriorRun: false
## Run modules
RunSetup: true
DoSpinup: false
DoJacobian: false
DoInversion: false
DoPosterior: false
## IMI preview
DoPreview: false
Run the preview
After modifying the prior emission inventories, run the preview without setting up the template run directory.
## Setup modules
SetupTemplateRundir: false
SetupSpinupRun: false
SetupJacobianRuns: false
SetupInversion: false
SetupPosteriorRun: false
## Run modules
RunSetup: true
DoSpinup: false
DoJacobian: false
DoInversion: false
DoPosterior: false
## IMI preview
DoPreview: true
If satisfied with the preview results, continue with one of the above configurations to run the inversion.
IMI directory contents
This page describes the contents of various file directories generated and populated by the IMI in the course of an inversion.
Inversion directory
The inversion directory is where the IMI computes the Jacobian, obtains the optimal estimate of emissions, and saves the results.
It is located at /home/ubuntu/imi_output_dir/{YourRunName}/inversion
.
In addition to a shell script and several Python scripts used in the inversion, you will find the following items in the inversion directory after completing an inversion:
|
Directory of Python
.pkl files containing
for each TROPOMI orbit relevant to the inversion.
All quantities have been “converted” to 1D fields indexed by latitude and longitude.
|
|
Directory of Python
.pkl files containing
for each TROPOMI orbit relevant to the inversion.
All quantities have been “converted” to 1D fields indexed by latitude and longitude.
|
|
Directory of
.nc files containing daily GEOS-Chem SpeciesConc output from the reference simulation.These files are used to generate virtual TROPOMI observations for comparison with the true observations.
|
|
Directory of
.nc files containing daily GEOS-Chem SpeciesConc output from the posterior simulation.These files are used to generate virtual TROPOMI observations for comparison with the true observations.
|
|
Directory of
.nc files containing daily 4-D GEOS-Chem sensitivities to perturbations in the
state variables of the inversion (i.e., in the emission elements being optimized).The data have dimensions
(element, lev, lat, lon) , where element is the emission element id
(state variable id) and lev is the vertical dimension.These files are used to compute the Jacobian matrix by application of the TROPOMI operator.
|
|
File containing the raw output of the inversion (
invert.py ) as vectors (posterior emission
estimate) and matrices (posterior error covariance matrix, averaging kernel matrix). |
|
File containing the posterior emission estimate, posterior error covariance matrix, and averaging
kernel matrix projected onto the 2-D inversion grid.
|
|
Jupyter notebook for quickly visualizing key results of the inversion.
|
AMI specifications
The Amazon Machine Image (AMI) for the IMI is accessible through the aws marketplace as a (free) product listing.
The latest AMI for the IMI Workflow contains the following software libraries:
GNU Compiler Collection 8.2.0
NetCDF-Fortran 4.5.3
Slurm 17.11.2
Python 3.9.7
GEOS-Chem Classic 13.3.3
TODO: add additional software dependencies
Known bugs
This page links to known bugs in the IMI. See the Github issues page for updates on their status.
Support
For support with the IMI workflow please create an issue on the github repository or email us at integrated-methane-inversion@g.harvard.edu detailing the nature of the issue you are facing. Please attach your IMI config.yml file, any relevant log files, and the version of the IMI you are using.
Example Github Issue Template
### What institution are you from?
Please tell us what institution you are from.
### Description of the problem
Describe your problem here. Describe the steps to reproduce the problem here, if possible.
### Description of troubleshooting performed
Describe any troubleshooting that you have already performed here. Also include any leads or suspicions here.
### IMI version
Enter your IMI version here.
### Description of modifications
Describe any modifications to the IMI here.
### Attach relevant files
- imi_output.log
- imi config.yml file
- any other relevant log files to your issue
Contributing
Contributions can be made in the form of pull requests to our github repository <https://github.com/ACMG-CH4/CH4_inversion_workflow>
.
Editing this User Guide
This user guide is generated with Sphinx.
Sphinx is an open-source Python project designed to make writing software documentation easier.
The documentation is written in a reStructuredText (it’s similar to markdown), which Sphinx extends for software documentation.
The source for the documentation is the docs/source
directory in top-level of the source code.
Quick start
To build this user guide on your local machine, you need to install Sphinx. Sphinx is a Python 3 package and
it is available via pip. This user guide uses the Read The Docs theme, so you will also need to
install sphinx-rtd-theme
. It also uses the sphinxcontrib-bibtex
and recommonmark extensions, which you’ll need to install.
$ pip install sphinx sphinx-rtd-theme sphinxcontrib-bibtex recommonmark
To build this user guide locally, navigate to the docs/
directory and make the html
target.
gcuser:~$ cd gcpy/docs
gcuser:~/gcpy/docs$ make html
This will build the user guide in docs/build/html
, and you can open index.html
in your
web-browser. The source files for the user guide are found in docs/source
.
Note
You can clean the documentation with make clean
.
Learning reST
Writing reST can be tricky at first. Whitespace matters, and some directives can be easily miswritten. Two important things you should know right away are:
Indents are 3-spaces
“Things” are separated by 1 blank line. For example, a list or code-block following a paragraph should be separated from the paragraph by 1 blank line.
You should keep these in mind when you’re first getting started. Dedicating an hour to learning reST will save you time in the long-run. Below are some good resources for learning reST.
reStructuredText primer: (single best resource; however, it’s better read than skimmed)
Official reStructuredText reference (there is a lot of information here)
Presentation by Eric Holscher (co-founder of Read The Docs) at DjangoCon US 2015 (the entire presentation is good, but reST is described from 9:03 to 21:04)
A good starting point would be Eric Holscher’s presentations followed by the reStructuredText primer.
Style guidelines
Important
This user guide is written in semantic markup. This is important so that the user guide remains maintainable. Before contributing to this documentation, please review our style guidelines (below). When editing the source, please refrain from using elements with the wrong semantic meaning for aesthetic reasons. Aesthetic issues can be addressed by changes to the theme.
For titles and headers:
Section headers should be underlined by
#
charactersSubsection headers should be underlined by
-
charactersSubsubsection headers should be underlined by
^
charactersSubsubsubsection headers should be avoided, but if necessary, they should be underlined by
"
characters
File paths (including directories) occuring in the text should use the :file:
role.
Program names (e.g. cmake) occuring in the text should use the :program:
role.
OS-level commands (e.g. rm) occuring in the text should use the :command:
role.
Environment variables occuring in the text should use the :envvar:
role.
Inline code or code variables occuring in the text should use the :code:
role.
Code snippets should use .. code-block:: <language>
directive like so
.. code-block:: python
import gcpy
print("hello world")
The language can be “none” to omit syntax highlighting.
For command line instructions, the “console” language should be used. The $
should be used
to denote the console’s prompt. If the current working directory is relevant to the instructions,
a prompt like gcuser:~/path1/path2$
should be used.
Inline literals (e.g. the $
above) should use the :literal:
role.
Terminology (TODO: edit this page)
- absolute path
The full path to a file, e.g.,
/example/foo/bar.txt
. An absolute path should always start with/
. As opposed to a relative path.- build
See compile.
- build directory
A directory where build configuration settings are stored, and where intermediate build files like object files, module files, and libraries are stored.
- checkpoint file
See restart file.
- compile
Generating an executable program from source code (which is in a plain-text format).
- gridded component
A formal model component. MAPL organizes model components with a tree structure, and facilitates component interconnections.
- HISTORY
The MAPL gridded component that handles model output. All GCHP output diagnostics are facilitated by HISTORY.
- relative path
The path to a file relative to the current working directory. For example, the relative path to
/example/foo/bar.txt
if your current working directory is/example
isfoo/bar.txt
. As opposed to an absolute path.- restart file
A NetCDF file with initial conditions for a simulation. Also called a checkpoint file in GCHP.
- run directory
The working directory for a GEOS-Chem simulation. A run directory houses the simulation’s configuration files, the output directory (
OutputDir
), and input files/links such as restart files or input data directories.- stretched-grid
A cubed-sphere grid that is “stretched” to enhance the grid resolution in a region.
- target face
The face of a stretched-grid that is refined. The target face is centered on the target point.