heatwave setup

Project Heatwave is a custom Python program that accepts user-input criteria (latitude, longitude, solar zenith angle, quality, dust, cloud cover, etc). It will proceed to identify and download AIRS data files from NASA and store them on a local hard drive. Next it will read the files, pull out data points matching input criteria and produce monthly average radiance file as an output .csv

installing a python environment

1. Install Anaconda

2. Start an Anaconda Prompt

3. Activate a unique environment

    1. conda create --name myenv

    2. conda activate myenv

4. Install dependencies:

    1. conda install -c conda-forge pyhdf

    2. conda install requests

    3. conda install pytest

    4. pip install asciimatics

    5. pip install pytest-randomly

5. Conda already has these two installed, so you may likely skip them (or attempt to install them to be sure they are installed):

    1. conda install matplotlib

    2. conda install pandas

installing heatwave

1. https://github.com/rentcp/Heatwave

2. Set up the JSON directories for data and results by editing the first lines of "example.json"

earthdata access

Each execution of Heatwave will require the user to provide a login and password to the EarthData system. There is no cost. Access approval can take 1-2 days. If you are a new user:

  1. Click "Register For A Profile." Complete registration & email verification.

how to run heatwave

1. From a Python shell type:

python cli.py

2. The program will display the list of inputs required, their ranges, and their "<" versus "<=" distinctions.

3. Use a text editor to set preferred parameters in example.json (you can name this anything.json)

4. From a Python shell type:

python cli.py example.json

5. Enter your Earthdata login and password

6. Program will run according to the settings in the example.json file.

build list of accessible files

The \data\ folder contains zipped .csv files, arranged by year, containing every accessible granule (file) produced by AIRS since 2002. From time to time it is necessary to update this dataset

how heatwave works

  1. cli.py, main()

    • The input JSON file is read in, and batches are created based on the date range and batching parameters. For each batch, the entire following program is run through, creating individual output CSVs.

  2. main_controller.py, MainController class, process()

    • If the requested granules aren't stored locally, they are downloaded now. The HDF filter is built for this batch, based on the parameters from the input JSON. aggregate_hdf_data() is called, which causes the extract_granule_dataset() function from hdf.py to be asynchronously run in 10 threads.

  3. hdf.py, extract_granule_dataset()

    • Data from the selected granules is aggregated into 6 dataframes, which contain the data used for filtering and separating data by latitude, plus 2 more containing radiance and radiance quality information. This information is then passed to the filter_dataset() function.

  4. hdf.py, filter_dataset()

    • Latitude buckets are defined, and data points are now filtered. After filtering, data is diverted in to 18 separate buckets by latitude. Data points are now summed and counted. This sum and count, and the filter statistics, are now returned to extract_granule_dataset(). Within this function, the granule, radiance sum, radiance count, and stats are immediately returned back up to the HDFDataAggregator class' process() function. After the asynchronous operation is completed, the aggregated data is passed to calculate_averages_and_filter().

  5. hdf.py, calculate_averages_and_filter()

    • All radiance sum and count information is placed into a dataframe, separated into latitude buckets. The averages are taken and added as an additional column. Then, the sum and count columns are dropped, but a copy of the information from the count column is kept for writing the separate stats CSV. The averages dataframe, filter stats, and count information are returned to the HDFDataAggregator class' process() function, which is immediately returned up to the aggregate_hdf_data() function, which is finally returned back up to the process() function in the MainController class.

  6. main_controller.py, MainController class, process()

    • The output CSVs are written into the temp directory, and this batch is now complete. Filter stats from this batch are returned up to the main() function in cli.py.

  7. cli.py, main()

    • Filter stats from this batch are printed to the console, along with a readout of how long the batch took to process. Filter stats are aggregated together as each batch completes.

The 'for each' loop from step 1 ends here, and now all batches are complete. Aggregated filter stats for the whole run are printed to the console. All of the CSVs from the temporary folder are read and concatenated together into the final output CSVs. The rows in both CSVs are sorted first by wavenumber, then by date.

Finally, the temporary folder is deleted.

If you have read this far, then you probably downloaded this tool and used it. I spent countless late nights and $1550 making it work. If Heatwave worked for you, please consider simply sending me an email: rentcp[at google's mail suffix] or tweet & tag me (link below) letting me know? I'd appreciate it.