diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/README.md b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/README.md index 1246be26..7c7c1f99 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/README.md +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/README.md @@ -15,56 +15,8 @@ To keep processing times low, this workflow has been parallelized. There are 4 v | WRF-Hydro | GWOUT | Calendar Year | hourly | nco_process_gwout.sh | gwout_nco.slurm | XXX | | WRF-Hydro | CHRTOUT | Calendar Year | hourly | nco_process_chrtout.sh | chrtout_nco.slurm | XXX | -## LDASOUT: -#### nco_process_ldasout.sh -##### Script Preparations: -You will need to specify three paths: - - The location of the 3-hour WRF-Hydro output LDASOUT files. - - The location of the static soil properties file. - - The location of where to save the monthly outputs. -##### Overview: --Process porosity & wilting point parameters --Process accumulated flux & state differences --Process mean states --Cleanup names - -## GWOUT: -#### nco_process_gwout.sh -##### Script Preparations: -You will need to specify two paths: - - The location of the hourly WRF-Hydro output GWOUT files. - - The location of where to save the monthly outputs. -*Note: this script has some additional lines of code to deal with filetypes in the depth variable. Renaming the variable seems to fix this bug. Another option is to use older version of the NCO module- this has not been explored yet. -##### Overview: --Process accumulated flux & state differences --Rename "depth" column to "bucket_depth" --Process sums and means --Process flow totals --Process depth average --Cleanup names - -## LDASIN: -#### nco_process_clim.sh -##### Script Preparations: -You will need to specify two paths: - - The location of the hourly CONUS404-BA output LDASIN files. - - The location of where to save the monthly outputs. -*Note: this script has some additional lines of code to deal with this data being organized by Water Year. -##### Overview: --Create totals and averages --Cleanup names - -## CHRTOUT: -#### nco_process_chrtout.sh -##### Script Preparations: -You will need to specify two paths: - - The location of the hourly WRF-Hydro output CHRTOUT files. - - The location of where to save the monthly outputs. -##### Overview: --Create totals and averages --Clean names - -## One Year at a Time: +## Set-Up +### One Year at a Time: Load netcdf operator ``` @@ -82,7 +34,7 @@ Run the shell script. ``` Repeat for other variables. -## Multiple Years at Once: +### Multiple Years at Once: Ensure paths in shell scripts and slurm files are correct. @@ -104,6 +56,908 @@ scancel ``` Repeat for other variables. +## Shell Scripts +
+LDASOUT: + +### [nco_process_ldasout.sh](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_ldasout.sh) +#### Script Preparations: +You will need to specify three paths: + - The location of the 3-hour WRF-Hydro output LDASOUT files. + - The location of the static soil properties file. + - The location of where to save the monthly outputs. +#### Overview: + - Process porosity & wilting point parameters + - Process accumulated flux & state differences + - Process mean states + - Cleanup names + +``` +#!/bin/bash +# ########################################################################### +# Bash shell script to create monthly aggregates of WRF-Hydro LDASOUT files. +# Requirements: NCO (tested with version 5.2.9) +# https://nco.sourceforge.net/ +# Usage: Call shell script with a single argument specifying the 4-digit +# year to process +# e.g., ./nco_process_ldasout.sh 2009 +# Developed: 06/11/2024, A. Dugger +# Updated: 4/7/2025, L. Staub +# ########################################################################### + +# ########################################################################### +# USER-SPECIFIED INPUTS: + +# Specify WRF-Hydro output directories: +# indir_base="/path/to/input/files/" #LDASOUT files +# soilparm="/path/to/soil_properties_file.nc" #soil properties static files + +indir_base="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/LDASOUT" +soilparm="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/static_niwaa_wrf_hydro_files/WRFHydro_soil_properties_CONUS_1km_NIWAAv1.0.nc" + + +# Specify output directory where monthly files should be written to monthly folder: +# (output files will be named water_YYYYMM.nc) + +outdir="/path/to/monthly/output/files/monthly" + + +# Check if the folder exists/create one +if [ ! -d "$outdir" ]; then + # Create the folder + mkdir -p "$outdir" + echo "Folder created: $outdir" +else + echo "Folder already exists: $outdir" +fi + +# ########################################################################### + +# ########################################################################### +# MAIN CODE. Probably no need to update anything below here. +# ########################################################################### + +# Initial setup. +shopt -s nullglob +uniqid=`uuidgen` +tmpfile=tmp${uniqid}.nc +paramfile=params${uniqid}.nc + + +# Process porosity and wilting point parameters for use in soilsat calculations. +# These parameters are currently uniform over depth layers. +#the two lines below could not run because the tmpfile and paramfile do not exist?? +rm ${tmpfile} +rm ${paramfile} +ncks -A -v smcmax,smcwlt $soilparm ${paramfile} +ncrename -O -d south_north,y ${paramfile} ${paramfile} +ncrename -O -d west_east,x ${paramfile} ${paramfile} +ncrename -O -d Time,time ${paramfile} ${paramfile} +ncpdq -O -a time,y,soil_layers_stag,x ${paramfile} ${paramfile} + +# Get the year to process from the command line argument. +# This setup is useful for scripting loops by year. +yr=${1} +echo "Processing year ${yr}" +YYYY=`printf %04d ${yr}` + +# Loop through months +for mo in $(seq 1 1 12); do + echo " Processing month ${mo}" + MM=`printf %02d ${mo}` + + # Setup input directory and output filename. + indir="${indir_base}/${yr}" + outfile="${outdir}/water_${YYYY}${MM}.nc" + rm $outfile + + # Grab the processing start time so we can track how long this takes. + start_time=`date -u +%s` + + # Processing accumulated flux and state diffs + # Adding one file so we can do a proper diff over accumulated terms + # Resets happen at 00Z on the first day of month every 3 months + # e.g., 197904010300.LDASOUT_DOMAIN1 to 197904302100.LDASOUT_DOMAIN1 + last_file_datetime=`date -d "${YYYY}${MM}01 + 1 month - 3 hour" +%Y%m%d%H` + firstfile=`echo ${indir}/${YYYY}${MM}010000.LDASOUT_DOMAIN1` + lastfile=`echo ${indir}/${last_file_datetime}00.LDASOUT_DOMAIN1` + + echo " $firstfile $lastfile" + + if [ -f "${firstfile}" -a -f "${lastfile}" ]; then + echo " Processing diffs" + echo " first $firstfile" + echo " last $lastfile" + echo " output $outfile" + ncdiff $lastfile $firstfile $tmpfile + # Calculate depth-mean soil moisture by averaging over column by layer depths: (0.1, 0.3, 0.6, 1.0) = 2.0 + ncap2 -O -F -s "deltaSOILM_depthmean=float((SOIL_M(:,:,1,:)*0.1+SOIL_M(:,:,2,:)*0.3+SOIL_M(:,:,3,:)*0.6+SOIL_M(:,:,4,:)*1.0)/2.0)" ${tmpfile} ${tmpfile} + if [ "${mo}" -eq 10 ]; then + ncks -h -A -v SOIL_M,SNEQV,deltaSOILM_depthmean ${tmpfile} ${outfile} + ncks -h -A -v ACCET,UGDRNOFF,ACSNOW ${lastfile} ${outfile} + else + ncks -h -A -v ACCET,UGDRNOFF,SOIL_M,SNEQV,ACSNOW,deltaSOILM_depthmean ${tmpfile} ${outfile} + fi + rm ${tmpfile} + ncrename -h -v ACCET,deltaACCET ${outfile} + ncrename -h -v ACSNOW,deltaACSNOW ${outfile} + ncrename -h -v UGDRNOFF,deltaUGDRNOFF ${outfile} + ncrename -h -v SOIL_M,deltaSOILM ${outfile} + ncrename -h -v SNEQV,deltaSNEQV ${outfile} + + # Processing mean states + # Averaging from 00Z of first day or month to 21Z of last day of month + # Compiling list of files + # e.g., 200506150500.LDASOUT_DOMAIN1 + infiles=(${indir}/${YYYY}${MM}*.LDASOUT_DOMAIN1) + infiles_list=`echo "${infiles[*]}"` + count=${#infiles[@]} + echo " Processing means" + echo " found $count files" + echo " first ${infiles[0]}" + echo " last ${infiles[-1]}" + ncra -O -y avg -v SOIL_M,SNEQV ${infiles_list} ${tmpfile} + # Calculate depth-mean soil moisture by averaging over column by layer depths: (0.1, 0.3, 0.6, 1.0) = 2.0 + ncap2 -O -F -s "avgSOILM_depthmean=float((SOIL_M(:,:,1,:)*0.1+SOIL_M(:,:,2,:)*0.3+SOIL_M(:,:,3,:)*0.6+SOIL_M(:,:,4,:)*1.0)/2.0)" ${tmpfile} ${tmpfile} + # Bring in porosity and calculate soilsat + # Note that porosity is uniform with depth, so it doesn't matter what layer we use + ncks -A -v smcmax ${paramfile} ${tmpfile} + ncap2 -O -F -s "avgSOILSAT=float(avgSOILM_depthmean/smcmax(:,:,1,:))" ${tmpfile} ${tmpfile} + ncrename -h -v SOIL_M,avgSOILM ${tmpfile} + ncrename -h -v SNEQV,avgSNEQV ${tmpfile} + # Calculate new wilting point adjusted variables requested by USGS + # Note that wilting point is uniform with depth, so it doesn't matter what layer we use + ncks -A -v smcwlt ${paramfile} ${tmpfile} + ncap2 -O -F -s "avgSOILM_wltadj_depthmean=float(avgSOILM_depthmean-smcwlt(:,:,1,:))" ${tmpfile} ${tmpfile} + ncap2 -O -F -s "avgSOILSAT_wltadj_top1=float((avgSOILM(:,:,1,:)-smcwlt(:,:,1,:))/(smcmax(:,:,1,:)-smcwlt(:,:,1,:)))" ${tmpfile} ${tmpfile} + # Combine average file with delta file + ncks -h -A -v avgSOILM,avgSNEQV,avgSOILM_depthmean,avgSOILSAT,avgSOILM_wltadj_depthmean,avgSOILSAT_wltadj_top1 ${tmpfile} ${outfile} + rm ${tmpfile} + + # Cleanup names and attributes. + echo "Cleaning up attributes" + ncatted -O -h -a valid_range,,d,, ${outfile} ${outfile} + ncatted -O -h -a cell_methods,,d,, ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaACCET,o,c,"Change in accumulated evapotranspiration (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaACSNOW,o,c,"Change in accumulated snowfall (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaSNEQV,o,c,"Change in snow water equivalent (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaSOILM,o,c,"Change in layer volumetric soil moisture, ratio of water volume to soil volume (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaUGDRNOFF,o,c,"Change in accumulated underground runoff (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaSOILM_depthmean,o,c,"Change in depth-mean volumetric soil moisture, ratio of water volume to soil volume (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgSNEQV,o,c,"Average snow water equivalent over month" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgSOILM,o,c,"Average layer volumetric soil moisture (ratio of water volume to soil volume) over month" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgSOILM_depthmean,o,c,"Average depth-mean volumetric soil moisture (ratio of water volume to soil volume) over month" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgSOILM_wltadj_depthmean,o,c,"Average depth-mean volumetric soil moisture (ratio of water volume to soil volume) minus wilting point over month" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgSOILSAT,o,c,"Average fractional soil saturation (soil moisture divided by maximum water content) over month" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgSOILSAT_wltadj_top1,o,c,"Average fractional soil saturation above wilting point (soil moisture minus wilting point divided by maximum water content minus wilting point) over top layer (top 10cm) over month" ${outfile} ${outfile} + ncatted -O -h -a units,avgSOILSAT,o,c,"fraction (0-1)" ${outfile} ${outfile} + ncatted -O -h -a units,avgSOILSAT_wltadj_top1,o,c,"fraction (0-1)" ${outfile} ${outfile} + + # Wrap up the month. + end_time=`date -u +%s` + elapsed=`echo "$end_time - $start_time" | bc` + echo " Done aggregating hourly values : "${YYYY}"-"${MM}" "$elapsed" seconds since start time." + + else + # We didn't find any files for this year+month. + echo " Missing files. Skipping month." + + fi + +done + +rm ${paramfile} +``` +
+ +
+ +GWOUT: + +### [nco_process_gwout.sh](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_gwout.sh) +#### Script Preparations: +You will need to specify two paths: + - The location of the hourly WRF-Hydro output GWOUT files. + - The location of where to save the monthly outputs. + +*Note: this script has some additional lines of code to deal with filetypes in the depth variable. Renaming the variable seems to fix this bug. Another option is to use older version of the NCO module- this has not been explored yet. +#### Overview: + - Process accumulated flux & state differences + - Rename "depth" column to "bucket_depth" + - Process sums and means + - Process flow totals + - Process depth average + - Cleanup names + +``` +#!/bin/bash +# ########################################################################### +# Bash shell script to create monthly aggregates of WRF-Hydro GWOUT files. +# Requirements: NCO (tested with version 5.2.9) +# https://nco.sourceforge.net/ +# Usage: Call shell script with a single argument specifying the 4-digit +# year to process +# e.g., ./nco_process_gwout.sh 2009 +# Developed: 06/11/2024, A. Dugger +# Updated: 4/7/2025 L.Staub +# ########################################################################### + +# ########################################################################### +# USER-SPECIFIED INPUTS: + +# Specify WRF-Hydro output directory: +#indir_base="/path/to/input/files" +indir_base="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/GWOUT" + +# Specify output directory where monthly files should be written to monthly folder: +# (output files will be named gw_YYYYMM.nc) + +outdir="/path/to/monthly/output/files/monthly" + + +# Check if the folder exists/create one +if [ ! -d "$outdir" ]; then + # Create the folder + mkdir -p "$outdir" + echo "Folder created: $outdir" +else + echo "Folder already exists: $outdir" +fi + +# ########################################################################### + +# ########################################################################### +# MAIN CODE. Probably no need to update anything below here. +# ########################################################################### + +# Initial setup. +shopt -s nullglob +uniqid=`uuidgen` +tmpfile=tmp${uniqid}.nc + +mkdir $outdir + +# Get the year to process from the command line argument. +# This setup is useful for scripting loops by year. +yr=${1} +echo "Processing year ${yr}" +YYYY=`printf %04d ${yr}` + +# Loop through months +for mo in $(seq 1 1 12); do + echo " Processing month ${mo}" + MM=`printf %02d ${mo}` + + # Calculate next year and month for diff calculations + next_yr=${yr} + next_mo=`echo "${mo} + 1" | bc` + if [ "${next_mo}" -eq 13 ]; then + next_mo=1 + next_yr=`echo "${yr} + 1" | bc` + fi + + # Setup some print variables + MM2=`printf %02d ${next_mo}` + YYYY2=`printf %04d ${next_yr}` + + # Setup input directory and output filename. + indir="${indir_base}/${yr}/" + indir_next="${indir_base}/${next_yr}/" + outfile="${outdir}/gw_${YYYY}${MM}.nc" + rm $outfile + + # Grab the processing start time so we can track how long this takes. + start_time=`date -u +%s` + + # Processing accumulated flux and state diffs + firstfile=`echo ${indir}/${YYYY}${MM}010100.GWOUT_DOMAIN1` + lastfile=`echo ${indir_next}/${YYYY2}${MM2}010000.GWOUT_DOMAIN1` + + echo " $firstfile $lastfile" + + if [ -f "${firstfile}" -a -f "${lastfile}" ]; then + echo " Processing diffs" + echo " first $firstfile" + echo " last $lastfile" + echo " output $outfile" + ncdiff $lastfile $firstfile $tmpfile + ncks -h -A -v depth ${tmpfile} ${outfile} + rm ${tmpfile} + ncrename -h -v depth,deltaDepth ${outfile} + + # Compiling list of files + # e.g., 200506150500.GWOUT_DOMAIN1 + infiles=(${indir}/${YYYY}${MM}*.GWOUT_DOMAIN1) + infiles_list=`echo "${infiles[*]}"` + count=${#infiles[@]} + # Check and rename the variable "depth" to "bucket_depth" if not already renamed + echo " Checking and renaming 'depth' to 'bucket_depth' if necessary" + for infile in "${infiles[@]}"; do + # Check if "depth" variable exists using ncdump + if ncdump -h "${infile}" | grep -q 'depth'; then + # Rename the variable only if "depth" exists + echo " Renaming 'depth' to 'bucket_depth' in ${infile}" + ncrename -h -v .depth,bucket_depth "${infile}" + else + echo " 'depth' already renamed in ${infile}, skipping" + fi +done + + echo " Processing sums and means" + echo " found $count files" + echo " first ${infiles[0]}" + echo " last ${infiles[-1]}" + # Create totals and averages. + echo " Processing flow totals" + ncea -h -y ttl -v inflow,outflow ${infiles_list} ${tmpfile} + ncks -h -A -v inflow,outflow ${tmpfile} ${outfile} + rm ${tmpfile} + echo " Processing depth average" + #ncra -O -y avg -v depth ${infiles_list} tmpavg_gw.nc # does not work since no record dim + ncea -h -y avg -v bucket_depth ${infiles_list} ${tmpfile} + ncks -h -A -v bucket_depth ${tmpfile} ${outfile} + rm ${tmpfile} + ncap2 -O -s "inflow=float(inflow*3600)" ${outfile} ${outfile} + ncap2 -O -s "outflow=float(outflow*3600)" ${outfile} ${outfile} + ncrename -h -v inflow,totInflow ${outfile} + ncrename -h -v outflow,totOutflow ${outfile} + ncrename -h -v depth,bucket_depth ${outfile} + + # Cleanup names and attributes. + ncatted -O -h -a valid_range,,d,, ${outfile} ${outfile} + ncatted -O -h -a cell_methods,,d,, ${outfile} ${outfile} + ncatted -O -h -a long_name,totInflow,o,c,"Total inflow volume over momth" ${outfile} ${outfile} + ncatted -O -h -a long_name,totOutflow,o,c,"Total outflow volume over momth" ${outfile} ${outfile} + ncatted -O -h -a long_name,deltaDepth,o,c,"Change in baseflow bucket storage (month end minus month start)" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgDepth,o,c,"Average baseflow bucket storage over month" ${outfile} ${outfile} + ncatted -O -h -a units,totInflow,m,c,"m^3" ${outfile} + ncatted -O -h -a units,totOutflow,m,c,"m^3" ${outfile} + ncatted -O -h -a units,deltaDepth,m,c,"mm" ${outfile} + ncatted -O -h -a units,bucket_depth,m,c,"mm" ${outfile} + + # Wrap up the month. + end_time=`date -u +%s` + elapsed=`echo "$end_time - $start_time" | bc` + echo " Done aggregating hourly values : "${YYYY}"-"${MM}" "$elapsed" seconds since start time." + + else + # We didn't find any files for this year+month. + echo " Missing files. Skipping month." + + fi + +done +``` + +
+ +
+LDASIN: + +### [nco_process_clim.sh](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_clim.sh) +#### Script Preparations: +You will need to specify two paths: + - The location of the hourly CONUS404-BA output LDASIN files. + - The location of where to save the monthly outputs. + +*Note: this script has some additional lines of code to deal with this data being organized by Water Year. +#### Overview: + - Create totals and averages + - Cleanup names + +``` +#!/bin/bash +############################################################################ +# Bash shell script to create monthly aggregates of WRF-Hydro forcing files. +# Requirements: NCO (tested with version 5.2.9) +# https://nco.sourceforge.net/ +# Usage: Call shell script with a single argument specifying the 4-digit +# year to process +# e.g., ./nco_process_clim.sh 2009 +# Developed: 06/11/2024, A. Dugger +# Updated: 4/7/2025 L. Staub +############################################################################ + +############################################################################ +# USER-SPECIFIED INPUTS: + +# Specify input forcing directory: +# (assumes forcings are organized by water year) +#indir_base="/path/to/met/forcings/" + +indir_base="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/LDASIN" + +# Specify output directory where monthly files should be written to monthly folder: +# (output files will be named clim_YYYYMM.nc) + +outdir="/path/to/monthly/output/files/monthly" + +############################################################################ + +############################################################################ +# MAIN CODE. Probably no need to update anything below here. +############################################################################ + +# Initial setup. +shopt -s nullglob +uniqid=`uuidgen` +tmpfile=tmp${uniqid}.nc + +mkdir $outdir + +# Get the year to process from the command line argument. +# This setup is useful for scripting loops by year. +yr=${1} +echo "Processing year ${yr}" +YYYY=`printf %04d ${yr}` + +# Loop through months +for mo in $(seq 1 1 12); do + echo " Processing month ${mo}" + MM=`printf %02d ${mo}` + + # Calculate water year for finding folder name. + wy_yr=${yr} + if [ "${mo}" -ge 10 ]; then + wy_yr=`echo "${wy_yr} + 1" | bc` + fi + + # Setup input directory and output filename. + indir="${indir_base}/WY${wy_yr}/" + outfile="${outdir}/clim_${YYYY}${MM}.nc" + + # Grab the processing start time so we can track how long this takes. + start_time=`date -u +%s` + + # Compiling list of files + # e.g., 200506150500.LDASIN_DOMAIN1 + infiles=(${indir}/${YYYY}${MM}*.LDASIN_DOMAIN1) + count=${#infiles[@]} + echo " Found $count files" + + # Check if we found files. Otherwise skip. + if [ ${count} -gt 0 ]; then + echo " Processing sums and means" + echo " first ${infiles[0]}" + echo " last ${infiles[-1]}" + echo " output $outfile" + + infiles_list=`echo "${infiles[*]}"` + + # Create totals and averages. + # Start with precip (sum) and temperature (average). + ncra -O -h -y ttl -v RAINRATE ${infiles_list} ${outfile} + ncap2 -O -s "RAINRATE=float(RAINRATE*3600)" ${outfile} ${outfile} + ncra -O -h -y avg -v T2D ${infiles_list} ${tmpfile} + ncks -h -A -v T2D ${tmpfile} ${outfile} + rm ${tmpfile} + # Some additional met variables. Remove comments if you want to include. + #ncra -O -h -y avg -v Q2D ${infiles_list} ${tmpfile} + #ncks -h -A -v Q2D ${tmpfile} ${outfile} + #rm ${tmpfile} + #ncra -O -h -y avg -v SWDOWN ${infiles_list} ${tmpfile} + #ncks -h -A -v SWDOWN ${tmpfile} ${outfile} + #rm ${tmpfile} + #ncra -O -h -y avg -v LWDOWN ${infiles_list} ${tmpfile} + #ncks -h -A -v LWDOWN ${tmpfile} ${outfile} + #rm ${tmpfile} + #ncra -O -h -y avg -v U2D ${infiles_list} ${tmpfile} + #ncks -h -A -v U2D ${tmpfile} ${outfile} + #rm ${tmpfile} + #ncra -O -h -y avg -v V2D ${infiles_list} ${tmpfile} + #ncks -h -A -v V2D ${tmpfile} ${outfile} + #rm ${tmpfile} + #ncap2 -O -s "WND2D=float(sqrt(U2D^2 + V2D^2))" ${outfile} ${outfile} + #ncks -O -h -x -v V2D ${outfile} ${outfile} + #ncks -O -h -x -v U2D ${outfile} ${outfile} + + # Cleanup names and attributes. + # Remove the comments if you are including additional met variables. + ncrename -h -v RAINRATE,totPRECIP ${outfile} + ncrename -h -v T2D,avgT2D ${outfile} + #ncrename -h -v Q2D,avgQ2D ${outfile} + #ncrename -h -v SWDOWN,avgSWDOWN ${outfile} + #ncrename -h -v LWDOWN,avgLWDOWN ${outfile} + #ncrename -h -v WND2D,avgWND2D ${outfile} + ncatted -O -h -a units,totPRECIP,o,c,"mm" ${outfile} ${outfile} + ncatted -O -h -a long_name,totPRECIP,o,c,"Total precipitation over the month" ${outfile} ${outfile} + ncatted -O -h -a long_name,avgT2D,o,c,"Average 2-m air temperature over the month" ${outfile} ${outfile} + #ncatted -O -h -a long_name,avgQ2D,o,c,"Average 2-m specific humidity over the month" ${outfile} ${outfile} + #ncatted -O -h -a long_name,avgSWDOWN,o,c,"Average downward shortwave radiation over the month" ${outfile} ${outfile} + #ncatted -O -h -a long_name,avgLWDOWN,o,c,"Average downward longwave radiation over the month" ${outfile} ${outfile} + #ncatted -O -h -a long_name,avgWND2D,o,c,"Average 2-m net windspeed over the month" ${outfile} ${outfile} + + # Wrap up the month. + end_time=`date -u +%s` + elapsed=`echo "$end_time - $start_time" | bc` + echo " Done aggregating hourly values : "${YYYY}"-"${MM}" "$elapsed" seconds since start time." + + else + # We didn't find any files for this year+month. + echo " Missing files. Skipping month." + + fi + +done +``` +
+ +
+CHRTOUT: + +### [nco_process_chrtout.sh](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_chrtout.sh) +#### Script Preparations: +You will need to specify two paths: + - The location of the hourly WRF-Hydro output CHRTOUT files. + - The location of where to save the monthly outputs. +#### Overview: + - Create totals and averages + - Clean names + +``` +#!/bin/bash +# ########################################################################### +# Bash shell script to create monthly aggregates of WRF-Hydro CHRTOUT files. +# Requirements: NCO (tested with version 5.2.9) +# https://nco.sourceforge.net/ +# Usage: Call shell script with a single argument specifying the 4-digit +# year to process +# e.g., ./nco_process_chrtout.sh 2009 +# Developed: 06/11/2024, A. Dugger +# Updated: 4/7/2025, L. Staub +# ########################################################################### + +# ########################################################################### +# USER-SPECIFIED INPUTS: + +# Specify WRF-Hydro output directory: +# (assumes files are organized by water year) +#indir_base="/path/to/input/files/" +indir_base="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/CHRTOUT" + +# Specify output directory where monthly files should be written to monthly folder: +# (output files will be named chrt_YYYYMM.nc) +# Have all outputs saved to the same folder + +outdir="/path/to/monthly/output/files/monthly" + + +# Check if the folder exists/create one +if [ ! -d "$outdir" ]; then + # Create the folder + mkdir -p "$outdir" + echo "Folder created: $outdir" +else + echo "Folder already exists: $outdir" +fi + + +# ########################################################################### + +# ########################################################################### +# MAIN CODE. Probably no need to update anything below here. +# ########################################################################### + +# Initial setup. +shopt -s nullglob + + +# Get the year to process from the command line argument. +# This setup is useful for scripting loops by year. +yr=${1} +echo "Processing year ${yr}" +YYYY=`printf %04d ${yr}` + +# Loop through months +for mo in $(seq 1 1 12); do + echo " Processing month ${mo}" + MM=`printf %02d ${mo}` + + # Setup input directory and output filename. + indir="${indir_base}/${yr}/" + outfile="${outdir}/chrt_${YYYY}${MM}.nc" + + # Grab the processing start time so we can track how long this takes. + start_time=`date -u +%s` + + # Compiling list of files + # e.g., 200506150500.CHRTOUT_DOMAIN1 + infiles=(${indir}/${YYYY}${MM}*.CHRTOUT_DOMAIN1) + count=${#infiles[@]} + echo " Found $count files" + + # Check if we found files. Otherwise skip. + if [ ${count} -gt 0 ]; then + echo " Processing sums and means" + echo " first ${infiles[0]}" + echo " last ${infiles[-1]}" + echo " output $outfile" + + infiles_list=`echo "${infiles[*]}"` + + # Create totals and averages. + ncea -h -y ttl -v streamflow,qSfcLatRunoff,qBucket ${infiles_list} ${outfile} + ncap2 -O -s "streamflow=float(streamflow*3600)" ${outfile} ${outfile} + ncap2 -O -s "qSfcLatRunoff=float(qSfcLatRunoff*3600)" ${outfile} ${outfile} + ncap2 -O -s "qBucket=float(qBucket*3600)" ${outfile} ${outfile} + ncrename -h -v streamflow,totStreamflow ${outfile} + ncrename -h -v qSfcLatRunoff,totqSfcLatRunoff ${outfile} + ncrename -h -v qBucket,totqBucket ${outfile} + + # Cleanup names and attributes. + ncatted -O -h -a valid_range,,d,, ${outfile} ${outfile} + ncatted -O -h -a cell_methods,,d,, ${outfile} ${outfile} + ncatted -O -h -a long_name,totStreamflow,m,c,"Total streamflow volume over momth" ${outfile} ${outfile} + ncatted -O -h -a long_name,totqSfcLatRunoff,m,c,"Total surface flow volume over momth" ${outfile} ${outfile} + ncatted -O -h -a long_name,totqBucket,m,c,"Total baseflow volume over month" ${outfile} ${outfile} + ncatted -O -h -a units,totStreamflow,m,c,"m^3" ${outfile} + ncatted -O -h -a units,totqSfcLatRunoff,m,c,"m^3" ${outfile} + ncatted -O -h -a units,totqBucket,m,c,"m^3" ${outfile} + + # Wrap up the month. + end_time=`date -u +%s` + elapsed=`echo "$end_time - $start_time" | bc` + echo " Done aggregating hourly values : "${YYYY}"-"${MM}" "$elapsed" seconds since start time." + + else + # We didn't find any files for this year+month. + echo " Missing files. Skipping month." + + fi + +done + +``` + +
+ +## Slurm files +
+LDASOUT: + +### [ldasout_nco.slurm](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasout_nco.slurm) +#### Script Preparations: +You will need to specify three paths: + - Set the --array variable to desirable time step + - The location of the WRF-Hydro output LDASOUT folder. + - The location of the shell script +#### Overview: +The slurm file sets up the parallel process. + +``` +#!/bin/bash +############################################################################ +# Parallelized slurm file: summarize hourly data into monthly files. +# +# Usage: Call shell script using associated slurm file +# e.g sbatch -o +# Developed: 1/25/25, L. Staub +# Updated: 4/7/25, L. Staub +############################################################################ + +############################################################################ + +#SBATCH -p cpu # set partition +#SBATCH -A impd # set account +#SBATCH --job-name=ldasout_nco # Job name +#SBATCH --nodes=1 # Number of nodes (adjust as needed) +#SBATCH --ntasks=1 # Number of tasks (one task per node/year) +#SBATCH --cpus-per-task=1 # CPUs per task (adjust as needed) +#SBATCH --time=05:00:00 # Time limit (adjust as needed) +#SBATCH --mail-type=ALL +#SBATCH --mail-user= # enter email +#SBATCH -o output_%A_%a.out # set path for job output files to be saved(A=main task a=subtask) +#SBATCH --array=2011-2013 # set time step to process + +# Set the source directory containing year folders +SOURCE_DIR="/path/to/LDASOUT" + +# Load necessary modules +module load nco + +# Record the start time +global_start=$(date +%s) +echo "Job started at $(date)" + +#Run the temporal aggregation + +srun /path/to/shell/script/nco_process_ldasout.sh $SLURM_ARRAY_TASK_ID + + +# Record the end time +global_end=$(date +%s) +global_elapsed=$((global_end - global_start)) +echo "Job finished at $(date)" +echo "Total job runtime: $global_elapsed seconds." + +``` +
+ +
+GWOUT: + +### [gwout_nco.slurm](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/gwout_nco.slurm) +#### Script Preparations: +You will need to specify three paths: + - Set the --array variable to desirable time step + - The location of the WRF-Hydro output GWOUT folder. + - The location of the shell script +#### Overview: +The slurm file sets up the parallel process. + +``` +#!/bin/bash +############################################################################ +# Parallelized slurm file: summarize hourly data into monthly files. +# +# Usage: Call shell script using associated slurm file +# e.g sbatch -o +# Developed: 1/25/25, L. Staub +# Updated: 4/7/25, L. Staub +############################################################################ + +############################################################################ + +#SBATCH -p cpu # set partition +#SBATCH -A impd # set account +#SBATCH --job-name=gwout_nco # Job name +#SBATCH --nodes=1 # Number of nodes (adjust as needed) +#SBATCH --ntasks=1 # Number of tasks (one task per node/year) +#SBATCH --cpus-per-task=1 # CPUs per task (adjust as needed) +#SBATCH --time=05:00:00 # Time limit (adjust as needed) +#SBATCH --mail-type=ALL +#SBATCH --mail-user= # enter email +#SBATCH -o output_%A_%a.out # set path for job output files to be saved(A=main task a=subtask) +#SBATCH --array=2011-2013 # set time step to process + +# Set the source directory containing year folders +SOURCE_DIR="/path/to/GWOUT" + +# Load necessary modules +module load nco + +# Record the start time +global_start=$(date +%s) +echo "Job started at $(date)" + +#Run the temporal aggregation + +srun /path/to/shell/script/nco_process_gwout.sh $SLURM_ARRAY_TASK_ID + +# Record the end time +global_end=$(date +%s) +global_elapsed=$((global_end - global_start)) +echo "Job finished at $(date)" +echo "Total job runtime: $global_elapsed seconds." + +``` +
+ +
+LDASIN: + +### [ldasin_nco.slurm](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasin_nco.slurm) +#### Script Preparations: +You will need to specify three paths: + - Set the --array variable to desirable time step + - The location of the WRF-Hydro output LDASIN folder. + - The location of the shell script +#### Overview: +The slurm file sets up the parallel process. + +``` +#!/bin/bash +############################################################################ +# Parallelized slurm file: summarize hourly data into monthly files. +# +# Usage: Call shell script using associated slurm file +# e.g sbatch -o +# Developed: 1/25/25, L. Staub +# Updated: 4/7/25, L. Staub +############################################################################ + +############################################################################ + +#SBATCH -p cpu # set partition +#SBATCH -A impd # set account +#SBATCH --job-name=ldasin_nco # Job name +#SBATCH --nodes=1 # Number of nodes (adjust as needed) +#SBATCH --ntasks=1 # Number of tasks (one task per node/year) +#SBATCH --cpus-per-task=1 # CPUs per task (adjust as needed) +#SBATCH --time=05:00:00 # Time limit (adjust as needed) +#SBATCH --mail-type=ALL +#SBATCH --mail-user= # enter email +#SBATCH -o output_%A_%a.out # set path for job output files to be saved(A=main task a=subtask) +#SBATCH --array=2011-2013 # set time step to process + +# Set the source directory containing year folders +SOURCE_DIR="/path/to/LDASIN" + +# Load necessary modules +module load nco + +# Record the start time +global_start=$(date +%s) +echo "Job started at $(date)" + +#Run the temporal aggregation + +srun /path/to/shell/script/nco_process_ldasin.sh $SLURM_ARRAY_TASK_ID + + +# Record the end time +global_end=$(date +%s) +global_elapsed=$((global_end - global_start)) +echo "Job finished at $(date)" +echo "Total job runtime: $global_elapsed seconds." + +``` + +
+ +
+CHRTOUT: + +### [chrtout_nco.slurm](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/chrtout_nco.slurm) +#### Script Preparations: +You will need to specify three paths: + - Set the --array variable to desirable time step + - The location of the WRF-Hydro output LDASIN folder. + - The location of the shell script +#### Overview: +The slurm file sets up the parallel process. + +``` +#!/bin/bash +############################################################################ +# Parallelized slurm file: summarize hourly data into monthly files. +# +# Usage: Call shell script using associated slurm file +# e.g sbatch -o +# Developed: 1/25/25, L. Staub +# Updated: 4/7/25, L. Staub +############################################################################ + +############################################################################ + +#SBATCH -p cpu # set partition +#SBATCH -A impd # set account +#SBATCH --job-name=chrtout_nco # Job name +#SBATCH --nodes=1 # Number of nodes (adjust as needed) +#SBATCH --ntasks=1 # Number of tasks (one task per node/year) +#SBATCH --cpus-per-task=1 # CPUs per task (adjust as needed) +#SBATCH --time=05:00:00 # Time limit (adjust as needed) +#SBATCH --mail-type=ALL +#SBATCH --mail-user= # enter email +#SBATCH -o output_%A_%a.out # set path for job output files to be saved(A=main task a=subtask) +#SBATCH --array=2011-2013 # set time step to process + +# Set the source directory containing year folders +SOURCE_DIR="/path/to/CHRTOUT" + +# Load necessary modules +module load nco + +# Record the start time +global_start=$(date +%s) +echo "Job started at $(date)" + +#run the temporal aggregation script + +srun /path/to/shell/script/nco_process_chrtout.sh $SLURM_ARRAY_TASK_ID + +# Record the end time +global_end=$(date +%s) +global_elapsed=$((global_end - global_start)) +echo "Job finished at $(date)" +echo "Total job runtime: $global_elapsed seconds." + + +``` + +
+ + ## Results The following metrics will be generated with these scripts: @@ -192,20 +1046,20 @@ The following metrics will be generated with these scripts: + - + - + - @@ -229,7 +1083,7 @@ The following metrics will be generated with these scripts: - + diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/chrtout_nco.slurm b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/chrtout_nco.slurm index d866c9f5..616e07e4 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/chrtout_nco.slurm +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/chrtout_nco.slurm @@ -23,7 +23,7 @@ #SBATCH --array=2011-2013 # set time step to process # Set the source directory containing year folders -SOURCE_DIR="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/CHRTOUT" +SOURCE_DIR="/path/to/CHRTOUT" # Load necessary modules module load nco @@ -34,7 +34,7 @@ echo "Job started at $(date)" #run the temporal aggregation script -srun /path/to/repo/hytest/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/nco_process_chrtout.sh $SLURM_ARRAY_TASK_ID +srun /path/to/shell/script/nco_process_chrtout.sh $SLURM_ARRAY_TASK_ID # Record the end time global_end=$(date +%s) diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/gwout_nco.slurm b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/gwout_nco.slurm index d8651426..0106c527 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/gwout_nco.slurm +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/gwout_nco.slurm @@ -23,7 +23,7 @@ #SBATCH --array=2011-2013 # set time step to process # Set the source directory containing year folders -SOURCE_DIR="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/GWOUT" +SOURCE_DIR="/path/to/GWOUT" # Load necessary modules module load nco @@ -34,7 +34,7 @@ echo "Job started at $(date)" #Run the temporal aggregation -srun /path/to/repo/hytest/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/nco_process_gwout.sh $SLURM_ARRAY_TASK_ID +srun /path/to/shell/script/nco_process_gwout.sh $SLURM_ARRAY_TASK_ID # Record the end time global_end=$(date +%s) diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasin_nco.slurm b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasin_nco.slurm index f5973234..bfd8afcd 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasin_nco.slurm +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasin_nco.slurm @@ -23,7 +23,7 @@ #SBATCH --array=2011-2013 # set time step to process # Set the source directory containing year folders -SOURCE_DIR="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/LDASIN" +SOURCE_DIR="/path/to/LDASIN" # Load necessary modules module load nco @@ -34,7 +34,7 @@ echo "Job started at $(date)" #Run the temporal aggregation -srun /path/to/repo/hytest/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/nco_process_ldasin.sh $SLURM_ARRAY_TASK_ID +srun /path/to/shell/script/nco_process_ldasin.sh $SLURM_ARRAY_TASK_ID # Record the end time diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasout_nco.slurm b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasout_nco.slurm index 9729c51d..62ffb554 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasout_nco.slurm +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/ldasout_nco.slurm @@ -23,7 +23,7 @@ #SBATCH --array=2011-2013 # set time step to process # Set the source directory containing year folders -SOURCE_DIR="/caldera/hovenweep/projects/usgs/water/impd/hytest/niwaa_wrfhydro_monthly_huc12_aggregations_sample_data/LDASOUT" +SOURCE_DIR="/path/to/LDASOUT" # Load necessary modules module load nco @@ -34,7 +34,7 @@ echo "Job started at $(date)" #Run the temporal aggregation -srun /path/to/repo/hytest/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/nco_process_ldasout.sh $SLURM_ARRAY_TASK_ID +srun /path/to/shell/script/nco_process_ldasout.sh $SLURM_ARRAY_TASK_ID # Record the end time diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_chrtout.sh b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_chrtout.sh index eef536ab..c10b9b45 100755 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_chrtout.sh +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/nco_process_chrtout.sh @@ -7,7 +7,7 @@ # year to process # e.g., ./nco_process_chrtout.sh 2009 # Developed: 06/11/2024, A. Dugger -# Updated: 4/7/2025, L. Staub +# Updated: 7/15/2025, L. Staub # ########################################################################### # ########################################################################### diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb index b6ae28b7..8e4330f2 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb @@ -151,7 +151,7 @@ "con.print(f'outDir exists: {outDir.is_dir()}')\n", "\n", "# Basename for output files - extension will be applied later\n", - "output_pattern = 'CONUS_HUC12_2D_20111001_20120930'\n", + "output_pattern = 'CONUS_HUC12_2D_WY2011_2013'\n", "\n", "# Other variables to help with the file output naming convention\n", "write_CSV = True\n", diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb index e1dea6b7..6fa5e2ad 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb @@ -118,7 +118,7 @@ "outDir = r'/path/to/outputs/agg_out'\n", "\n", "# Output filename pattern\n", - "output_pattern = 'CONUS_HUC12_1D_2011001_20120930'\n", + "output_pattern = 'CONUS_HUC12_1D_WY2011_2013'\n", "\n", "# Select output formats\n", "write_NC = True # Output netCDF file\n", diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/README.md b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/README.md index 10a3fe9b..5288fdba 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/README.md +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/README.md @@ -11,12 +11,12 @@ Tracking computation times for a 3-year subset of WRF-Hydro modeling application | **Script** | **Description** | **Datasets processed** | **Dask** | **Completion Time** | **Output** | | ------ | ------ | ------ | ------ | ------ | ------ | -| 01_2D_spatial_aggregation | Aggregation to HUC12s of 2-Dimensional variables | monthly LDASOUT & LDASIN | Yes | 2 hours | CONUS_HUC12_2D_20111001_20120930.nc | -| 02_1D_spatial_aggregation | Aggregation to HUC12s of 1-Dimensional variables | monthly GWOUT & CHRTOUT | No | 2.5 hours | CONUS_HUC12_1D_2011001_20120930.nc | +| 01_2D_spatial_aggregation | Aggregation to HUC12s of 2-Dimensional variables | monthly LDASOUT & LDASIN | Yes | 2 hours | CONUS_HUC12_2D_WY2011_2013.nc | +| 02_1D_spatial_aggregation | Aggregation to HUC12s of 1-Dimensional variables | monthly GWOUT & CHRTOUT | No | 2.5 hours | CONUS_HUC12_1D_WY2011_2013.nc | | usgs_common | python script containg functions used in aggregation | --- | No | --- | --- | ## Compute Environment Needs -Users will need to create and activate a conda environment using the [wrfhydro_huc12_agg.yml](wrfhydro_huc12_agg.yml) file to run the python script and notebooks. For this environment to work, the latest version of Miniforge should be installed in the user area on Hovenweep. Miniconda may work, but has not been tested with this workflow. +Users will need to create and activate a conda environment using the [wrfhydro_huc12_agg.yml](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml) file to run the python script and notebooks. For this environment to work, the latest version of Miniforge should be installed in the user area on Hovenweep. Miniconda may work, but has not been tested with this workflow. #### Ensure Miniforge is installed ``` @@ -58,13 +58,13 @@ Since this portion of the workflow utilizes Dask, it is important that the corre ## Instructions ### 1. Set-up -Confirm that the [usgs_common.py](wrfhydro_huc12_agg.yml) python script has the correct paths to the WRF-Hydro modeling application output static files under the "Domain Files" section. The paths currently are set up to point to the HyTEST directory on hovenweep where the 3-year subset of the data is stored. This script has multiple functions that are called into the 1-D and 2-D aggregation jupyter notebooks. +Confirm that the [usgs_common.py](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/usgs_common.py) python script has the correct paths to the WRF-Hydro modeling application output static files under the "Domain Files" section. The paths currently are set up to point to the HyTEST directory on hovenweep where the 3-year subset of the data is stored. This script has multiple functions that are called into the 1-D and 2-D aggregation jupyter notebooks. ### 2. 2-D Aggregation -The [2-Dimensional Aggregation jupyter notebook](01_2D_spatial_aggregation.ipynb) aggregates the 2-Dimensional WRF-Hydro modeling application outputs LDASOUT (monthly outputs named water_YYYYMM.nc) and LDASIN (monthly outputs named clim_YYYYMM.nc) to HUC12 basins, using the 1000 m grid file. The file paths for the LDASOUT and LDASIN monthly data, the 1000 m HUC12 grid file, and the location for the 2D aggregated outputs to be stored will need to be specified. This script will spin up a dask cluster to parallelize the aggregation, a link to the dask dashboard is provided to monitor workers during calculations. Once this script has finished processing, the dask cluster will need to be spun down and closed. The product from this script will be 1 netCDF file containing the spatially aggregated outputs of the 2-Dimensional WRF-Hydro monthly modeling application outputs for the years 2011-2013. +The [2-Dimensional Aggregation jupyter notebook](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb) aggregates the 2-Dimensional WRF-Hydro modeling application outputs LDASOUT (monthly outputs named water_YYYYMM.nc) and LDASIN (monthly outputs named clim_YYYYMM.nc) to HUC12 basins, using the 1000 m grid file. The file paths for the LDASOUT and LDASIN monthly data, the 1000 m HUC12 grid file, and the location for the 2D aggregated outputs to be stored will need to be specified. This script will spin up a dask cluster to parallelize the aggregation, a link to the dask dashboard is provided to monitor workers during calculations. Once this script has finished processing, the dask cluster will need to be spun down and closed. The product from this script will be 1 netCDF file containing the spatially aggregated outputs of the 2-Dimensional WRF-Hydro monthly modeling application outputs for the years 2011-2013. ### 3. 1-D Aggregation -The [1-Dimensional Aggregation jupyter notebook](02_1D_spatial_aggregation.ipynb) aggregates the 1-Dimensional WRF-Hydro modeling application outputs GWOUT (monthly outputs named gw_YYYYMM.nc) and CHRTOUT (monthly outputs named chrtout_YYYYMM.nc) to HUC12 basins, using the crosswalk csv file. The file paths for the GWOUT and CHRTOUT monthly data, the HUC12 crosswalk file, and the location for the 1D aggregated outputs to be stored will need to be specified. The product from this script will be 1 netCDF file containing the spatially aggregated outputs of the 1-Dimensional WRF-Hydro monthly modeling application outputs for the years 2011-2013. +The [1-Dimensional Aggregation jupyter notebook](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb) aggregates the 1-Dimensional WRF-Hydro modeling application outputs GWOUT (monthly outputs named gw_YYYYMM.nc) and CHRTOUT (monthly outputs named chrtout_YYYYMM.nc) to HUC12 basins, using the crosswalk csv file. The file paths for the GWOUT and CHRTOUT monthly data, the HUC12 crosswalk file, and the location for the 1D aggregated outputs to be stored will need to be specified. The product from this script will be 1 netCDF file containing the spatially aggregated outputs of the 1-Dimensional WRF-Hydro monthly modeling application outputs for the years 2011-2013. ## Variable Table
GWOUT totOutflowTotal outflow Total outflow volume over month--- m3
totInflowTotal inflow Total inflow volume over month--- m3
deltaDepthBaseflow bucket storage change Change in baseflow bucket storage (month end minus month start)--- mm
totStreamflow---Streamflow Total streamflow volume over month m3
diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml index 70055121..73ec8ad6 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml @@ -53,6 +53,7 @@ dependencies: - shapely - gdal=3.5.3=py311hadb6153_11 - fiona + - s3fs diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/01_Merge_1D_and_2D_files.ipynb b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/01_Merge_1D_and_2D_files.ipynb index cabd1abf..075b5a55 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/01_Merge_1D_and_2D_files.ipynb +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/01_Merge_1D_and_2D_files.ipynb @@ -33,11 +33,11 @@ "source": [ "# Input files\n", "#Paths for 2D and 1D aggregated files\n", - "in_file1 = r'/path/to/outputs/agg_out/CONUS_HUC12_WB_2D_19791001_20220930_2.nc'\n", - "in_file2 = r'/path/to/outputs/CONUS_HUC12_WB_1D_19791001_20220930.nc'\n", + "in_file1 = r'/path/to/outputs/agg_out/CONUS_HUC12_2D_WY2011_2013.nc'\n", + "in_file2 = r'/path/to/outputs/CONUS_HUC12_1D_WY2011_2013.nc'\n", "\n", "# Output file\n", - "out_file = r'/path/to/outputs/agg_out/CONUS_HUC12_WB_combined_19791001_20220930.nc'\n", + "out_file = r'/path/to/outputs/agg_out/CONUS_HUC12_WB_combined_WY2011_2013.nc'\n", "\n", "# Name the zone coordinate that contains the HUC12 IDs\n", "zone_name = 'WBDHU12'\n", diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/02_Format.ipynb b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/02_Format.ipynb index 21b997e7..fef90ab1 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/02_Format.ipynb +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/02_Format.ipynb @@ -51,7 +51,7 @@ "metadata": {}, "outputs": [], "source": [ - "in_nc = r'/path/to/outputs/agg_out/CONUS_HUC12_WB_combined_19791001_20220930.nc'\n", + "in_nc = r'/path/to/outputs/agg_out/CONUS_HUC12_WB_combined_WY2011_2013.nc'\n", "\n", "# Output directory\n", "outDir = r'/path/to/outputs/agg_out/'\n", diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/README.md b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/README.md index c2d61609..39a6f7c7 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/README.md +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/README.md @@ -12,19 +12,19 @@ Tracking computation times for a 3-year subset of WRF-Hydro modeling application | **Script** | **Description** | **Datasets processed** | **Dask** | **Completion Time** | **Output** | | ------ | ------ | ------ | ------ | ------ | ------ | -| 01_Merge_1D_and_2D_files | Combine 1-Dimensional and 2-Dimensional aggregations into one netcdf file | CONUS_HUC12_2D_20111001_20120930.nc & CONUS_HUC12_1D_2011001_20120930.nc | No | 10 min | CONUS_HUC12_WB_combined_19791001_20220930.nc | -| 02_Format | Formatting | CONUS_HUC12_WB_combined_19791001_20220930.nc | No | 10 min | huc12_monthly_wb_iwaa_wrfhydro_WY2011_2013.nc | +| 01_Merge_1D_and_2D_files | Combine 1-Dimensional and 2-Dimensional aggregations into one netcdf file | CONUS_HUC12_2D_WY2011_2013.nc & CONUS_HUC12_1D_WY2011_2013.nc | No | 10 min | CONUS_HUC12_WB_combined_WY2011_2013.nc | +| 02_Format | Formatting | CONUS_HUC12_WB_combined_WY2011_2013.nc | No | 10 min | huc12_monthly_wb_iwaa_wrfhydro_WY2011_2013.nc | ## Compute Environment Needs -Users will need to create and activate a conda environment using the [wrfhydro_huc12_agg.yml](02_Spatial_Aggregation/wrfhydro_huc12_agg.yml) file to run the python script and notebooks. For this environment to work, the latest version of Miniforge should be installed in the user area on Hovenweep. Miniconda may work, but has not been tested with this workflow. See the README documentation in the [Spatial Aggregation](02_Spatial_Aggregation/) folder for first time environment set up instructions. +Users will need to create and activate a conda environment using the [wrfhydro_huc12_agg.yml](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml) file to run the python script and notebooks. For this environment to work, the latest version of Miniforge should be installed in the user area on Hovenweep. Miniconda may work, but has not been tested with this workflow. See the README documentation in the [Spatial Aggregation](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation) folder for first time environment set up instructions. ## Instructions ### 1. Merge -The [Merge 1-D and 2-D jupyter notebook](01_Merge_1D_and_2D_files.ipynb) combines the spatially aggregated outputs of the monthly 1-Dimensional & 2-Dimensional WRF-Hydro modeling application outputs into 1 netCDF file. This script also contains plots that allow the user to explore the range in values for each variable. +The [Merge 1-D and 2-D jupyter notebook](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/01_Merge_1D_and_2D_files.ipynb) combines the spatially aggregated outputs of the monthly 1-Dimensional & 2-Dimensional WRF-Hydro modeling application outputs into 1 netCDF file. This script also contains plots that allow the user to explore the range in values for each variable. ### 2. Finalize -The [Format jupyter notebook](02_Format.ipynb) takes the merged output from step 1 and clarifies variable names, adds character HUCID's, and modifies data types. A 'yrmo' variable is added as a place for year/month information to be stored and to provide an efficient way for R users to access the final datasets. The output from this script is 1 netCDF file containing the monthly WRF-Hydro modeling application outputs aggregated to HUC12s for the years 2011-2013 that is comparable to the netCDF stored on this [Science Base](https://www.sciencebase.gov/catalog/item/6411fd40d34eb496d1cdc99d) page where the original outputs of this workflow are stored. +The [Format jupyter notebook](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/02_Format.ipynb) takes the merged output from step 1 and clarifies variable names, adds character HUCID's, and modifies data types. A 'yrmo' variable is added as a place for year/month information to be stored and to provide an efficient way for R users to access the final datasets. The output from this script is 1 netCDF file containing the monthly WRF-Hydro modeling application outputs aggregated to HUC12s for the years 2011-2013 that is comparable to the netCDF stored on this [Science Base](https://www.sciencebase.gov/catalog/item/6411fd40d34eb496d1cdc99d) page where the original outputs of this workflow are stored. ## Variable Table
diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/OSNpod.ipynb b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/OSNpod.ipynb new file mode 100644 index 00000000..3c3fe8dc --- /dev/null +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/OSNpod.ipynb @@ -0,0 +1,104 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 7, + "id": "0d3462d8-3173-4a9e-ab07-140bce848d55", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: fsspec in /home/lstaub/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages (2025.7.0)\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "#pip install fsspec\n", + "#pip install s3fs" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "26d72c12-7e94-461b-809b-381aaf7144fb", + "metadata": {}, + "outputs": [], + "source": [ + "import fsspec\n", + "import xarray as xr" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ffd8b69c-e06f-4596-b094-a13e475418fd", + "metadata": {}, + "outputs": [ + { + "ename": "KeyError", + "evalue": "'.zmetadata'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/mapping.py:155\u001b[0m, in \u001b[0;36mFSMap.__getitem__\u001b[0;34m(self, key, default)\u001b[0m\n\u001b[1;32m 154\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 155\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfs\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcat\u001b[49m\u001b[43m(\u001b[49m\u001b[43mk\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 156\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmissing_exceptions \u001b[38;5;28;01mas\u001b[39;00m exc:\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/asyn.py:118\u001b[0m, in \u001b[0;36msync_wrapper..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[38;5;28mself\u001b[39m \u001b[38;5;241m=\u001b[39m obj \u001b[38;5;129;01mor\u001b[39;00m args[\u001b[38;5;241m0\u001b[39m]\n\u001b[0;32m--> 118\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43msync\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfunc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/asyn.py:103\u001b[0m, in \u001b[0;36msync\u001b[0;34m(loop, func, timeout, *args, **kwargs)\u001b[0m\n\u001b[1;32m 102\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(return_result, \u001b[38;5;167;01mBaseException\u001b[39;00m):\n\u001b[0;32m--> 103\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m return_result\n\u001b[1;32m 104\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/asyn.py:56\u001b[0m, in \u001b[0;36m_runner\u001b[0;34m(event, coro, result, timeout)\u001b[0m\n\u001b[1;32m 55\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 56\u001b[0m result[\u001b[38;5;241m0\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m coro\n\u001b[1;32m 57\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m ex:\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/asyn.py:455\u001b[0m, in \u001b[0;36mAsyncFileSystem._cat\u001b[0;34m(self, path, recursive, on_error, batch_size, **kwargs)\u001b[0m\n\u001b[1;32m 452\u001b[0m \u001b[38;5;28;01masync\u001b[39;00m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21m_cat\u001b[39m(\n\u001b[1;32m 453\u001b[0m \u001b[38;5;28mself\u001b[39m, path, recursive\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m, on_error\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraise\u001b[39m\u001b[38;5;124m\"\u001b[39m, batch_size\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs\n\u001b[1;32m 454\u001b[0m ):\n\u001b[0;32m--> 455\u001b[0m paths \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_expand_path(path, recursive\u001b[38;5;241m=\u001b[39mrecursive)\n\u001b[1;32m 456\u001b[0m coros \u001b[38;5;241m=\u001b[39m [\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_cat_file(path, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs) \u001b[38;5;28;01mfor\u001b[39;00m path \u001b[38;5;129;01min\u001b[39;00m paths]\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/asyn.py:870\u001b[0m, in \u001b[0;36mAsyncFileSystem._expand_path\u001b[0;34m(self, path, recursive, maxdepth)\u001b[0m\n\u001b[1;32m 869\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(path, \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m--> 870\u001b[0m out \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mawait\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_expand_path([path], recursive, maxdepth)\n\u001b[1;32m 871\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/asyn.py:899\u001b[0m, in \u001b[0;36mAsyncFileSystem._expand_path\u001b[0;34m(self, path, recursive, maxdepth)\u001b[0m\n\u001b[1;32m 898\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m out:\n\u001b[0;32m--> 899\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mFileNotFoundError\u001b[39;00m(path)\n\u001b[1;32m 900\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28msorted\u001b[39m(out)\n", + "\u001b[0;31mFileNotFoundError\u001b[0m: ['hytest/tutorials/dataset_preprocessing/niwaa_wrfhydro_monthly_huc12_agg/*.zarr/.zmetadata']", + "\nThe above exception was the direct cause of the following exception:\n", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/zarr/storage.py:1446\u001b[0m, in \u001b[0;36mFSStore.__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1445\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m-> 1446\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmap\u001b[49m\u001b[43m[\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m]\u001b[49m\n\u001b[1;32m 1447\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexceptions \u001b[38;5;28;01mas\u001b[39;00m e:\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/fsspec/mapping.py:159\u001b[0m, in \u001b[0;36mFSMap.__getitem__\u001b[0;34m(self, key, default)\u001b[0m\n\u001b[1;32m 158\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m default\n\u001b[0;32m--> 159\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mexc\u001b[39;00m\n\u001b[1;32m 160\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m result\n", + "\u001b[0;31mKeyError\u001b[0m: '.zmetadata'", + "\nThe above exception was the direct cause of the following exception:\n", + "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[2], line 5\u001b[0m\n\u001b[1;32m 1\u001b[0m zarr_url \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124ms3://hytest/tutorials/dataset_preprocessing/niwaa_wrfhydro_monthly_huc12_agg/*.zarr\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[1;32m 3\u001b[0m fs \u001b[38;5;241m=\u001b[39m fsspec\u001b[38;5;241m.\u001b[39mfilesystem(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124ms3\u001b[39m\u001b[38;5;124m'\u001b[39m, anon\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m, endpoint_url\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mhttps://usgs.osn.mghpcc.org/\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m----> 5\u001b[0m ds \u001b[38;5;241m=\u001b[39m \u001b[43mxr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mopen_dataset\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfs\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_mapper\u001b[49m\u001b[43m(\u001b[49m\u001b[43mzarr_url\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mengine\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mzarr\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\n\u001b[1;32m 6\u001b[0m \u001b[43m \u001b[49m\u001b[43mbackend_kwargs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mconsolidated\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m:\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m}\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mchunks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m{\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/xarray/backends/api.py:571\u001b[0m, in \u001b[0;36mopen_dataset\u001b[0;34m(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)\u001b[0m\n\u001b[1;32m 559\u001b[0m decoders \u001b[38;5;241m=\u001b[39m _resolve_decoders_kwargs(\n\u001b[1;32m 560\u001b[0m decode_cf,\n\u001b[1;32m 561\u001b[0m open_backend_dataset_parameters\u001b[38;5;241m=\u001b[39mbackend\u001b[38;5;241m.\u001b[39mopen_dataset_parameters,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 567\u001b[0m decode_coords\u001b[38;5;241m=\u001b[39mdecode_coords,\n\u001b[1;32m 568\u001b[0m )\n\u001b[1;32m 570\u001b[0m overwrite_encoded_chunks \u001b[38;5;241m=\u001b[39m kwargs\u001b[38;5;241m.\u001b[39mpop(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124moverwrite_encoded_chunks\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n\u001b[0;32m--> 571\u001b[0m backend_ds \u001b[38;5;241m=\u001b[39m \u001b[43mbackend\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mopen_dataset\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 572\u001b[0m \u001b[43m \u001b[49m\u001b[43mfilename_or_obj\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 573\u001b[0m \u001b[43m \u001b[49m\u001b[43mdrop_variables\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdrop_variables\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 574\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mdecoders\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 575\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 576\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 577\u001b[0m ds \u001b[38;5;241m=\u001b[39m _dataset_from_backend_dataset(\n\u001b[1;32m 578\u001b[0m backend_ds,\n\u001b[1;32m 579\u001b[0m filename_or_obj,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 589\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs,\n\u001b[1;32m 590\u001b[0m )\n\u001b[1;32m 591\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m ds\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/xarray/backends/zarr.py:1170\u001b[0m, in \u001b[0;36mZarrBackendEntrypoint.open_dataset\u001b[0;34m(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version)\u001b[0m\n\u001b[1;32m 1149\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21mopen_dataset\u001b[39m( \u001b[38;5;66;03m# type: ignore[override] # allow LSP violation, not supporting **kwargs\u001b[39;00m\n\u001b[1;32m 1150\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 1151\u001b[0m filename_or_obj: \u001b[38;5;28mstr\u001b[39m \u001b[38;5;241m|\u001b[39m os\u001b[38;5;241m.\u001b[39mPathLike[Any] \u001b[38;5;241m|\u001b[39m BufferedIOBase \u001b[38;5;241m|\u001b[39m AbstractDataStore,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1167\u001b[0m zarr_version\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 1168\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Dataset:\n\u001b[1;32m 1169\u001b[0m filename_or_obj \u001b[38;5;241m=\u001b[39m _normalize_path(filename_or_obj)\n\u001b[0;32m-> 1170\u001b[0m store \u001b[38;5;241m=\u001b[39m \u001b[43mZarrStore\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mopen_group\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1171\u001b[0m \u001b[43m \u001b[49m\u001b[43mfilename_or_obj\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1172\u001b[0m \u001b[43m \u001b[49m\u001b[43mgroup\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mgroup\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1173\u001b[0m \u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1174\u001b[0m \u001b[43m \u001b[49m\u001b[43msynchronizer\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43msynchronizer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1175\u001b[0m \u001b[43m \u001b[49m\u001b[43mconsolidated\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconsolidated\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1176\u001b[0m \u001b[43m \u001b[49m\u001b[43mconsolidate_on_close\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 1177\u001b[0m \u001b[43m \u001b[49m\u001b[43mchunk_store\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mchunk_store\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1178\u001b[0m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstorage_options\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1179\u001b[0m \u001b[43m \u001b[49m\u001b[43mstacklevel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstacklevel\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1180\u001b[0m \u001b[43m \u001b[49m\u001b[43mzarr_version\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mzarr_version\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1181\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1183\u001b[0m store_entrypoint \u001b[38;5;241m=\u001b[39m StoreBackendEntrypoint()\n\u001b[1;32m 1184\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m close_on_error(store):\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/xarray/backends/zarr.py:498\u001b[0m, in \u001b[0;36mZarrStore.open_group\u001b[0;34m(cls, store, mode, synchronizer, group, consolidated, consolidate_on_close, chunk_store, storage_options, append_dim, write_region, safe_chunks, stacklevel, zarr_version, write_empty)\u001b[0m\n\u001b[1;32m 495\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mFileNotFoundError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNo such file or directory: \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mstore\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 496\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m consolidated:\n\u001b[1;32m 497\u001b[0m \u001b[38;5;66;03m# TODO: an option to pass the metadata_key keyword\u001b[39;00m\n\u001b[0;32m--> 498\u001b[0m zarr_group \u001b[38;5;241m=\u001b[39m \u001b[43mzarr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mopen_consolidated\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstore\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mopen_kwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 499\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 500\u001b[0m zarr_group \u001b[38;5;241m=\u001b[39m zarr\u001b[38;5;241m.\u001b[39mopen_group(store, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mopen_kwargs)\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/zarr/convenience.py:1360\u001b[0m, in \u001b[0;36mopen_consolidated\u001b[0;34m(store, metadata_key, mode, **kwargs)\u001b[0m\n\u001b[1;32m 1357\u001b[0m metadata_key \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmeta/root/consolidated/\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m metadata_key\n\u001b[1;32m 1359\u001b[0m \u001b[38;5;66;03m# setup metadata store\u001b[39;00m\n\u001b[0;32m-> 1360\u001b[0m meta_store \u001b[38;5;241m=\u001b[39m \u001b[43mConsolidatedStoreClass\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstore\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmetadata_key\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mmetadata_key\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1362\u001b[0m \u001b[38;5;66;03m# pass through\u001b[39;00m\n\u001b[1;32m 1363\u001b[0m chunk_store \u001b[38;5;241m=\u001b[39m kwargs\u001b[38;5;241m.\u001b[39mpop(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mchunk_store\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m) \u001b[38;5;129;01mor\u001b[39;00m store\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/zarr/storage.py:3046\u001b[0m, in \u001b[0;36mConsolidatedMetadataStore.__init__\u001b[0;34m(self, store, metadata_key)\u001b[0m\n\u001b[1;32m 3043\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mstore \u001b[38;5;241m=\u001b[39m Store\u001b[38;5;241m.\u001b[39m_ensure_store(store)\n\u001b[1;32m 3045\u001b[0m \u001b[38;5;66;03m# retrieve consolidated metadata\u001b[39;00m\n\u001b[0;32m-> 3046\u001b[0m meta \u001b[38;5;241m=\u001b[39m json_loads(\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstore\u001b[49m\u001b[43m[\u001b[49m\u001b[43mmetadata_key\u001b[49m\u001b[43m]\u001b[49m)\n\u001b[1;32m 3048\u001b[0m \u001b[38;5;66;03m# check format of consolidated metadata\u001b[39;00m\n\u001b[1;32m 3049\u001b[0m consolidated_format \u001b[38;5;241m=\u001b[39m meta\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mzarr_consolidated_format\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m)\n", + "File \u001b[0;32m~/miniforge3/envs/wrfhydro_huc12_agg/lib/python3.11/site-packages/zarr/storage.py:1448\u001b[0m, in \u001b[0;36mFSStore.__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1446\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmap[key]\n\u001b[1;32m 1447\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexceptions \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[0;32m-> 1448\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(key) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01me\u001b[39;00m\n", + "\u001b[0;31mKeyError\u001b[0m: '.zmetadata'" + ] + } + ], + "source": [ + "zarr_url = 's3://hytest/tutorials/dataset_preprocessing/niwaa_wrfhydro_monthly_huc12_agg/*.zarr'\n", + "\n", + "fs = fsspec.filesystem('s3', anon=True, endpoint_url='https://usgs.osn.mghpcc.org/')\n", + "\n", + "ds = xr.open_dataset(fs.get_mapper(zarr_url), engine='zarr', \n", + " backend_kwargs={'consolidated':True}, chunks={})\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/README.md b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/README.md index ece63771..cf379c6a 100644 --- a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/README.md +++ b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/README.md @@ -80,17 +80,18 @@ The following input files are needed for this workflow. A 3-year subset of these

WRF-Hydro Background

-![Screenshot](images/wrf-hydro_logo.png) +![Screenshot](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/wrf-hydro_logo.png) The Weather Research and Forecasting Hydrological modeling system ([WRF-Hydro](https://ral.ucar.edu/projects/wrf_hydro)) provides water-budget estimates across space and time by linking process-based hydrologic, and hydraulic routing models of the atmosphere and terrestrial hydrology. The image below shows WRF-Hydro output files organized by model physics component with the files used in this workflow highlighted. The image below shows WRF-Hydro output files organized by model physics component with the files used in this workflow highlighted. -![Screenshot](images/wrf-hydro_outputs2.png) +![Screenshot](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/wrf-hydro_outputs2.png) The image below shows a conceptual diagram created by Aubrey Dugger that shows how the WRF-Hydro National IWAA Configuration water-budget was calculated. -![Screenshot](images/WRF-Hydro_WBM.png) +![Screenshot](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/WRF-Hydro_WBM.png) +*Image credit: Aubrey Dugger.* Want to learn more about the WRF-Hydro Modeling System? [These tutorial recordings](https://doimspp.sharepoint.com/sites/gs-wma-hytest/SitePages/WRF-Hydro-Modeling-System-Hands-on-Tutorial.aspx?xsdata=MDV8MDJ8fDRlMzY5NWMwMTU1MzRiYzEyZjNkMDhkZDcxMzA3YjVmfDA2OTNiNWJhNGIxODRkN2I5MzQxZjMyZjQwMGE1NDk0fDB8MHw2Mzg3OTExNzUxOTg2ODI5NDF8VW5rbm93bnxWR1ZoYlhOVFpXTjFjbWwwZVZObGNuWnBZMlY4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazkwYUdWeUlpd2lWMVFpT2pFeGZRPT18MXxMMk5vWVhSekx6RTVPamcwTlRNME1EQmhMVEF5WldRdE5HVXpPUzFoTW1VMkxUZGhOMlJoWWpsak5UYzBaVjlsWWpVME1UazRNeTAwWVdSaUxUUTNZbU10WVRZeFpTMWhNR1V6WVdRMVl6a3hNV05BZFc1eExtZGliQzV6Y0dGalpYTXZiV1Z6YzJGblpYTXZNVGMwTXpVeU1EY3hPRGc1TWc9PXw1ZTFlYjM2NzA4MWQ0YjZiY2NkNjA4ZGQ3MTMwN2I1Y3w0OTYwODE5NzFjMmQ0ZWMyOTA5MmVlNmVhMzE1OWEyZA%3D%3D&sdata=UDZvaGNyMktQcXZic3pDdmI5NEpOUFhkdnhCNjZOVTlzYll3cmk1OTM4UT0%3D&ovuser=0693b5ba-4b18-4d7b-9341-f32f400a5494%2Clstaub%40usgs.gov&OR=Teams-HL&CT=1743697590477&clickparams=eyJBcHBOYW1lIjoiVGVhbXMtRGVza3RvcCIsIkFwcFZlcnNpb24iOiI0OS8yNTAzMTMyMTAxMiIsIkhhc0ZlZGVyYXRlZFVzZXIiOmZhbHNlfQ%3D%3D) are a great resource and [this document](https://ral.ucar.edu/sites/default/files/docs/water/wrf-hydro-v511-technical-description.pdf) provides even more technical details! This workflow uses the land model (LDASOUT), stream channel routing (CHRTOUT), and conceptual groundwater (GWOUT) outputs from a version of the WRF-Hydro Modeling system that is forced with CONUS404-BA. @@ -99,14 +100,15 @@ Want to learn more about the WRF-Hydro Modeling System? [These tutorial recordin [CONUS404](https://www.sciencebase.gov/catalog/item/6372cd09d34ed907bf6c6ab1) is a high resolution hydro-climate dataset used for forcing hydrological models and covers 43 years of data at 4kilometer resolution. Two separate fields (2-meter air temperature and precipitation) in this dataset had biases identified, leading to the development of a new product [CONUS404-BA](https://www.sciencebase.gov/catalog/item/64f77acad34ed30c20544c18). This dataset has downscaled the CONUS404 dataset from 4 kilometer to 1 kilometer, and bias adjusted the 2-meter air temperature and precipitation fields using Daymet version 3 as the background observational reference. This workflow uses the precipitation and rainrate fields from the CONUS404-BA output (LDASIN). The following image was provided by David Gochis: -![Screenshot](images/CONUS404-BA.png) +![Screenshot](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/CONUS404-BA.png) +*Image credit: Joe Grimm.*

WBD HUC12s Background

The [twelve-digit hydrologic unit codes (HUCs)](https://www.sciencebase.gov/catalog/item/63cb38b2d34e06fef14f40ad) used in this workflow are derived from the Watershed Boundary Dataset (WBD) and are part of a nested spatial unit system. Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 4s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream that drain to a specific point on a river. The United States Congress has assigned the USGS, along with other Federal agencies, to assess national water availability every five years under the [SECURE Water Act](https://www.doi.gov/ocl/hearings/111/SECUREWaterAct_031610). The HUC12 spatial unit is of interest because it is the reporting unit to perform this assessment through the [National Integrated Water Availability Assessments](https://pubs.usgs.gov/publication/pp1894A) (NIWAAs). -![Screenshot](images/WBD_HUC12.png) +![Screenshot](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/WBD_HUC12.png) ## Compute Environment Needs The 10-year WRF-Hydro Modeling Application forced with CONUS404-BA is comprised of 12 years of hourly data (2009-2011). There are 4 WRF-Hydro modeling application output file types used in this workflow: LDASOUT, LDASIN, CHRTOUT, and GWOUT. LDASOUT, CHRTOUT, and GWOUT each have 1 netcdf file for each hour in a day (24 files per day) while LDASOUT has 1 netcdf for every 3 hours (8 files per day). Additionally, there are three leap years during this time span (2012, 2016, and 2020). The following information was gathered to better understand computational needs: @@ -120,21 +122,23 @@ The 10-year WRF-Hydro Modeling Application forced with CONUS404-BA is comprised There are roughly ~350,640 files used as inputs to this workflow that will take up at least 70 TBs worth of storage space. Because of these file sizes, this workflow was developed using High Performance Computing (HPC) systems. If this is the first time HPC systems are being used, an account will need to be [requested](https://hpcportal.cr.usgs.gov/index.html). It is highly encouraged that new users attend [HPC101 Training](https://hpcportal.cr.usgs.gov/training/index.html) before beginning work on HPC systems. To save on storage space, a 3-year subset of these data was downloaded to the USGS HPC system, Hovenweep. The workflow in this repository is currently set up to run on this temporal subset of data (2011, 2012, and 2013) but can be modified to include a larger time span. -The temporal aggregation part of this workflow requires a module called Netcdf Operator (NCO). The spatial aggregation portion of this workflow requires a python environment yml file to be installed. +The temporal aggregation part of this workflow requires a module called Netcdf Operator (NCO). Instructionss on NCO installation can be found in the Set-Up section [here](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation/README.md) The spatial aggregation portion of this workflow requires a python environment yml file to be installed, which can be found [here](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml). + +If you would like to run this tutorial on your own, you can find the shell scripts and jupyter notebooks that are used in this tutorial in our [Github repository](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg). If you find any issues or errors in this tutorial, please open an [issue in our Github repository](https://github.com/hytest-org/hytest/issues).

1. Temporal Aggregation

-The WRF-Hydro modeling application outputs LDASOUT, CHRTOUT, GWOUT and the CONUS404-BA forcing variable subset LDASIN are summarized from hourly to mothly time steps. There is 1 shell script for each variable to be processed, with each one utilizing the NCO module. The [01_Temporal_Aggregation](01_Temporal_Aggregation/) folder contains a README document with instructions for using NCO and running these scripts on the USGS HPC Hovenweep system. Each shell script can be run using the srun command for a single year, or each they can be called from within a slurm file to run multiple years at once. +The WRF-Hydro modeling application outputs LDASOUT, CHRTOUT, GWOUT and the CONUS404-BA forcing variable subset LDASIN are summarized from hourly to mothly time steps. There is 1 shell script for each variable to be processed, with each one utilizing the NCO module. The [01_Temporal_Aggregation](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/01_Temporal_Aggregation) folder contains a README document with instructions for using NCO and running these scripts on the USGS HPC Hovenweep system. Each shell script can be run using the srun command for a single year, or each they can be called from within a slurm file to run multiple years at once.

2. Spatial Aggregation

-These scripts need the correct environment installed, found in the conda environment file titled [wrfhydro_huc12_agg.yml](02_Spatial_Aggregation/wrfhydro_huc12_agg.yml). Instructions for installing the environment can be found in the README documentation in the [02_Spatial_Aggregation](02_Spatial_Aggregation/) folder. There is a python aggregation script for each data type: [1-Dimensional](02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb) and [2-Dimensional](02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb). Due to the differing dimensions of the data, different spatial datasets are used. The 2-Dimensional data is aggregated using a 1000 m grid while the 1-Dimensional data is aggregated with a crosswalk table that contains spatial data for each HUC ID. Spatial aggregations are done using the [flox](https://flox.readthedocs.io/en/latest/aggregations.html) python package. The functions that utilize this package can be found in the [usgs_common.py](02_Spatial_Aggregation/usgs_common.py) python script. +These scripts need the correct environment installed, found in the conda environment file titled [wrfhydro_huc12_agg.yml](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/wrfhydro_huc12_agg.yml). Instructions for installing the environment can be found in the README documentation in the [02_Spatial_Aggregation](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/) folder. There is a python aggregation script for each data type: [1-Dimensional](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/01_2D_spatial_aggregation.ipynb) and [2-Dimensional](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_1D_spatial_aggregation.ipynb). Due to the differing dimensions of the data, different spatial datasets are used. The 2-Dimensional data is aggregated using a 1000 m grid while the 1-Dimensional data is aggregated with a crosswalk table that contains spatial data for each HUC ID. Spatial aggregations are done using the [flox](https://flox.readthedocs.io/en/latest/aggregations.html) python package. The functions that utilize this package can be found in the [usgs_common.py](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/usgs_common.py) python script. ![Screenshot](images/1Dand2Daggregation.png)

3. Merge and Format

-Once the aggregations are complete, the 1D and 2D outputs will need to be merged into 1 netcdf using the [xarray](https://docs.xarray.dev/en/stable/generated/xarray.merge.html) python package. This process also plots the different variables to see what the range of values looks like. This process includes 1 jupyter notebook titled [01_Merge_1D_and_2D_files.ipynb](03_Finalize/01_Merge_1D_and_2D_files.ipynb) that can be can be found within the [03_Finalize](03_Finalize/) folder. This process ensures the merged netCDF file is formatted by clarifying variable names, adding character HUCID's, and modifying data types. A 'yrmo' variable is added as a place for year/month information to be stored and to provide an efficient way for R users to access the final datasets. The formatting process includes 1 jupyter notebook titled [02_Format.ipynb](02_Spatial_Aggregation/02_Format.ipynb) and can be found in the [03_Finalize](03_Finalize/) folder.These scripts use the same environment requirements that are installed in the spatial aggregation portion of this workflow. +Once the aggregations are complete, the 1D and 2D outputs will need to be merged into 1 netcdf using the [xarray](https://docs.xarray.dev/en/stable/generated/xarray.merge.html) python package. This process also plots the different variables to see what the range of values looks like. This process includes 1 jupyter notebook titled [01_Merge_1D_and_2D_files.ipynb](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/01_Merge_1D_and_2D_files.ipynb) that can be can be found within the [03_Finalize](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/) folder. This process ensures the merged netCDF file is formatted by clarifying variable names, adding character HUCID's, and modifying data types. A 'yrmo' variable is added as a place for year/month information to be stored and to provide an efficient way for R users to access the final datasets. The formatting process includes 1 jupyter notebook titled [02_Format.ipynb](https://github.com/hytest-org/hytest/blob/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/02_Spatial_Aggregation/02_Format.ipynb) and can be found in the [03_Finalize](https://github.com/hytest-org/hytest/tree/main/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/03_Finalize/) folder.These scripts use the same environment requirements that are installed in the spatial aggregation portion of this workflow. diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/CONUS404-BA.png b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/CONUS404-BA.png index ee772b2e..5b0919db 100644 Binary files a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/CONUS404-BA.png and b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/CONUS404-BA.png differ diff --git a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/WRF-Hydro_WBM.png b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/WRF-Hydro_WBM.png index 8afe196a..bb126a75 100644 Binary files a/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/WRF-Hydro_WBM.png and b/dataset_processing/tutorials/niwaa_wrfhydro_monthly_huc12_agg/images/WRF-Hydro_WBM.png differ