Skip to content

DSHydro/Insect_Forest_Infestation

Repository files navigation

Insect_Forest_Infestation

People

Zachary Bowyer
Marie Hoeger
Frances Scott-Weis
Valentina Staneva
Nicoleta Cristea
Benjamin Bright

Short form description

Capstone poster presentation: https://github.com/DSHydro/Insect_Forest_Infestation/blob/main/MSDS_Capstone_Poster_Final.pptx

Context

This project was sponsored by the UW eScience institute.
The main goal of this work was to explore the viability of
using Planet Satellite imagery to identify insect infestations in forests.

Description

The purpose of this repository is to explore research avenues proposed by the University of Washington's escience institute.
Specifically, work is aimed at exploring the viability of identifying insect infestations in forests using satellite imagery from Planet.

Insect infestations are a leading cause of tree mortality. The impacts of increasing tree mortality are widespread and can be extremely harmful for forests and the greater environment. Current identification methods involve aerial and manual surveys and are very time intensive. This restricts the frequency at which these surveys can occur, leading to many unidentified outbreaks.

Specifically, this repository contains code used to pull data from sources such as google earth engine, Planet, etc.
Additionally, code for training a random forest model exists in this repository on landcover + planet data + hand labeled red trees.

Work done so far:

  1. Get USFS polygons that range between 2018-2022 of Western Balsam Bark Beetles and have a damage designation of 'severe', 'moderate', or 'very severe'.
  2. Of those USFS polygons, get hand labeled data of red trees at the tree level using 2018-2023 google earth imagery.
  3. Using the polygon coordinates of the red tree labels, get planet RGB basemaps of the areas.
  4. Using the polygon coordinates of the red tree labeles, get a landcover dataset of the areas.
  5. Using the polygon coordinates of the red tree labels, get planet RGB/NIR composites of the areas for 2019-2022.
  6. Calculate spectral indices of composite files (NDVI, GNDVI, RGI).
  7. Combined the basemaps + annotations + landcover datasets, and make sure they are aligned
  8. Train a random forest model on composite data.

Next/potential steps (No order):

  1. CNN
  2. Once red tree model is sufficient, backtrack red trees areas to identify other stages of infestation such as gray/green shift
  3. Time series model, instead of individual images
  4. Improve modular experiment design
  5. Add test suite and continuous integration
  6. Automatic upload/download of data to/from shared Google drive or server
  7. Tool to create large area over-time visualization/heatmap of our trained model's results
  8. Shell script to automatically initialize project (install conda and download data from drive/planet/GEE)

How to get running

Reproducing this work from scratch will require a few steps. In order, these steps can be defined as:

  1. Clone repository into local folder
  2. Request Planet API key
  3. Create Google Earth Engine account
  4. Install anaconda
  5. Activate environment from environment.yml (Windows environment)
  6. Download data folders from google drive or DSHYDRO server
  7. Run code

1. Clone repository into local folder

This is a typical setup step, just make sure whatever storage medium you are cloning to has enough storage to hold datasets. (~100gb)
git clone https://github.com/DSHydro/Insect_Forest_Infestation.git - HTTP
git clone git@github.com:DSHydro/Insect_Forest_Infestation.git - SSH

2. Request Planet API KEY

IMPORTANT - If are you planning to non-commerical planet data (free plans),
make sure you repeat this process many times so that you can get as many
API keys as possible, as the download limit for these keys are quite small.
For our work, we ended up with a total of six api keys by the end, and had reduced
efficiency at times because we had hit quota limit.

Applying to data access programs for ‘Planet’ data. Planet data is essentially daily/weekly/monthly/quarterly satellite imagery data. Access should take roughly one to two weeks. Https://www.planet.com/get-started/ lists ways to get data access, however it seems our use case only applies to the “Science and Education/Education and Research Program.” There are three tiers of Education and Research plans. This section only refers to the free plan for now.

Steps to apply to get access to planet data (Free research plan):

  1. Navigate to https://www.planet.com/markets/education-and-research/
  2. Click the ‘apply now’ button on the top of the page
  3. Fill out the information in the ‘Apply for a Basic Account’ section
  4. Specific Information input fields used (We did not get denied with these answers for 5+ keys):

    A. Please provide a link to online content related to project (e.g. a past manuscript, project or web page), or if you don't have one, simply say "not available"

     “not available”  
    

    B. Please provide a link to your university or department website

     https://www.washington.edu/datasciencemasters/  
    

    C. Please provide a link to more on your background (e.g. researchgate profile, LinkedIn profile), or if you don't have one, simply say "not available

     https://www.linkedin.com/in/zachary-bowyer-834a80164/
    

    D. What best describes your role at the university?

     Graduate student  
    

    E. Describe the project you intend to investigate with Planet data. What questions do you hope to answer?

     Identifying forest insect infestation with temporal geospatial data
    

    F. I plan to use Planet data for:

     Research  
    

    G. Describe the geography you plan to investigate (you'll have download access to up to 5,000 square kilometers of data per month)

     Forests/mountainous regions  
    

    H. How do you plan to publish your results? Check all that apply

     Other  
    

    I. How did you hear about Planet's Education and Research Program? Check all that apply

     Other  
    

Once you’ve applied for access you should get an email in a few days from Planet saying to activate your account.
Go to that email and click on the ‘unique profile link’.
From there, fill out the form information.
In my case, I said the imagery usage level was ‘Beginner’ and using the data for ‘Scientific research at a University”.
After you activate your account you are able to login to your planet account.
Example Timeline:
Applied 10/21/2023 at 8:28PM
Planet activation email received 10/26/2023 at 4:22AM

  1. After you receive your API key, put it in Credentials/Credentials.txt on the first line. (This is where subsequent code will look for your API key for authentication)

3. Create Google Earth Engine account

Go to https://signup.earthengine.google.com/
Use your university affiliated email
Go through the steps.

  1. Click register a noncommerical or commerical cloud project
  2. Click "unpaid usage"
  3. Select project type as "Academia and research"
  4. Click "Create a new Google cloud project"
  5. Select "uw.edu" as organization
  6. Select project-id as whatever you want
  7. Set project name as whatever you want
  8. Click "Continue to summary"
  9. Click "Confirm"

From here, everything should be good to go, one thing is that
you will have to locally authenticate your google earth engine project
with your machine. This is done by calling ee.Authenticate. This will open up a browser for you to log in with, so there's no need to store security information locally in a file. You can test this with the first few cells of https://github.com/DSHydro/Insect_Forest_Infestation/blob/main/Scratch_work/GoogleEarthEngineAPI.ipynb

4. Install anaconda

https://www.anaconda.com/download
Latest version should be fine.

5. Activate environment from environment.yml (Windows environment)

This is going to be the most difficult part of this. The reason for this is that the environment file is hard coded to whatever version of windows the author used.
Our recommendation is that if you are on windows you attempt the environment recreation, and then manually fix issues as they pop up. If you are on linux, we recommend you go through the list of packages and download them manually into your own environment. It may be worthwhile to make an 'environments' folder and make an environment.yml for everyone's machine.

One common issue I ran into was missing DLLs. I ended up just uninstalling and reinstalling packages with earlier versions. It's not a well fleshed out answer but I've only had to solve this problem once so far. I'd recommend to whoever runs into this problem next to document their steps for fixing it here.

6. Download data folders from google drive or DSHYDRO server

For this code to run, we're going to need three folders put in the ../Data folder The names of the folders are arbitrary but we recommend they look something like:

  1. ../Data/Annotations
  2. ../Data/LabeledData
  3. ../Data/UnlabeledData

NOTE: Metadata files are made manually, maybe try to automate this in the future using data from USFS geojson files.

The annotations folder will hold kml annotation files and their associated metadata.
The labeledData folder will contain composite and landcover .tif files. Additionally it needs to contain metadata files.

With that being said, head over to https://drive.google.com/drive/folders/14xWJfO4k8uwXLkXJqv5xm_j9zA6zQExI
and download "Final_Annotations", "LabeledData2", and "UnlabeledData1" as your three folders and put them in the ../Data folder

This data will be enough for you to pick up where we left off.

If you want to generate your own data, look into the code here:
https://github.com/DSHydro/Insect_Forest_Infestation/tree/main/Scratch_work/composite_scripts
https://github.com/DSHydro/Insect_Forest_Infestation/blob/main/Src/DownloadLandcoverFromTIF.py
https://github.com/DSHydro/Insect_Forest_Infestation/blob/main/Src/Modules/PlanetApiWrapper.py

If you intend on using other Planet products, you will need to probably modify existing code or create new functions for that.

7. Run code

You can now try running from the ../Src directory:
python Train_RF.py
python RunRF.py
If you can get these working then you will be able to reproduce the work.

Data sources

RGB Basemaps:
We use planet.com as our source of satellite imagery/geospatial data.

Planet’s Visual Basemaps are 8-bit, time series mosaic
products which are optimized for visual consistency and
minimize the effects of clouds, haze, and other image
variability. They are ideal for use in visual backdrops
or machine learning to enable an understanding of change over time.

PlanetScope Visual Basemaps (Zoom Level 15 - 4.77 meter, Zoom Level 16 - 2.38
meter cell size at the equator) are generated with a proprietary "best scene on top"
algorithm which selects the highest quality imagery from Planet’s catalog over
specified time intervals, based on cloud cover and image sharpness. PlanetScope
Visual Basemaps can be purchased over custom areas of interest at a quarterly,
monthly, biweekly, or weekly cadence.

Composites:
Insert info here

drawing

Individual images:
drawing

The United States Department of Agriculture (USDA) is directed by Congress
to annually report forest conditions for the United States through the
federal lands forest health protection restoration act.
The United States Forest Service (USFS), which is an
organization within the United States Department of Agriculture,
conducts annual aerial surveys to map out forest health.
In this work, estimates of insect damage are recorded, and we utilize this data to find AOIs.

drawing

An important aspect of this work is being able to exclude
non-forest pixels in model training and analysis.
Image segmentation, pixel clustering, and landcover datasets
are a way to do this, for nowe we are using a landcover dataset.
We chose this specific landcover dataset because its date
range falls within the same date range as our satellite imagery (2018-present).
This data has a resolution of 30 meters per pixel.

drawing

Hand labeled red trees

Our ground truth data. Since we cannot make field observations
nor have access to very high resolution satellite imagery,
our process was to use the USFS annual aeriel surveys to find general
insect infestation AOIs, then hand label red trees on google earth's
high-resolution aerial imagery.
Our hand labels are polygons in the CRS84 format.

drawing

Description of main directories

Further documentation on specific files exists as a .md file in each main directory

./Archive/        - Place to put code/data that you don't need but dont want to explicitly delet
./Credentials/    - Contains .txt with one line, where that one line is a planet API key
./Data/           - Contains all the data used by the project
./Images/         - Images to be used in README.md
./Models/         - Saved model weights
./Scratch_work/   - Contains random work, proof of concepts, etc.
./Src/            - Contains polished work

Commands

Environment file: https://github.com/DSHydro/Insect_Forest_Infestation/blob/main/environment.yml (Windows)

Create environment from yml file and activate it

conda env create -f environment.yml
conda activate ForestInfestation  
python -m ipykernel install --user --name=ForestInfestation  

Create conda environment from scratch

conda create --name=ForestInfestation python=3.9.5   
conda env export > environment.yml  

Delete current conda environment

conda remove --name ForestInfestation --all

About

Temporal analysis of geospatial data to identify insect infestation in trees.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •