Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
462d20d
script to output segments detected by humans
kgarwoodsdzwa Mar 12, 2025
8fff329
add no creation of no bird label segments
kgarwoodsdzwa Mar 17, 2025
1f35b93
add way to create dataset of segements with more control
kgarwoodsdzwa Mar 18, 2025
42c3514
remove unecessary lines
kgarwoodsdzwa Mar 18, 2025
e9c826f
adding pseudocode for creating dataset
kgarwoodsdzwa Apr 15, 2025
f3d93a5
make more functions work
kgarwoodsdzwa Apr 15, 2025
46f3881
change structure and names of some files
kgarwoodsdzwa Apr 16, 2025
76bc584
move older files out of the main section
kgarwoodsdzwa Apr 16, 2025
841a55b
made progress on parsing 2018 labels
kgarwoodsdzwa Apr 17, 2025
80029d0
made create segments function work with subset of data
kgarwoodsdzwa Apr 17, 2025
b673c54
begin adding functionality to creating noise segments
kgarwoodsdzwa Apr 21, 2025
770cfcd
working no_buow segment creation
kgarwoodsdzwa Apr 22, 2025
41cc829
more working
kgarwoodsdzwa Apr 22, 2025
719c432
fixed error handling for edge case
kgarwoodsdzwa Apr 24, 2025
ba6dda9
fix indexing on result csv
kgarwoodsdzwa Apr 24, 2025
35d2e5e
fixed some edge case error handling
kgarwoodsdzwa Apr 24, 2025
bea4cf4
starting to figure out the problem
kgarwoodsdzwa Apr 24, 2025
6fc0eca
optimized stratified group k-fold splitter
kgarwoodsdzwa Apr 25, 2025
16124e5
create new metadata csv with the fold number in a column
kgarwoodsdzwa Apr 25, 2025
a997618
fixed indexing issue and added print statements for debug
kgarwoodsdzwa Apr 28, 2025
37f72b8
added error handling and proper parsing of class list
kgarwoodsdzwa Apr 28, 2025
3018c15
remove extra blank line
kgarwoodsdzwa Apr 28, 2025
4d35019
added proper parsing of indexes in metadata csv
kgarwoodsdzwa Apr 28, 2025
7397fad
fix error of not dropping indexes
kgarwoodsdzwa Apr 28, 2025
1be173e
handling if a wav file for a detection is less than 3s
kgarwoodsdzwa Apr 28, 2025
ecd18b4
Add files to create perch embeddings
Apr 30, 2025
2051600
working version of code
kgarwoodsdzwa May 12, 2025
d19829d
Change output dataframe format for standardization
May 15, 2025
887df67
pylint and update doc strings
May 15, 2025
5f438f2
fixing flake8 and pylint
kgarwoodsdzwa May 15, 2025
c67b6ce
adding some doctstrings
kgarwoodsdzwa May 15, 2025
14e0e3f
Update output to new format
May 28, 2025
6a7c285
Adds inital files for project
Sean1572 Jun 11, 2025
6f0528d
Add pyproject.toml
Sean1572 Jun 11, 2025
36e1db4
add documentation for install
Sean1572 Jun 11, 2025
fd8b1e8
Set python required to 3.10 and above
Sean1572 Jun 11, 2025
46d0f0e
Fixes bug from folder layout
Sean1572 Jun 11, 2025
87acb4d
Adds .gitignore for build artifacts
Sean1572 Jun 11, 2025
174b139
Write draft of abstract model class
Sean1572 Jun 11, 2025
a04a654
Updates documentation for venv
Sean1572 Jun 11, 2025
e71a341
Merge pull request #31 from conservationtechlab/30_package_management
kgarwoodsdzwa Jun 11, 2025
a75f031
Adds loss to ModelOutput
Sean1572 Jun 11, 2025
237fcd8
Merge branch 'dev' into 29_model_training_pipeline
Sean1572 Jun 11, 2025
f8d43fa
Update timm_model to include loss function
Sean1572 Jun 11, 2025
41473ed
Start building dataset handling
Sean1572 Jun 12, 2025
fd9027b
Adds pyha-analyzer as a subdependency
Sean1572 Jun 13, 2025
4df6591
Adds default format for datasets
Sean1572 Jun 13, 2025
aa9d594
Adds data preprocessing and model format
Sean1572 Jun 13, 2025
a4865a7
Adds the preprocessor for spectrograms
Sean1572 Jun 13, 2025
0fe10ba
Adds a demo train script
Sean1572 Jun 13, 2025
6e925cc
Lint
Sean1572 Jun 13, 2025
8f8fb18
feat: handle numpy conflict
Sean1572 Jun 13, 2025
f831e02
Moves files to src and fix bugs
Sean1572 Jun 13, 2025
2639ee5
condense older versions of creating buowset segments
kgarwoodsdzwa Jun 16, 2025
af0bb87
fixed pylint error
kgarwoodsdzwa Jun 16, 2025
1539b7a
fix flake8 and pylint and docstrings
kgarwoodsdzwa Jun 16, 2025
6e53425
flake9, pylint and docstrings
kgarwoodsdzwa Jun 16, 2025
e21482c
fixed pylint and flake8 and docstrings
kgarwoodsdzwa Jun 16, 2025
90c3bf7
fixed line too long for my comment
kgarwoodsdzwa Jun 16, 2025
cec5f46
fix some doctring stuff
kgarwoodsdzwa Jun 16, 2025
b7ffbc4
add description for including the class list
kgarwoodsdzwa Jun 17, 2025
4f3133d
Got training working!
Sean1572 Jun 18, 2025
3f0b204
add docstrings, lint, and remove unused functions
kgarwoodsdzwa Jun 18, 2025
941eb1d
forgot one docstring line
kgarwoodsdzwa Jun 18, 2025
920429a
Merge pull request #32 from conservationtechlab/3_create_dataset
kgarwoodsdzwa Jun 18, 2025
f0ccdb0
Clean up code
Sean1572 Jun 20, 2025
e25e508
Apply better fix for model input and output
Sean1572 Jun 20, 2025
4ac3ed0
Make config easier to manage
Sean1572 Jun 20, 2025
3a48a87
Lint, Spell Check, and Documentation
Sean1572 Jun 20, 2025
a39a944
Add environment set up for config
Sean1572 Jun 20, 2025
492495b
Add high level overview of repo
Sean1572 Jun 20, 2025
be03237
Add keep version in whoot/ __init__.py
Sean1572 Jun 20, 2025
1cfc200
Add leaderboard panel
Sean1572 Jun 20, 2025
9174cf6
Add demo of the supplement to the comet ml logging
Sean1572 Jun 20, 2025
b9711e9
Add better model checkpointing
Sean1572 Jun 27, 2025
bde9b12
Update gitignore to hide model_checkpoints
Sean1572 Jun 27, 2025
da30e61
Added pipeline for augmentations
Sean1572 Jun 27, 2025
ef0886e
Fixed missing run_name after adding custom logging system
Sean1572 Jun 27, 2025
b0eb37c
Merge pull request #38 from conservationtechlab/33_versioning_pyproject
kgarwoodsdzwa Jun 30, 2025
0c20090
Update dataframe format to store embeddings as list
Jun 30, 2025
1eb30c7
Update dataframe format to match birdnet
Jun 30, 2025
019ef96
Update to work with new standard embeddings format
Jun 30, 2025
962da26
fix pylint error of no columns
kgarwoodsdzwa Jul 1, 2025
52c7b89
Merge pull request #45 from conservationtechlab/43_birdnet_embeddings…
kgarwoodsdzwa Jul 1, 2025
52db610
Lint leaderboard.py
Sean1572 Jul 2, 2025
cb2531f
Add linter dev dependencies
Sean1572 Jul 2, 2025
c110816
Add linting docs
Sean1572 Jul 2, 2025
d881195
Added confusion matrix
Sean1572 Jul 2, 2025
67982cd
Linted
Sean1572 Jul 2, 2025
44dbb78
Linted
Sean1572 Jul 2, 2025
3e12498
Added last of the less destructive linting
Sean1572 Jul 4, 2025
515e61c
Fixed issue formatting model input
Sean1572 Jul 4, 2025
4e18962
Finalized rounds of linting, mvp for training done
Sean1572 Jul 4, 2025
22acd07
Add task logging
Sean1572 Jul 4, 2025
e4bf6b5
Added task filtering
Sean1572 Jul 4, 2025
7024de5
Fixed sorting
Sean1572 Jul 4, 2025
caad265
Linted after task update
Sean1572 Jul 4, 2025
351852e
Linted
Sean1572 Jul 4, 2025
0efd9c5
Clean code
Sean1572 Jul 4, 2025
bbbfde1
Cleaned code
Sean1572 Jul 4, 2025
3216415
Linted
Sean1572 Jul 4, 2025
1186e93
Add binary extractor for buowset (#39)
Sean1572 Jul 4, 2025
7f6d3e3
Lint the binary model and retest
Sean1572 Jul 4, 2025
9e3007d
Cleans up train script demos
Sean1572 Jul 4, 2025
e7115c9
Make it easier to add tasks to leaderboard
Sean1572 Jul 4, 2025
8a09aca
Lint
Sean1572 Jul 4, 2025
bab8398
Fixed install
Sean1572 Jul 9, 2025
a596f30
Fixed error handling
Sean1572 Jul 9, 2025
b2d58df
Update documentation for config.yml
Sean1572 Jul 9, 2025
fcaa6bb
Renamed extra deps from model_training to model-training
Sean1572 Jul 9, 2025
c11eb1c
Fixed install for pyha
Sean1572 Jul 9, 2025
2fa8207
Fix docstrings
Jul 10, 2025
d83fa95
Merge pull request #46 from conservationtechlab/19_make_perch_embeddings
kgarwoodsdzwa Jul 10, 2025
82f13cc
Add COMET_WORKSPACE config
Sean1572 Jul 11, 2025
64342b8
Fix missing comet link in readme.md
Sean1572 Jul 11, 2025
0578805
Cleaned config
Sean1572 Jul 11, 2025
d7f30a3
Renamed comment to offline to online
Sean1572 Jul 11, 2025
3a5a4bd
Reworked preprocessors to clarify inheritance
Sean1572 Jul 11, 2025
6211a33
Improved model saving with PretrainedModel
Sean1572 Aug 1, 2025
0318e63
Add google style docstrings, Simplify pylintrc
Sean1572 Aug 6, 2025
4eb928e
Remove docstring
Sean1572 Aug 6, 2025
c4e8b36
Fix mispelling of multilabel
Sean1572 Aug 13, 2025
f6fcbe4
60 esc extractor (#68)
kgarwoodsdzwa Aug 13, 2025
8dc307b
Swap test_fold and vaild_fold in buowset_extractor.py
Sean1572 Aug 13, 2025
99db502
Removed unneeded change for this PR
Sean1572 Aug 13, 2025
3c530f5
Merge pull request #41 from conservationtechlab/40_comet-ml-panels
kgarwoodsdzwa Aug 13, 2025
6415d6e
Merge pull request #48 from conservationtechlab/dev-deps
kgarwoodsdzwa Aug 13, 2025
5c3307e
Merge branch 'dev' into 29_model_training_pipeline
Sean1572 Aug 13, 2025
618da32
Linted and added more documentation
Sean1572 Aug 13, 2025
4fb1ad7
Merge pull request #37 from conservationtechlab/29_model_training_pip…
kgarwoodsdzwa Aug 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
exclude = .venv/*
docstring-convention=google
228 changes: 228 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Created by https://www.toptal.com/developers/gitignore/api/python,venv,visualstudiocode
# Edit at https://www.toptal.com/developers/gitignore?templates=python,venv,visualstudiocode

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

### Python Patch ###
# Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration
poetry.toml

# ruff
.ruff_cache/

# LSP config files
pyrightconfig.json

### venv ###
# Virtualenv
# http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
[Bb]in
[Ii]nclude
[Ll]ib
[Ll]ib64
[Ll]ocal
[Ss]cripts
pyvenv.cfg
pip-selfcheck.json

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets

# Local History for Visual Studio Code
.history/

# Built Visual Studio Code Extensions
*.vsix

### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide

# End of https://www.toptal.com/developers/gitignore/api/python,venv,visualstudiocode

uv.lock
.ruff_cache

# Data Folders
data

# Model Storage
model_checkpoints/*


# testing/debugging notebooks
test.ipynb
buowset.ipynb

# Question: do we want to commit vscode setting.json files?
settings.json

# Block all configs besides the example config
whoot_model_training/configs
!whoot_model_training/configs/config.yml
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
43 changes: 43 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,45 @@
# whoot
Tools for capturing, analyzing, and parsing audio data

# Installation Instructions

## Default Python Instructions
1) Install Python>=3.10
2) Create a virtual enviroment via `python -m venv`
3) Activate the enviroment using an activate script:

- Windows: `.venv\Scripts\activate`
- macOS/Linux: `source .venv/bin/activate`

If this works, you should see in your command line `(whoot)`. If not check https://docs.python.org/3/library/venv.html#how-venvs-work

4) Run in project root `pip install -e .`

To install optional dependencies run `pip install -e .[extra1,extra2,...]`

Current support optional dependency collections include

- `cpu`: Installs torch and torchvision for CPU use only
- `cu128`: Installs torch and torchvision with Cuda 12.8 Binaries
- `model-training`: Required for running scripts in `whoot/model_training`, make sure to add either `cpu` or `cu128`
- `dev`: Installs linters pylint and flake8. MUST be used by developers of whoot

## Usage

Once the enviroment is activated, you should be able to do `python path/to/script.py` to run any of the whoot scripts. If a script states a package is missing, you might not be using the virtual enviroment.

# Developer Notes

## Creating a new Project

When adding a new package, like `assess_birdnet` to the whoot toolkit, add your package name to the `[tool.setuptools]` section of `pyproject.toml`

### Linting

Style guidelines are listed in `.flake8` and `pylintrc`. To use these tools do the following

1) Follow the Installation Instructions, on pip install do `pip install -e .[dev,extra1,extra2,...]`.
2) Activate the environment

To run the linters run `python -m flake8` and `python -m pylint --recursive y PATH/TO/FILES.py`
In order to contribute to whoot, both of these must be cleared.
2 changes: 2 additions & 0 deletions cfgs/params_segment_2017_data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
#length added to beginning and end of detection segment (ms)
padding: 100
85 changes: 85 additions & 0 deletions comet_ml_panels/leaderboard.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# """Creates the Leaderboard for Comet ML Panels

# This script queries from a given Comet ML project a DataFrame of
# model metrics at each step for each model in the project
# Then displays the top models.

# Example:
# This is not intended to be run locally. Please test on Comet-ML.

# For Developers:
# For more on adding to this see docs at
# https://www.comet.com/docs/v2/guides/comet-ui/experiment-management/visualizations/python-panel/

# Note that updating this file does not update comet-ml. Please
# go into the project to update after pushing to GitHub.

# Do not include Doc string in comet-ml... for some reason this
# is displayed in the comet-ml panel if copied directly
# """
from comet_ml import API, APIExperiment, ui
import pandas as pd
import numpy as np

# Initialize Comet API
api = API()

# Select the experiments and metrics to compare
available_metrics = ["train/valid_cMAP", "train/valid_ROCAUC"]
selected_metric = ui.dropdown("Select a metric:", available_metrics)

experiment_keys = api.get_panel_experiment_keys()
data = api.get_metrics_for_chart(
experiment_keys, metrics=[selected_metric], parameters=["task"])

# Given all experiments, find all possible tasks to measure!
available_tasks = list(
set(data[key]["params"]["task"]
for key in data if "task" in data[key]["params"])
)
available_tasks.append(None)
selected_task = ui.dropdown("Select a Task:", available_tasks)

processed_data = []

for key in data:
# Note, some of the early runs have no value for the task
# The following code handles those cases
TASK = None
if "task" in data[key]["params"]:
TASK = data[key]["params"]["task"]

# Only display the leaderboard for tasks we want
# This CAN include runs with no task
if TASK is not selected_task and TASK != selected_task:
continue

# Failed runs may not have metrics
if len(data[key]["metrics"]) == 0:
continue

metric_values = data[key]["metrics"][0]["values"]
max_index = np.argmax(metric_values)

processed_data.append({
"experiment_name": data[key]["experimentName"],
"experiment_key": key,
selected_metric: max(metric_values),
"step": data[key]["metrics"][0]["steps"][max_index],
})

leaderboard_df = pd.DataFrame(processed_data).sort_values(
selected_metric, ascending=False)

leaderboard_df["users"] = leaderboard_df["experiment_key"].apply(
lambda key: APIExperiment(previous_experiment=key).get_user()
)

col_order = [
"experiment_name",
selected_metric,
"experiment_key",
"step",
"users"
]
ui.display(leaderboard_df[col_order])
Loading