Coral Associated Fishes and Invertebrates (CAFI) Communities in Coral Reef Ecosystems
This repository contains complete, publication-ready data from two field experiments and one observational survey investigating how cryptic invertebrate communities affect coral reef ecosystems in Mo'orea, French Polynesia (2019-2021).
What are CAFI? Coral Associated Fishes and Invertebrates are organisms (crabs, shrimp, worms, snails, fish) that live hidden within coral branches. These cryptic communities can significantly influence coral health and growth.
- β Original data 100% preserved - All data as collected in the field
- β Numeric columns added - For statistical analysis (see below)
- β 23 data files - Covering 2 experiments and 1 survey
- β Complete metadata - 24 metadata files (5 .txt, 8 .xlsx, 10 .csv)
- β BCO-DMO compliant - Ready for data repository submission
- β FAIR principles - Findable, Accessible, Interoperable, Reusable
- Adrian Stier, UC Santa Barbara (astier@ucsb.edu)
- Craig Osenberg, University of Georgia
- Joseph Curtis, Field Technician, UC Santa Barbara
- Alex Primo, Graduate Student Researcher, University of Georgia
- Dan Cryan, PhD Student, University of Georgia
- Molly Brzezinski, Lab Manager, UC Santa Barbara
- Kelsey Vaughn, PhD Student, University of Georgia
- Ninah Munk, MS Student, UC Santa Barbara
- Lily Zhao, PhD Student, UC Santa Barbara
- Kai Kopecky, PhD Student, UC Santa Barbara
- Christian Deneka, Undergraduate Researcher, University of Georgia
- NSF OCE-1851510 and OCE-1851032 (Ocean Sciences, 2019-2025)
moorea-cafi-data/
βββ README.md # This file - START HERE
βββ BCO_DMO_FILE_DESCRIPTIONS.csv # BCO-DMO dataset organization
βββ BCO_DMO_SUBMISSION_CHECKLIST.md # Submission status and checklist
βββ CLAUDE.md # Repository context for AI assistants
βββ CITATION.cff # Citation information
βββ DATA_DICTIONARY.md # Column descriptions for all files
βββ DATA_INTRODUCTION.html # Interactive data introduction
βββ DOI_AND_VERSIONING.md # DOI and versioning guide
βββ GETTING_STARTED.md # Quick start guide
βββ LICENSE # CC-BY-4.0 license
βββ data/ # 23 data files (CSV + Excel)
β βββ maatea_size_* # Maatea Size experiment (8 files)
β βββ moorea_survey_* # Mo'orea Survey (5 files)
β βββ mrb_amount_* # MRB Amount experiment (10 files)
βββ metadata/ # 24 metadata files
β βββ README_*_project_overview.txt # Method overviews (5 .txt files)
β βββ README_*_metadata_v*.xlsx # Data dictionaries (8 .xlsx files)
β βββ README_*_metadata_v*.csv # BCO-DMO parameter definitions (8 .csv)
β βββ site_locations.csv # GPS coordinates
β βββ personnel.csv # Research team details
βββ images/ # Species photos and figures
-
Start with GETTING_STARTED.md
- Overview of the three experiments
- Which files to use for your analysis
- Common workflows
-
Check DATA_DICTIONARY.md
- Descriptions of every column in every file
- Data types and units
- Special codes and categories
-
Choose your data files from
data/- All files have clear, descriptive names
- See File Naming Convention below
# Example: Load CAFI taxonomy data
cafi_data <- read.csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")
# Use original column for viewing
head(cafi_data$cafi_size_mm) # Shows: "5.2", "<5", "L", etc.
# Use _numeric column for analysis
mean(cafi_data$cafi_size_mm_numeric, na.rm=TRUE) # Calculates meanimport pandas as pd
# Example: Load CAFI taxonomy data
cafi_data = pd.read_csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")
# Use original column for viewing
print(cafi_data['cafi_size_mm'].head()) # Shows: "5.2", "<5", "L", etc.
# Use _numeric column for analysis
cafi_data['cafi_size_mm_numeric'].mean() # Calculates meanQuestion: Does coral colony size affect CAFI communities?
Location: Maatea backreef, Mo'orea (17.6Β°S, 149.8Β°W) Years: 2019-2021 Coral colonies: 60 Pocillopora colonies Treatments: Different colony sizes with/without CAFI removal
Key files:
maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv- CAFI invertebrate datamaatea_size_physiology_master_long_2019_2021_v3.csv- Coral health metricsmaatea_size_fish_surveys_2019_2021_v1.csv- Fish community data
Question: Does coral habitat density affect CAFI community assembly?
Location: MRB north shore backreef, Mo'orea (17.5Β°S, 149.8Β°W) Years: 2019-2021 Coral colonies: 54 Pocillopora colonies Treatments: Low (solitary), Medium (groups of 3), High (groups of 6) density
Key files:
mrb_amount_cafi_field_experiment_summer_2021_v4.csv- CAFI datamrb_amount_coral_growth_surface_area_change_v1.csv- Growth measurementsmrb_amount_physiology_master_2019_2021_v5.csv- Coral physiology
Question: What are natural CAFI communities like across Mo'orea?
Location: Multiple sites around Mo'orea (17.5Β°S, 149.8Β°W) Year: 2019 Coral colonies: 114 Pocillopora colonies surveyed Time Zone: All data collected in Tahiti Time (UTC-10)
Key files:
moorea_survey_cafi_taxonomy_summer_2019_v5.csv- CAFI biodiversitymoorea_survey_coral_characteristics_merged_2019_v2.csv- Coral traitsmoorea_survey_physiology_master_2019_v3.csv- Physiological measurements
All files follow this pattern:
{experiment}_{datatype}_{temporal}_{version}.{ext}
Examples:
-
maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv- maatea_size = Maatea Size experiment
- cafi_taxonomy = CAFI species identification data
- merged_2019_2021 = Combined data from both years
- v2 = Version 2
-
mrb_amount_physiology_master_2019_2021_v5.csv- mrb_amount = MRB Amount experiment
- physiology_master = Complete physiological measurements
- 2019_2021 = Data span 2019-2021
- v5 = Version 5
Original columns (e.g., cafi_size_mm):
- Contains data exactly as recorded in the field
- May include:
<5,<1,L,M,S, or numeric values - Use for: Understanding what was recorded, data provenance
Numeric columns (e.g., cafi_size_mm_numeric):
- Contains only numeric values (non-numeric β NA)
- Use for: Statistical analysis, calculations, plots
Example:
Row 1: cafi_size_mm = "8.5" cafi_size_mm_numeric = 8.5
Row 2: cafi_size_mm = "<5" cafi_size_mm_numeric = NA
Row 3: cafi_size_mm = "L" cafi_size_mm_numeric = NA
Row 4: cafi_size_mm = "12.3" cafi_size_mm_numeric = 12.3
- Original = Preserves field notes like "too small to measure (<5mm)"
- Numeric = Enables calculations without losing original information
<5= Less than 5mm (too small for precise measurement)<1= Less than 1mm (very small larvae/juveniles)L= Large (size category, not measurement)M= Medium (size category, not measurement)S= Small (size category, not measurement)
All missing data is represented by blank/empty cells. This dataset does NOT use placeholder codes like NA, ., or - for missing values.
All dates and times are in Tahiti Time (UTC-10).
All dates are in ISO 8601 format (YYYY-MM-DD).
- Standard:
SITE-POC##(e.g.,MAT-POC01,MRB-POC45) - MAT = Maatea site
- MRB = MRB site
- FE = Survey site code
- HAU = Survey site code
- POC = Pocillopora species
- Suffixes:
D= Dead colony (e.g.,FE-POC16D)A= Alternate sampling (e.g.,FE-POC11A)
- Collection: Coral colonies wrapped in mesh bags, transported to lab
- Extraction: Clove oil anesthetization to expel invertebrates
- Identification: Sorted, measured, identified to lowest taxonomic level
- Size measurements (organism-specific):
- Fish: Standard length (snout to caudal peduncle) in millimeters
- Crustaceans: Body length (carapace length for crabs/shrimp) in millimeters
- Molluscs: Shell length or width (depending on species) in millimeters
- Polychaetes: Body length in millimeters
- Other invertebrates: Maximum body dimension in millimeters
- Note: All measurements exclude appendages, antennae, or tail fins
- Method: Structure-from-Motion (SfM) 3D reconstruction
- Software: Agisoft Metashape
- Measurements: Surface area (cmΒ²), height (cm), volume (cmΒ³)
- Height measurements:
- Measured relative to a horizontal reference plane placed at coral base
- Negative min heights indicate portions of the coral base extending below the reference plane (this is valid and expected for some colonies with irregular bases)
- Max height represents the highest point of the colony above the reference plane
- Tissue slurry: Airbrushed coral tissue homogenized
- Protein: Bradford assay (mg/cmΒ²)
- Carbohydrates: Phenol-sulfuric acid assay (mg/cmΒ²)
- Zooxanthellae: Hemocytometer cell counts (cells/cmΒ²)
maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv- 8,965 rowsmaatea_size_colony_measurements_wide_2019_2021_v1.csv- 60 rowsmaatea_size_experimental_treatments_v1.csv- 945 rowsmaatea_size_fish_surveys_2019_2021_v1.csv- 4,581 rowsmaatea_size_genetic_samples_metadata_v3.xlsx- 60 rowsmaatea_size_photogrammetry_2019_2021_v1.csv- 117 rows (combined Dec 2019 + May 2021)maatea_size_photogrammetry_summer_2019_v1.xlsx- 60 rows (supplemental)maatea_size_physiology_master_long_2019_2021_v3.csv- 118 rows
moorea_survey_cafi_taxonomy_summer_2019_v5.csv- 3,989 rowsmoorea_survey_coral_characteristics_merged_2019_v2.csv- 114 rowsmoorea_survey_physiology_master_2019_v3.csv- 108 rowsmoorea_survey_tip_stump_comparison_dec_2019_v1.xlsx- 21 rows (supplemental)moorea_survey_tip_stump_zoox_counts_dec_2019_v1.xlsx- 108 rows (supplemental)
mrb_amount_cafi_field_experiment_summer_2021_v4.csv- 4,119 rowsmrb_amount_coral_growth_surface_area_change_filtered_v1.csv- 44 rows (supplemental QC)mrb_amount_coral_growth_surface_area_change_v1.csv- 54 rowsmrb_amount_coral_id_position_treatment_v1.csv- 54 rows (supplemental)mrb_amount_experimental_treatments_v1.csv- 54 rowsmrb_amount_fish_surveys_may_2021_v1.csv- 999 rowsmrb_amount_manual_colony_measurements_2019_2021_v1.xlsx- 54 rows (supplemental)mrb_amount_photogrammetry_200k_mesh_2019_2021_v1.csv- 264 rows (supplemental raw)mrb_amount_photogrammetry_measures_2019_2021_v1.csv- 108 rowsmrb_amount_physiology_master_2019_2021_v5.csv- 53 rows
Total: 23 data files
# Load data
cafi <- read.csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")
# Summarize by family
library(dplyr)
cafi %>%
filter(!is.na(family)) %>%
group_by(family) %>%
summarize(
count = n(),
mean_size = mean(cafi_size_mm_numeric, na.rm=TRUE)
) %>%
arrange(desc(count))# Load data
growth <- read.csv("data/mrb_amount_coral_growth_surface_area_change_v1.csv")
treatments <- read.csv("data/mrb_amount_experimental_treatments_v1.csv")
# Merge and analyze
library(dplyr)
merged <- growth %>%
left_join(treatments, by="coral_id") %>%
group_by(treatment) %>%
summarize(
mean_growth = mean(delta_surface_area, na.rm=TRUE),
se = sd(delta_surface_area, na.rm=TRUE)/sqrt(n())
)# Load both datasets
cafi <- read.csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")
phys <- read.csv("data/maatea_size_physiology_master_long_2019_2021_v3.csv")
# Count CAFI per colony
cafi_counts <- cafi %>%
group_by(coral_id, time_point) %>%
summarize(cafi_count = n())
# Merge with physiology
library(dplyr)
merged <- phys %>%
left_join(cafi_counts, by=c("coral_id", "time_point"))
# Analyze relationship
cor.test(merged$cafi_count, merged$protein_mg_cm2)- GETTING_STARTED.md - Detailed introduction for new users
- DATA_DICTIONARY.md - Complete column descriptions
- DATA_INTRODUCTION.html - Interactive data introduction
- To view: Download the file and open it in any web browser (Chrome, Firefox, Safari, etc.)
- Or view on GitHub: Click the file, then click "Download" or use GitHub's HTML preview
Plain text overviews (.txt files):
- README_amount_project_overview.txt - MRB Amount experiment methods
- README_size_project_overview.txt - Maatea Size experiment methods
- README_survey_project_overview.txt - Survey study methods
- README_photogrammetry_metadata_v2.txt - 3D photogrammetry methods and model types
- README_tip_stump_comparison_dec_2019.txt - Tip vs. stump comparison study
Data dictionaries (.xlsx and .csv files):
- 8 Excel files with detailed column descriptions for each dataset
- 8 CSV files with BCO-DMO compliant parameter definitions and units
If you use this data, please cite:
Stier, A.C. and Osenberg, C.W. (2025). Mo'orea Coral Reef CAFI Field
Experiments Data Package (2019-2021). Dataset.
https://github.com/stier-lab/moorea-cafi-data
Questions about the data?
- Adrian Stier: astier@ucsb.edu
- Craig Osenberg: osenberg@uga.edu
Technical issues with this repository?
- Open an issue on GitHub
This data is released under CC-BY-4.0 (Creative Commons Attribution 4.0 International)
You are free to:
- Share β copy and redistribute
- Adapt β remix, transform, and build upon
Under these terms:
- Attribution β cite the dataset
- No additional restrictions
- β BCO-DMO compliant - Meets all data repository standards
- β ISO 8601 dates - All dates in YYYY-MM-DD format
- β Clean column names - Lowercase with underscores only
- β Blank cells for missing data - No placeholder codes
- β Decimal degrees - GPS coordinates properly formatted
- β Original data preserved - Field data in original columns
- β Numeric columns added - For statistical analysis
- β Complete metadata - Detailed methods for every file in accessible formats
- β Image inventory - BCO-DMO compliant image documentation
- β FAIR compliant - Findable, Accessible, Interoperable, Reusable
-
v2.6 (2025-01-03) - Repository cleanup and BCO-DMO finalization
- Consolidated to 3 BCO-DMO datasets (Biological, Morphometry, Experimental Design)
- Added 8 BCO-DMO parameter metadata CSV files
- Consolidated Maatea photogrammetry files (24β23 files)
- Fixed grant numbers and removed outdated documentation
- Full BCO-DMO format compliance
-
v2.3 (2025-01-02) - Full BCO-DMO compliance
- Converted all dates to ISO 8601 format (YYYY-MM-DD)
- Standardized column names (lowercase, underscores only)
- Replaced NA values with blank cells per BCO-DMO standards
- Added image inventory CSV and CLAUDE.md
-
v2.2 (2024-11-11) - BCO-DMO submission preparation
- Fixed funding information (NSF OCE-1851510 and OCE-1851032)
- Added complete research team to personnel
- Added DOI and versioning documentation
-
v2.1 (2024-10-27) - Enhanced metadata release
- Added 5 plain text (.txt) method overview files
- Reformatted metadata files with clear structure
-
v2.0 (2024-10-24) - NSF OCE and LTER/EDI compliant release
- Initial public release with complete metadata and documentation
For information about:
- How DOIs work
- Updating data after DOI assignment
- Planned DOIs for this dataset (BCO-DMO, EDI/LTER, Zenodo)
Ready to submit to BCO-DMO?
See BCO_DMO_SUBMISSION_CHECKLIST.md for:
- Complete submission checklist
- Step-by-step instructions
- Required information and file list
- Contact information and timeline
Last Updated: 2025-01-03 Current Version: v2.6 (BCO-DMO Compliant) Repository Maintained By: Stier Lab, UC Santa Barbara
