Skip to content

stier-lab/moorea-cafi-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

51 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mo'orea Coral Reef CAFI Field Experiments (2019-2021)

Coral Associated Fishes and Invertebrates (CAFI) Communities in Coral Reef Ecosystems

Trapezia guard crab on coral

Status Data Quality Files


πŸ“– About This Dataset

This repository contains complete, publication-ready data from two field experiments and one observational survey investigating how cryptic invertebrate communities affect coral reef ecosystems in Mo'orea, French Polynesia (2019-2021).

What are CAFI? Coral Associated Fishes and Invertebrates are organisms (crabs, shrimp, worms, snails, fish) that live hidden within coral branches. These cryptic communities can significantly influence coral health and growth.

Key Features:

  • βœ… Original data 100% preserved - All data as collected in the field
  • βœ… Numeric columns added - For statistical analysis (see below)
  • βœ… 23 data files - Covering 2 experiments and 1 survey
  • βœ… Complete metadata - 24 metadata files (5 .txt, 8 .xlsx, 10 .csv)
  • βœ… BCO-DMO compliant - Ready for data repository submission
  • βœ… FAIR principles - Findable, Accessible, Interoperable, Reusable

Principal Investigators:

  • Adrian Stier, UC Santa Barbara (astier@ucsb.edu)
  • Craig Osenberg, University of Georgia

Research Team:

  • Joseph Curtis, Field Technician, UC Santa Barbara
  • Alex Primo, Graduate Student Researcher, University of Georgia
  • Dan Cryan, PhD Student, University of Georgia
  • Molly Brzezinski, Lab Manager, UC Santa Barbara
  • Kelsey Vaughn, PhD Student, University of Georgia
  • Ninah Munk, MS Student, UC Santa Barbara
  • Lily Zhao, PhD Student, UC Santa Barbara
  • Kai Kopecky, PhD Student, UC Santa Barbara
  • Christian Deneka, Undergraduate Researcher, University of Georgia

Funding:

  • NSF OCE-1851510 and OCE-1851032 (Ocean Sciences, 2019-2025)

πŸ—‚οΈ Repository Structure

moorea-cafi-data/
β”œβ”€β”€ README.md                          # This file - START HERE
β”œβ”€β”€ BCO_DMO_FILE_DESCRIPTIONS.csv      # BCO-DMO dataset organization
β”œβ”€β”€ BCO_DMO_SUBMISSION_CHECKLIST.md    # Submission status and checklist
β”œβ”€β”€ CLAUDE.md                          # Repository context for AI assistants
β”œβ”€β”€ CITATION.cff                       # Citation information
β”œβ”€β”€ DATA_DICTIONARY.md                 # Column descriptions for all files
β”œβ”€β”€ DATA_INTRODUCTION.html             # Interactive data introduction
β”œβ”€β”€ DOI_AND_VERSIONING.md              # DOI and versioning guide
β”œβ”€β”€ GETTING_STARTED.md                 # Quick start guide
β”œβ”€β”€ LICENSE                            # CC-BY-4.0 license
β”œβ”€β”€ data/                              # 23 data files (CSV + Excel)
β”‚   β”œβ”€β”€ maatea_size_*                 # Maatea Size experiment (8 files)
β”‚   β”œβ”€β”€ moorea_survey_*               # Mo'orea Survey (5 files)
β”‚   └── mrb_amount_*                  # MRB Amount experiment (10 files)
β”œβ”€β”€ metadata/                          # 24 metadata files
β”‚   β”œβ”€β”€ README_*_project_overview.txt # Method overviews (5 .txt files)
β”‚   β”œβ”€β”€ README_*_metadata_v*.xlsx     # Data dictionaries (8 .xlsx files)
β”‚   β”œβ”€β”€ README_*_metadata_v*.csv      # BCO-DMO parameter definitions (8 .csv)
β”‚   β”œβ”€β”€ site_locations.csv            # GPS coordinates
β”‚   └── personnel.csv                 # Research team details
└── images/                            # Species photos and figures

πŸš€ Quick Start

For First-Time Users:

  1. Start with GETTING_STARTED.md

    • Overview of the three experiments
    • Which files to use for your analysis
    • Common workflows
  2. Check DATA_DICTIONARY.md

    • Descriptions of every column in every file
    • Data types and units
    • Special codes and categories
  3. Choose your data files from data/

For R Users:

# Example: Load CAFI taxonomy data
cafi_data <- read.csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")

# Use original column for viewing
head(cafi_data$cafi_size_mm)  # Shows: "5.2", "<5", "L", etc.

# Use _numeric column for analysis
mean(cafi_data$cafi_size_mm_numeric, na.rm=TRUE)  # Calculates mean

For Python Users:

import pandas as pd

# Example: Load CAFI taxonomy data
cafi_data = pd.read_csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")

# Use original column for viewing
print(cafi_data['cafi_size_mm'].head())  # Shows: "5.2", "<5", "L", etc.

# Use _numeric column for analysis
cafi_data['cafi_size_mm_numeric'].mean()  # Calculates mean

πŸ“Š Two Experiments + One Survey

1. Maatea Size Experiment (8 files)

Question: Does coral colony size affect CAFI communities?

Location: Maatea backreef, Mo'orea (17.6Β°S, 149.8Β°W) Years: 2019-2021 Coral colonies: 60 Pocillopora colonies Treatments: Different colony sizes with/without CAFI removal

Key files:

  • maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv - CAFI invertebrate data
  • maatea_size_physiology_master_long_2019_2021_v3.csv - Coral health metrics
  • maatea_size_fish_surveys_2019_2021_v1.csv - Fish community data

2. MRB Amount Experiment (10 files)

Question: Does coral habitat density affect CAFI community assembly?

Location: MRB north shore backreef, Mo'orea (17.5Β°S, 149.8Β°W) Years: 2019-2021 Coral colonies: 54 Pocillopora colonies Treatments: Low (solitary), Medium (groups of 3), High (groups of 6) density

Key files:

  • mrb_amount_cafi_field_experiment_summer_2021_v4.csv - CAFI data
  • mrb_amount_coral_growth_surface_area_change_v1.csv - Growth measurements
  • mrb_amount_physiology_master_2019_2021_v5.csv - Coral physiology

3. Mo'orea Survey (5 files)

Question: What are natural CAFI communities like across Mo'orea?

Location: Multiple sites around Mo'orea (17.5Β°S, 149.8Β°W) Year: 2019 Coral colonies: 114 Pocillopora colonies surveyed Time Zone: All data collected in Tahiti Time (UTC-10)

Key files:

  • moorea_survey_cafi_taxonomy_summer_2019_v5.csv - CAFI biodiversity
  • moorea_survey_coral_characteristics_merged_2019_v2.csv - Coral traits
  • moorea_survey_physiology_master_2019_v3.csv - Physiological measurements

πŸ“ File Naming Convention

All files follow this pattern:

{experiment}_{datatype}_{temporal}_{version}.{ext}

Examples:

  • maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv

    • maatea_size = Maatea Size experiment
    • cafi_taxonomy = CAFI species identification data
    • merged_2019_2021 = Combined data from both years
    • v2 = Version 2
  • mrb_amount_physiology_master_2019_2021_v5.csv

    • mrb_amount = MRB Amount experiment
    • physiology_master = Complete physiological measurements
    • 2019_2021 = Data span 2019-2021
    • v5 = Version 5

⚠️ IMPORTANT: Understanding Columns

Many files have BOTH original and numeric columns:

Original columns (e.g., cafi_size_mm):

  • Contains data exactly as recorded in the field
  • May include: <5, <1, L, M, S, or numeric values
  • Use for: Understanding what was recorded, data provenance

Numeric columns (e.g., cafi_size_mm_numeric):

  • Contains only numeric values (non-numeric β†’ NA)
  • Use for: Statistical analysis, calculations, plots

Example:

Row 1:  cafi_size_mm = "8.5"     cafi_size_mm_numeric = 8.5
Row 2:  cafi_size_mm = "<5"      cafi_size_mm_numeric = NA
Row 3:  cafi_size_mm = "L"       cafi_size_mm_numeric = NA
Row 4:  cafi_size_mm = "12.3"    cafi_size_mm_numeric = 12.3

Why both columns?

  • Original = Preserves field notes like "too small to measure (<5mm)"
  • Numeric = Enables calculations without losing original information

πŸ” Data Codes and Categories

Common CAFI Size Codes:

  • <5 = Less than 5mm (too small for precise measurement)
  • <1 = Less than 1mm (very small larvae/juveniles)
  • L = Large (size category, not measurement)
  • M = Medium (size category, not measurement)
  • S = Small (size category, not measurement)

Missing Data (BCO-DMO Compliant):

All missing data is represented by blank/empty cells. This dataset does NOT use placeholder codes like NA, ., or - for missing values.

Timezone:

All dates and times are in Tahiti Time (UTC-10).

Date Format:

All dates are in ISO 8601 format (YYYY-MM-DD).

Coral ID Format:

  • Standard: SITE-POC## (e.g., MAT-POC01, MRB-POC45)
  • MAT = Maatea site
  • MRB = MRB site
  • FE = Survey site code
  • HAU = Survey site code
  • POC = Pocillopora species
  • Suffixes:
    • D = Dead colony (e.g., FE-POC16D)
    • A = Alternate sampling (e.g., FE-POC11A)

πŸ“ Measurement Methods

CAFI (Coral Associated Fishes and Invertebrates):

  • Collection: Coral colonies wrapped in mesh bags, transported to lab
  • Extraction: Clove oil anesthetization to expel invertebrates
  • Identification: Sorted, measured, identified to lowest taxonomic level
  • Size measurements (organism-specific):
    • Fish: Standard length (snout to caudal peduncle) in millimeters
    • Crustaceans: Body length (carapace length for crabs/shrimp) in millimeters
    • Molluscs: Shell length or width (depending on species) in millimeters
    • Polychaetes: Body length in millimeters
    • Other invertebrates: Maximum body dimension in millimeters
    • Note: All measurements exclude appendages, antennae, or tail fins

Coral Photogrammetry:

  • Method: Structure-from-Motion (SfM) 3D reconstruction
  • Software: Agisoft Metashape
  • Measurements: Surface area (cmΒ²), height (cm), volume (cmΒ³)
  • Height measurements:
    • Measured relative to a horizontal reference plane placed at coral base
    • Negative min heights indicate portions of the coral base extending below the reference plane (this is valid and expected for some colonies with irregular bases)
    • Max height represents the highest point of the colony above the reference plane

Coral Physiology:

  • Tissue slurry: Airbrushed coral tissue homogenized
  • Protein: Bradford assay (mg/cmΒ²)
  • Carbohydrates: Phenol-sulfuric acid assay (mg/cmΒ²)
  • Zooxanthellae: Hemocytometer cell counts (cells/cmΒ²)

πŸ—ƒοΈ Complete File List

Maatea Size Experiment (8 files):

  1. maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv - 8,965 rows
  2. maatea_size_colony_measurements_wide_2019_2021_v1.csv - 60 rows
  3. maatea_size_experimental_treatments_v1.csv - 945 rows
  4. maatea_size_fish_surveys_2019_2021_v1.csv - 4,581 rows
  5. maatea_size_genetic_samples_metadata_v3.xlsx - 60 rows
  6. maatea_size_photogrammetry_2019_2021_v1.csv - 117 rows (combined Dec 2019 + May 2021)
  7. maatea_size_photogrammetry_summer_2019_v1.xlsx - 60 rows (supplemental)
  8. maatea_size_physiology_master_long_2019_2021_v3.csv - 118 rows

Mo'orea Survey (5 files):

  1. moorea_survey_cafi_taxonomy_summer_2019_v5.csv - 3,989 rows
  2. moorea_survey_coral_characteristics_merged_2019_v2.csv - 114 rows
  3. moorea_survey_physiology_master_2019_v3.csv - 108 rows
  4. moorea_survey_tip_stump_comparison_dec_2019_v1.xlsx - 21 rows (supplemental)
  5. moorea_survey_tip_stump_zoox_counts_dec_2019_v1.xlsx - 108 rows (supplemental)

MRB Amount Experiment (10 files):

  1. mrb_amount_cafi_field_experiment_summer_2021_v4.csv - 4,119 rows
  2. mrb_amount_coral_growth_surface_area_change_filtered_v1.csv - 44 rows (supplemental QC)
  3. mrb_amount_coral_growth_surface_area_change_v1.csv - 54 rows
  4. mrb_amount_coral_id_position_treatment_v1.csv - 54 rows (supplemental)
  5. mrb_amount_experimental_treatments_v1.csv - 54 rows
  6. mrb_amount_fish_surveys_may_2021_v1.csv - 999 rows
  7. mrb_amount_manual_colony_measurements_2019_2021_v1.xlsx - 54 rows (supplemental)
  8. mrb_amount_photogrammetry_200k_mesh_2019_2021_v1.csv - 264 rows (supplemental raw)
  9. mrb_amount_photogrammetry_measures_2019_2021_v1.csv - 108 rows
  10. mrb_amount_physiology_master_2019_2021_v5.csv - 53 rows

Total: 23 data files


🎯 Common Analysis Workflows

1. Analyze CAFI Community Composition

# Load data
cafi <- read.csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")

# Summarize by family
library(dplyr)
cafi %>%
  filter(!is.na(family)) %>%
  group_by(family) %>%
  summarize(
    count = n(),
    mean_size = mean(cafi_size_mm_numeric, na.rm=TRUE)
  ) %>%
  arrange(desc(count))

2. Compare Coral Growth Between Treatments

# Load data
growth <- read.csv("data/mrb_amount_coral_growth_surface_area_change_v1.csv")
treatments <- read.csv("data/mrb_amount_experimental_treatments_v1.csv")

# Merge and analyze
library(dplyr)
merged <- growth %>%
  left_join(treatments, by="coral_id") %>%
  group_by(treatment) %>%
  summarize(
    mean_growth = mean(delta_surface_area, na.rm=TRUE),
    se = sd(delta_surface_area, na.rm=TRUE)/sqrt(n())
  )

3. Link CAFI to Coral Physiology

# Load both datasets
cafi <- read.csv("data/maatea_size_cafi_taxonomy_merged_2019_2021_v2.csv")
phys <- read.csv("data/maatea_size_physiology_master_long_2019_2021_v3.csv")

# Count CAFI per colony
cafi_counts <- cafi %>%
  group_by(coral_id, time_point) %>%
  summarize(cafi_count = n())

# Merge with physiology
library(dplyr)
merged <- phys %>%
  left_join(cafi_counts, by=c("coral_id", "time_point"))

# Analyze relationship
cor.test(merged$cafi_count, merged$protein_mg_cm2)

πŸ“š Additional Documentation

Quick Reference Guides

  • GETTING_STARTED.md - Detailed introduction for new users
  • DATA_DICTIONARY.md - Complete column descriptions
  • DATA_INTRODUCTION.html - Interactive data introduction
    • To view: Download the file and open it in any web browser (Chrome, Firefox, Safari, etc.)
    • Or view on GitHub: Click the file, then click "Download" or use GitHub's HTML preview

Detailed Methods (metadata/ folder)

Plain text overviews (.txt files):

Data dictionaries (.xlsx and .csv files):

  • 8 Excel files with detailed column descriptions for each dataset
  • 8 CSV files with BCO-DMO compliant parameter definitions and units

πŸ“„ Citation

If you use this data, please cite:

Stier, A.C. and Osenberg, C.W. (2025). Mo'orea Coral Reef CAFI Field
Experiments Data Package (2019-2021). Dataset.
https://github.com/stier-lab/moorea-cafi-data

πŸ“§ Contact

Questions about the data?

Technical issues with this repository?

  • Open an issue on GitHub

πŸ“œ License

This data is released under CC-BY-4.0 (Creative Commons Attribution 4.0 International)

You are free to:

  • Share β€” copy and redistribute
  • Adapt β€” remix, transform, and build upon

Under these terms:

  • Attribution β€” cite the dataset
  • No additional restrictions

βœ… Data Quality & BCO-DMO Compliance

  • βœ… BCO-DMO compliant - Meets all data repository standards
  • βœ… ISO 8601 dates - All dates in YYYY-MM-DD format
  • βœ… Clean column names - Lowercase with underscores only
  • βœ… Blank cells for missing data - No placeholder codes
  • βœ… Decimal degrees - GPS coordinates properly formatted
  • βœ… Original data preserved - Field data in original columns
  • βœ… Numeric columns added - For statistical analysis
  • βœ… Complete metadata - Detailed methods for every file in accessible formats
  • βœ… Image inventory - BCO-DMO compliant image documentation
  • βœ… FAIR compliant - Findable, Accessible, Interoperable, Reusable

πŸ”„ Version History

  • v2.6 (2025-01-03) - Repository cleanup and BCO-DMO finalization

    • Consolidated to 3 BCO-DMO datasets (Biological, Morphometry, Experimental Design)
    • Added 8 BCO-DMO parameter metadata CSV files
    • Consolidated Maatea photogrammetry files (24β†’23 files)
    • Fixed grant numbers and removed outdated documentation
    • Full BCO-DMO format compliance
  • v2.3 (2025-01-02) - Full BCO-DMO compliance

    • Converted all dates to ISO 8601 format (YYYY-MM-DD)
    • Standardized column names (lowercase, underscores only)
    • Replaced NA values with blank cells per BCO-DMO standards
    • Added image inventory CSV and CLAUDE.md
  • v2.2 (2024-11-11) - BCO-DMO submission preparation

    • Fixed funding information (NSF OCE-1851510 and OCE-1851032)
    • Added complete research team to personnel
    • Added DOI and versioning documentation
  • v2.1 (2024-10-27) - Enhanced metadata release

    • Added 5 plain text (.txt) method overview files
    • Reformatted metadata files with clear structure
  • v2.0 (2024-10-24) - NSF OCE and LTER/EDI compliant release

    • Initial public release with complete metadata and documentation

πŸ“‹ DOI and Versioning

For information about:

  • How DOIs work
  • Updating data after DOI assignment
  • Planned DOIs for this dataset (BCO-DMO, EDI/LTER, Zenodo)

See DOI_AND_VERSIONING.md


πŸš€ BCO-DMO Submission

Ready to submit to BCO-DMO?

See BCO_DMO_SUBMISSION_CHECKLIST.md for:

  • Complete submission checklist
  • Step-by-step instructions
  • Required information and file list
  • Contact information and timeline

Last Updated: 2025-01-03 Current Version: v2.6 (BCO-DMO Compliant) Repository Maintained By: Stier Lab, UC Santa Barbara

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages