From 738cf6f515a55fd6e48189e13d27e5d4c880181c Mon Sep 17 00:00:00 2001 From: Birdmachine Date: Mon, 2 Feb 2026 13:27:44 -0500 Subject: [PATCH 1/4] Adds iCLAP and COMET pages and index rows, as well as missing(?) 4i page. --- docs/assays/metadata/4i.md | 37 +++++++++++++++++++++++ docs/assays/metadata/COMET.md | 37 +++++++++++++++++++++++ docs/assays/metadata/iCLAP.md | 34 +++++++++++++++++++++ docs/assays/metadata/index.md | 2 ++ docs/assays/metadata/new-metadata-test.md | 2 ++ 5 files changed, 112 insertions(+) create mode 100644 docs/assays/metadata/4i.md create mode 100644 docs/assays/metadata/COMET.md create mode 100644 docs/assays/metadata/iCLAP.md diff --git a/docs/assays/metadata/4i.md b/docs/assays/metadata/4i.md new file mode 100644 index 0000000..fe7bf3c --- /dev/null +++ b/docs/assays/metadata/4i.md @@ -0,0 +1,37 @@ +--- +layout: page +--- +# 4i (Iterative Indirect Immunofluorescence Imaging) + +
Version 2 (current) + +## Version 2 (current) + +| Attribute Name | Type | Description | Allowable Values | Required | +|---------------|------|-------------|------------------|----------| +| lab_id | Textfield | A locally assigned identifier provided by the data provider for the dataset. It is used to reference an external metadata record that may be maintained independently, enabling traceability and supporting provenance tracking. Example: Visium_9OLC_A4_S1 | | False | +| source_storage_duration_value | Numeric | The length of time the sample was stored prior to processing it. For assays performed on tissue sections, this refers to how long the tissue section (e.g., slide) was stored before the assay began (e.g., imaging). For assays performed on suspensions, such as sequencing, it refers to how long the suspension was stored before library construction started. Example: 12 | | True | +| time_since_acquisition_instrument_calibration_value | Numeric | The length of time since the acquisition instrument was last serviced or calibrated. This provides a metric for assessing drift in data capture. Example: 10 | | False | +| contributors_path | Textfield | The name of the file containing the ORCID IDs for all contributors to this dataset. Example: ./contributors.csv | | True | +| data_path | Textfield | The top-level directory containing the raw and/or processed data. For a single dataset upload, this might be represented as ".", whereas for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For example, if the data is within a directory named "TEST001-RK", use the syntax "./TEST001-RK" for this field. If there are multiple directory levels, use the format "./TEST001-RK/Run1/Pass2", where "Pass2" is the subdirectory where the single dataset's data is stored. This is an internal metadata field used solely for data ingestion. Example: ./TEST001-RK | | True | +| number_of_antibodies | Numeric | The number of antibodies used in the assay. If no antibodies were utilized, enter 0. Example: 5 | | True | +| number_of_biomarker_imaging_rounds | Numeric | The number of imaging rounds required to capture the tagged biomarkers. For CODEX, a biomarker imaging round includes steps such as (1) oligo application, (2) fluor application, and (3) washes. For Cell DIVE, it involves (1) the staining of a biomarker via secondary detection or direct conjugate, followed by (2) dye inactivation. Example: 3 | | True | +| number_of_total_imaging_rounds | Numeric | The total number of imaging rounds performed using a microscope to collect either autofluorescence/background or stained signals, such as those used in histological analysis. Example: 5 | | True | +| slide_id | Textfield | The unique identifier assigned to each slide, enabling users to determine which tissue sections were processed together on the same slide. It is recommended that data providers prefix the ID with the center name to prevent overlapping values across different centers. Example: VAN0071-PA-1-1_AF | | True | +| dataset_type | Assigned Value | The specific type of dataset being produced. Example: RNAseq | ```Visium HD```, ```4i```, ```LC-MS```, ```Thick section Multiphoton MxIF```, ```Light Sheet```, ```ATACseq```, ```Resolve```, ```HiFi-Slide```, ```COMET```, ```MPLEx```, ```10X Multiome```, ```MALDI```, ```Histology```, ```Cell DIVE```, ```FACS```, ```MS Lipidomics```, ```Visium (no probes)```, ```MUSIC```, ```RNAseq```, ```GeoMx (NGS)```, ```GeoMx (nCounter)```, ```RNAseq (with probes)```, ```Singular Genomics G4X```, ```Molecular Cartography```, ```CosMx Transcriptomics```, ```MERFISH```, ```Pixel-seqV2```, ```2D Imaging Mass Cytometry```, ```Confocal```, ```seqFISH```, ```DART-FISH```, ```MIBI```, ```Olink```, ```Enhanced Stimulated Raman Spectroscopy (SRS)```, ```DESI```, ```Xenium```, ```CyCIF```, ```SNARE-seq2```, ```nanoSPLITS```, ```Stereo-seq```, ```Visium (with probes)```, ```SIMS```, ```Auto-fluorescence```, ```CyTOF```, ```CosMx Proteomics```, ```DBiT-seq```, ```PhenoCycler```, ```CODEX```, ```Second Harmonic Generation (SHG)```, ```Seq-Scope``` | True | +| analyte_class | Assigned Value | The analyte class which is the target molecule that the assay is measuring. Example: DNA | ```Nucleic acid + protein```, ```Lipid + metabolite```, ```Collagen```, ```RNA```, ```Fluorochrome```, ```DNA```, ```Metabolite```, ```DNA + RNA```, ```Saturated lipid```, ```Lipid```, ```Peptide```, ```Protein```, ```Unsaturated lipid```, ```Endogenous fluorophore```, ```Chromatin```, ```Polysaccharide``` | True | +| acquisition_instrument_vendor | Assigned Value | The company that manufactures or supplies the acquisition instrument. An acquisition instrument is a device equipped with signal detection hardware and signal processing software. It captures signals produced by assays, such as variations in light intensity or color, or signals corresponding to molecular mass. If the instrument was custom-built or developed internally, enter "In-House". Example: Illumina | ```Complete Genomics```, ```Cytek Biosciences```, ```Thermo Fisher Scientific```, ```Sciex```, ```Vizgen```, ```Leica Microsystems```, ```Akoya Biosciences```, ```Keyence```, ```Andor```, ```Standard BioTools (Fluidigm)```, ```Leica Biosystems```, ```Zeiss Microscopy```, ```Ionpath```, ```Motic```, ```In-House```, ```Evident Scientific (Olympus)```, ```GE Healthcare```, ```Element Biosciences```, ```Hamamatsu```, ```Bruker```, ```Illumina```, ```3DHISTECH```, ```Singular Genomics```, ```Huron Digital Pathology```, ```Resolve Biosciences```, ```NanoString```, ```Cytiva```, ```10x Genomics```, ```Microscopes International```, ```BGI Genomics``` | True | +| acquisition_instrument_model | Assigned Value | The specific model of the acquisition instrument, as manufacturers often offer various versions with differing features or sensitivities. These differences may be relevant to the processing or interpretation of the data. If the instrument was custom-built or developed internally, enter "In-House". If the model is unknown, enter "Unknown". Example: HiSeq 4000 | ```NovaSeq X```, ```NovaSeq X Plus```, ```Cytek Northern Lights```, ```Lightsheet 7```, ```Resolve Biosciences Molecular Cartography```, ```timsTOF HT```, ```timsTOF Pro 2```, ```timsTOF Pro```, ```timsTOF Ultra```, ```timsTOF Ultra 2```, ```timsTOF SCP```, ```Axio Scan.Z1```, ```MALDI timsTOF Flex Prototype```, ```CosMx Spatial Molecular Imager```, ```Unknown```, ```MERSCOPE Ultra```, ```Juno System```, ```timsTOF FleX```, ```Custom: Multiphoton```, ```CyTOF XT```, ```Helios```, ```EVOS M7000```, ```Aperio AT2```, ```Phenocycler-Fusion 2.0```, ```Axio Observer 5```, ```Axio Observer 7```, ```Axio Observer 3```, ```NanoZoomer-SQ```, ```NanoZoomer S210```, ```NanoZoomer S60```, ```NanoZoomer S360```, ```DM6 B```, ```MoticEasyScan One```, ```In-House```, ```NextSeq 500```, ```BZ-X710```, ```QTRAP 5500```, ```NextSeq 550```, ```HiSeq 2500```, ```HiSeq 4000```, ```NovaSeq 6000```, ```Q Exactive HF```, ```Orbitrap Fusion Lumos Tribrid```, ```Q Exactive```, ```VS200 Slide Scanner```, ```Not applicable```, ```Orbitrap Eclipse Tribrid```, ```MIBIscope```, ```IN Cell Analyzer 2200```, ```timsTOF FleX MALDI-2``` | True | +| source_storage_duration_unit | Assigned Value | The unit of measurement used to specify the source storage duration value. Example: hour | ```hour```, ```month```, ```day```, ```minute```, ```year``` | True | +| time_since_acquisition_instrument_calibration_unit | Assigned Value | The unit of measurement used to specify the time since acquisition instrument calibration value. Example: month | ```month```, ```day```, ```year``` | False | +| metadata_schema_id | Textfield | The unique string identifier for the metadata specification version, which is easily interpretable by computers for purposes of data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9 | | True | +| preparation_protocol_doi | Link | The DOI for the protocols.io page that details the assay or the procedures used for sample procurement and preparation. For example, in the case of an imaging assay, the protocol may start with tissue section staining and end with the generation of an OME-TIFF file. The documented protocol should also include any image processing steps involved in producing the final OME-TIFF. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1 | | True | +| is_targeted | Radio | Indicates whether a specific molecule or set of molecules is targeted for detection or measurement by the assay. Example: Yes | ```Yes```, ```No``` | True | +| antibodies_path | Textfield | The path to the antibodies.tsv file relative to the root directory of the upload structure. This path should start with "." and is typically formatted as "./extras/antibodies.tsv". Example: ./extras/antibodies.tsv | | True | +| parent_sample_id | Textfield | The unique identifier from HuBMAP or SenNet for the sample (such as a block, section, or suspension) used to perform the assay. For instance, in an RNAseq assay, the parent sample would be the suspension, while in imaging assays, it would be the tissue section. If the assay is derived from multiple parent samples, this field should contain a comma-separated list of identifiers. Example: HBM386.ZGKG.235, HBM672.MKPK.442 | | True | +| non_global_files | Textfield | Specifies a semicolon-separated list of non-global files that are to be included in the dataset. The file paths assume that the files are located in the "TOP/non-global/" directory. For instance, if the file is located at TOP/non-global/lab_processed/images/1-tissue-boundary.geojson, the value for this field would be "./lab_processed/images/1-tissue-boundary.geojson". Once ingested, these files will be copied to their appropriate locations within the respective dataset directory tree. This field is intended for internal HuBMAP processing. Examples for GeoMx and PhenoCycler are provided in the File Locations documentation: https://docs.google.com/document/d/1n2McSs9geA9Eli4QWQaB3c9R3wo5d5U1Xd57DWQfN5Q/edit#heading=h.1u82i4axggee Example: ./lab_processed/images/1-tissue-boundary.geojson | | False | +| cell_boundary_marker_or_stain | Textfield | The name of the marker or stain used to identify all cell boundaries in the tissue. This name must exactly match the antibody-targeted molecule marker or non-antibody targeted molecule stain as found in the imaging data. For example, in the case of using the PhenoCycler, ensure the name corresponds to the value in the XPD output file. If multiple markers or stains are employed, list them in a comma-separated format. Example: Pan-Cytokeratin, E-Cadherin | | False | +| nuclear_marker_or_stain | Textfield | The nuclear marker or stain used, which can be an antibody-targeted molecule present in or around the cell nucleus. For protein targets, use the protein or gene symbol that identifies the antibody target, ensuring it matches the antibody target from the panel used or custom panels. Preferably, if using a custom antibody marker, this symbol should be the HGNC symbol (https://www.genenames.org/). For non-protein targets, provide the stain name (e.g., DAPI) and, when applicable, include the associated staining kit and vendor. For the PhenoCycler, ensure the symbol matches the value found in the XPD output file. Example: DAPI | | False | +| number_of_channels | Numeric | The number of fluorescent channels that are imaged during each cycle. Example: 3 | | True | + +
\ No newline at end of file diff --git a/docs/assays/metadata/COMET.md b/docs/assays/metadata/COMET.md new file mode 100644 index 0000000..7a7348b --- /dev/null +++ b/docs/assays/metadata/COMET.md @@ -0,0 +1,37 @@ +--- +layout: page +--- +# COMET + +
Version 2.0 (use this one) + +## Version 2.0 (use this one) + +| Attribute Name | Type | Description | Allowable Values | Required | +|---------------|------|-------------|------------------|----------| +| lab_id | Textfield | A locally assigned identifier provided by the data provider for the dataset. It is used to reference an external metadata record that may be maintained independently, enabling traceability and supporting provenance tracking. Example: Visium_9OLC_A4_S1 | | False | +| source_storage_duration_value | Numeric | The length of time the sample was stored prior to processing it. For assays performed on tissue sections, this refers to how long the tissue section (e.g., slide) was stored before the assay began (e.g., imaging). For assays performed on suspensions, such as sequencing, it refers to how long the suspension was stored before library construction started. Example: 12 | | True | +| time_since_acquisition_instrument_calibration_value | Numeric | The length of time since the acquisition instrument was last serviced or calibrated. This provides a metric for assessing drift in data capture. Example: 10 | | False | +| contributors_path | Textfield | The name of the file containing the ORCID IDs for all contributors to this dataset. Example: ./contributors.csv | | True | +| data_path | Textfield | The top-level directory containing the raw and/or processed data. For a single dataset upload, this might be represented as ".", whereas for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For example, if the data is within a directory named "TEST001-RK", use the syntax "./TEST001-RK" for this field. If there are multiple directory levels, use the format "./TEST001-RK/Run1/Pass2", where "Pass2" is the subdirectory where the single dataset's data is stored. This is an internal metadata field used solely for data ingestion. Example: ./TEST001-RK | | True | +| number_of_antibodies | Numeric | The number of antibodies used in the assay. If no antibodies were utilized, enter 0. Example: 5 | | True | +| number_of_biomarker_imaging_rounds | Numeric | The number of imaging rounds required to capture the tagged biomarkers. For CODEX, a biomarker imaging round includes steps such as (1) oligo application, (2) fluor application, and (3) washes. For Cell DIVE, it involves (1) the staining of a biomarker via secondary detection or direct conjugate, followed by (2) dye inactivation. Example: 3 | | True | +| number_of_total_imaging_rounds | Numeric | The total number of imaging rounds performed using a microscope to collect either autofluorescence/background or stained signals, such as those used in histological analysis. Example: 5 | | True | +| slide_id | Textfield | The unique identifier assigned to each slide, enabling users to determine which tissue sections were processed together on the same slide. It is recommended that data providers prefix the ID with the center name to prevent overlapping values across different centers. Example: VAN0071-PA-1-1_AF | | True | +| dataset_type | Assigned Value | The specific type of dataset being produced. Example: RNAseq | ```Visium HD```, ```4i```, ```LC-MS```, ```Thick section Multiphoton MxIF```, ```Light Sheet```, ```ATACseq```, ```Resolve```, ```HiFi-Slide```, ```COMET```, ```MPLEx```, ```10X Multiome```, ```MALDI```, ```Raman Imaging```, ```Histology```, ```Cell DIVE```, ```FACS```, ```MS Lipidomics```, ```Visium (no probes)```, ```MUSIC```, ```RNAseq```, ```GeoMx (NGS)```, ```GeoMx (nCounter)```, ```RNAseq (with probes)```, ```Singular Genomics G4X```, ```Molecular Cartography```, ```CosMx Transcriptomics```, ```MERFISH```, ```Pixel-seqV2```, ```2D Imaging Mass Cytometry```, ```Confocal```, ```seqFISH```, ```DART-FISH```, ```MIBI```, ```Olink```, ```Enhanced Stimulated Raman Spectroscopy (SRS)```, ```DESI```, ```Xenium```, ```iCLAP```, ```CyCIF```, ```SNARE-seq2```, ```nanoSPLITS```, ```STARmap```, ```Stereo-seq```, ```Visium (with probes)```, ```SIMS```, ```Auto-fluorescence```, ```CyTOF```, ```CosMx Proteomics```, ```Virtual Histology```, ```DBiT-seq``` | True | +| analyte_class | Assigned Value | The analyte class which is the target molecule that the assay is measuring. Example: DNA | ```Nucleic acid + protein```, ```Lipid + metabolite```, ```Collagen```, ```RNA```, ```Fluorochrome```, ```DNA```, ```Metabolite```, ```DNA + RNA```, ```Saturated lipid```, ```Lipid```, ```Lipid + metabolite + protein```, ```RNA + protein```, ```Peptide```, ```Protein```, ```Unsaturated lipid```, ```Endogenous fluorophore```, ```Chromatin```, ```Polysaccharide``` | True | +| acquisition_instrument_vendor | Assigned Value | The company that manufactures or supplies the acquisition instrument. An acquisition instrument is a device equipped with signal detection hardware and signal processing software. It captures signals produced by assays, such as variations in light intensity or color, or signals corresponding to molecular mass. If the instrument was custom-built or developed internally, enter "In-House". Example: Illumina | ```Complete Genomics```, ```Cytek Biosciences```, ```Thermo Fisher Scientific```, ```Sciex```, ```Vizgen```, ```Leica Microsystems```, ```Akoya Biosciences```, ```Keyence```, ```Andor```, ```Standard BioTools (Fluidigm)```, ```Leica Biosystems```, ```Zeiss Microscopy```, ```Ionpath```, ```Motic```, ```In-House```, ```Revvity```, ```Evident Scientific (Olympus)```, ```GE Healthcare```, ```Element Biosciences```, ```Hamamatsu```, ```Waters```, ```Bruker```, ```Illumina```, ```3DHISTECH```, ```Singular Genomics```, ```Huron Digital Pathology```, ```Resolve Biosciences```, ```NanoString```, ```Cytiva```, ```10x Genomics```, ```Microscopes International```, ```BGI Genomics``` | True | +| acquisition_instrument_model | Assigned Value | The specific model of the acquisition instrument, as manufacturers often offer various versions with differing features or sensitivities. These differences may be relevant to the processing or interpretation of the data. If the instrument was custom-built or developed internally, enter "In-House". If the model is unknown, enter "Unknown". Example: HiSeq 4000 | ```NovaSeq X```, ```NovaSeq X Plus```, ```Cytek Northern Lights```, ```Lightsheet 7```, ```Resolve Biosciences Molecular Cartography```, ```timsTOF HT```, ```timsTOF Pro 2```, ```timsTOF Pro```, ```timsTOF Ultra```, ```timsTOF Ultra 2```, ```timsTOF SCP```, ```Axio Scan.Z1```, ```MALDI timsTOF Flex Prototype```, ```CosMx Spatial Molecular Imager```, ```Unknown```, ```MERSCOPE Ultra```, ```Juno System```, ```timsTOF FleX```, ```Custom: Multiphoton```, ```CyTOF XT```, ```Helios```, ```EVOS M7000```, ```Aperio AT2```, ```Phenocycler-Fusion 2.0```, ```Axio Observer 5```, ```Axio Observer 7```, ```Axio Observer 3```, ```NanoZoomer-SQ```, ```NanoZoomer S210```, ```NanoZoomer S60```, ```NanoZoomer S360```, ```DM6 B```, ```MoticEasyScan One```, ```In-House```, ```NextSeq 500```, ```BZ-X710```, ```QTRAP 5500```, ```DMi8```, ```NextSeq 550```, ```HiSeq 2500```, ```HiSeq 4000```, ```NovaSeq 6000```, ```Opera Phenix Plus HCS```, ```SYNAPT G2-Si```, ```Q Exactive HF```, ```Orbitrap Fusion Tribrid```, ```Orbitrap Fusion Lumos Tribrid```, ```Q Exactive```, ```VS200 Slide Scanner```, ```Not applicable``` | True | +| source_storage_duration_unit | Assigned Value | The unit of measurement used to specify the source storage duration value. Example: hour | ```hour```, ```month```, ```day```, ```minute```, ```year``` | True | +| time_since_acquisition_instrument_calibration_unit | Assigned Value | The unit of measurement used to specify the time since acquisition instrument calibration value. Example: month | ```month```, ```day```, ```year``` | False | +| metadata_schema_id | Textfield | The unique string identifier for the metadata specification version, which is easily interpretable by computers for purposes of data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9 | | True | +| preparation_protocol_doi | Link | The DOI for the protocols.io page that details the assay or the procedures used for sample procurement and preparation. For example, in the case of an imaging assay, the protocol may start with tissue section staining and end with the generation of an OME-TIFF file. The documented protocol should also include any image processing steps involved in producing the final OME-TIFF. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1 | | True | +| is_targeted | Radio | Indicates whether a specific molecule or set of molecules is targeted for detection or measurement by the assay. Example: Yes | ```Yes```, ```No``` | True | +| antibodies_path | Textfield | The path to the antibodies.tsv file relative to the root directory of the upload structure. This path should start with "." and is typically formatted as "./extras/antibodies.tsv". Example: ./extras/antibodies.tsv | | True | +| parent_sample_id | Textfield | The unique identifier from HuBMAP or SenNet for the sample (such as a block, section, or suspension) used to perform the assay. For instance, in an RNAseq assay, the parent sample would be the suspension, while in imaging assays, it would be the tissue section. If the assay is derived from multiple parent samples, this field should contain a comma-separated list of identifiers. Example: HBM386.ZGKG.235, HBM672.MKPK.442 | | True | +| non_global_files | Textfield | Specifies a semicolon-separated list of non-global files that are to be included in the dataset. The file paths assume that the files are located in the "TOP/non-global/" directory. For instance, if the file is located at TOP/non-global/lab_processed/images/1-tissue-boundary.geojson, the value for this field would be "./lab_processed/images/1-tissue-boundary.geojson". Once ingested, these files will be copied to their appropriate locations within the respective dataset directory tree. This field is intended for internal HuBMAP processing. Examples for GeoMx and PhenoCycler are provided in the File Locations documentation: https://docs.google.com/document/d/1n2McSs9geA9Eli4QWQaB3c9R3wo5d5U1Xd57DWQfN5Q/edit#heading=h.1u82i4axggee Example: ./lab_processed/images/1-tissue-boundary.geojson | | False | +| cell_boundary_marker_or_stain | Textfield | The name of the marker or stain used to identify all cell boundaries in the tissue. This name must exactly match the antibody-targeted molecule marker or non-antibody targeted molecule stain as found in the imaging data. For example, in the case of using the PhenoCycler, ensure the name corresponds to the value in the XPD output file. If multiple markers or stains are employed, list them in a comma-separated format. Example: Pan-Cytokeratin, E-Cadherin | | False | +| nuclear_marker_or_stain | Textfield | The nuclear marker or stain used, which can be an antibody-targeted molecule present in or around the cell nucleus. For protein targets, use the protein or gene symbol that identifies the antibody target, ensuring it matches the antibody target from the panel used or custom panels. Preferably, if using a custom antibody marker, this symbol should be the HGNC symbol (https://www.genenames.org/). For non-protein targets, provide the stain name (e.g., DAPI) and, when applicable, include the associated staining kit and vendor. For the PhenoCycler, ensure the symbol matches the value found in the XPD output file. Example: DAPI | | False | +| number_of_channels | Numeric | The number of fluorescent channels that are imaged during each cycle. Example: 3 | | True | + +
\ No newline at end of file diff --git a/docs/assays/metadata/iCLAP.md b/docs/assays/metadata/iCLAP.md new file mode 100644 index 0000000..ce13e52 --- /dev/null +++ b/docs/assays/metadata/iCLAP.md @@ -0,0 +1,34 @@ +--- +layout: page +--- +# iCLAP + +
Version 2.0 (use this one) + +## Version 2.0 (use this one) + +| Attribute Name | Type | Description | Allowable Values | Required | +|---------------|------|-------------|------------------|----------| +| lab_id | Textfield | A locally assigned identifier provided by the data provider for the dataset. It is used to reference an external metadata record that may be maintained independently, enabling traceability and supporting provenance tracking. Example: Visium_9OLC_A4_S1 | | False | +| source_storage_duration_value | Numeric | The length of time the sample was stored prior to processing it. For assays performed on tissue sections, this refers to how long the tissue section (e.g., slide) was stored before the assay began (e.g., imaging). For assays performed on suspensions, such as sequencing, it refers to how long the suspension was stored before library construction started. Example: 12 | | True | +| time_since_acquisition_instrument_calibration_value | Numeric | The length of time since the acquisition instrument was last serviced or calibrated. This provides a metric for assessing drift in data capture. Example: 10 | | False | +| contributors_path | Textfield | The name of the file containing the ORCID IDs for all contributors to this dataset. Example: ./contributors.csv | | True | +| data_path | Textfield | The top-level directory containing the raw and/or processed data. For a single dataset upload, this might be represented as ".", whereas for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For example, if the data is within a directory named "TEST001-RK", use the syntax "./TEST001-RK" for this field. If there are multiple directory levels, use the format "./TEST001-RK/Run1/Pass2", where "Pass2" is the subdirectory where the single dataset's data is stored. This is an internal metadata field used solely for data ingestion. Example: ./TEST001-RK | | True | +| number_of_antibodies | Numeric | The number of antibodies used in the assay. If no antibodies were utilized, enter 0. Example: 5 | | True | +| number_of_biomarker_imaging_rounds | Numeric | The number of imaging rounds required to capture the tagged biomarkers. For CODEX, a biomarker imaging round includes steps such as (1) oligo application, (2) fluor application, and (3) washes. For Cell DIVE, it involves (1) the staining of a biomarker via secondary detection or direct conjugate, followed by (2) dye inactivation. Example: 3 | | True | +| number_of_total_imaging_rounds | Numeric | The total number of imaging rounds performed using a microscope to collect either autofluorescence/background or stained signals, such as those used in histological analysis. Example: 5 | | True | +| slide_id | Textfield | The unique identifier assigned to each slide, enabling users to determine which tissue sections were processed together on the same slide. It is recommended that data providers prefix the ID with the center name to prevent overlapping values across different centers. Example: VAN0071-PA-1-1_AF | | True | +| dataset_type | Assigned Value | The specific type of dataset being produced. Example: RNAseq | ```Visium HD```, ```4i```, ```LC-MS```, ```Thick section Multiphoton MxIF```, ```Light Sheet```, ```ATACseq```, ```Resolve```, ```HiFi-Slide```, ```COMET```, ```MPLEx```, ```10X Multiome```, ```MALDI```, ```Raman Imaging```, ```Histology```, ```Cell DIVE```, ```FACS```, ```MS Lipidomics```, ```Visium (no probes)```, ```MUSIC```, ```RNAseq```, ```GeoMx (NGS)```, ```GeoMx (nCounter)```, ```RNAseq (with probes)```, ```Singular Genomics G4X```, ```Molecular Cartography```, ```CosMx Transcriptomics```, ```MERFISH```, ```Pixel-seqV2```, ```2D Imaging Mass Cytometry```, ```Confocal```, ```seqFISH```, ```DART-FISH```, ```MIBI```, ```Olink```, ```Enhanced Stimulated Raman Spectroscopy (SRS)```, ```DESI```, ```Xenium```, ```iCLAP```, ```CyCIF```, ```SNARE-seq2```, ```nanoSPLITS```, ```STARmap```, ```Stereo-seq```, ```Visium (with probes)```, ```SIMS```, ```Auto-fluorescence```, ```CyTOF```, ```CosMx Proteomics```, ```Virtual Histology```, ```DBiT-seq``` | True | +| analyte_class | Assigned Value | The analyte class which is the target molecule that the assay is measuring. Example: DNA | ```Nucleic acid + protein```, ```Lipid + metabolite```, ```Collagen```, ```RNA```, ```Fluorochrome```, ```DNA```, ```Metabolite```, ```DNA + RNA```, ```Saturated lipid```, ```Lipid```, ```Lipid + metabolite + protein```, ```RNA + protein```, ```Peptide```, ```Protein```, ```Unsaturated lipid```, ```Endogenous fluorophore```, ```Chromatin```, ```Polysaccharide``` | True | +| acquisition_instrument_vendor | Assigned Value | The company that manufactures or supplies the acquisition instrument. An acquisition instrument is a device equipped with signal detection hardware and signal processing software. It captures signals produced by assays, such as variations in light intensity or color, or signals corresponding to molecular mass. If the instrument was custom-built or developed internally, enter "In-House". Example: Illumina | ```Complete Genomics```, ```Cytek Biosciences```, ```Thermo Fisher Scientific```, ```Sciex```, ```Vizgen```, ```Leica Microsystems```, ```Akoya Biosciences```, ```Keyence```, ```Andor```, ```Standard BioTools (Fluidigm)```, ```Leica Biosystems```, ```Zeiss Microscopy```, ```Ionpath```, ```Motic```, ```In-House```, ```Revvity```, ```Evident Scientific (Olympus)```, ```GE Healthcare```, ```Element Biosciences```, ```Hamamatsu```, ```Waters```, ```Bruker```, ```Illumina```, ```3DHISTECH```, ```Singular Genomics```, ```Huron Digital Pathology```, ```Resolve Biosciences```, ```NanoString```, ```Cytiva```, ```10x Genomics```, ```Microscopes International```, ```BGI Genomics``` | True | +| acquisition_instrument_model | Assigned Value | The specific model of the acquisition instrument, as manufacturers often offer various versions with differing features or sensitivities. These differences may be relevant to the processing or interpretation of the data. If the instrument was custom-built or developed internally, enter "In-House". If the model is unknown, enter "Unknown". Example: HiSeq 4000 | ```NovaSeq X```, ```NovaSeq X Plus```, ```Cytek Northern Lights```, ```Lightsheet 7```, ```Resolve Biosciences Molecular Cartography```, ```timsTOF HT```, ```timsTOF Pro 2```, ```timsTOF Pro```, ```timsTOF Ultra```, ```timsTOF Ultra 2```, ```timsTOF SCP```, ```Axio Scan.Z1```, ```MALDI timsTOF Flex Prototype```, ```CosMx Spatial Molecular Imager```, ```Unknown```, ```MERSCOPE Ultra```, ```Juno System```, ```timsTOF FleX```, ```Custom: Multiphoton```, ```CyTOF XT```, ```Helios```, ```EVOS M7000```, ```Aperio AT2```, ```Phenocycler-Fusion 2.0```, ```Axio Observer 5```, ```Axio Observer 7```, ```Axio Observer 3```, ```NanoZoomer-SQ```, ```NanoZoomer S210```, ```NanoZoomer S60```, ```NanoZoomer S360```, ```DM6 B```, ```MoticEasyScan One```, ```In-House```, ```NextSeq 500```, ```BZ-X710```, ```QTRAP 5500```, ```DMi8```, ```NextSeq 550```, ```HiSeq 2500```, ```HiSeq 4000```, ```NovaSeq 6000```, ```Opera Phenix Plus HCS```, ```SYNAPT G2-Si```, ```Q Exactive HF```, ```Orbitrap Fusion Tribrid```, ```Orbitrap Fusion Lumos Tribrid```, ```Q Exactive```, ```VS200 Slide Scanner```, ```Not applicable``` | True | +| source_storage_duration_unit | Assigned Value | The unit of measurement used to specify the source storage duration value. Example: hour | ```hour```, ```month```, ```day```, ```minute```, ```year``` | True | +| time_since_acquisition_instrument_calibration_unit | Assigned Value | The unit of measurement used to specify the time since acquisition instrument calibration value. Example: month | ```month```, ```day```, ```year``` | False | +| metadata_schema_id | Textfield | The unique string identifier for the metadata specification version, which is easily interpretable by computers for purposes of data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9 | | True | +| preparation_protocol_doi | Link | The DOI for the protocols.io page that details the assay or the procedures used for sample procurement and preparation. For example, in the case of an imaging assay, the protocol may start with tissue section staining and end with the generation of an OME-TIFF file. The documented protocol should also include any image processing steps involved in producing the final OME-TIFF. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1 | | True | +| is_targeted | Radio | Indicates whether a specific molecule or set of molecules is targeted for detection or measurement by the assay. Example: Yes | ```Yes```, ```No``` | True | +| antibodies_path | Textfield | The path to the antibodies.tsv file relative to the root directory of the upload structure. This path should start with "." and is typically formatted as "./extras/antibodies.tsv". Example: ./extras/antibodies.tsv | | True | +| parent_sample_id | Textfield | The unique identifier from HuBMAP or SenNet for the sample (such as a block, section, or suspension) used to perform the assay. For instance, in an RNAseq assay, the parent sample would be the suspension, while in imaging assays, it would be the tissue section. If the assay is derived from multiple parent samples, this field should contain a comma-separated list of identifiers. Example: HBM386.ZGKG.235, HBM672.MKPK.442 | | True | +| number_of_channels | Numeric | The number of fluorescent channels that are imaged during each cycle. Example: 3 | | True | + +
\ No newline at end of file diff --git a/docs/assays/metadata/index.md b/docs/assays/metadata/index.md index 17336b1..973d251 100644 --- a/docs/assays/metadata/index.md +++ b/docs/assays/metadata/index.md @@ -11,6 +11,7 @@ This is a list of available dataset types (data types from multiple supported as | [Autofluorescence](https://docs.hubmapconsortium.org/assays/af) | [attributes](AutoFluorescence) | Exploits endogenous fluorescence in a biological tissue to capture an image. The image can then be used to integrate other images from multiple modalities and to align tissues within a 3D experiment. | | [ATACseq](https://docs.hubmapconsortium.org/assays/atacseq)| [attributes](ATACseq) | Identifies accessible DNA regions by probing open chromatin with hyperactive mutant Tn5 Transposase that inserts sequencing adapters into open regions of the genome. | | [CODEX](https://docs.hubmapconsortium.org/assays/codex) | [attributes](CODEX) | Strategy for generating highly multiplexed images of fluorescently-labeled antigens. | +| COMET | [attributes](COMET) | COMET is a technique used to measure DNA damage in individual cells. The name comes from the shape that damaged DNA fragments form when they migrate out of a cell's nucleus under an electric field, resembling a comet with a head and a tail. This assay is widely used in genetics research to study DNA damage from factors like radiation, chemicals, and environmental exposure. | | CosMx-Proteomics | [attributes](CosMx-Proteomics) | CosMx Proteomics is a technology that enables the high-resolution, spatial analysis of proteins within their native tissue environment. It is part of the CosMx Spatial Molecular Imager (SMI) platform, which provides single-cell and subcellular resolution to map protein expression, cell states, and cell-cell interactions in FFPE and fresh frozen tissue samples. | | CyCIF | [attributes](CyCIF) | CyCIF, or Cyclic Immunofluorescence, is a technique used in microscopy to image multiple protein markers within a single sample. It allows for highly multiplexed immunofluorescence imaging, meaning it can detect a large number of different proteins simultaneously. | | CyTOF | [attributes](CyTOF) | A type of mass cytometry that employs antibodies labeled with heavy metal isotopes and uses time-of-flight mass spectrometry to analyze single cells. | @@ -22,6 +23,7 @@ This is a list of available dataset types (data types from multiple supported as | Histology | [attributes](Histology) | | | [IMC](https://docs.hubmapconsortium.org/assays/imc) | [attributes](IMC) |Combines standard immunohistochemistry with CyTOF mass cytometry to resolve the cellular localization of up to 40 proteins in a tissue sample. | | [LC-MS](https://docs.hubmapconsortium.org/assays/lcms) | [attributes](LC-MS) | Coupling of liquid chromatography (LC) to mass spectrometry (MS) | +| iCLAP | [attributes](iCLAP) | iCLAP (individual-nucleotide resolution UV-crosslinking and affinity purification) is a specialized, high-stringency technique designed to map the specific RNA binding sites of RNA-binding proteins (RBPs) at the single-nucleotide level. | | Light Sheet | [attributes](LightSheet) | | | [MALDI-IMS](https://docs.hubmapconsortium.org/assays/maldi-ims) | [attributes](MALDI) | Matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS) combines the sensitivity and molecular specificity of MS with the spatial fidelity of classical microscopy. | | MERFISH | [attributes](MERFISH) | | diff --git a/docs/assays/metadata/new-metadata-test.md b/docs/assays/metadata/new-metadata-test.md index 379d6d4..a194bf5 100644 --- a/docs/assays/metadata/new-metadata-test.md +++ b/docs/assays/metadata/new-metadata-test.md @@ -12,12 +12,14 @@ A list of available dataset types (data types from multiple supported assays), w | [Autofluorescence (AF)](https://docs.hubmapconsortium.org/assays/af) [](AutoFluorescence "Attribute description")| Exploits endogenous fluorescence in biological tissue to capture an image. The image can be used to integrate other images from multiple modalities and align tissues within a 3D experiment. Link to [AF directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/af/current/). | | [ATACseq](https://docs.hubmapconsortium.org/assays/atacseq) [](ATACseq "Attribute description")| Assay for Transposase-Accessible Chromatin using sequencing (ATACseq) identifies accessible DNA regions, probing open chromatin with hyperactive mutant Tn5 Transposase that inserts sequencing adapters into open regions of the genome. Link to [ATACsec directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/atacseq/current/). | | [CODEX](https://docs.hubmapconsortium.org/assays/codex) [](CODEX "Attribute description")| Co-detection by indexing (CODEX) is a strategy for generating highly multiplexed images of fluorescently-labeled antigens. Link to [CODEX directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/codex/current/). | +| COMET [](COMET "Attribute description") | COMET is a technique used to measure DNA damage in individual cells. The name comes from the shape that damaged DNA fragments form when they migrate out of a cell's nucleus under an electric field, resembling a comet with a head and a tail. This assay is widely used in genetics research to study DNA damage from factors like radiation, chemicals, and environmental exposure. | | [CosMx Proteomics](https://hubmapconsortium.github.io/ingest-validation-tools/cosmx-proteomics/current/) [](CosMx Proteomics "Attribute description")| CosMx Proteomics is a technology that enables the high-resolution, spatial analysis of proteins within their native tissue environment. It is part of the CosMx Spatial Molecular Imager (SMI) platform, which provides single-cell and subcellular resolution to map protein expression, cell states, and cell-cell interactions in FFPE and fresh frozen tissue samples. | | [DESI](https://pmc.ncbi.nlm.nih.gov/articles/PMC6053038/) [](DESI "Attribute description")| Desorption Electrospray Ionization (DESI), an ambient ionization technique that can be coupled to mass spectrometry (MS) for chemical analysis of samples at atmospheric conditions. Link to [DESI directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/desi/current/). | | [Enhanced SRS](https://www.nature.com/articles/s41467-019-13230-1) [](EnhancedSRS "Attribute description")| Refers to improvements made to Stimulated Raman Scattering (SRS), a technique used in microscopy and spectroscopy for chemical imaging and analysis. These enhancements aim to improve sensitivity, spatial resolution, and other capabilities of SRS. Link to [Enhanced SRS directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/enhanced-srs/current/). | | [GeoMx](https://nanostring.com/products/geomx-digital-spatial-profiler/spatial-multiomics-enabled-with-geomx-dsp/) [](GeoMx "Attribute description")| A platform for spatial biology that analyzes RNA and protein expression within tissue sections which allows for non-destructive, in situ profiling of gene expression and protein levels from specific regions of interest (ROIs) within a tissue. Link to [GeoMx directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/geomx-ngs/current/). | | [HiFi-Slide](https://www.researchgate.net/publication/370676672_HiFi-Slide_spatial_RNA-Sequencing_v2) [](HiFi "Attribute description")| High-Fidelity Spatial Transcriptomic Slide (HiFi-Slide) sequencing, a super-resolution spatial transcriptomics sequencing technology, captures and spatially resolves genome-wide RNA expression in a submicron resolution for fresh-frozen tissue. Link to [HiFi directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/hifi-slide/current/). | | [Histology](https://en.wikipedia.org/wiki/Histology) [](Histology "Attribute description")| The microscopic study of tissue composition and structure, often referred to as microscopic anatomy. It involves examining tissue samples, typically after they've been sectioned, stained, and placed under a microscope. Link to [Histology directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/histology/current/). | +| iCLAP [](iCLAP "Attribute description") | iCLAP (individual-nucleotide resolution UV-crosslinking and affinity purification) is a specialized, high-stringency technique designed to map the specific RNA binding sites of RNA-binding proteins (RBPs) at the single-nucleotide level. | | [IMC](https://docs.hubmapconsortium.org/assays/imc) [](IMC "Attribute description")| Combines standard immunohistochemistry with CyTOF mass cytometry to resolve the cellular localization of up to 40 proteins in a tissue sample. Link to [IMC directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/imc-2d/current/). | | [LC-MS](https://docs.hubmapconsortium.org/assays/lcms) [](LC-MS "Attribute description")| Coupling of liquid chromatography (LC) to mass spectrometry (MS). Link to [LC-MS directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/lcms/current/). | | [Light Sheet](https://en.wikipedia.org/wiki/Light_sheet_fluorescence_microscopy) [](LightSheet "Attribute description")| A fluorescence imaging technique that uses a thin sheet of laser light to illuminate a sample, allowing for high-resolution, 3D imaging with reduced photobleaching and phototoxicity; particularly useful for imaging large, thick, or delicate biological samples, like developing embryos or organoids. Link to [Light Sheet directory schema](https://hubmapconsortium.github.io/ingest-validation-tools/lightsheet/current/). | From 03e7dfd14955cd463bf567866b1038ef18abad3b Mon Sep 17 00:00:00 2001 From: Birdmachine Date: Mon, 2 Feb 2026 13:43:45 -0500 Subject: [PATCH 2/4] Include MetaBuilder Script --- .gitignore | 5 +- scripts/metaBuilder/README.md | 53 +++++ scripts/metaBuilder/assays.txt | 5 + scripts/metaBuilder/fetchMeta.py | 137 +++++++++++++ scripts/metaBuilder/generate_all_md.py | 268 +++++++++++++++++++++++++ scripts/metaBuilder/json_to_md.py | 57 ++++++ scripts/metaBuilder/pageLayout.md | 12 ++ 7 files changed, 536 insertions(+), 1 deletion(-) create mode 100644 scripts/metaBuilder/README.md create mode 100644 scripts/metaBuilder/assays.txt create mode 100644 scripts/metaBuilder/fetchMeta.py create mode 100644 scripts/metaBuilder/generate_all_md.py create mode 100644 scripts/metaBuilder/json_to_md.py create mode 100644 scripts/metaBuilder/pageLayout.md diff --git a/.gitignore b/.gitignore index 16a98a2..86bb580 100644 --- a/.gitignore +++ b/.gitignore @@ -2,4 +2,7 @@ .idea node_modules .vscode/* -scripts \ No newline at end of file +# scripts +scripts/metaBuilder/metaJSON/old +scripts/metaBuilder/toMD/old/ +scripts/metaBuilder/__pycache__/ diff --git a/scripts/metaBuilder/README.md b/scripts/metaBuilder/README.md new file mode 100644 index 0000000..66b17ab --- /dev/null +++ b/scripts/metaBuilder/README.md @@ -0,0 +1,53 @@ +# Assay Metadata Markdown Generator + +This directory contains scripts to fetch, process, and convert HuBMAP assay metadata templates into human-readable Markdown documentation. + +## Workflow Overview +1. **Edit assays.txt** + - List the URLs of the HuBMAP ingest-validation-tools assay metadata schema pages you want to process, one per line. Under each URL, include the intended description of the assay (also on its own line; no line breaks) + - Example: + ``` + https://hubmapconsortium.github.io/ingest-validation-tools/iclap/current/ + iCLAP (individual-nucleotide resolution UV-crosslinking and affinity purification) is a specialized...[etc] + + https://hubmapconsortium.github.io/ingest-validation-tools/comet/current/ + COMET is a technique used to measure DNA damage in individual cells...[etc] + ``` + +2. **Run the Pipeline** + - Use the provided script to generate all Markdown documentation in one step: + ```sh + python3 generate_all_md.py + ``` + - This will: + 1. Run `fetchMeta.py` to fetch the associated metadata information from CEDAR and generate JSON files in `metaJSON/`. + 2. Convert each JSON file in `metaJSON/` to a Markdown file in `toMD/` using `json_to_md.py`. + +3. **Output** + - JSON files: `metaJSON/` + - Markdown files: `toMD/` + + On Generation, prior files are moved to `old/` subdirectories within each folder. (This folder's included in .gitignore) + +## Script Descriptions + +- **fetchMeta.py**: Fetches and processes metadata templates from the CEDAR API using the Template IDs. Outputs a JSON file for each assay. +- **json_to_md.py**: Converts a single JSON file to a Markdown table. Used by the batch script. +- **generate_all_md.py**: Runs the full pipeline: fetches all templates and generates Markdown for each. + +## Requirements +- Python 3 +- `requests` library (`pip install requests`) + +## Notes +- The URLs used in the assays.txt file can be found [here](https://hubmapconsortium.github.io/ingest-validation-tools/current/). +- The scripts will create the `metaJSON/` and `toMD/` folders if they do not exist. +- Re-running the Script will add additional rows to each index table, but overwrite the existing assay metadata page file (vs creating a duplicate) + +## Example Usage + +```sh +python3 generate_all_md.py +``` + +This will produce Markdown documentation for all specified templates. diff --git a/scripts/metaBuilder/assays.txt b/scripts/metaBuilder/assays.txt new file mode 100644 index 0000000..3011e94 --- /dev/null +++ b/scripts/metaBuilder/assays.txt @@ -0,0 +1,5 @@ +https://hubmapconsortium.github.io/ingest-validation-tools/iclap/current/ +iCLAP (individual-nucleotide resolution UV-crosslinking and affinity purification) is a specialized, high-stringency technique designed to map the specific RNA binding sites of RNA-binding proteins (RBPs) at the single-nucleotide level. + +https://hubmapconsortium.github.io/ingest-validation-tools/comet/current/ +COMET is a technique used to measure DNA damage in individual cells. The name comes from the shape that damaged DNA fragments form when they migrate out of a cell's nucleus under an electric field, resembling a comet with a head and a tail. This assay is widely used in genetics research to study DNA damage from factors like radiation, chemicals, and environmental exposure. \ No newline at end of file diff --git a/scripts/metaBuilder/fetchMeta.py b/scripts/metaBuilder/fetchMeta.py new file mode 100644 index 0000000..57fa9eb --- /dev/null +++ b/scripts/metaBuilder/fetchMeta.py @@ -0,0 +1,137 @@ +import os +import json +import re +import requests +import argparse + +BASE_TEMPLATE_URL = "https://open.metadatacenter.org/templates/https%3A%2F%2Frepo.metadatacenter.org%2Ftemplates%2F{}" + +strip = [ + "@id", + "@type", + "schema:name", + "schema:description", +] + +def fetch_data(url): + response = requests.get(url) + response.raise_for_status() + return response.json() + +def process_assay(template_id): + url = BASE_TEMPLATE_URL.format(template_id) + assay = fetch_data(url) + props = assay.get("properties", {}) + assay_details = { + "assayName": assay.get("schema:name", ""), + "properties": [] + } + print(f" Captured {assay_details['assayName']} details...") + for property, prop_val in props.items(): + if property in strip: + continue + literal_values = [] + name = prop_val.get("skos:prefLabel", "") + shortcode = prop_val.get("schema:name", "") + type_ = prop_val.get("_ui", {}).get("inputType", "") + constraints = [] + literals = [] + value = "" + + value_constraints = prop_val.get("_valueConstraints", {}) + # Assigned Values + if value_constraints.get("branches"): + branches = value_constraints.get("branches") + + if isinstance(branches, list): + for branch in branches: + name = branch.get("name") + uri = branch.get("uri") + elif isinstance(branches, dict): + name = branches.get("name") + uri = branches.get("uri") + + constraints = branches + if isinstance(constraints, list) and constraints and isinstance(constraints[0], dict) and 'uri' in constraints[0]: + value_HRAVS = str(constraints[0]['uri']) + value = fetch_by_HRAVS(constraints) + type_ = "Assigned Value" + + # YES / NO + elif value_constraints.get("literals"): + literals = value_constraints["literals"] + for literal in literals: + if isinstance(literal, dict): + for key in ("label", "value", "prefLabel"): + if key in literal: + literal_values.append(str(literal[key])) + break + else: + literal_values.append(str(literal)) + else: + literal_values.append(str(literal)) + type_ = "Radio" + # value = ",".join(literal_values) + value =literal_values + + else: + type_ = prop_val.get("_ui", {}).get("inputType", "") + value = "" + + if name: + assay_details["properties"].append({ + "attribute": shortcode, + "label": name, + "type": type_.title() if isinstance(type_, str) else type_, + "description": prop_val.get("schema:description", ""), + "value": value, + "required": value_constraints.get("requiredValue") + }) + + title = f"{assay.get('schema:name', '')}.json" + title = re.sub(r"[\s;]+", "-", title) + os.makedirs("metaJSON", exist_ok=True) + with open(os.path.join("metaJSON", title), "w") as f: + json.dump(assay_details["properties"], f) + +def fetch_by_HRAVS(details): + detail = details[0]; + if detail.get("uri"): + value_HRAVS = str(detail.get("uri")) + + match = re.search(r'#HRAVS_(\d+)', value_HRAVS) + if match: + hravs_id = match.group(1) + valueSet = fetch_details(hravs_id) + return valueSet + +def fetch_details(url_id): + BASE_URL ="https://data.bioontology.org/ontologies/HRAVS/classes/https%3A%2F%2Fpurl.humanatlas.io%2Fvocab%2Fhravs%23HRAVS_{}" + url = BASE_URL.format(url_id)+"/children?apikey=ad1d9ae5-3781-48d2-a61b-ab243bea22ee&display=prefLabel&no_context=true" + headers = { + "User-Agent": ( + "Mozilla/5.0 (X11; Linux x86_64) " + "AppleWebKit/537.36 (KHTML, like Gecko) " + "Chrome/115.0.0.0 Safari/537.36" + ) + } + response = requests.get(url, headers=headers) + response.raise_for_status() + try: + data = json.loads(response.text) + collection = data.get("collection", []) + values = [item.get("prefLabel") for item in collection if "prefLabel" in item] + except Exception as e: + print(f"Error parsing JSON for {url_id}: {e}") + + return values + +def main(): + parser = argparse.ArgumentParser(description="Fetch and process multiple HuBMAP assay metadata templates.") + parser.add_argument('--templateID', action='append', required=True, help='Template ID to process (can be used multiple times)') + args = parser.parse_args() + for tid in args.templateID: + process_assay(tid) + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/scripts/metaBuilder/generate_all_md.py b/scripts/metaBuilder/generate_all_md.py new file mode 100644 index 0000000..f3961e5 --- /dev/null +++ b/scripts/metaBuilder/generate_all_md.py @@ -0,0 +1,268 @@ +import os +import subprocess +import glob +import threading +import itertools +import sys +import time +import requests +from bs4 import BeautifulSoup +import re + +def spinner(msg, stop_event): + for c in itertools.cycle('⠷⠯⠟⠻⠽⠾'): + if stop_event.is_set(): + break + sys.stdout.write(f'\r{msg} {c}') + sys.stdout.flush() + time.sleep(0.1) + sys.stdout.write('\r' + ' ' * (len(msg) + 2) + '\r') + +def get_assay_info(assay_url): + response = requests.get(assay_url) + response.raise_for_status() + soup = BeautifulSoup(response.text, 'html.parser') + + # 1. Assay name (try h1 without 'category' class, then fallback) + h1s = soup.find_all('h1') + assay_name = None + for h1 in h1s: + if 'category' not in (h1.get('class') or []): + assay_name = h1.text.strip() + break + if not assay_name: + assay_name = soup.title.text.strip() if soup.title else "Unknown" + + # 2. Versions (found between tags within tags) + versions = [] + current_version = None + for summary in soup.find_all('summary'): + b_tag = summary.find('b') + if b_tag: + version = b_tag.text.strip() + versions.append(version) + if "use this one" in summary.text.lower(): + current_version = version + + # 3. Metadata schema links + meta_links = [] + template_ids = [] + for link in soup.find_all('a', href=True): + href = link['href'] + if href.startswith('https://openview.metadatacenter.org/templates/https:%2F%2Frepo.metadatacenter.org%2Ftemplates%2F'): + meta_links.append(href) + # Extract template ID + tid = href.split('templates%2F')[-1] + tid = tid.split('?')[0] # Remove any query params + template_ids.append(tid) + + return { + "assay_name": assay_name, + "versions": versions, + "current_version": current_version, + "metadata_schema_links": meta_links, + "template_ids": template_ids + } + +# --------- MAIN FINALIZATION --------- + +# Paths +script_dir = os.path.dirname(__file__) +toMD_dir = os.path.join(script_dir, "toMD") +layout_template_path = os.path.join(script_dir, "pageLayout.md") +output_dir = os.path.abspath(os.path.join(script_dir, "../../docs/assays/metadata/")) +index_md_path = os.path.join(output_dir, "index.md") +new_index_md_path = os.path.join(output_dir, "new-metadata-test.md") +assays_txt_path = os.path.join(script_dir, "assays.txt") +meta_json_dir = os.path.join(script_dir, 'metaJSON') + + +# Step 1: Process assays.txt as link-description pairs +assay_infos = [] +all_template_ids = [] +assay_descriptions = {} +with open(assays_txt_path, "r") as f: + lines = [l.strip() for l in f if l.strip()] + for i in range(0, len(lines), 2): + url = lines[i] + description = lines[i+1] if i+1 < len(lines) else "" + info = get_assay_info(url) + assay_infos.append(info) + print(f"Processed: {info['assay_name']}") + all_template_ids.extend(info['template_ids']) + assay_descriptions[info['assay_name']] = description + +# Step 2: Call fetchMeta.py with all collected template IDs +if all_template_ids: + template_id_args = [] + for tid in all_template_ids: + template_id_args.extend(["--templateID", tid]) + stop_event = threading.Event() + spin_thread = threading.Thread(target=spinner, args=('Fetching metadata', stop_event)) + spin_thread.start() + subprocess.run(['python3', 'fetchMeta.py'] + template_id_args, check=True) + stop_event.set() + spin_thread.join() + print('Fetching... done.') +else: + print("No template IDs found to pass to fetchMeta.py") + +# Step 3: Find all JSON files in metaJSON and run json_to_md.py on each +json_files = glob.glob(os.path.join(meta_json_dir, '*.json')) +for json_file in json_files: + stop_event = threading.Event() + spin_thread = threading.Thread(target=spinner, args=(f'Converting {os.path.basename(json_file)}', stop_event)) + spin_thread.start() + subprocess.run(['python3', 'json_to_md.py', json_file], check=True) + stop_event.set() + spin_thread.join() + +print('All markdown files generated into ./toMD') + +# --------- FINAL STEP: Build new markdown file from pageLayout.md --------- + +def build_final_md(assay_name, current_version, table_md_path, layout_template_path, output_dir): + # Read the layout template + with open(layout_template_path, "r") as f: + layout = f.read() + # Read the generated table + with open(table_md_path, "r") as f: + table = f.read() + # Fill in the template + content = layout.replace("{AssayName}", assay_name) + content = content.replace("{Version NUMBER (current)}", current_version) + content = content.replace("{Table Here}", table) + # Write to the output directory + out_path = os.path.join(output_dir, f"{assay_name}.md") + with open(out_path, "w") as f: + f.write(content) + print(f"Saved: {out_path}") + return out_path + +def update_index_md(index_md_path, assay_name, description=None): + # Add a row to the table in index.md for the new assay, keeping alphabetical order + with open(index_md_path, "r") as f: + lines = f.readlines() + # Find all table rows (lines starting with "|", but skip the header and separator) + table_rows = [(i, line) for i, line in enumerate(lines) if line.strip().startswith("|")] + header_idx = table_rows[0][0] + separator_idx = table_rows[1][0] + data_rows = table_rows[2:] # actual data rows + + # Prepare the new row + link = f"[attributes]({assay_name})" + desc = description or "" + new_row = f"| {assay_name} | {link} | {desc} |\n" + + # Extract assay names from data rows for sorting + def get_row_assay_name(row): + # Assay name is between first and second '|' + parts = row.split('|') + return parts[1].strip().lower() + + # Insert the new row into the correct alphabetical position + inserted = False + for idx, (line_idx, row) in enumerate(data_rows): + existing_name = get_row_assay_name(row) + if assay_name.lower() < existing_name: + insert_at = line_idx + lines.insert(insert_at, new_row) + inserted = True + break + if not inserted: + # If not inserted, append at the end of the table + last_table_row = table_rows[-1][0] + lines.insert(last_table_row + 1, new_row) + + with open(index_md_path, "w") as f: + f.writelines(lines) + print(f"Updated: {index_md_path}") + +# --- NEW: Update new-metadata-test.md in alphabetical order --- +def update_new_index_md(new_index_md_path, assay_name, description=None, schema_url=None): + """ + Add a row to the table in new-metadata-test.md for the new assay, keeping alphabetical order. + The row format is: + | Dataset Type | Description | + | [AssayName](schema_url) [](AssayName "Attribute description") | description | + """ + with open(new_index_md_path, "r") as f: + lines = f.readlines() + # Find all table rows (lines starting with '|', but skip the header and separator) + table_rows = [(i, line) for i, line in enumerate(lines) if line.strip().startswith("|")] + if len(table_rows) < 2: + print(f"Table not found in {new_index_md_path}") + return + header_idx = table_rows[0][0] + separator_idx = table_rows[1][0] + data_rows = table_rows[2:] # actual data rows + + # Prepare the new row + # Use schema_url if provided, else just the assay name + if schema_url: + assay_link = f"[{assay_name}]({schema_url}) []({assay_name} \"Attribute description\")" + else: + assay_link = f"{assay_name} []({assay_name} \"Attribute description\")" + desc = description or "" + new_row = f"| {assay_link} | {desc} |\n" + + # Extract assay names from data rows for sorting + def get_row_assay_name(row): + # Assay name is between first and second '|', but may contain markdown links + parts = row.split('|') + # Remove markdown link if present + name = parts[1].strip() + if name.startswith('['): + name = name.split(']')[0][1:] + return name.lower() + + # Insert the new row into the correct alphabetical position + inserted = False + for idx, (line_idx, row) in enumerate(data_rows): + existing_name = get_row_assay_name(row) + if assay_name.lower() < existing_name: + insert_at = line_idx + lines.insert(insert_at, new_row) + inserted = True + break + if not inserted: + # If not inserted, append at the end of the table + last_table_row = table_rows[-1][0] + lines.insert(last_table_row + 1, new_row) + + with open(new_index_md_path, "w") as f: + f.writelines(lines) + print(f"Updated: {new_index_md_path}") + + +assay_version_map = {info['assay_name']: info['current_version'] for info in assay_infos} + +# Find all generated markdown tables in toMD +md_tables = glob.glob(os.path.join(toMD_dir, "*.md")) +for table_md_path in md_tables: + assay_name = os.path.splitext(os.path.basename(table_md_path))[0] + # Use the actual current_version if available + current_version = assay_version_map.get(assay_name, "Version 2 (current)") + out_path = build_final_md(assay_name, current_version, table_md_path, layout_template_path, output_dir) + # Use the description from assays.txt for both index.md and new-metadata-test.md + description = assay_descriptions.get(assay_name, "") + update_index_md(index_md_path, assay_name, description=description) + update_new_index_md(new_index_md_path, assay_name, description=description) + +# After processing, move parsed JSON and Markdown files to /old + +# Ensure the /old directories exist +meta_json_old_dir = os.path.join(meta_json_dir, "old") +toMD_old_dir = os.path.join(toMD_dir, "old") +os.makedirs(meta_json_old_dir, exist_ok=True) +os.makedirs(toMD_old_dir, exist_ok=True) + +# Move JSON files +for json_file in json_files: + dest = os.path.join(meta_json_old_dir, os.path.basename(json_file)) + os.rename(json_file, dest) + +# Move Markdown files +for md_file in md_tables: + dest = os.path.join(toMD_old_dir, os.path.basename(md_file)) + os.rename(md_file, dest) \ No newline at end of file diff --git a/scripts/metaBuilder/json_to_md.py b/scripts/metaBuilder/json_to_md.py new file mode 100644 index 0000000..5e4f6a7 --- /dev/null +++ b/scripts/metaBuilder/json_to_md.py @@ -0,0 +1,57 @@ +import os +import json +import sys + +def json_to_markdown(json_path, md_path=None): + with open(json_path, 'r') as f: + data = json.load(f) + # Use the filename (without extension) as the header + base = os.path.basename(json_path) + header = os.path.splitext(base)[0] + # If the JSON has an 'assayName' field, prefer that for the header + if isinstance(data, dict) and 'assayName' in data: + header = data['assayName'] + properties = data.get('properties', []) + elif isinstance(data, dict) and 'properties' in data: + properties = data['properties'] + elif isinstance(data, list): + properties = data + else: + raise ValueError('Unrecognized JSON structure') + + md_lines = [] + md_lines.append("| Attribute Name | Type | Description | Allowable Values | Required |") + md_lines.append("|---------------|------|-------------|------------------|----------|") + for prop in properties: + attr = prop.get('attribute', '') + typ = prop.get('type', '') + desc = prop.get('description', '').replace('\n', ' ') + # Allowable Values: join list or show as string + val = prop.get('value', '') + if isinstance(val, list): + val = ', '.join(f'```{str(v)}```' for v in val) + elif val is None: + val = '' + else: + val = f'```{val}```' if val else '' + req = str(prop.get('required', '')) + md_lines.append(f"| {attr} | {typ} | {desc} | {val} | {req} |") + md_content = '\n'.join(md_lines) + if not md_path: + base = os.path.basename(json_path) + md_name = os.path.splitext(base)[0] + ".md" + md_dir = os.path.join(os.path.dirname(json_path), "../toMD") + md_dir = os.path.abspath(md_dir) + os.makedirs(md_dir, exist_ok=True) + md_path = os.path.join(md_dir, md_name) + with open(md_path, 'w') as f: + f.write(md_content) + # print(f"{header} markdown exported to ./toMD") + +if __name__ == "__main__": + if len(sys.argv) < 2: + print("Usage: python json_to_md.py [output_md_file]") + sys.exit(1) + json_file = sys.argv[1] + md_file = sys.argv[2] if len(sys.argv) > 2 else None + json_to_markdown(json_file, md_file) diff --git a/scripts/metaBuilder/pageLayout.md b/scripts/metaBuilder/pageLayout.md new file mode 100644 index 0000000..33cd771 --- /dev/null +++ b/scripts/metaBuilder/pageLayout.md @@ -0,0 +1,12 @@ +--- +layout: page +--- +# {AssayName} + +
{Version NUMBER (current)} + +## {Version NUMBER (current)} + +{Table Here} + +
\ No newline at end of file From a6c1ab079cb80974d8dee2991664bff12c5621f8 Mon Sep 17 00:00:00 2001 From: Birdmachine Date: Mon, 2 Feb 2026 13:45:15 -0500 Subject: [PATCH 3/4] Remove duped 41 page (Was renamed) --- docs/assays/metadata/4i.md | 37 ------------------------------------- 1 file changed, 37 deletions(-) delete mode 100644 docs/assays/metadata/4i.md diff --git a/docs/assays/metadata/4i.md b/docs/assays/metadata/4i.md deleted file mode 100644 index fe7bf3c..0000000 --- a/docs/assays/metadata/4i.md +++ /dev/null @@ -1,37 +0,0 @@ ---- -layout: page ---- -# 4i (Iterative Indirect Immunofluorescence Imaging) - -
Version 2 (current) - -## Version 2 (current) - -| Attribute Name | Type | Description | Allowable Values | Required | -|---------------|------|-------------|------------------|----------| -| lab_id | Textfield | A locally assigned identifier provided by the data provider for the dataset. It is used to reference an external metadata record that may be maintained independently, enabling traceability and supporting provenance tracking. Example: Visium_9OLC_A4_S1 | | False | -| source_storage_duration_value | Numeric | The length of time the sample was stored prior to processing it. For assays performed on tissue sections, this refers to how long the tissue section (e.g., slide) was stored before the assay began (e.g., imaging). For assays performed on suspensions, such as sequencing, it refers to how long the suspension was stored before library construction started. Example: 12 | | True | -| time_since_acquisition_instrument_calibration_value | Numeric | The length of time since the acquisition instrument was last serviced or calibrated. This provides a metric for assessing drift in data capture. Example: 10 | | False | -| contributors_path | Textfield | The name of the file containing the ORCID IDs for all contributors to this dataset. Example: ./contributors.csv | | True | -| data_path | Textfield | The top-level directory containing the raw and/or processed data. For a single dataset upload, this might be represented as ".", whereas for a data upload containing multiple datasets, this would be the directory name for the respective dataset. For example, if the data is within a directory named "TEST001-RK", use the syntax "./TEST001-RK" for this field. If there are multiple directory levels, use the format "./TEST001-RK/Run1/Pass2", where "Pass2" is the subdirectory where the single dataset's data is stored. This is an internal metadata field used solely for data ingestion. Example: ./TEST001-RK | | True | -| number_of_antibodies | Numeric | The number of antibodies used in the assay. If no antibodies were utilized, enter 0. Example: 5 | | True | -| number_of_biomarker_imaging_rounds | Numeric | The number of imaging rounds required to capture the tagged biomarkers. For CODEX, a biomarker imaging round includes steps such as (1) oligo application, (2) fluor application, and (3) washes. For Cell DIVE, it involves (1) the staining of a biomarker via secondary detection or direct conjugate, followed by (2) dye inactivation. Example: 3 | | True | -| number_of_total_imaging_rounds | Numeric | The total number of imaging rounds performed using a microscope to collect either autofluorescence/background or stained signals, such as those used in histological analysis. Example: 5 | | True | -| slide_id | Textfield | The unique identifier assigned to each slide, enabling users to determine which tissue sections were processed together on the same slide. It is recommended that data providers prefix the ID with the center name to prevent overlapping values across different centers. Example: VAN0071-PA-1-1_AF | | True | -| dataset_type | Assigned Value | The specific type of dataset being produced. Example: RNAseq | ```Visium HD```, ```4i```, ```LC-MS```, ```Thick section Multiphoton MxIF```, ```Light Sheet```, ```ATACseq```, ```Resolve```, ```HiFi-Slide```, ```COMET```, ```MPLEx```, ```10X Multiome```, ```MALDI```, ```Histology```, ```Cell DIVE```, ```FACS```, ```MS Lipidomics```, ```Visium (no probes)```, ```MUSIC```, ```RNAseq```, ```GeoMx (NGS)```, ```GeoMx (nCounter)```, ```RNAseq (with probes)```, ```Singular Genomics G4X```, ```Molecular Cartography```, ```CosMx Transcriptomics```, ```MERFISH```, ```Pixel-seqV2```, ```2D Imaging Mass Cytometry```, ```Confocal```, ```seqFISH```, ```DART-FISH```, ```MIBI```, ```Olink```, ```Enhanced Stimulated Raman Spectroscopy (SRS)```, ```DESI```, ```Xenium```, ```CyCIF```, ```SNARE-seq2```, ```nanoSPLITS```, ```Stereo-seq```, ```Visium (with probes)```, ```SIMS```, ```Auto-fluorescence```, ```CyTOF```, ```CosMx Proteomics```, ```DBiT-seq```, ```PhenoCycler```, ```CODEX```, ```Second Harmonic Generation (SHG)```, ```Seq-Scope``` | True | -| analyte_class | Assigned Value | The analyte class which is the target molecule that the assay is measuring. Example: DNA | ```Nucleic acid + protein```, ```Lipid + metabolite```, ```Collagen```, ```RNA```, ```Fluorochrome```, ```DNA```, ```Metabolite```, ```DNA + RNA```, ```Saturated lipid```, ```Lipid```, ```Peptide```, ```Protein```, ```Unsaturated lipid```, ```Endogenous fluorophore```, ```Chromatin```, ```Polysaccharide``` | True | -| acquisition_instrument_vendor | Assigned Value | The company that manufactures or supplies the acquisition instrument. An acquisition instrument is a device equipped with signal detection hardware and signal processing software. It captures signals produced by assays, such as variations in light intensity or color, or signals corresponding to molecular mass. If the instrument was custom-built or developed internally, enter "In-House". Example: Illumina | ```Complete Genomics```, ```Cytek Biosciences```, ```Thermo Fisher Scientific```, ```Sciex```, ```Vizgen```, ```Leica Microsystems```, ```Akoya Biosciences```, ```Keyence```, ```Andor```, ```Standard BioTools (Fluidigm)```, ```Leica Biosystems```, ```Zeiss Microscopy```, ```Ionpath```, ```Motic```, ```In-House```, ```Evident Scientific (Olympus)```, ```GE Healthcare```, ```Element Biosciences```, ```Hamamatsu```, ```Bruker```, ```Illumina```, ```3DHISTECH```, ```Singular Genomics```, ```Huron Digital Pathology```, ```Resolve Biosciences```, ```NanoString```, ```Cytiva```, ```10x Genomics```, ```Microscopes International```, ```BGI Genomics``` | True | -| acquisition_instrument_model | Assigned Value | The specific model of the acquisition instrument, as manufacturers often offer various versions with differing features or sensitivities. These differences may be relevant to the processing or interpretation of the data. If the instrument was custom-built or developed internally, enter "In-House". If the model is unknown, enter "Unknown". Example: HiSeq 4000 | ```NovaSeq X```, ```NovaSeq X Plus```, ```Cytek Northern Lights```, ```Lightsheet 7```, ```Resolve Biosciences Molecular Cartography```, ```timsTOF HT```, ```timsTOF Pro 2```, ```timsTOF Pro```, ```timsTOF Ultra```, ```timsTOF Ultra 2```, ```timsTOF SCP```, ```Axio Scan.Z1```, ```MALDI timsTOF Flex Prototype```, ```CosMx Spatial Molecular Imager```, ```Unknown```, ```MERSCOPE Ultra```, ```Juno System```, ```timsTOF FleX```, ```Custom: Multiphoton```, ```CyTOF XT```, ```Helios```, ```EVOS M7000```, ```Aperio AT2```, ```Phenocycler-Fusion 2.0```, ```Axio Observer 5```, ```Axio Observer 7```, ```Axio Observer 3```, ```NanoZoomer-SQ```, ```NanoZoomer S210```, ```NanoZoomer S60```, ```NanoZoomer S360```, ```DM6 B```, ```MoticEasyScan One```, ```In-House```, ```NextSeq 500```, ```BZ-X710```, ```QTRAP 5500```, ```NextSeq 550```, ```HiSeq 2500```, ```HiSeq 4000```, ```NovaSeq 6000```, ```Q Exactive HF```, ```Orbitrap Fusion Lumos Tribrid```, ```Q Exactive```, ```VS200 Slide Scanner```, ```Not applicable```, ```Orbitrap Eclipse Tribrid```, ```MIBIscope```, ```IN Cell Analyzer 2200```, ```timsTOF FleX MALDI-2``` | True | -| source_storage_duration_unit | Assigned Value | The unit of measurement used to specify the source storage duration value. Example: hour | ```hour```, ```month```, ```day```, ```minute```, ```year``` | True | -| time_since_acquisition_instrument_calibration_unit | Assigned Value | The unit of measurement used to specify the time since acquisition instrument calibration value. Example: month | ```month```, ```day```, ```year``` | False | -| metadata_schema_id | Textfield | The unique string identifier for the metadata specification version, which is easily interpretable by computers for purposes of data validation and processing. Example: 22bc762a-5020-419d-b170-24253ed9e8d9 | | True | -| preparation_protocol_doi | Link | The DOI for the protocols.io page that details the assay or the procedures used for sample procurement and preparation. For example, in the case of an imaging assay, the protocol may start with tissue section staining and end with the generation of an OME-TIFF file. The documented protocol should also include any image processing steps involved in producing the final OME-TIFF. Example: https://dx.doi.org/10.17504/protocols.io.eq2lyno9qvx9/v1 | | True | -| is_targeted | Radio | Indicates whether a specific molecule or set of molecules is targeted for detection or measurement by the assay. Example: Yes | ```Yes```, ```No``` | True | -| antibodies_path | Textfield | The path to the antibodies.tsv file relative to the root directory of the upload structure. This path should start with "." and is typically formatted as "./extras/antibodies.tsv". Example: ./extras/antibodies.tsv | | True | -| parent_sample_id | Textfield | The unique identifier from HuBMAP or SenNet for the sample (such as a block, section, or suspension) used to perform the assay. For instance, in an RNAseq assay, the parent sample would be the suspension, while in imaging assays, it would be the tissue section. If the assay is derived from multiple parent samples, this field should contain a comma-separated list of identifiers. Example: HBM386.ZGKG.235, HBM672.MKPK.442 | | True | -| non_global_files | Textfield | Specifies a semicolon-separated list of non-global files that are to be included in the dataset. The file paths assume that the files are located in the "TOP/non-global/" directory. For instance, if the file is located at TOP/non-global/lab_processed/images/1-tissue-boundary.geojson, the value for this field would be "./lab_processed/images/1-tissue-boundary.geojson". Once ingested, these files will be copied to their appropriate locations within the respective dataset directory tree. This field is intended for internal HuBMAP processing. Examples for GeoMx and PhenoCycler are provided in the File Locations documentation: https://docs.google.com/document/d/1n2McSs9geA9Eli4QWQaB3c9R3wo5d5U1Xd57DWQfN5Q/edit#heading=h.1u82i4axggee Example: ./lab_processed/images/1-tissue-boundary.geojson | | False | -| cell_boundary_marker_or_stain | Textfield | The name of the marker or stain used to identify all cell boundaries in the tissue. This name must exactly match the antibody-targeted molecule marker or non-antibody targeted molecule stain as found in the imaging data. For example, in the case of using the PhenoCycler, ensure the name corresponds to the value in the XPD output file. If multiple markers or stains are employed, list them in a comma-separated format. Example: Pan-Cytokeratin, E-Cadherin | | False | -| nuclear_marker_or_stain | Textfield | The nuclear marker or stain used, which can be an antibody-targeted molecule present in or around the cell nucleus. For protein targets, use the protein or gene symbol that identifies the antibody target, ensuring it matches the antibody target from the panel used or custom panels. Preferably, if using a custom antibody marker, this symbol should be the HGNC symbol (https://www.genenames.org/). For non-protein targets, provide the stain name (e.g., DAPI) and, when applicable, include the associated staining kit and vendor. For the PhenoCycler, ensure the symbol matches the value found in the XPD output file. Example: DAPI | | False | -| number_of_channels | Numeric | The number of fluorescent channels that are imaged during each cycle. Example: 3 | | True | - -
\ No newline at end of file From 8a9972dd78bdd55a9dbca18011b34b17afc0ca1c Mon Sep 17 00:00:00 2001 From: Birdmachine Date: Mon, 2 Feb 2026 13:48:54 -0500 Subject: [PATCH 4/4] Script description correction --- scripts/metaBuilder/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/metaBuilder/README.md b/scripts/metaBuilder/README.md index 66b17ab..2b07f37 100644 --- a/scripts/metaBuilder/README.md +++ b/scripts/metaBuilder/README.md @@ -27,7 +27,7 @@ This directory contains scripts to fetch, process, and convert HuBMAP assay meta - JSON files: `metaJSON/` - Markdown files: `toMD/` - On Generation, prior files are moved to `old/` subdirectories within each folder. (This folder's included in .gitignore) + After Generation, processed files are moved to `old/` subdirectories within each folder. (This folder's included in .gitignore) ## Script Descriptions