This is a hands-on session where you will learn how to map data to Senzing using AI. Each participant should come prepared so we can move quickly and focus on solving your real-world mapping challenges.
What to bring:
- Laptop: each participant needs their own laptop (Mac/Windows).
- AI account: a paid AI subscription (Claude, ChatGPT, GitHub Copilot, Cursor, Google Gemini, Amazon CodeWhisperer, Codeium, or another paid AI assistant). Let us know if you already use another provider and want to use it.
- Local development environment with AI: you'll need a way to work with AI locally on your machine. Options include an IDE with AI extension (VS Code + Claude Code/Copilot, JetBrains + AI plugin), an AI-native IDE (Cursor, Windsurf), or a command-line AI tool (Claude Code CLI). This local setup lets you access files directly, run code, execute the linter, and iterate on your mappings throughout the workshop.
- Create a working folder for workshop files (e.g.,
~/bootcamp) and pull this repository into it. - Your data file: bring a real dataset you want to map (CSV, JSON, etc.). Aim for a representative sample that’s safe to use in class. If you can’t share production data, bring a small, sanitized sample and put it on the
~/bootcampdirectory. - Python 3: needed to run the mapping/validation code the AI will generate.
- Verify:
python3 --version(orpython --versionon Windows).
- Verify:
- Senzing environment (for final validation): we will load your mapped JSON into Senzing.
- Install Docker Desktop (Mac/Windows/Linux) and complete the first-run setup.
- If you cannot install Docker, let us know in advance; we will provide alternatives during the session.
- Verify Docker is running:
docker --versionanddocker run hello-world - Ensure at least 4 GB RAM is allocated to Docker (Settings → Resources).
- Pull the workshop container image ahead of time (will be available one week before class):
docker pull senzing/summit-bootcamp-2025
- If you can also, do these two pulls to get a local AI model:
- docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
- docker exec -it ollama ollama pull mistral:7b-instruct-q4_K_M
- Install Docker Desktop (Mac/Windows/Linux) and complete the first-run setup.
Notes
- We want you to solve a real problem. Bring a dataset and context so we can map to Senzing in a way that’s meaningful to your use case.
- Keep sensitive data safe. Prefer samples or de-identified subsets when possible.
The mapping documentation is maintained in the Senzing/mapper-ai repository:
- senzing_mapping_assistant_prompt.md: master mapping instructions/prompt with rules, templates, and examples.
- senzing_mapping_examples.md: curated reference examples that show correct Senzing JSON patterns.
- senzing_entity_specification.md: authoritative, AI-ready Senzing Entity Spec (mapper-ai repo is the source of truth).
- lint_senzing_json.py: JSON schema linter for validating generated Senzing JSON/JSONL.
- identifier_crosswalk.json: canonical identifier types, aliases, and mapping guidance.
- identifier_lookup_log.md: template to record curated identifier lookups (no PII).
- Path:
employee_data/ - Contents:
data/us-small-employee-raw.csv: sample input dataschema/us-small-employee-schema.csv: inferred schema (from file_analyzer)byhand/*: code and Senzing JSONL generated by hand (current expected result)
- Path:
voter_data/ - Contents:
data/: sample voter datasetschema/: inferred schema produced for the voter dataset
- Path:
company_data/ - Contents:
data/: sample company datasetschema/: inferred schema produced for the company dataset
- File Analyzer (profile files to derive schema and stats):
- Path:
tools/file_analyzer.py - Purpose: analyze CSV/JSON/Parquet when a schema doesn’t exist; shows attribute name, inferred type, population %, uniqueness %, and top values.
- Run:
python3 tools/file_analyzer.py path/to/data.csv -o path/to/schema.csv
- Path:
- Senzing JSON Linter (schema correctness check):
- Path:
docs/lint_senzing_json.py(local) or fetch from mapper-ai - Purpose: validates structure of Senzing JSON/JSONL.
- Run (file):
python3 docs/lint_senzing_json.py path/to/output.jsonl - Run (directory):
python3 docs/lint_senzing_json.py path/to/dir
- Path:
- Senzing JSON Analyzer (validate mapped JSONL before loading):
- Path:
tools/sz_json_analyzer.py - Purpose: validates/inspects Senzing JSON/JSONL; highlights mapped vs unmapped attributes, uniqueness/population, warnings, and errors.
- Run:
python3 tools/sz_json_analyzer.py path/to/output.jsonl -o path/to/report.csv - Docs: https://github.com/senzing-garage/sz-json-analyzer
- Path:
Data Handling Guidance
- Best practice: Use schema files, not raw data. Generate a schema with the File Analyzer and map from that. This uses fewer tokens, minimizes data exposure, and keeps your AI focused on the mapping logic.
- If working locally with your IDE: You can have the AI map directly from raw data files, but it's still recommended to use the File Analyzer first when possible.
- If the File Analyzer can't handle your file format: Either ask your AI to analyze the file and generate a schema, or write your own code to produce a schema document.
- Never upload full production datasets to web-based AI. Use schema extracts, field lists, small sanitized samples, or analyzer summaries instead.
Tips for collaborating with an AI:
- Ask it questions if you don't understand something. One of my favorites is:
what does the senzing spec say about that - If it gives you options, ask it for the pros and cons.
- Correct it when it gets something wrong. It will learn from you.
- Keep it on track: AI's hallucinate. See: ChatGPT Common Issues And Solutions
Above all: Don't use it to replace your judgement or expertise. It's just your assistant. You are the decision maker.
Step 1: Create a project folder (if you haven't already)
- Make a working directory for your data (e.g.,
~/bootcamp/my-source). - Put your dataset into it (e.g., a
data/subfolder). - No dataset? Copy from the aiclass voter_data or company_data folder to your new working directory.
Step 2: Generate a schema (recommended approach)
- Preferred: Use the File Analyzer to generate a schema from your data:
- Run:
python3 tools/file_analyzer.py path/to/data.csv -o path/to/schema.csv - Place the output schema (e.g.,
schema.csv) in your project (e.g., aschema/subfolder). - Benefits: fewer tokens, less data exposure, better AI focus on mapping logic
- Run:
- If you already have an official schema or data dictionary: use that instead, skip this step.
- If the File Analyzer can't handle your file format:
- Option A: Ask your AI to analyze the file and generate a schema document
- Option B: Write your own code to produce a schema
- Option C (local IDE only): Have the AI map directly from the raw data file
Step 3: Start your mapping session in your IDE
Recommended: Use your local IDE with AI assistant (VS Code with Claude/Copilot, Cursor, Windsurf, JetBrains with AI plugin, etc.)
This approach gives you direct file access, ability to execute the linter, generate and test code, handle complex multi-file schemas, and iterate on mapper implementations.
- Open your project folder in your local development environment
- Fetch the RAG files into your workspace (clone the mapper-ai repo or download them):
https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/senzing_mapping_assistant_prompt.md https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/senzing_mapping_examples.md https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/senzing_entity_specification.md https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/lint_senzing_json.py https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/identifier_crosswalk.json https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/usage_type_crosswalk.json - Configure your AI assistant to use these files as context/knowledge resources
- Use
senzing_mapping_assistant_prompt.mdas your system prompt or opening instruction - Begin interactive work with your schema and data files
Alternative: Web-based AI chat (if you cannot use a local IDE):
- Open Senzing Mapping Assistant GPT - mapping docs are preloaded
- Or create a new project in your AI's web interface and upload the RAG files listed above
- Note: web-based approaches lack local linter execution and may struggle with complex multi-file schemas
Step 4: Map your schema through to code
- Provide your schema to the AI assistant and start the mapping process.
- Collaborate with the assistant to analyze your schema, agree on mappings, produce example JSON/JSONL, and generate a transformer script to emit Senzing JSONL.
- By the end of this step you should have code. Download it, run it to map your data, and then verify the output with the JSON analyzer in
tools(tools/sz_json_analyzer.py).
Step 5: Generate Senzing JSON output
- Run the transformer you built with the assistant to produce JSONL files.
- Example:
python3 transform_your_source.py --input path/to/source.csv --output path/to/output.jsonl - Lint for schema correctness:
- Local file:
python3 docs/lint_senzing_json.py path/to/output.jsonl - Raw URL (for remote use): https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/lint_senzing_json.py
- Local file:
Step 6: Load into Senzing Note: this part will depend on if you are on windows, linux or mac, whether you have docker installed and/or python3. If you have trouble with any of this raise your hand and we will help you.
-
Analyze with Senzing JSON Analyzer:
- Local file:
python3 docs/sz_json_analyzer.py path/to/output.jsonl - Raw URL (for remote use): https://raw.githubusercontent.com/jbutcher21/aiclass/main/tools/sz_json_analyzer.py
- see the docs at https://github.com/senzing-garage/sz-json-analyzer
- Local file:
-
Load your file in the Senzing instance: (only if you have docker)
- go into the docker instance
- add your data sources in sz_configtool
- load your json file with sz_file_loader
- take a snapshot with sz_snapshot
- explore your results with sz_explorer
Here is what you should type:
docker run --rm -it --user 0 -v .:/bootcamp senzing/summit-bootcamp-2025
root@89730121f88b:/# cd /bootcamp
root@89730121f88b:/bootcamp# sz_configtool
(szcfg) addDataSource EMPLOYEES
(szcfg) addDataSource EMPLOYERS
(szcfg) save
(szcfg) quit
sz_file_loader -f employees/output/employee_senzing.jsonl
sz_snapshot -o snap1
sz_explorer -s snap1.json