Senzing AI Mapping Workshop

This is a hands-on session where you will learn how to map data to Senzing using AI. Each participant should come prepared so we can move quickly and focus on solving your real-world mapping challenges.

Prerequisites

What to bring:

Laptop: each participant needs their own laptop (Mac/Windows).
AI account: a paid AI subscription (Claude, ChatGPT, GitHub Copilot, Cursor, Google Gemini, Amazon CodeWhisperer, Codeium, or another paid AI assistant). Let us know if you already use another provider and want to use it.
Local development environment with AI: you'll need a way to work with AI locally on your machine. Options include an IDE with AI extension (VS Code + Claude Code/Copilot, JetBrains + AI plugin), an AI-native IDE (Cursor, Windsurf), or a command-line AI tool (Claude Code CLI). This local setup lets you access files directly, run code, execute the linter, and iterate on your mappings throughout the workshop.
Create a working folder for workshop files (e.g., ~/bootcamp) and pull this repository into it.
Your data file: bring a real dataset you want to map (CSV, JSON, etc.). Aim for a representative sample that’s safe to use in class. If you can’t share production data, bring a small, sanitized sample and put it on the ~/bootcamp directory.
Python 3: needed to run the mapping/validation code the AI will generate.
- Verify: python3 --version (or python --version on Windows).
Senzing environment (for final validation): we will load your mapped JSON into Senzing.
- Install Docker Desktop (Mac/Windows/Linux) and complete the first-run setup.
  - If you cannot install Docker, let us know in advance; we will provide alternatives during the session.
- Verify Docker is running: docker --version and docker run hello-world
- Ensure at least 4 GB RAM is allocated to Docker (Settings → Resources).
- Pull the workshop container image ahead of time (will be available one week before class):
  - docker pull senzing/summit-bootcamp-2025
- If you can also, do these two pulls to get a local AI model:
  - docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
  - docker exec -it ollama ollama pull mistral:7b-instruct-q4_K_M

Notes

We want you to solve a real problem. Bring a dataset and context so we can map to Senzing in a way that’s meaningful to your use case.
Keep sensitive data safe. Prefer samples or de-identified subsets when possible.

What’s Inside

Documents folder

The mapping documentation is maintained in the Senzing/mapper-ai repository:

senzing_mapping_assistant_prompt.md: master mapping instructions/prompt with rules, templates, and examples.
senzing_mapping_examples.md: curated reference examples that show correct Senzing JSON patterns.
senzing_entity_specification.md: authoritative, AI-ready Senzing Entity Spec (mapper-ai repo is the source of truth).
lint_senzing_json.py: JSON schema linter for validating generated Senzing JSON/JSONL.
identifier_crosswalk.json: canonical identifier types, aliases, and mapping guidance.
identifier_lookup_log.md: template to record curated identifier lookups (no PII).

Employee Data (input and expected outputs)

Path: employee_data/
Contents:
- data/us-small-employee-raw.csv: sample input data
- schema/us-small-employee-schema.csv: inferred schema (from file_analyzer)
- byhand/*: code and Senzing JSONL generated by hand (current expected result)

Voter Data (input only)

Path: voter_data/
Contents:
- data/: sample voter dataset
- schema/: inferred schema produced for the voter dataset

Company Data (input only)

Path: company_data/
Contents:
- data/: sample company dataset
- schema/: inferred schema produced for the company dataset

Tools

File Analyzer (profile files to derive schema and stats):
- Path: tools/file_analyzer.py
- Purpose: analyze CSV/JSON/Parquet when a schema doesn’t exist; shows attribute name, inferred type, population %, uniqueness %, and top values.
- Run: python3 tools/file_analyzer.py path/to/data.csv -o path/to/schema.csv
Senzing JSON Linter (schema correctness check):
- Path: docs/lint_senzing_json.py (local) or fetch from mapper-ai
- Purpose: validates structure of Senzing JSON/JSONL.
- Run (file): python3 docs/lint_senzing_json.py path/to/output.jsonl
- Run (directory): python3 docs/lint_senzing_json.py path/to/dir
Senzing JSON Analyzer (validate mapped JSONL before loading):
- Path: tools/sz_json_analyzer.py
- Purpose: validates/inspects Senzing JSON/JSONL; highlights mapped vs unmapped attributes, uniqueness/population, warnings, and errors.
- Run: python3 tools/sz_json_analyzer.py path/to/output.jsonl -o path/to/report.csv
- Docs: https://github.com/senzing-garage/sz-json-analyzer

Step-by-Step Guide (Senzing Mapping Assistant)

Data Handling Guidance

Best practice: Use schema files, not raw data. Generate a schema with the File Analyzer and map from that. This uses fewer tokens, minimizes data exposure, and keeps your AI focused on the mapping logic.
If working locally with your IDE: You can have the AI map directly from raw data files, but it's still recommended to use the File Analyzer first when possible.
If the File Analyzer can't handle your file format: Either ask your AI to analyze the file and generate a schema, or write your own code to produce a schema document.
Never upload full production datasets to web-based AI. Use schema extracts, field lists, small sanitized samples, or analyzer summaries instead.

Tips for collaborating with an AI:

Ask it questions if you don't understand something. One of my favorites is: what does the senzing spec say about that
If it gives you options, ask it for the pros and cons.
Correct it when it gets something wrong. It will learn from you.
Keep it on track: AI's hallucinate. See: ChatGPT Common Issues And Solutions

Above all: Don't use it to replace your judgement or expertise. It's just your assistant. You are the decision maker.

Step 1: Create a project folder (if you haven't already)

Make a working directory for your data (e.g., ~/bootcamp/my-source).
Put your dataset into it (e.g., a data/ subfolder).
No dataset? Copy from the aiclass voter_data or company_data folder to your new working directory.

Step 2: Generate a schema (recommended approach)

Preferred: Use the File Analyzer to generate a schema from your data:
- Run: python3 tools/file_analyzer.py path/to/data.csv -o path/to/schema.csv
- Place the output schema (e.g., schema.csv) in your project (e.g., a schema/ subfolder).
- Benefits: fewer tokens, less data exposure, better AI focus on mapping logic
If you already have an official schema or data dictionary: use that instead, skip this step.
If the File Analyzer can't handle your file format:
- Option A: Ask your AI to analyze the file and generate a schema document
- Option B: Write your own code to produce a schema
- Option C (local IDE only): Have the AI map directly from the raw data file

Step 3: Start your mapping session in your IDE

Recommended: Use your local IDE with AI assistant (VS Code with Claude/Copilot, Cursor, Windsurf, JetBrains with AI plugin, etc.)

This approach gives you direct file access, ability to execute the linter, generate and test code, handle complex multi-file schemas, and iterate on mapper implementations.

Open your project folder in your local development environment

Fetch the RAG files into your workspace (clone the mapper-ai repo or download them):

https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/senzing_mapping_assistant_prompt.md
https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/senzing_mapping_examples.md
https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/senzing_entity_specification.md
https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/lint_senzing_json.py
https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/identifier_crosswalk.json
https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/usage_type_crosswalk.json

Configure your AI assistant to use these files as context/knowledge resources
Use senzing_mapping_assistant_prompt.md as your system prompt or opening instruction
Begin interactive work with your schema and data files

Alternative: Web-based AI chat (if you cannot use a local IDE):

Open Senzing Mapping Assistant GPT - mapping docs are preloaded
Or create a new project in your AI's web interface and upload the RAG files listed above
Note: web-based approaches lack local linter execution and may struggle with complex multi-file schemas

Step 4: Map your schema through to code

Provide your schema to the AI assistant and start the mapping process.
Collaborate with the assistant to analyze your schema, agree on mappings, produce example JSON/JSONL, and generate a transformer script to emit Senzing JSONL.
By the end of this step you should have code. Download it, run it to map your data, and then verify the output with the JSON analyzer in tools (tools/sz_json_analyzer.py).

Step 5: Generate Senzing JSON output

Run the transformer you built with the assistant to produce JSONL files.
Example: python3 transform_your_source.py --input path/to/source.csv --output path/to/output.jsonl
Lint for schema correctness:
- Local file: python3 docs/lint_senzing_json.py path/to/output.jsonl
- Raw URL (for remote use): https://raw.githubusercontent.com/Senzing/mapper-ai/main/rag/lint_senzing_json.py

Step 6: Load into Senzing Note: this part will depend on if you are on windows, linux or mac, whether you have docker installed and/or python3. If you have trouble with any of this raise your hand and we will help you.

Analyze with Senzing JSON Analyzer:
- Local file: python3 docs/sz_json_analyzer.py path/to/output.jsonl
- Raw URL (for remote use): https://raw.githubusercontent.com/jbutcher21/aiclass/main/tools/sz_json_analyzer.py
  - see the docs at https://github.com/senzing-garage/sz-json-analyzer
Load your file in the Senzing instance: (only if you have docker)
- go into the docker instance
- add your data sources in sz_configtool
- load your json file with sz_file_loader
- take a snapshot with sz_snapshot
- explore your results with sz_explorer
see https://www.senzing.com/docs/tutorials/eda/

Here is what you should type:

docker run --rm -it --user 0 -v .:/bootcamp senzing/summit-bootcamp-2025

root@89730121f88b:/# cd /bootcamp
root@89730121f88b:/bootcamp# sz_configtool 

(szcfg) addDataSource EMPLOYEES
(szcfg) addDataSource EMPLOYERS
(szcfg) save
(szcfg) quit

sz_file_loader -f employees/output/employee_senzing.jsonl 

sz_snapshot -o snap1

sz_explorer -s snap1.json

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
company_data		company_data
employee_data		employee_data
slm_demo		slm_demo
tools		tools
voter_data		voter_data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Senzing AI Mapping Workshop

Prerequisites

What’s Inside

Documents folder

Employee Data (input and expected outputs)

Voter Data (input only)

Company Data (input only)

Tools

Step-by-Step Guide (Senzing Mapping Assistant)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

jbutcher21/aiclass

Folders and files

Latest commit

History

Repository files navigation

Senzing AI Mapping Workshop

Prerequisites

What’s Inside

Documents folder

Employee Data (input and expected outputs)

Voter Data (input only)

Company Data (input only)

Tools

Step-by-Step Guide (Senzing Mapping Assistant)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages