AI-Assisted Evaluation Evidence Mapping

Introduction

An Evidence Map is a structured, visual tool that organizes what we know—and don’t know—about programs, policies, and interventions. Think of it as a research and evaluation landscape that helps understand how the evidence available is distributed against a framework of reference: for example, the Sustainable Development Goals:

Which interventions have been evaluated
Where they were implemented
What outcomes were observed
Where critical knowledge gaps remain

This approach is especially valuable for evidence-informed project design, particularly when time or resources limit the ability to read through hundreds of individual evaluation reports.

More than just tagging content, the system smartly extracts the ‘why’ behind program success - not just outcomes, but mechanisms, required conditions, and implementation factors. This means better program design, faster proposal development, and fewer repeated mistakes. Evaluations become a living knowledge base that actually gets used. This project shall enhance the speed and quality of learning. Instead of each organization reinventing the wheel or making the same mistakes, we’d have:

Real-time access to “what worked where and why”
Faster identification of promising approaches to test
Better understanding of when and how to adapt programs

This tool can not only inform future evaluation design, guide strategic planning, and support the development of robust, evidence-based project proposals and strategic plan, but also could fundamentally change how evidence flows through the humanitarian system - from slow, siloed learning to rapid, networked knowledge sharing.

Some examples of Evidence Maps:

Evidence and Gap Maps relating to Sustainable Development Goals
Known from birth: Generating and using evidence to strengthen birth registration systems in Africa (UNICEF)
Child and Adolescent Mental Health and Psychosocial Support Interventions: An evidence and gap map of low- and middle-income countries (Report)
Youth Futures Foundation Evidence & Gap Map on Interventions to Increase Youth Employment (Map; Report)
Evidence map by the UNSDG System-Wide Evaluation Office (SWEO) mapping UN evaluations against the 2024 Quadrennial Comprehensive Policy Review (QCPR)
UNICEF Evidence Map example).

About this project

Evidence Maps are relatively established tools that started being common before the popularization of LLMs and AI in general. Their production generally entails a large amount of staff-hours spent on reviewing documentation and “tagging” it, with more or less articulated arrangements for source selection and cross-verification of the human-led classification. Recent practice in evidence mapping focuses on the automation of different steps of the map production process via AI.

The goal of this exercise is to create an AI-powered open source application capable of accurately ““tagging” evaluation reports against the IOM Strategic Result Framework and the Global Compact on Migration, two important tools that guide the action of IOM and other entities in the humanitarian and development space.

Rather than a fully automated tagging tool, the instrument will be used in the context of a structured and efficient human-machine collaboration protocol whereby the relative strengths of each are leveraged. This initiative aims at producing a fully reproducible solution that is not only scalable, but also easy to repurpose for other applications in the humanitarian and development space.

The goal of this exercise is to inform future evaluation design, guide strategic planning, and support the development of robust, evidence-based project proposals and strategic plan.

In addition, such AI-enhanced approach shall help to manage evaluation output as “LivingEvidence” - aka looking at knowledge synthesis as an ongoing rather than static endeavor - that can improve the timeliness of recommendation updates and reduce the knowledge-to-practice gap.

Last The approach used here is also tackling the Evidence Generalization Problem, which refers to the challenge of applying findings from one context to another. By systematically mapping evidence across diverse contexts, we can better understand how and why certain interventions work in specific settings, and identify the conditions under which they are most effective.

Approach

To ensure relevance, we begin by clarifying the scope and purpose of the mapping:

What types of interventions are we assessing? (IOM programs, policies, and strategies, e.g., cash-based interventions, health services, community engagement)
Who are the target populations? (e.g., migrants, displaced persons, host communities)
What outcomes matter most? (e.g., livelihood improvements, health outcomes, social integration - Aligned with the IOM Strategic Results Framework)
Who is the audience for this map? (e.g., policymakers, funders, researchers, program managers, donors)
What are our key learning questions? (e.g., “What works best to maximize impact an effectivness?”)
What level of evidence is required? (e.g., RCTs, quasi-experiments, observational studies)
What variables are we tracking? (e.g., intervention type, target group, outcomes, geography)

Step 1: Building the Knowledge Base

We have compiled a list of all publicly available IOM Evaluation Reports.

Each report will be analyzed to generate a structured metadata record, including:

What: Title, summary, full-text link, evaluation type (formative, summative, impact), scope (strategy, policy, thematic, program, or project), and geographic coverage
Who: Conducting entity (IOM internal vs. external evaluators)
How: Methodology, study design, sample size, and data collection techniques

These metadata and full-text documents will be convert the content of each report into a embeddings vector database, enabling fast, flexible, and AI-enhanced retrieval using advanced tools like Hybrid Search.

Step 2: Structured Information Extraction

Instead of just summarizing each evaluation report, we use AI to answer these same questions for every single evaluation. This creates comparable data across all studies.

We will create a set of plain-language questions, reflecting the entire IOM Results Framework. Using AI tools, we will extract consistent and comparable data from each report:

Program details (what was implemented)
Context (where and with whom)
Design (how it was studied)
Findings (what results were observed)
Strength of evidence (how reliable the findings are)

Step 3: Cross-Evaluation Analysis

We will then run those questions through the vector database to generate answers based first on each evaluation.

This will allow to categorize the data within a structured framework:

By intervention type (e.g., skills training, psychosocial support)
By measured outcome (e.g., employment, resilience, community cohesion)
By population (e.g., migrants in transit, returnees, host communities)
By evidence quality (e.g., robust vs. exploratory studies)

Step 4: Generate Actionable and Generalizable Insights

One key challenge is How to generalize the findings from an evaluation from one place to another one? The Generalizability Framework provides some insights on how to do that.

We will use the same questions to generate a Q&A dataset that can be used to answer the same questions across all evaluations, aka the full corpus of Q&A previously generated, therefore accounting for all evidences that were gathered across all evaluations.

This will allow us to quickly assess the evidence base and identify key insights, such as: _ “What types of interventions are most effective in improving livelihood outcomes for migrants in urban settings?”_

Step 5: Identify Patterns and Gaps

We’ll present the results across objectives, outcome, population and geography using intuitive, interactive visualizations:

Bubble maps (bubble size = number of studies or sample size)
Heatmaps (showing concentration of evidence by topic or geography)
Gap maps (highlighting under-researched areas)

The evidence system will allow us to quickly highlight:

✅ Areas with strong, consistent evidence
⚠️ Topics with mixed or conflicting findings
❌ Critical gaps where no evidence exists

Example Insight:
> “Mentoring programs show consistent positive results for urban migrant youth, but there’s limited evidence for rural populations.”

Deliverables

The final deliverables will include:

A Q&A dataset that can be used:
- As a reference for content curation and evaluation
- As a knowledge base for AI-enhanced project proposal systems
- As a training dataset for fine-tuning small language models via Hugging Face
A synthesis report identifying:
- Research priorities
- High-risk areas for intervention
- Recommendations for future evaluations
A searchable online visual evidence map for ongoing use by IOM teams that would allow to see evidence according to both SRF and GCM frameworks.

Developer Guide

This module implements a Literate Programming approach, which means:

Documentation: Each handler serves as comprehensive documentation.
Code Reference: The notebooks contain the actual implementation code.
Communication Tool: They facilitate discussions with data providers about discrepancies or inconsistencies.

If you are new to using nbdev, here are some useful pointers to get you started.

Install evaluation_knowledge in Development mode

# make sure evaluation_knowledge package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to evaluation_knowledge
$ nbdev_prepare

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/iom/evaluation_knowledge.git

or from conda

$ conda install -c iom evaluation_knowledge

or from pypi

$ pip install evaluation_knowledge

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

How to use

Prepare the documentation of your library
Run the module

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
_proc		_proc
evaluation_knowledge.egg-info		evaluation_knowledge.egg-info
evaluation_knowledge		evaluation_knowledge
nbs		nbs
.envTemplate		.envTemplate
.gitattributes		.gitattributes
.gitconfig		.gitconfig
.gitignore		.gitignore
.vdoc.64003216-1a82-4a97-b3c5-d3ddaef21333.py		.vdoc.64003216-1a82-4a97-b3c5-d3ddaef21333.py
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
chunck_processing.log		chunck_processing.log
pyproject.toml		pyproject.toml
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Assisted Evaluation Evidence Mapping

Introduction

Some examples of Evidence Maps:

About this project

Approach

Step 1: Building the Knowledge Base

Step 2: Structured Information Extraction

Step 3: Cross-Evaluation Analysis

Step 4: Generate Actionable and Generalizable Insights

Step 5: Identify Patterns and Gaps

Deliverables

Developer Guide

Install evaluation_knowledge in Development mode

Usage

Installation

Documentation

How to use

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

iom/evaluation_knowledge

Folders and files

Latest commit

History

Repository files navigation

AI-Assisted Evaluation Evidence Mapping

Introduction

Some examples of Evidence Maps:

About this project

Approach

Step 1: Building the Knowledge Base

Step 2: Structured Information Extraction

Step 3: Cross-Evaluation Analysis

Step 4: Generate Actionable and Generalizable Insights

Step 5: Identify Patterns and Gaps

Deliverables

Developer Guide

Install evaluation_knowledge in Development mode

Usage

Installation

Documentation

How to use

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages