Dewey

Dewey is an AI-powered librarian designed to help newsrooms make their archives easy to search, making use of LLMs to provide cited responses.

Archival research methods are cumbersome. Often times, they rely on keyword searches and date range filtering, making it difficult to surface topics without specific preexisting knowledge. Moreover, archives can live across disparate source systems/databases because of how content management systems evolve over time. Unifying these systems with a state-of-the-art search engine hopes to make archive research easier and more efficient for reporters.

Acknowledgements

Special thank you to the Lenfest Institute AI Collaborative and Fellowship Program for making this project happen.

Patrick Kerkstra, Ross Maghielse - newsroom guidance, support, and tester recruiting
Tommy Rowan, Nick Vidala, Jennifer Friedman-Perez - Alpha testing users
Lenfest Institute - for providing and securing the grant that made this project possible
Microsoft - for co-funding the grant and jumpstarting our progress
OpenAI - for co-funding the grant and providing technical support

Installation

Clone the repository

git clone https://github.com/phillymedia/dewey-ai.git
cd dewey-ai

Create virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows:
.venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Configuration

To run this project, you will need to add the following environment variables to your .env file

Copy environment template

cp .env.template .env

Configure your .env file Note: While most environment variables must specify preexisting resources and deployments,AZURE_SEARCH_INDEX_NAME represents the desired name for your search index.

# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
EMBEDDING_DEPLOYMENT_NAME="your-embedding-deployment-name"
EMBEDDING_MODEL_NAME="text-embedding-3-large"
CHATGPT_DEPLOYMENT_NAME="your-chatgpt-deployment-name"
CHATGPT_MODEL_NAME="gpt-5"

# Azure AI Search Configuration
AZURE_SEARCH_ENDPOINT="https://your-search-service.search.windows.net"
AZURE_SEARCH_API_KEY="your-azure-search-api-key"
AZURE_SEARCH_INDEX_NAME="your-search-index-name"  

# Azure Blob Storage Configuration
AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=..."
AZURE_STORAGE_CONTAINER_NAME="your-storage-container-name"

Deployment

Prepare Your Documents

Any articles you wish to upload should be in JSON format in the data/ folder. Each document should follow this structure:

{
    "id": "unique-document-id",
    "headline": "Article Title",
    "content": "Full article content...",
    "url": "https://example.com/article",
    "authors": ["Author 1", "Author 2", ...],
    "publish_date": "2025-09-09T12:00:00Z",
}

id (optional) - A unique ID for your article. Will be autopopulated if you do not provide
headline - A title to your article
content - The full text to your article. It is up to you how to format this. This will end up being your searchable field
url - A URL facing your article. This is critical for citation functionality
authors - A list of author names from the article's byline
publish_date - The date in which your article as published (ISO 8601 format)

Run Setup Script

python app/setup.py

This will:

Validate your configuration
Create Azure AI Search index with proper schema
Set up skillsets for document processing (chunking and embedding)
Create indexer for automated processing
Upload documents to blob storage
Process documents through AI Search pipeline

Launch Dewey

python main.py

The application will be available at http://localhost:7860. This project uses Gradio to create a user-friendly web interface for our machine learning model. You can learn more about Gradio at https://www.gradio.app/.

Usage

Open your browser to http://localhost:7860
Ask questions in natural language:
- "What articles did author X about the election last month?"
- "Summarize the last decade of coverage on topic X."
- "Find me the best restaurants mentioned in 2024."
Expand Dewey's though process to see what was searched and how many articles were retrieved

Contributing

All contributions are welcome! More details to come. In the meantime, please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
data		data
.env.template		.env.template
.gitignore		.gitignore
LICENSE		LICENSE
Lenfest AI Collaborative Case Study - AI Archive Research Assistant.pdf		Lenfest AI Collaborative Case Study - AI Archive Research Assistant.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dewey

Acknowledgements

Installation

Configuration

Deployment

Usage

Contributing

About

Uh oh!

Releases

Packages

Languages

License

BostonGlobe/ai-collab-dewey-ai

Folders and files

Latest commit

History

Repository files navigation

Dewey

Acknowledgements

Installation

Configuration

Deployment

Usage

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages