Dewey is an AI-powered librarian designed to help newsrooms make their archives easy to search, making use of LLMs to provide cited responses.
Archival research methods are cumbersome. Often times, they rely on keyword searches and date range filtering, making it difficult to surface topics without specific preexisting knowledge. Moreover, archives can live across disparate source systems/databases because of how content management systems evolve over time. Unifying these systems with a state-of-the-art search engine hopes to make archive research easier and more efficient for reporters.
Special thank you to the Lenfest Institute AI Collaborative and Fellowship Program for making this project happen.
- Patrick Kerkstra, Ross Maghielse - newsroom guidance, support, and tester recruiting
- Tommy Rowan, Nick Vidala, Jennifer Friedman-Perez - Alpha testing users
- Lenfest Institute - for providing and securing the grant that made this project possible
- Microsoft - for co-funding the grant and jumpstarting our progress
- OpenAI - for co-funding the grant and providing technical support
- Clone the repository
git clone https://github.com/phillymedia/dewey-ai.git
cd dewey-ai- Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows:
.venv\Scripts\activate- Install dependencies
pip install -r requirements.txtTo run this project, you will need to add the following environment variables to your .env file
- Copy environment template
cp .env.template .env
- Configure your
.envfile Note: While most environment variables must specify preexisting resources and deployments,AZURE_SEARCH_INDEX_NAMErepresents the desired name for your search index.
# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com"
AZURE_OPENAI_API_KEY="your-azure-openai-api-key"
EMBEDDING_DEPLOYMENT_NAME="your-embedding-deployment-name"
EMBEDDING_MODEL_NAME="text-embedding-3-large"
CHATGPT_DEPLOYMENT_NAME="your-chatgpt-deployment-name"
CHATGPT_MODEL_NAME="gpt-5"
# Azure AI Search Configuration
AZURE_SEARCH_ENDPOINT="https://your-search-service.search.windows.net"
AZURE_SEARCH_API_KEY="your-azure-search-api-key"
AZURE_SEARCH_INDEX_NAME="your-search-index-name"
# Azure Blob Storage Configuration
AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=..."
AZURE_STORAGE_CONTAINER_NAME="your-storage-container-name"
- Prepare Your Documents
Any articles you wish to upload should be in JSON format in the data/ folder. Each document should follow this structure:
{
"id": "unique-document-id",
"headline": "Article Title",
"content": "Full article content...",
"url": "https://example.com/article",
"authors": ["Author 1", "Author 2", ...],
"publish_date": "2025-09-09T12:00:00Z",
}id(optional) - A unique ID for your article. Will be autopopulated if you do not provideheadline- A title to your articlecontent- The full text to your article. It is up to you how to format this. This will end up being your searchable fieldurl- A URL facing your article. This is critical for citation functionalityauthors- A list of author names from the article's bylinepublish_date- The date in which your article as published (ISO 8601 format)
- Run Setup Script
python app/setup.pyThis will:
- Validate your configuration
- Create Azure AI Search index with proper schema
- Set up skillsets for document processing (chunking and embedding)
- Create indexer for automated processing
- Upload documents to blob storage
- Process documents through AI Search pipeline
- Launch Dewey
python main.pyThe application will be available at http://localhost:7860. This project uses Gradio to create a user-friendly web interface for our machine learning model. You can learn more about Gradio at https://www.gradio.app/.
- Open your browser to
http://localhost:7860 - Ask questions in natural language:
- "What articles did author X about the election last month?"
- "Summarize the last decade of coverage on topic X."
- "Find me the best restaurants mentioned in 2024."
- Expand Dewey's though process to see what was searched and how many articles were retrieved
All contributions are welcome! More details to come. In the meantime, please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request