Skip to content

sbonaime/mediawiki2wikijs

Repository files navigation

MediaWiki to Wiki.js Migration Tool

Python 3.9+ License: MIT Pandoc Required

A comprehensive Python-based tool for migrating complete MediaWiki sites to Wiki.js, preserving page organization, internal links, images, and categories. Designed for reliability with checkpoint/resume functionality and automatic authentication management.

✨ Features

  • Complete Export: Extract all pages, images, and metadata from MediaWiki (v1.13.5+)
  • Content Transformation: Convert wikitext to markdown with automatic link and image reference updates
  • Intelligent Import: Create pages and upload images to Wiki.js via GraphQL API
  • Resumable Operations: Checkpoint every 10 pages for large wiki migrations
  • Flexible Authentication: Supports both public wikis (anonymous access) and private wikis (automatic reconnection on timeout)
  • Dry-Run Mode: Preview operations without making changes
  • Link Depth Control: Configurable BFS traversal for selective export
  • Error Recovery: Comprehensive logging with CSV error reports

Requirements

  • Python: 3.9 or higher (3.11+ recommended for performance)
  • Pandoc: System-level installation required for wikitextβ†’markdown conversion
  • Network: HTTP/HTTPS access to both MediaWiki and Wiki.js instances

Install Pandoc

macOS (via Homebrew):

brew install pandoc

Linux (Debian/Ubuntu):

sudo apt-get install pandoc

Windows: Download from https://pandoc.org/installing.html

πŸ“‹ Table of Contents

Installation

  1. Clone the repository:

    git clone https://github.com/sbonaime/mediawiki2wikijs.git
    cd mediawiki2wikijs
  2. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure settings:

    cp .env.example .env
    # Edit .env with your MediaWiki URL and Wiki.js credentials
    # Note: MediaWiki username/password are optional (leave empty for public wikis)

Quick Start

1. Export from MediaWiki

python src/export_mediawiki.py

This will:

  • Connect to your MediaWiki instance
  • Export all pages to export_output/ directory
  • Download all images
  • Convert content to markdown
  • Save checkpoint every 10 pages

2. Import to Wiki.js

python src/import_wikijs.py --source ./export_output

This will:

  • Connect to your Wiki.js instance
  • Create all pages with markdown content
  • Upload all images
  • Update internal links

Configuration

Environment Variables

Edit your .env file with these settings:

# MediaWiki Configuration
MEDIAWIKI_URL=https://wiki.example.com

# Optional: For private wikis that require authentication
# Leave empty for public wikis
MEDIAWIKI_USERNAME=YourUsername
MEDIAWIKI_PASSWORD=YourPassword

# Wiki.js Configuration (required)
WIKIJS_URL=https://newwiki.example.com
WIKIJS_API_KEY=your-api-key-here

# Optional Settings
EXPORT_DIR=./export_output
CHECKPOINT_FREQUENCY=10
MAX_LINK_DEPTH=-1
LOG_LEVEL=INFO

Public vs Private Wikis:

  • Public wikis: Leave MEDIAWIKI_USERNAME and MEDIAWIKI_PASSWORD empty or remove them. The tool will connect anonymously.
  • Private wikis: Provide valid credentials. The tool will automatically handle authentication and reconnection on timeout.

Usage Examples

Export Options

# Dry run (preview without downloading)
python src/export_mediawiki.py --dry-run

# Limit link depth
python src/export_mediawiki.py --link-depth 2

# Resume interrupted export
python src/export_mediawiki.py --resume

# Export specific namespaces
python src/export_mediawiki.py --namespaces 0,2

# Verbose logging
python src/export_mediawiki.py --verbose

Import Options

# Dry run (preview without creating pages)
python src/import_wikijs.py --source ./export_output --dry-run

# Skip existing pages
python src/import_wikijs.py --source ./export_output --skip-existing

# Force overwrite existing pages
python src/import_wikijs.py --source ./export_output --force

# Resume interrupted import
python src/import_wikijs.py --source ./export_output --resume

Documentation

Project Structure

src/
β”œβ”€β”€ export_mediawiki.py        # Export CLI script
β”œβ”€β”€ import_wikijs.py           # Import CLI script
β”œβ”€β”€ lib/                       # Shared library code
β”‚   β”œβ”€β”€ config.py              # Configuration management
β”‚   β”œβ”€β”€ logger.py              # Logging utilities
β”‚   β”œβ”€β”€ mediawiki_client.py    # MediaWiki API client
β”‚   β”œβ”€β”€ wikijs_client.py       # Wiki.js GraphQL client
β”‚   β”œβ”€β”€ content_transformer.py # Markup conversion
β”‚   β”œβ”€β”€ storage_manager.py     # File system operations
β”‚   β”œβ”€β”€ image_processor.py     # Image handling
β”‚   └── auth_manager.py        # Authentication management
└── models/                    # Data models
    β”œβ”€β”€ wiki_page.py           # Page entity
    β”œβ”€β”€ image_asset.py         # Image entity
    β”œβ”€β”€ link_reference.py      # Link relationship
    β”œβ”€β”€ checkpoint.py          # Checkpoint state
    └── migration_report.py    # Migration report

tests/
β”œβ”€β”€ unit/                      # Unit tests
β”œβ”€β”€ integration/               # Integration tests
└── contract/                  # API contract tests

Troubleshooting

Authentication Timeout

If you see "Authentication failed: Session expired", the script will automatically reconnect after 5 minutes of inactivity.

Pandoc Not Found

Install Pandoc system-wide (see Requirements section above).

Image Download Failures

Check that your MediaWiki bot user has read permissions on the File namespace.

Page Already Exists

Use --skip-existing flag to skip existing pages, or --force to overwrite.

🀝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes: Follow existing code style and add tests
  4. Commit: git commit -m 'Add amazing feature'
  5. Push: git push origin feature/amazing-feature
  6. Open a Pull Request

See tasks.md for planned features and open tasks.

Development Setup

# Install development dependencies
pip install -r requirements.txt

# Run tests
pytest tests/

# Run linting
ruff check src/

οΏ½ Changelog

See CHANGELOG.md for a detailed history of changes and releases.

οΏ½πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

For issues and questions:

πŸ™ Acknowledgments

  • MediaWiki API: Powered by mwclient
  • Wiki.js GraphQL: Using gql
  • Pandoc: Universal document converter by John MacFarlane

πŸ“Š Status

Project Status: βœ… MVP Complete (70% of planned features implemented)

  • βœ… Phase 1-2: Setup and foundational infrastructure
  • βœ… Phase 3: MediaWiki export with BFS traversal
  • βœ… Phase 4: Content transformation (wikitextβ†’markdown)
  • βœ… Phase 5: Wiki.js import with GraphQL
  • ⏳ Phase 6: Verification tools (planned)
  • ⏳ Phase 7: Testing and polish (planned)

See tasks.md for detailed progress tracking.

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages