Skip to content
This repository was archived by the owner on Jun 3, 2020. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ build/
include/
.vagrant/
.DS_Store
venv/
16 changes: 16 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Dirks",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"args": "-v wordpress-xml/nerdblog.wp.xml"
}
]
}
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

81 changes: 81 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# ExitWP for Hugo

## Convert WordPress and Squarespace exports to the [Hugo static site generator](https://gohugo.io/)

This is an updated version of the ExitWP tool, originally created by Thomas Frössman for Jekyll and later adapted for Hugo by Arjan Wooning.

For a detailed guide and background information, visit [Arjan Wooning's website](https://arjan.wooning.cz/conversion-tools-from-wordpress-to-hugo/#final-solution-exitwp-for-hugo).

ExitWP is a tool designed to simplify the migration process from one or more WordPress blogs, or other blogs/websites exported to the WordPress XML format, to the [Hugo static site generator](https://gohugo.io/). It aims to convert as much information as possible from the WordPress export, with options to filter the converted data.
[SquareSpace](https://squarespace.com/) also offers the option to [export your site as WordPress formatted XML file(s)](https://support.squarespace.com/hc/en-us/articles/206566687-Exporting-your-site?platform=v6&websiteId=5974c4a71b631b9a769048c6).

## Features

- Converts WordPress export XML to Hugo-compatible Markdown or HTML
- Downloads and processes images within posts
- Supports inclusion of comments from WordPress posts
- Handles tags and categories for Hugo
- Flexible configuration options via `config.yaml`

Please refer to the [Release notes](RELEASE_NOTES.md) (RELEASE_NOTES.md) for an overview of changes and updates.

## Getting Started

1. Clone the repository: `git clone https://github.com/wooni005/exitwp-for-hugo.git`
2. Export your WordPress blog(s) using the WordPress exporter (Tools > Export in WordPress admin). Other website hosting sites, like [SquareSpace](https://squarespace.com/) also offer the option to export your site as WordPress formatted XML file(s).
3. Place all WordPress XML files in the `wordpress-xml` directory
4. Configure the tool by editing `config.yaml`
5. Run the converter: `python3 exitwp.py`
6. Optionally, if the script runs into issues, or the output does not appear to be correct, run `xmllint` [part of Libxml2](https://en.wikipedia.org/wiki/Libxml2) on your export file(s) and fix any errors.
7. Your converted blog(s) will be in separate directories under the `build` directory, specified in `config.yaml`.

## Dependencies

- Python 3.x
- markdownify
- PyYAML
- Beautiful Soup 4

## Installing Dependencies

```bash
pip3 install -r requirements.txt
```

## Configuration

Refer to the `config.yaml` file for all configurable options. Key settings include:

- `wp_exports`: Directory containing WordPress export XML files
- `build_dir`: Target directory for output
- `download_images`: Whether to download and relocate images
- `include_comments`: Option to include comments in the exported content
- `target_format`: Choose between 'markdown' or 'html' output
- `image_settings`: Configure image processing behavior

## Usage

Basic usage:

```bash
python3 exitwp.py
```

For verbose output:

```bash
python3 exitwp.py -v
```

## Known Issues and Limitations

- Potential issues with non-UTF-8 encoded WordPress dump files
- Image downloading may fail for some URLs due to various reasons (404 errors, timeouts, etc.)

## Support

This tool is not actively maintained. For support or custom modifications, consider using AI chatbots like ChatGPT or Claude.

## Contributing

If you've made significant improvements to the tool, feel free to submit a pull request.
74 changes: 0 additions & 74 deletions README.rst

This file was deleted.

60 changes: 60 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Changelog and Release Notes

# August 2024

## exitwp.py

### Major Changes
- Replaced html2text_file with markdownify for HTML to Markdown conversion
- Added support for downloading and processing images within posts
- Implemented comment extraction and inclusion in the output
- Added support for tags and categories handling
- Improved error handling and logging

### New Features
- Image processing: Downloads images, saves them locally, and updates image URLs in the content
- Comment handling: Extracts and includes comments in the output markdown files
- Tags and categories: Properly handles WordPress tags and categories, mapping them to Hugo format
- Timezone handling: Added support for CET timezone

### Improvements
- Enhanced YAML header generation for Hugo compatibility
- Improved date parsing and handling
- Better error logging and verbose output options
- Refactored code for better readability and maintainability

### Bug Fixes
- Fixed issues with Unicode handling
- Addressed potential errors in parsing XML and HTML content

## config.yaml

### New Options
- Added `tags_label` option to specify the label for tags/categories in the output
- Introduced `include_comments` option to control whether comments are included in the export

### Changes
- Refined `taxonomies` configuration to better handle tags and categories
- Updated `body_replace` patterns for improved content transformation

### Improvements
- Added more detailed comments and explanations for configuration options

## Overall Improvements

1. Better Hugo Compatibility: The updated script now generates output more closely aligned with Hugo's expectations.
2. Enhanced Image Handling: Improved downloading and processing of images within posts.
3. Comment Support: Added the ability to include WordPress comments in the exported content.
4. Improved Taxonomy Handling: Better management of tags and categories for Hugo.
5. More Flexible Configuration: Additional options in config.yaml for finer control over the export process.

## Upgrade Notes

When upgrading to this new version:

1. Review the new configuration options in config.yaml and adjust as needed for your use case.
2. Be aware of the change from html2text to markdownify for HTML to Markdown conversion.
3. Test the script with a small subset of your content first to ensure compatibility with your specific WordPress export.
4. Pay attention to the new image handling and comment inclusion features, adjusting settings as necessary.

This update significantly improves the WordPress to Hugo migration process, offering more features and better compatibility with Hugo's content structure.
19 changes: 0 additions & 19 deletions Vagrantfile

This file was deleted.

74 changes: 67 additions & 7 deletions config.yaml
Original file line number Diff line number Diff line change
@@ -1,19 +1,71 @@
# Tell me what's going on.. can also pass command line argument -v
verbose: False

# The directory where exitwp looks for wordpress export xml files.
wp_exports: wordpress-xml

# The target directory where all output is saved.
build_dir: build

# Output format: primary choices are html or markdown.
# Some functions, like the inclusion of comments, only output in markdown,
# and may look not as expected in html.
target_format: markdown

# The date format of the wikipedia export file.
# I'm not sure if this ever differs depending on wordpress localization.
# Wordpress is often so full of strange quirks so I wouldnt rule it out.
# I'm not sure if this ever differs depending on WordPress localization.
# Wordpress is often so full of strange quirks so I wouldn't rule it out.
date_format: '%Y-%m-%d %H:%M:%S'

# Try to download and reloacate all images locally to the blog.
download_images: False
# Try to download and relocate all images locally to the blog.
download_images: True

# Image URL filtering
image_settings:
# URL parts to exclude when processing images
excluded_url_parts:
- 'tracking.pixel.com'
- 'http://www.assoc-amazon.com/'
# Domains to always include when processing images
included_domains:
- 'nerdblog.steinkopf.net'
# Default behavior for image validity when no other conditions are met
# Set to true to include images by default, false to exclude by default
#
# If set to true:
# - All images will be considered valid unless explicitly excluded
# - The 'included_domains' setting will have no effect
#
# If set to false:
# - Only images from 'included_domains' will be considered valid
# - All other images will be excluded unless explicitly included
# - This can be handy if you want to process only images from your old
# blog for example, but not download images from the public internet
# to your own (new) server.
#
# Examples:
# 1. To process all images except those from specific domains:
# default_image_validity: true
# excluded_url_parts:
# - 'ads.example.com'
# - 'tracking.example.com'
#
# 2. To process only images from specific domains:
# default_image_validity: false
# included_domains:
# - 'images.mysite.com'
# - 'cdn.mysite.com'
#
default_image_validity: false
# Icon to use when an image is not found. Make sure to put this file in
# the right place on your destination server manually.
# (This file is not supplied with exitwp, you have to pick one yourself.)
not_found_icon: '/icons/question-warning.svg'
# Default timeout (in seconds) for image downloads
download_timeout: 10

# Include old/existing comments with the post
include_comments: true

# Item types we don't want to import.
item_type_filter: {attachment, nav_menu_item}
Expand All @@ -22,21 +74,29 @@ item_type_filter: {attachment, nav_menu_item}
# By default, we're filtering based on field "status" set to "draft"
item_field_filter: {status: draft}

# Output label for categories or tags.
# NOTE: This overrides the name_mapping in the taxonomies below!
# Default will be tags_label: 'categories', as specified in the exitwp.py
# script, if not defined here.
# tags_label: 'tags'

taxonomies:
# Filter taxonomies.
filter: {}
# Filter taxonomies entries.
entry_filter: {category: Uncategorized}
# Rename taxonomies when writing jekyll output format.
# NOTE: categories label is overwritten by the tags_label above!!
name_mapping: {category: categories, post_tag: tags}

# Replace certain patterns in body
# Simply replace the key with its value
body_replace: {
# '\(/media/': '(/images/posts/',
# '<pre.*?lang="(.*?)".*?>': '\n{% codeblock \1 lang:\1 %}\n',
# '<pre.*?>': '\n{% codeblock %}\n',
# '</pre>': '\n{% endcodeblock %}\n',

# '[python]': '{% codeblock lang:python %}',
# '[/python]': '{% endcodeblock %}',
}
# '[python]': '{% codeblock lang:python %}',
# '[/python]': '{% endcodeblock %}',
}
Loading