mmdbconvert

A command-line tool to merge multiple MaxMind MMDB databases and export to CSV, Parquet, or MMDB format.

Features

✅ Merge multiple MMDB databases - Combine GeoIP2 databases (e.g., Enterprise + Anonymous IP)
✅ Non-overlapping networks - Automatically resolves overlapping networks to smallest blocks
✅ Adjacent network merging - Combines adjacent networks with identical data for compact output
✅ Multiple output formats - Export to CSV, Parquet, or MMDB format
✅ Query-optimized Parquet - Integer columns enable 10-100x faster IP lookups
✅ Type-preserving MMDB output - Perfect type preservation for merged databases
✅ Flexible column mapping - Extract any fields from MMDB databases using JSON paths
✅ IPv4 and IPv6 support - Handle both IP versions seamlessly
✅ Type hints for Parquet - Native int64, float64, bool types for efficient storage

Installation

Binary Releases (Recommended)

Download pre-built binaries from the GitHub Releases page.

Architecture Guide:

amd64 = x86-64 / x64 (most common for Intel/AMD processors)

arm64 = ARM 64-bit (Apple Silicon, AWS Graviton, Raspberry Pi 4+)

darwin = macOS

Replace <VERSION> with the release version (e.g., 0.1.0)

Replace <ARCH> with your architecture (e.g., amd64 or arm64)

Linux

Using .deb package (Debian/Ubuntu):

Download the .deb file for your architecture from the releases page
Install using dpkg:

sudo dpkg -i mmdbconvert_<VERSION>_<ARCH>.deb

Using .rpm package (RedHat/CentOS/Fedora):

Download the .rpm file for your architecture from the releases page
Install using rpm:

sudo rpm -i mmdbconvert_<VERSION>_<ARCH>.rpm

Using tar.gz archive:

Download the Linux tar.gz file for your architecture from the releases page
Extract and install:

tar -xzf mmdbconvert_<VERSION>_linux_<ARCH>.tar.gz
sudo mv mmdbconvert/mmdbconvert /usr/local/bin/

macOS

Download the macOS tar.gz file for your architecture from the releases page:
- darwin_arm64 for Apple Silicon (M1/M2/M3/M4)
- darwin_amd64 for Intel Macs
Extract and install:

tar -xzf mmdbconvert_<VERSION>_darwin_<ARCH>.tar.gz
sudo mv mmdbconvert/mmdbconvert /usr/local/bin/

Windows

Download the Windows zip file for your architecture from the releases page
Extract the zip file
Add the mmdbconvert.exe binary to your PATH or run it directly from the extracted location

Using PowerShell:

# Extract (adjust filename to match your download)
Expand-Archive -Path mmdbconvert_<VERSION>_windows_<ARCH>.zip -DestinationPath .

# Run
.\mmdbconvert\mmdbconvert.exe --version

Note: ARM64 binaries are available for all platforms. Choose the appropriate architecture for your system.

From Source

go install github.com/maxmind/mmdbconvert/cmd/mmdbconvert@latest

Build Locally

git clone https://github.com/maxmind/mmdbconvert.git
cd mmdbconvert
go build -o mmdbconvert ./cmd/mmdbconvert

Quick Start

1. Create a Configuration File

Create config.toml:

[output]
format = "csv"
file = "output.csv"

[[databases]]
name = "city"
path = "/path/to/GeoIP2-City.mmdb"

[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]

2. Run the Tool

mmdbconvert config.toml

3. View the Output

head output.csv

network,country_code,city_name
1.0.0.0/24,AU,Sydney
1.0.1.0/24,CN,Beijing
1.0.4.0/22,AU,Melbourne

Note: The network column appears automatically because no [[network.columns]] sections were defined. By default, CSV output includes a CIDR column named network, while Parquet output includes start_int and end_int integer columns for faster IP lookups. You can customize network columns in the configuration.

Usage

# Basic usage
mmdbconvert config.toml

# Explicit config flag
mmdbconvert --config config.toml

# Suppress progress output
mmdbconvert --config config.toml --quiet

# Disable unmarshaler caching to reduce memory usage (several times slower)
mmdbconvert --config config.toml --disable-cache

# Show version
mmdbconvert --version

# Show help
mmdbconvert --help

Configuration

See docs/config.md for complete configuration reference.

CSV Output Example

[output]
format = "csv"
file = "geo.csv"

[output.csv]
delimiter = ","  # or "\t" for tab-delimited

[[network.columns]]
name = "network"
type = "cidr"

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]

Parquet Output Example

[output]
format = "parquet"
file = "geo.parquet"

[output.parquet]
compression = "snappy"
row_group_size = 500000

# Integer columns for fast queries
[[network.columns]]
name = "start_int"
type = "start_int"

[[network.columns]]
name = "end_int"
type = "end_int"

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

[[columns]]
name = "country"
database = "city"
path = ["country", "iso_code"]
type = "string"

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
type = "float64"

MMDB Output Example

[output]
format = "mmdb"
file = "merged.mmdb"

[output.mmdb]
database_type = "GeoIP2-City"
description = { en = "Merged GeoIP Database" }
record_size = 28

[[databases]]
name = "city"
path = "GeoIP2-City.mmdb"

# Use output_path to create nested structure
[[columns]]
name = "country_code"
database = "city"
path = ["country", "iso_code"]
output_path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "city"
path = ["city", "names", "en"]
output_path = ["city", "names", "en"]

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
output_path = ["location", "latitude"]

[[columns]]
name = "longitude"
database = "city"
path = ["location", "longitude"]
output_path = ["location", "longitude"]

MMDB output features:

Perfect type preservation from source databases
Support for nested structures via output_path
Compatible with all MMDB readers (libmaxminddb, etc.)
Configurable record size (24, 28, or 32 bits)

Querying Parquet Files

Parquet files generated with integer columns (start_int, end_int) support extremely fast IP lookups (10-100x faster than string comparisons).

DuckDB Example

-- Lookup IP address 203.0.113.100 (integer: 3405803876)
SELECT * FROM read_parquet('geo.parquet')
WHERE start_int <= 3405803876 AND end_int >= 3405803876;

See docs/parquet-queries.md for comprehensive query examples and performance optimization guide.

Examples

Merging Multiple Databases

Combine GeoIP2 Enterprise with Anonymous IP data:

[output]
format = "parquet"
file = "merged.parquet"

[[network.columns]]
name = "start_int"
type = "start_int"

[[network.columns]]
name = "end_int"
type = "end_int"

[[databases]]
name = "enterprise"
path = "GeoIP2-Enterprise.mmdb"

[[databases]]
name = "anonymous"
path = "GeoIP2-Anonymous-IP.mmdb"

# Columns from Enterprise database
[[columns]]
name = "country_code"
database = "enterprise"
path = ["country", "iso_code"]

[[columns]]
name = "city_name"
database = "enterprise"
path = ["city", "names", "en"]

[[columns]]
name = "latitude"
database = "enterprise"
path = ["location", "latitude"]
type = "float64"

[[columns]]
name = "longitude"
database = "enterprise"
path = ["location", "longitude"]
type = "float64"

# Columns from Anonymous IP database
[[columns]]
name = "is_anonymous"
database = "anonymous"
path = ["is_anonymous"]
type = "bool"

[[columns]]
name = "is_anonymous_vpn"
database = "anonymous"
path = ["is_anonymous_vpn"]
type = "bool"

All Network Column Types

[[network.columns]]
name = "network"
type = "cidr"          # e.g., "203.0.113.0/24"

[[network.columns]]
name = "start_ip"
type = "start_ip"      # e.g., "203.0.113.0"

[[network.columns]]
name = "end_ip"
type = "end_ip"        # e.g., "203.0.113.255"

[[network.columns]]
name = "start_int"
type = "start_int"     # e.g., 3405803776 (IPv4 only)

[[network.columns]]
name = "end_int"
type = "end_int"       # e.g., 3405804031 (IPv4 only)

Default network columns: If you don't define any [[network.columns]], mmdbconvert automatically provides sensible defaults based on output format:

CSV: Single network column (CIDR format) for human readability
Parquet: start_int and end_int columns for 10-100x faster IP queries

Note: start_int and end_int only work with IPv4 addresses unless you split your output into separate IPv4/IPv6 files via output.ipv4_file and output.ipv6_file. For single-file outputs that include IPv6 data, use string columns (start_ip, end_ip, cidr).

Data Type Hints

Parquet supports native types for efficient storage and queries:

[[columns]]
name = "population"
database = "city"
path = ["city", "population"]
type = "int64"          # Integer values

[[columns]]
name = "accuracy_radius"
database = "city"
path = ["location", "accuracy_radius"]
type = "int64"

[[columns]]
name = "latitude"
database = "city"
path = ["location", "latitude"]
type = "float64"        # Floating-point values

[[columns]]
name = "is_satellite"
database = "city"
path = ["traits", "is_satellite_provider"]
type = "bool"           # Boolean values

Use Cases

Merging Enterprise + Anonymous IP

Merging GeoIP2 Enterprise with GeoIP2 Anonymous IP to enrich traffic logs. The merged database provides:

Geographic location data (country, city, coordinates)
Anonymous IP detection (VPN, proxy, hosting provider)
Single query-optimized Parquet file for fast lookups
Non-overlapping networks for accurate IP matching

Creating Custom MMDB Databases

Merge multiple MMDB databases into a single custom database with perfect type preservation:

Combine multiple data sources (GeoIP, ISP, ASN, etc.)
Create application-specific databases with only needed fields
Maintain exact data types from source databases
Deploy merged databases with existing MMDB readers
No performance overhead compared to original databases

Analytics Pipelines

Export MMDB databases to Parquet for use in analytics pipelines:

DuckDB: Fast local queries on laptop/server
Apache Spark: Distributed processing of billions of logs
Trino/Presto: Query data in S3 without downloading
BigQuery: Load Parquet files for SQL analysis

Data Warehouse Integration

Convert MMDB databases to CSV/Parquet for loading into data warehouses:

Snowflake
Redshift
BigQuery
Databricks

Architecture

Streaming Network Merge

mmdbconvert uses a streaming accumulator algorithm:

Nested iteration through all databases using NetworksWithin()
Smallest network selection - Always chooses most specific network block
Data extraction from all databases for each network
Adjacent network merging - Combines networks with identical data

Non-Overlapping Networks

When databases have overlapping networks, mmdbconvert automatically splits them into non-overlapping blocks:

Example:

Database A: 10.0.0.0/16
Database B: 10.0.1.0/24

Output:
  10.0.0.0/24   (only in A)
  10.0.1.0/24   (in both A and B)
  10.0.2.0/23   (only in A)
  10.0.4.0/22   (only in A)
  10.0.8.0/21   (only in A)
  ... etc

This ensures accurate IP lookups with no ambiguity.

Documentation

Configuration Reference - Complete config file documentation
Parquet Query Guide - Optimizing IP lookup queries

Requirements

Go 1.25 or later
MaxMind MMDB database files (GeoIP2, GeoLite2, etc.)

License

This project is licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes with tests
Run linters: golangci-lint run
Run tests: go test ./...
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Acknowledgments

Built with:

go-toml - TOML configuration parsing
maxminddb-golang - MMDB database reading
mmdbwriter - MMDB database writing
parquet-go - Parquet file writing

Support

Issues: GitHub Issues
Documentation: docs/
MaxMind Support: https://support.maxmind.com

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github		.github
cmd/mmdbconvert		cmd/mmdbconvert
docs		docs
examples		examples
internal		internal
testdata		testdata
.gitignore		.gitignore
.gitmodules		.gitmodules
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
.precious.toml		.precious.toml
.prettierrc.json		.prettierrc.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
go.mod		go.mod
go.sum		go.sum

License

Licenses found

maxmind/mmdbconvert

Folders and files

Latest commit

History

Repository files navigation

mmdbconvert

Features

Installation

Binary Releases (Recommended)

Linux

macOS

Windows

From Source

Build Locally

Quick Start

1. Create a Configuration File

2. Run the Tool

3. View the Output

Usage

Configuration

CSV Output Example

Parquet Output Example

MMDB Output Example

Querying Parquet Files

DuckDB Example

Examples

Merging Multiple Databases

All Network Column Types

Data Type Hints

Use Cases

Merging Enterprise + Anonymous IP

Creating Custom MMDB Databases

Analytics Pipelines

Data Warehouse Integration

Architecture

Streaming Network Merge

Non-Overlapping Networks

Documentation

Requirements

License

Contributing

Acknowledgments

Support

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages