ArchiveBox

Self-Hosted

Open-source self-hosted web archiving tool for long-term digital preservation

Overview

ArchiveBox is an open-source self-hosted tool for long-term web archiving and digital preservation. It captures web pages in multiple formats (HTML, PDFs, images, videos) using tools like wget and Chrome headless. Deploy via Docker (recommended), pip, or bare metal; stores data in a human-readable directory structure. Features include scheduled archiving, import from bookmarks/RSS feeds, a web UI for browsing, and offline access. Ideal for personal/organizational use to preserve important web content without relying on third-party services.

Self-Hosting Resources

Find Official Image on Docker Hub View Source & Docs

Below is a reference structure for docker-compose.yml. ⚠️ Do NOT run blindly. Replace placeholders with official values.

docker-compose.template.yml TEMPLATE


version: '3'
services:
  archivebox:
    image: <OFFICIAL_IMAGE_NAME>:latest
    container_name: archivebox
    ports:
      - "8080:<APP_INTERNAL_PORT>"
    volumes:
      - ./data:/app/data
    restart: unless-stopped

Key Features

Captures web content in multiple formats (HTML, PDFs, media)
Supports import from bookmarks, RSS feeds, and URLs
Web UI for browsing and searching archived content

Frequently Asked Questions

? Is ArchiveBox hard to install?

No—ArchiveBox offers simple deployment via Docker (recommended), pip, or bare metal. The Docker setup needs minimal config, and official docs provide step-by-step guides for all methods.

? Is it a good alternative to cloud-based archiving services like Wayback Machine?

Yes—unlike cloud services, ArchiveBox lets you control data locally, ensuring privacy and long-term access without third-party reliance. It captures more formats and supports self-hosted scheduling.

? Is ArchiveBox completely free?

Yes—ArchiveBox is open-source under the MIT License, so it’s free to use, modify, and self-host with no hidden fees or subscriptions.

Top Alternatives

Pocket Premium Search Google

Evernote Web Clipper Search Google

Tool Info

Pricing Free/Open Source

Category Archiving and Digital Preservation (DP)

Platform Self-Hosted

Pros

⊕ Local data storage ensures privacy and control
⊕ No subscription fees or hidden costs
⊕ Human-readable archive structure for easy access

Cons

⊖ Requires basic technical setup (Docker/pip preferred)
⊖ Large storage footprint for extensive archives
⊖ Dynamic content may not be fully captured in all cases