Browser-Only Document Processing: Building a Private PDF & Image Toolset

When I set out to build a document-processing toolset, I had one non-negotiable requirement: your files never leave your device. No uploads to a server. No third party ever sees your content. Everything runs locally in the browser.

This article explains how I built the Professional Toolset — a collection of more than 30 PDF and image utilities — using client-side JavaScript libraries.

Why “Browser-Only” Matters

Most online PDF tools upload your document to a server, process it, and send it back. That design has real drawbacks:

Privacy: Your confidential documents are transmitted to and stored on a third-party server, at least temporarily
Speed: Upload and download time dominates the total processing time, especially for large files
Cost: Servers cost money; browsers are free
Offline capability: A browser-based tool can work without any internet connection after the first page load

Modern JavaScript and WebAssembly have made it practical to do substantial document processing entirely client-side.

Core Libraries

PDF.js — Rendering PDFs

PDF.js from Mozilla is the engine that powers Firefox’s built-in PDF viewer. It parses and renders PDF files in a <canvas> element. I use it to display PDF page previews and to extract page content for conversion operations.

const pdf = await pdfjsLib.getDocument({ data: arrayBuffer }).promise;
const page = await pdf.getPage(1);
const viewport = page.getViewport({ scale: 1.5 });

const canvas = document.createElement('canvas');
canvas.width = viewport.width;
canvas.height = viewport.height;

const ctx = canvas.getContext('2d');
await page.render({ canvasContext: ctx, viewport }).promise;

pdf-lib — Creating and Modifying PDFs

pdf-lib is a pure-JavaScript library for creating and editing PDFs. Unlike PDF.js (which is read-only), pdf-lib lets you merge files, split pages, rotate them, add watermarks, and more.

Merging two PDFs is roughly this:

const mergedPdf = await PDFDocument.create();

for (const file of files) {
  const pdf = await PDFDocument.load(await file.arrayBuffer());
  const pages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
  pages.forEach(page => mergedPdf.addPage(page));
}

const bytes = await mergedPdf.save();

Sharp-inspired Browser Image Processing

For image operations — compression, resizing, format conversion, rotation — I use the browser’s built-in Canvas API combined with createImageBitmap. There is no native browser API as full-featured as sharp (the Node.js image library), but the combination of Canvas and OffscreenCanvas covers most common tasks well.

async function resizeImage(file, maxWidth, maxHeight) {
  const bitmap = await createImageBitmap(file);
  const scale = Math.min(maxWidth / bitmap.width, maxHeight / bitmap.height, 1);

  const canvas = new OffscreenCanvas(
    Math.round(bitmap.width * scale),
    Math.round(bitmap.height * scale)
  );

  const ctx = canvas.getContext('2d');
  ctx.drawImage(bitmap, 0, 0, canvas.width, canvas.height);

  return canvas.convertToBlob({ type: 'image/jpeg', quality: 0.85 });
}

OffscreenCanvas is key here: it lets image processing run off the main thread if needed, keeping the UI responsive.

Mammoth.js — Word to PDF

Mammoth converts .docx files to HTML in the browser. I combine it with html2canvas and jsPDF to produce a PDF output. The conversion fidelity isn’t perfect for complex formatting, but it handles the most common cases reliably.

SheetJS — Excel Conversion

SheetJS reads and writes spreadsheet files in the browser. It supports .xlsx, .xls, .csv, and more. I use it for the Excel-to-PDF converter, where each sheet is rendered to an HTML table and then captured as a PDF page.

Handling Large Files

One challenge with browser-based processing is memory. JavaScript has access to the device’s RAM, but browsers have per-tab limits, and large PDFs (50MB+) can cause problems.

Strategies I use to mitigate this:

Process one page at a time: Rather than loading the entire PDF into a canvas upfront, I render each page individually, capture the result, and free the canvas before moving to the next
Use URL.createObjectURL instead of base64: Base64-encoded file data is ~33% larger. Object URLs reference the raw binary blob without encoding overhead
Revoke object URLs when done: URL.revokeObjectURL(url) frees the memory immediately rather than waiting for garbage collection

Triggering File Downloads

After processing, the result needs to be downloaded. The simplest approach is to create a temporary link and simulate a click:

function downloadBlob(blob, filename) {
  const url = URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = filename;
  a.click();
  URL.revokeObjectURL(url);
}

This works across modern browsers without any server involvement.

Service Worker Caching for Offline Use

The toolset is part of a Progressive Web App. The service worker pre-caches all the library scripts so the tools work offline after the first visit. The CDN libraries (PDF.js, pdf-lib, etc.) are cached alongside the HTML and CSS.

This means a user can open the PDF merger on an airplane without internet access and it will work perfectly, because the browser already has everything it needs stored locally.

Subresource Integrity

All CDN-loaded scripts include Subresource Integrity (SRI) hashes:

<script src="https://cdn.jsdelivr.net/npm/pdf-lib/dist/pdf-lib.min.js"
        integrity="sha384-HASH_HERE"
        crossorigin="anonymous"></script>

The browser verifies the script’s cryptographic hash before executing it. If the CDN delivers tampered code, the browser refuses to run it. This is a critical security measure for any site that loads third-party scripts.

The Full Toolset

The toolset currently includes tools for:

PDF organisation: Merge, split, reorder pages, rotate pages
PDF optimisation: Compress, convert colour to greyscale
PDF conversion: Convert to/from Word, Excel, PowerPoint (using pptxgenjs to generate slides from PDF page images), and images
PDF editing: Add watermarks, electronic signatures, page numbers
Image processing: Compress, resize, convert formats (JPG/PNG/WebP), rotate, crop, add watermarks, blur faces, remove backgrounds
Developer utilities: JSON formatter, JWT decoder, regex tester, hash generator, UUID generator, Base64 encoder, URL encoder, timestamp converter, case converter, word counter, lorem ipsum generator, QR code generator, percentage calculator, colour converter, CSS meme generator

All tools share the same privacy guarantee: your data never leaves your browser.

What’s Next

A few tools I’d like to add:

PDF OCR: Extracting text from scanned PDFs requires running an OCR model in the browser via WebAssembly — projects like Tesseract.js make this feasible
Batch processing: Allow multiple files to be processed in sequence without user interaction for each one
PDF form filling: pdf-lib supports filling form fields, which would enable a browser-based form-fill tool

The browser is increasingly powerful enough to handle tasks that previously required a server. Client-side processing respects user privacy, reduces infrastructure costs, and can actually be faster than round-tripping to a remote server. I expect this trend to continue as WebAssembly and browser APIs mature.

Try the tools at /playground/tools/.

Why “Browser-Only” Matters

Core Libraries

PDF.js — Rendering PDFs

pdf-lib — Creating and Modifying PDFs

Sharp-inspired Browser Image Processing

Mammoth.js — Word to PDF

SheetJS — Excel Conversion

Handling Large Files

Triggering File Downloads

Service Worker Caching for Offline Use

Subresource Integrity

The Full Toolset

What’s Next

We Value Your Privacy

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Preference Cookies