The Heart of FairScan: the image processing pipeline explained

September 30, 2025

How does FairScan work at its core? How does it transform an image captured by the phone's camera into the page of a PDF document that doesn’t just look like a photo? From the user’s point of view, it feels like a single transformation that happens in less than a second. Let’s look at the different steps involved, illustrated with a visual example.

Segmentation

The first step is image segmentation, which identifies the pixels that belong to a document. I trained a custom model specifically for images containing documents.

Here you can see the original image combined with the segmentation mask (the green overlay) produced by the model. Since the model was trained to identify only one "document" even if multiple are visible, you can see that only part of the second document in the picture is captured.

This is the most crucial step, the one that can feel a bit “magical.” I plan to keep improving the model so it detects documents more reliably in all kinds of conditions.

Contours

The next step is to detect the edges of the document based on the segmentation output. Edge detection is a well-known problem in image processing, and the Canny algorithm is widely used for this task.

In my first attempt at FairScan, I skipped segmentation and ran Canny directly on the captured image. That worked in some cases, but segmentation produces much better results when the background and the document have similar colors or when shadows are strong.

Quadrilateral

At this stage, the assumption is that documents are rectangles. So we want to identify a quadrilateral. Your brain probably sees the document on the photo as a rectangle, but because of perspective distortion, it isn’t. The previous step may return several contours of arbitrary shapes. We approximate these contours as polygons using the Ramer–Douglas–Peucker algorithm, then filter out those that don’t have 4 edges, and finally pick the one with the largest area.

Perspective correction

The next step is to correct the perspective and crop the image. This involves applying a perspective transformation based on the quadrilateral from the previous step and mapping it to a target rectangle.

The most important part here is the aspect ratio so that the document doesn’t look distorted. For now, I use a simple heuristic based on the average of opposite edges. It gives acceptable results as long as the photo wasn’t taken from too oblique an angle.

Post-processing

Maybe you’re thinking: “the image on the left has a strange color.” That really is the color from the original photo (I promise). I guess that when looking at the photo of a document, the human brain tends to interpret the main color of the page as white, especially because it contrasts with the darker background. But it’s not actually white, and you don’t want your PDF to look like that.

So the last step adjusts parameters like brightness and contrast to bring the result closer to what you expect from a digital document.

This step struggles in low-light conditions and doesn’t always correctly identify when a document should be treated as grayscale. There’s definitely room for improvement here.

That's it: all of those steps make the heart of FairScan and it all happens very quickly. In fact, the first steps already run live on the camera preview to display the quadrilateral, so the user immediately sees what FairScan detects as the document.

This pipeline combines several steps, most of them relying on well-established algorithms implemented in open-source libraries. Before starting this project, I knew nothing about image processing. It took me quite some time to tune this pipeline, but I learned a lot along the way. And I know each of these steps can still be improved to automatically produce cleaner PDFs. From the user’s perspective, it will continue to feel effortless.