Making document detection more reliable

November 12, 2025

I released the first public version of FairScan about two months ago. It aims to be very simple, so that one can get a PDF in just a few seconds. To achieve that, the app relies heavily on automatic processing... but that's easier said than done. The very first review I got on the Play Store read something like: It doesn't work, "no document detected" on every page. Not exactly what I had hoped, but a good reminder that automatic document detection isn't just one feature among others: it's the feature that can make or break the whole experience. If detection fails, the app fails.

When releasing FairScan, the system seemed good enough, but I knew it wasn't perfect. Some cases were particularly difficult:

Certain document types, such as textbooks or magazines
Images containing multiple documents
Documents with a missing corner, folded, or stapled

In those situations, the app often displayed "No document detected". If that happens on every page, there's only one conclusion: the app doesn't work. I had to do something.

Improving FairScan's segmentation model

In a previous article, I explained FairScan's image processing pipeline, how it turns a photo into the image that ends up in a PDF. The very first step is the segmentation model: detecting which pixels in the photo belong to the document.

That can be tricky when there's more than one document in the image. The pixels of one document look just like those of another, so the model may treat them as one big blob, either producing a strange merged shape or failing to detect any quadrilateral at all. From the start, I worked around that limitation by training my segmentation model to identify only the main document in an image. It worked in many cases, but not always.

To improve this, I tried an instance segmentation model (YOLO 11). Unlike a semantic segmentation model such as DeepLabV3+, instance segmentation can distinguish several separate documents within the same image. I knew it would be more complex to integrate, but I also knew it could handle those situations better.

Semantic vs instance segmentation

To train it properly, I had to revisit my dataset and annotate every secondary document visible in the images. That raised an interesting question: what counts as a "document"? If only half a document appears, does it qualify? What about a tiny corner, or a blank page? Those choices could affect the model's behavior, so I removed ambiguous images entirely. From roughly 400 images, I went down to about 300.

I trained the YOLO model on this dataset, and the results were promising, but it was clear I needed a larger dataset. So I added new types of documents that were underrepresented, such as textbooks and magazines. That took time (as I describe in another post), but with around 600 images, both the semantic and instance segmentation models improved noticeably. The new models perform better across document types and handle multiple documents more reliably. Still, the instance segmentation model wasn't consistently better at separating documents, and integrating it into the Android app would have required substantial work. For now, I've chosen to keep the simpler semantic segmentation model in production.

seg-1.4-vs-1.5.webp

Making the system more robust to non-quadrilateral documents

The segmentation model is crucial, but it's not the whole story. From its output, the app must derive a quadrilateral that can be mapped to the final rectangular image. But what if a document is stapled, folded, or missing a corner? Then the segmentation output might be a polygon with 5 sides, not 4. And if the app can't handle that, the user gets another "No document detected".

I tried training a regression model to predict a quadrilateral from the segmentation mask. In theory, that could handle missing corners or imperfect masks. I extended my dataset with "expected" quadrilaterals and trained the model accordingly. The results were mixed: not bad, but not good enough. Excluding poor segmentation cases didn't help much either. After several attempts, I realized I was overcomplicating the problem and decided to move on.

FairScan 1.4 vs 1.5 on a document missing a corner

I switched back to more traditional image processing and designed new fallback algorithms that activate when the default detection fails. Some of the algorithms I tried are not well-known techniques. They're just my ideas for the particular problem I have to solve. One is purely geometrical: when the detected polygon has more than 4 sides, it searches for 3 consecutive angles that are almost right angles and builds a quadrilateral from them. Another, more basic one simply constructs a quadrilateral around the detected contour: it's not ideal, but less frustrating than getting an error message.

The dreadful "No document detected" should now be much rarer. Over the past few weeks, I've explored several approaches: some worked, some didn't, but all contributed to a better understanding of the problem. Expanding the dataset and adding new fallback algorithms made FairScan significantly better at detecting documents of all kinds. It's still not perfect, but it's definitely one big step closer. Try it and see for yourself.