A data-driven approach to color detection in FairScan

December 14, 2025

FairScan's goal is to make it fast and easy for users to get a PDF. To achieve that, the app handles all image processing steps automatically. As a user, you have implicit expectations about the result. One of them is simple: if you see colors on your document, you expect a color PDF. Otherwise, you expect a grayscale one.

That sounds easy. But with FairScan 1.7, I could see myself that the app was not doing a great job at that. Some users' comments confirmed it. I needed to improve the automatic distinction between color and grayscale documents. However, if I change anything in this processing, how can I know whether what I do is actually better, or worse, than before?

Data to the rescue

There is an approach that works really well for this kind of problem: benchmarking. The idea is to use a dataset, with inputs and expected outputs, and to measure progress against it.

Getting a dataset can be a big task on its own. But I had already built a dataset to train the segmentation model, which I described in a previous post. It contains more than 600 images of over 200 different documents. It did not include information about whether documents were grayscale or in color, the so-called ground truth, but that was something I could complete in less than one hour.

I also had to write a small script to compute a score for the current implementation. Again, that did not take long. Here is the initial result.

Initial score:

Number of images: 652
Accuracy: 86.7%
Expected color is true and detected color is false: 38 images
Expected color is false and detected color is true: 49 images

What matters most is that getting this score is easy and fast: one click and about one minute on my computer. I could certainly make it faster, but it is already good enough to iterate quickly.

First improvement

The algorithm used in FairScan 1.7 to detect colors is simple. It counts pixels above a given chroma threshold, which measures colorfulness, and compares the proportion of such pixels against another threshold.

I had already identified one low-hanging fruit. Instead of considering all pixels inside the quadrilateral of the detected document, the algorithm should exclude pixels that are not part of the document according to the segmentation model. This may look like a subtle change, but it improves all cases where the document appears slightly concave, which is very common, or where a corner is missing because it is torn or stapled.

This change was easy to implement, and the score immediately improved to 90.6%.

Overall strategy

Improving the score is a good start, but it is still far from 100%. That does not mean the change was not useful, but it clearly means more work is needed. So what is the strategy to approach the problem?

Here is the process I follow:

Build a benchmark
Look at individual failed cases to understand why the score is low
Find a way to improve some of these cases
Validate the change against the benchmark
Iterate, going back to step 2

At this point, one improvement was validated. It was time to iterate.

Iterating

When looking at documents incorrectly detected as colored, it quickly became clear that many cases were related to lighting conditions. A black-and-white document photographed under dim light often appears slightly blue. With artificial light, it can appear slightly yellow or even reddish.

One possible workaround is to apply a white balance algorithm, such as the grey world assumption. I first applied it to the whole image and checked the benchmark. The score dropped. In other words, validation failed.

Looking at the new incorrect cases revealed a clear pattern. When the background was strongly colored, for example a red sofa taking most of the frame, the grey world algorithm compensated for it and made the document turn slightly blue. That was clearly not what I wanted.

The fix was to apply the white balance algorithm only to the document area. This time, the benchmark score increased to 91.7%. Validation succeeded, and the iteration could continue.

For color detection, FairScan relies on the CIELAB color space, which was designed to better match human perception than RGB. One interesting observation is that many pixels in a document are either very dark or very bright. For such pixels, humans barely perceive any color. I therefore tried filtering out pixels with very low or very high luminance, the L component in CIELAB. This resulted in a small improvement, bringing accuracy to 91.9%.

When inspecting chroma heat maps for incorrectly processed grayscale documents, I noticed that chroma values were sometimes high near the edges of the document. This is explained by the fact that the segmentation mask is not perfect, and some background pixels are incorrectly included. Because the proportion threshold for colored pixels must be low to detect documents with only small colored areas, these few pixels can lead to false positives.

The fix here was straightforward. I removed a small number of pixels along the border of the detected document. Not too many, so that documents that are only colored on the border can still be detected correctly. With this change, the score rose to 94.3%.

Parameters

At the end of each iteration, I adjusted the algorithm parameters to maximize the benchmark score. The main parameters are the chroma threshold and the proportion threshold for colored pixels. Changing them is similar to moving a cursor that favors either color or grayscale detection.

A different dataset would likely lead to slightly different values. However, I believe these values would not change significantly with another dataset that has a good variety of images.

The end of it

After implementing several improvements, I did not reach 100% accuracy. When I look at the remaining incorrect cases, I think some of them are still reachable, but I am running out of ideas that would improve them without lowering the overall score.

The last few percent are clearly harder to get and are probably not where I should focus right now. For now, I decided to stop here. I know with confidence that FairScan 1.8.0 performs better than all previous versions.

The automatic distinction between color and grayscale documents is probably the part of FairScan that fits best with a benchmark-driven approach. The expected result is binary, color or grayscale, which makes it easy to evaluate and quantify.

This kind of benchmarking is also especially useful when the goal is to match human perception. Detecting whether a document should be considered colored sounds trivial, but it is not. A benchmark helps validate, or invalidate, that a given change is a real improvement. I will definitely rely on this approach again for other steps of FairScan's automatic processing, so that users can get good-looking PDFs in most cases by doing nothing more than clicking a button.