Benford's Law - Analysis and Example

Last edited:

Thumbnail

In 1938, physicist Frank Benford made a fascinating discovery that would later revolutionize fraud detection and data analysis. He noticed that in naturally occurring numbers, the first digit isn't randomly distributed as one might expect. Instead, it follows a peculiar logarithmic pattern now known as Benford's Law.

Benford's Law Formula

According to this law, in many real-world datasets, the number 1 appears as the first digit about 30.1% of the time, while 9 appears only 4.6% of the time. This counter-intuitive distribution shows that lower digits are far more likely to appear as the first digits than higher ones. The complete distribution predicts that digits 1 through 9 appear as first digits with the following frequencies: 30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1%, and 4.6% according to the following formula:

P(d)=log10(1+1d), P(d) = \log_{10}\left( 1 + \frac{1}{d} \right),

where P(d)P(d) is the probability of leading digit dd to occur.

Benford's Law Explanation

But why does this pattern emerge? The explanation lies in the way natural processes grow. Consider a city's population that grows by a fixed percentage each year. To increase from 1,000 to 2,000 residents (starting with digit 1 to starting with digit 2), the population needs to double. However, to go from 9,000 to 10,000 residents, it only needs to increase by about 11%. This means numbers tend to spend more time starting with lower digits during natural growth processes.

This mathematical principle appears in an astonishing variety of datasets: stock prices, river lengths, building heights, molecular weights, and even social media statistics. However, Benford's Law only applies to natural, unmanipulated data spanning several orders of magnitude. Data that is artificially constrained or manipulated often deviates from this pattern.

This property makes Benford's Law a powerful tool for detecting potential fraud or data manipulation. When people fabricate numbers, they tend to distribute digits more uniformly, unconsciously creating patterns that deviate from Benford's Law. This principle has been successfully used in forensic accounting, election auditing, and scientific fraud detection.

Benford's Law Example

Let's see Benford's Law in action using a real-world example. Let us analyze Adobe's 10-K for the fiscal year 2024 using a Chrome extension called Benford's Law Analyzer. The extension finds all number sequences in the financial report and automatically calculated their first-digit distribution:

Benford's law for Adobe 10-K no filtering

This does not seem to follow Benford's law whatsoever and having in mind what is mentioned above this could be setting off alarm bells. Is Adobe committing financial fraud right before our eyes? Well if we take a closer look at the 10-K we see that a big number of the number sequences is years like 2023 and 2024. In the Chrome extension, we can filter out number sequences and if we filter out all years from 2016 to 2026 we instead get this distribution:

Benford's law for Adobe 10-K no filtering

Now the distribution is looking more like what you would expect although it is not perfect, but the reason for this might be another logical explanation like that of the years. The lesson here is that you need to know and understand the dataset you are working with and if there are number sequences that might be occurring at an unusual rate compared to the rest of the dataset.

This simple yet powerful principle continues to find new applications in our data-driven world. From detecting financial fraud to validating scientific research, Benford's Law is a quick and effective way to spot potential anomalies in numerical data. While deviation from Benford's Law doesn't necessarily prove manipulation - there are many legitimate reasons why data might not follow this pattern - it provides a valuable first step in identifying datasets that might deserve closer scrutiny.

Was the post useful? Feel free to donate!

DONATE