Benford's Law, also known as the First-Digit Law, is a fascinating mathematical principle that describes the frequency distribution of leading digits in many real-world sets of numerical data. Named after physicist Frank Benford, who published his observations in 1938, this law states that in many naturally occurring collections of numbers, the leading digit is likely to be small. Interestingly, this contradicts the intuitive expectation that each digit would have an equal probability of 1/9 (approximately 11.1%) of being the first digit.
According to Benford's Law, in sets of numbers from many real-life sources of data:
This counterintuitive distribution applies to a wide variety of data sets, including lengths of rivers, populations of cities, stock market prices, and even numbers in financial reports. Benford's Law has found applications in various fields, from detecting financial fraud to analyzing scientific data and election results.
The contrast between the expected uniform distribution (1/9 for each digit) and the actual distribution observed in many real-world datasets highlights the importance of this principle in identifying potentially manipulated or artificially generated data.
In general, the any quantity can be represented by $L$, where
$$ L=x\times 10^k $$
where $1≤x<10$. Now let $P(x)$ be the probability density function that $x$ lies between the values $x$ and $x+dx$. The normalisation condition gives
$$ \int_1^{10}P(x)\ dx=1. $$
Let $\lambda$ be a scaling factor. For example, if $x$ is in miles, then $x\rightarrow\lambda x$ converts it to km when $\lambda=1.6$. The key insight to Benford’s law is that we require $P$ to be invariant under this change of scale. I.e. it doesn’t matter the units we use, the probabilities of the first digits will still come out the same.
Mathematically, this invariance requires that
$$ \int_1^{10}P(\lambda x)d(\lambda x)=\int_1^{10}\lambda^{-1}P(x)\lambda dx=1 $$
in order to leave the probabilities normalised. From this, we can deduce
$$ P(\lambda x)=\lambda^{-1}P(x) $$
Differentiating both sides: