It is a commonly held belief in the

Information Security community that an attacker, performing

reconnaissance against a set of computers, will hide their own

identity and location by including a

large number of

forged source packets. Most Information Security professionals don't have a strong enough math background to realize that this technique, used simply, is flawed, and thus,

defeatable.

Posit an attacker that generates a stream of traffic, with their own source IP. To cover their tracks, for each packet in this stream, they generate a large number of similar or identical packets, with a randomly chosen IP address.

Here's where the math comes in. Using the linearity of expectations, we can calculate how many unique source IPs we expect to see from a given amount of traffic.

Assume the attacker sends 10,000 random packets per second, and 1 true packet. Sampling across 30 seconds, we see 300,030 different packets. The expectation of seeing any given IP address (assuming no pruning has occurred) is:

*1-(1-2*^{-32})^{300,030}=6.985x10^{-5}

(That is, one minus the probability that no balls will land in that box - which is the probability that a ball lands in another box, raised to number of balls)

The linearity of expectations simply states that the expectation of the sum of events is computable by the sum of the expectations; in this case, the expected number of IPs can be computed by summing the expectation of each IP address.

*Expected IP space = 2*^{32}*6.985x10^{-5}=300019.52

You can see already that we'd expect to see more unique IP addresses than we will, given that 30 of our packets originate from one IP address.

In fact, what we are likely to observe is:
299,990.5 (299,989.5 unique randomly selected IPs, plus our one attacker).

Now that the attacked party knows that there is useful, non-random data in the stream of packets, hunting and finding it is much easier.