From c54fc0b4f4ae7e7b5a6fd3f0bb3640f7563623be Mon Sep 17 00:00:00 2001 From: matt-wolff Date: Thu, 3 Mar 2022 17:31:34 -0800 Subject: [PATCH 1/4] Added Erlang chapter skeleton. --- bookOutline.hjson | 1 + chapters/part2/erlang/index.html | 271 +++++++++++++++++++++++++++++++ 2 files changed, 272 insertions(+) create mode 100644 chapters/part2/erlang/index.html diff --git a/bookOutline.hjson b/bookOutline.hjson index b8e52f05..cb8cd80a 100644 --- a/bookOutline.hjson +++ b/bookOutline.hjson @@ -48,6 +48,7 @@ "continuous":"Continuous Distribution", "uniform":"Uniform Distribution", "exponential":"Exponential Distribution", + "erlang":"Erlang Distribution", "normal":"Normal Distribution", "binomial_approx":"Binomial Approximation" }, diff --git a/chapters/part2/erlang/index.html b/chapters/part2/erlang/index.html new file mode 100644 index 00000000..ba489876 --- /dev/null +++ b/chapters/part2/erlang/index.html @@ -0,0 +1,271 @@ + +% rebase('templates/chapter.html', title="Poisson Distribution") + +

Poisson Distribution

+
+ +

A Poisson random variable gives the probability of a given number of events in a fixed interval of time (or space). It makes the Poisson assumption that events occur with a known constant mean rate and independently of the time since the last event. +

+ +<% + include('templates/rvCards/poisson.html') +%> + +

Poisson Intuition

+ +

In this section we show the intuition behind the Poisson derivation. It is both a great way to deeply understand the Poisson, as well as good practice with Binomial distributions.

+ +

Let's work on the problem of predicting the chance of a given number of events occurring in a fixed time interval — the next minute. For example, imagine you are working on a ride sharing application and you care about the probability of how many requests you get from a particular area. From historical data, you know that the average requests per minute is $\lambda = 5$. What is the probability of getting 1, 2, 3, etc requests in a minute?

+ +

: We could approximate a solution to this problem by using a binomial distribution! Lets say we split our minute into 60 seconds, and make each second an indicator Bernoulli variable — you either get a request or you don't. If you get a request in a second, the indicator is 1. Otherwise it is 0. Here is a visualization of our 60 binary-indicators. In this example imagine we have requests at 2.75 and 7.12 seconds. the corresponding indicator variables are blue filled in boxes:

+
1 minute
+
+
+ +

+ The total number of requests received over the minute can be approximated as the sum of the sixty indicator variables, which conveniently matches the description of a binomial — a sum of Bernoullis. Specifically define $X$ to be the number of requests in a minute. $X$ is a binomial with $n=60$ trials. What is the probability, $p$, of a success on a single trial? To make the expectation of $X$ equal the observed historical average $\lambda =5$ we should choose $p$ so that $\lambda = \E[X]$. + $$ + \begin{align} + \lambda &= \E[X] && \text{Expectation matches historical average} \\ + \lambda &= n \cdot p && \text{Expectation of a Binomial is } n \cdot p \\ + p &= \frac{\lambda}{n} && \text{Solving for $p$} + \end{align} + $$ + In this case since $\lambda=5$ and $n=60$, we should choose $p=5/60$ and state that $X \sim \Bin(n=60, p=5/60)$. Now that we have a form for $X$ we can answer probability questions about the number of requests by using the Binomial PMF: + + $$\p(X = x) = {n \choose x} p^x (1-p)^{n-x}$$ +

So for example:
+ $$\p(X=1) = {60 \choose 1} (5/60)^1 (55/60)^{60-1} \approx 0.0295$$ + $$\p(X=2) = {60 \choose 2} (5/60)^2 (55/60)^{60-2} \approx 0.0790$$ + $$\p(X=3) = {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389$$ +

+ + + Great! But don't forget that this was an approximation. We didn't account for the fact that there can be more than one event in a single second. One way to assuage this issue is to divide our minute into more fine-grained intervals (the choice to split it into 60 seconds was rather arbitrary). Instead lets divide our minute into 600 deciseconds, again with requests at 2.75 and 7.12 seconds: +
1 minute
+
+
+ +

+Now $n=600$, $p=5/600$ and $X \sim \Bin(n=600, p=6/600)$. We can repeat our example calculations using this better approximation: +$$\p(X=1) = {600 \choose 1} (5/600)^1 (595/60)^{600-1} \approx 0.0333$$ + $$\p(X=2) = {600 \choose 2} (5/600)^2 (595/600)^{600-2} \approx 0.0837$$ + $$\p(X=3) = {600 \choose 3} (5/600)^3 (595/600)^{600-3} \approx 0.1402$$ +

+ +
+

Choose any value of $n$, the number of buckets to divide our minute into: + + + +

+ + +
+ +

The larger $n$ is, the more accurate the approximation. So what happens when $n$ is infinity? It becomes a Poisson!

+ +

Poisson, a Binomial in the limit

+ +

Or if we really cared about making sure that we don't get two events in the same bucket, we can divide our minute into infinitely small buckets: +

1 minute
+
+
+

+

+

Proof: Derivation of the Poisson

+

+ What does the PMF of $X$ look like now that we have infinite divisions of our minute? We can write the equation and think about it as $n$ goes to infinity. Recall that $p$ still equals $\lambda/n$: +

+ + $$ + \P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x} + $$ +

+ While it may look intimidating, this expression simplifies nicely. This proof uses a few special limit rules that we haven't introduced in this book: +

+ +$$ +\begin{align} + \P(X=x) + &= \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x} + && \text{Start: binomial in the limit}\\ + &= \lim_{n \rightarrow \infty} + {n \choose x} \cdot + \frac{\lambda^x}{n^x} \cdot + \frac{(1-\lambda/n)^{n}}{(1-\lambda/n)^{x}} + && \text{Expanding the power terms} \\ + &= \lim_{n \rightarrow \infty} + \frac{n!}{(n-x)!x!} \cdot + \frac{\lambda^x}{n^x} \cdot + \frac{(1-\lambda/n)^{n}}{(1-\lambda/n)^{x}} + && \text{Expanding the binomial term} \\ + &= \lim_{n \rightarrow \infty} + \frac{n!}{(n-x)!x!} \cdot + \frac{\lambda^x}{n^x} \cdot + \frac{e^{-\lambda}}{(1-\lambda/n)^{x}} + && \href{http://www.sosmath.com/calculus/sequence/specialim/specialim.html}{\text{Rule }} \lim_{n \rightarrow \infty}(1-\lambda/n)^{n} = e^{-\lambda}\\ + &= \lim_{n \rightarrow \infty} + \frac{n!}{(n-x)!x!} \cdot + \frac{\lambda^x}{n^x} \cdot + \frac{e^{-\lambda}}{1} + && \href{https://www.youtube.com/watch?v=x1WBTBtfvjM}{\text{Rule }} \lim_{n \rightarrow \infty}\lambda/n= 0\\ + &= \lim_{n \rightarrow \infty} + \frac{n!}{(n-x)!} \cdot + \frac{1}{x!} \cdot + \frac{\lambda^x}{n^x} \cdot + \frac{e^{-\lambda}}{1} + && \text{Splitting first term}\\ + &= \lim_{n \rightarrow \infty} + \frac{n^x}{1} \cdot + \frac{1}{x!} \cdot + \frac{\lambda^x}{n^x} \cdot + \frac{e^{-\lambda}}{1} + && \lim_{n \rightarrow \infty }\frac{n!}{(n-x)!} = n^x\\ + &= \lim_{n \rightarrow \infty} + \frac{\lambda^x}{x!} \cdot + \frac{e^{-\lambda}}{1} + && \text{Cancel }n^x\\ + &= + \frac{\lambda^x \cdot e^{-\lambda}}{x!} + && \text{Simplify}\\ +\end{align} + $$ +
+

+ +

That is a beautiful expression! Now we can calculate the real probability of number of requests in a minute, if the historical average is $\lambda=5$:

+ +

+$$\p(X=1) = \frac{5^1 \cdot e^{-5}}{1!} = 0.03369$$ + $$\p(X=2) = \frac{5^2 \cdot e^{-5}}{2!}= 0.08422$$ + $$\p(X=3) = \frac{5^3 \cdot e^{-5}}{3!} = 0.14037$$ +

+ +

This is both more accurate and much easier to compute!

+ + +

Changing time frames

+ +

Say you are given a rate over one unit of time, but you want to know the rate in another unit of time. For example, you may be given the rate of hits to a website per minute, but you want to know the probability over a 20 minute period. You would just need to multiply this rate by 20 in order to go from the "per 1 minute of time" rate to obtain the "per 20 minutes of time" rate. + + + + + + From bcdbc7c679018c4d1bf9df36374fab0050c39186 Mon Sep 17 00:00:00 2001 From: matt-wolff Date: Thu, 3 Mar 2022 21:29:31 -0800 Subject: [PATCH 2/4] Erlang rvCard and description. --- chapters/part2/erlang/index.html | 8 +-- templates/chapterList.html | 1 + templates/rvCards/erlang.html | 104 +++++++++++++++++++++++++++++++ 3 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 templates/rvCards/erlang.html diff --git a/chapters/part2/erlang/index.html b/chapters/part2/erlang/index.html index ba489876..3a91b682 100644 --- a/chapters/part2/erlang/index.html +++ b/chapters/part2/erlang/index.html @@ -1,14 +1,14 @@ -% rebase('templates/chapter.html', title="Poisson Distribution") +% rebase('templates/chapter.html', title="Erlang Distribution") -

Poisson Distribution

+

Erlang Distribution


-

A Poisson random variable gives the probability of a given number of events in a fixed interval of time (or space). It makes the Poisson assumption that events occur with a known constant mean rate and independently of the time since the last event. +

An Erlang random variable measures the amount of time until the $k^{th}$ event occurs. The random variable is the summation of $k$ IID exponential random variables.

<% - include('templates/rvCards/poisson.html') + include('templates/rvCards/erlang.html') %>

Poisson Intuition

diff --git a/templates/chapterList.html b/templates/chapterList.html index a653cf1f..0eada7c5 100644 --- a/templates/chapterList.html +++ b/templates/chapterList.html @@ -47,6 +47,7 @@ Continuous Distribution Uniform Distribution Exponential Distribution +Erlang Distribution Normal Distribution Binomial Approximation diff --git a/templates/rvCards/erlang.html b/templates/rvCards/erlang.html new file mode 100644 index 00000000..f5136d9b --- /dev/null +++ b/templates/rvCards/erlang.html @@ -0,0 +1,104 @@ + + +
+

Erlang Random Variable

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Notation: + $X \sim {\rm Erlang}(k, \lambda)$
Description: + Time until $k^{th}$ event occurs if (a) the events occur with a constant mean rate and (b) they occur independently of time since last event.
Parameters: + $k \in \{1, 2, \dots\}$, occurrence of event
+ $\lambda \in \{0, 1, \dots\}$, the constant average rate.
Support: + $x \in \mathbb{R}^+$
PDF equation:$$f(x) = \frac{\lambda^{k}x^{k-1}e^{- \lambda x}}{(k-1)!}$$
CDF equation:$$F(x) = 1 - \sum_{n=0}^{k-1}{\frac{1}{n!} e^{-\lambda x} (\lambda x )^{n}}$$
Expectation:$\E[X] = k/\lambda$
Variance:$\var(X) = k/\lambda^2$
PDF graph:
+
+
+Parameter $k$: +
+
+Parameter $\lambda$: +
+
+ +
+ + \ No newline at end of file From 6b5ff6bc1fd5223090c03c4b17b53cdf0752f08d Mon Sep 17 00:00:00 2001 From: matt-wolff Date: Sun, 6 Mar 2022 19:01:15 -0800 Subject: [PATCH 3/4] Finished Erlang chapter. --- chapters/part2/erlang/index.html | 282 +++---------------------------- templates/rvCards/erlang.html | 2 +- 2 files changed, 28 insertions(+), 256 deletions(-) diff --git a/chapters/part2/erlang/index.html b/chapters/part2/erlang/index.html index 3a91b682..efe0acc8 100644 --- a/chapters/part2/erlang/index.html +++ b/chapters/part2/erlang/index.html @@ -4,268 +4,40 @@

Erlang Distribution


-

An Erlang random variable measures the amount of time until the $k^{th}$ event occurs. The random variable is the summation of $k$ IID exponential random variables. -

+

+ If events are occurring sequentially with the same mean rate of occurrence after each event, + an Erlang random variable measures the amount of time until the $k^{th}$ event occurs. + The random variable is the summation of $k$ independent and identically distributed (IID) Exponential random variables. +

<% include('templates/rvCards/erlang.html') %> - -

Poisson Intuition

- -

In this section we show the intuition behind the Poisson derivation. It is both a great way to deeply understand the Poisson, as well as good practice with Binomial distributions.

- -

Let's work on the problem of predicting the chance of a given number of events occurring in a fixed time interval — the next minute. For example, imagine you are working on a ride sharing application and you care about the probability of how many requests you get from a particular area. From historical data, you know that the average requests per minute is $\lambda = 5$. What is the probability of getting 1, 2, 3, etc requests in a minute?

- -

: We could approximate a solution to this problem by using a binomial distribution! Lets say we split our minute into 60 seconds, and make each second an indicator Bernoulli variable — you either get a request or you don't. If you get a request in a second, the indicator is 1. Otherwise it is 0. Here is a visualization of our 60 binary-indicators. In this example imagine we have requests at 2.75 and 7.12 seconds. the corresponding indicator variables are blue filled in boxes:

-
1 minute
-
-
+

- The total number of requests received over the minute can be approximated as the sum of the sixty indicator variables, which conveniently matches the description of a binomial — a sum of Bernoullis. Specifically define $X$ to be the number of requests in a minute. $X$ is a binomial with $n=60$ trials. What is the probability, $p$, of a success on a single trial? To make the expectation of $X$ equal the observed historical average $\lambda =5$ we should choose $p$ so that $\lambda = \E[X]$. - $$ - \begin{align} - \lambda &= \E[X] && \text{Expectation matches historical average} \\ - \lambda &= n \cdot p && \text{Expectation of a Binomial is } n \cdot p \\ - p &= \frac{\lambda}{n} && \text{Solving for $p$} - \end{align} - $$ - In this case since $\lambda=5$ and $n=60$, we should choose $p=5/60$ and state that $X \sim \Bin(n=60, p=5/60)$. Now that we have a form for $X$ we can answer probability questions about the number of requests by using the Binomial PMF: - - $$\p(X = x) = {n \choose x} p^x (1-p)^{n-x}$$ -

So for example:
- $$\p(X=1) = {60 \choose 1} (5/60)^1 (55/60)^{60-1} \approx 0.0295$$ - $$\p(X=2) = {60 \choose 2} (5/60)^2 (55/60)^{60-2} \approx 0.0790$$ - $$\p(X=3) = {60 \choose 3} (5/60)^3 (55/60)^{60-3} \approx 0.1389$$ -

- - - Great! But don't forget that this was an approximation. We didn't account for the fact that there can be more than one event in a single second. One way to assuage this issue is to divide our minute into more fine-grained intervals (the choice to split it into 60 seconds was rather arbitrary). Instead lets divide our minute into 600 deciseconds, again with requests at 2.75 and 7.12 seconds: -
1 minute
-
-
- -

-Now $n=600$, $p=5/600$ and $X \sim \Bin(n=600, p=6/600)$. We can repeat our example calculations using this better approximation: -$$\p(X=1) = {600 \choose 1} (5/600)^1 (595/60)^{600-1} \approx 0.0333$$ - $$\p(X=2) = {600 \choose 2} (5/600)^2 (595/600)^{600-2} \approx 0.0837$$ - $$\p(X=3) = {600 \choose 3} (5/600)^3 (595/600)^{600-3} \approx 0.1402$$ +If you set $k$ equal to 50 up above, that is the equivalent of summing 50 Exponential random variables with a mean rate of $\lambda$ together. Notice +how the resulting PDF resembles that of a Gaussian. We will explore why that is when we cover the Central Limit Theorem.

-

Choose any value of $n$, the number of buckets to divide our minute into: - - - -

- - -
- -

The larger $n$ is, the more accurate the approximation. So what happens when $n$ is infinity? It becomes a Poisson!

- -

Poisson, a Binomial in the limit

- -

Or if we really cared about making sure that we don't get two events in the same bucket, we can divide our minute into infinitely small buckets: -

1 minute
-
-
-

-

-

Proof: Derivation of the Poisson

-

- What does the PMF of $X$ look like now that we have infinite divisions of our minute? We can write the equation and think about it as $n$ goes to infinity. Recall that $p$ still equals $\lambda/n$: -

- - $$ - \P(X=x) = \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x} - $$ -

- While it may look intimidating, this expression simplifies nicely. This proof uses a few special limit rules that we haven't introduced in this book: -

- -$$ -\begin{align} - \P(X=x) - &= \lim_{n \rightarrow \infty} {n \choose x} (\lambda / n)^x(1-\lambda/n)^{n-x} - && \text{Start: binomial in the limit}\\ - &= \lim_{n \rightarrow \infty} - {n \choose x} \cdot - \frac{\lambda^x}{n^x} \cdot - \frac{(1-\lambda/n)^{n}}{(1-\lambda/n)^{x}} - && \text{Expanding the power terms} \\ - &= \lim_{n \rightarrow \infty} - \frac{n!}{(n-x)!x!} \cdot - \frac{\lambda^x}{n^x} \cdot - \frac{(1-\lambda/n)^{n}}{(1-\lambda/n)^{x}} - && \text{Expanding the binomial term} \\ - &= \lim_{n \rightarrow \infty} - \frac{n!}{(n-x)!x!} \cdot - \frac{\lambda^x}{n^x} \cdot - \frac{e^{-\lambda}}{(1-\lambda/n)^{x}} - && \href{http://www.sosmath.com/calculus/sequence/specialim/specialim.html}{\text{Rule }} \lim_{n \rightarrow \infty}(1-\lambda/n)^{n} = e^{-\lambda}\\ - &= \lim_{n \rightarrow \infty} - \frac{n!}{(n-x)!x!} \cdot - \frac{\lambda^x}{n^x} \cdot - \frac{e^{-\lambda}}{1} - && \href{https://www.youtube.com/watch?v=x1WBTBtfvjM}{\text{Rule }} \lim_{n \rightarrow \infty}\lambda/n= 0\\ - &= \lim_{n \rightarrow \infty} - \frac{n!}{(n-x)!} \cdot - \frac{1}{x!} \cdot - \frac{\lambda^x}{n^x} \cdot - \frac{e^{-\lambda}}{1} - && \text{Splitting first term}\\ - &= \lim_{n \rightarrow \infty} - \frac{n^x}{1} \cdot - \frac{1}{x!} \cdot - \frac{\lambda^x}{n^x} \cdot - \frac{e^{-\lambda}}{1} - && \lim_{n \rightarrow \infty }\frac{n!}{(n-x)!} = n^x\\ - &= \lim_{n \rightarrow \infty} - \frac{\lambda^x}{x!} \cdot - \frac{e^{-\lambda}}{1} - && \text{Cancel }n^x\\ - &= - \frac{\lambda^x \cdot e^{-\lambda}}{x!} - && \text{Simplify}\\ -\end{align} - $$ -
+

Example: + Agner Krarup Erlang wants to reward the 10th customer who walks into his gag foam-phone store, "Phoney Foam Phones". + Agner needs 15 minutes to prepare the prize. Since a YouTube video went viral showcasing how foam-phones can be used to clean dishes, + the store has become surprisingly popular. People are walking in at a mean rate + of one person per minute. What is the probability that Erlang will have the time to prepare his prize?

- -

That is a beautiful expression! Now we can calculate the real probability of number of requests in a minute, if the historical average is $\lambda=5$:

- -

-$$\p(X=1) = \frac{5^1 \cdot e^{-5}}{1!} = 0.03369$$ - $$\p(X=2) = \frac{5^2 \cdot e^{-5}}{2!}= 0.08422$$ - $$\p(X=3) = \frac{5^3 \cdot e^{-5}}{3!} = 0.14037$$ -

- -

This is both more accurate and much easier to compute!

- - -

Changing time frames

- -

Say you are given a rate over one unit of time, but you want to know the rate in another unit of time. For example, you may be given the rate of hits to a website per minute, but you want to know the probability over a 20 minute period. You would just need to multiply this rate by 20 in order to go from the "per 1 minute of time" rate to obtain the "per 20 minutes of time" rate. - - - - - - + Let $X$ be the number of minutes until the 10th customer walks in. Since a customer walking in is an event that is + occurring at a mean rate of once per minute, $X \sim {\rm Erlang}(k = 10, \lambda = 1)$. + The question is asking us to calculate P(X > 15): + + \begin{align*} + P(X > 15) &= 1 - P(X < 15) \\ + &= 1 - F_{X}(15) \\ + &= 1 - (1 - \sum_{n=0}^{10-1}{\frac{1}{n!} e^{-1 \cdot 15} (1 \cdot 15)^{n}}) \\ + &= 1 - (1 - \sum_{n=0}^{9}{\frac{1}{n!} e^{-15} (15)^{n}}) \\ + &= 0.070 + \end{align*} + + Sorry Agner! + \ No newline at end of file diff --git a/templates/rvCards/erlang.html b/templates/rvCards/erlang.html index f5136d9b..ea134f5a 100644 --- a/templates/rvCards/erlang.html +++ b/templates/rvCards/erlang.html @@ -17,7 +17,7 @@ Parameters: $k \in \{1, 2, \dots\}$, occurrence of event
- $\lambda \in \{0, 1, \dots\}$, the constant average rate. + $\lambda \in \{0, 1, \dots\}$, the constant average rate From 121803c0791ba2ad164f80a1ea548a281b5c9bb4 Mon Sep 17 00:00:00 2001 From: matt-wolff Date: Mon, 7 Mar 2022 21:07:43 -0800 Subject: [PATCH 4/4] Fixed sentence phrasing in Erlang. --- chapters/part2/erlang/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/part2/erlang/index.html b/chapters/part2/erlang/index.html index efe0acc8..64b9bc0d 100644 --- a/chapters/part2/erlang/index.html +++ b/chapters/part2/erlang/index.html @@ -16,7 +16,7 @@

-If you set $k$ equal to 50 up above, that is the equivalent of summing 50 Exponential random variables with a mean rate of $\lambda$ together. Notice +If you set $k$ equal to 50 up above, that is the equivalent of summing together 50 Exponential random variables with a mean rate of $\lambda$. Notice how the resulting PDF resembles that of a Gaussian. We will explore why that is when we cover the Central Limit Theorem.