Saturday, October 27, 2018

German Tank problem


I was recently watching How to estimate a population using statisticians in Youtube and this reminded of the German tank problem. The problem has a very nice history and I started reading about it in Wikipedia after I watched the video.

I found it very interesting that the problem can be achieved using either frequentist inference or Bayesian inference, achieving different results. Estimating the size of a Population by gives the proof of the famous result that the number of tanks.

Let the random variables $N$ denote the unknown number of tanks, $M$ be the maximum serial number observed and $K$ be the sample size. Then given $M=m$ and $K=k$,

$\displaystyle N \approx \frac{k+1}{k}m-1$ by frequentist analysis and
$\displaystyle \mathbb{E}(N|M=m,K=k)=(m-1)\frac{k-1}{k-2}$ by Bayesian analysis.

Both of these estimates are based on the probability $M=m$ is the maximal among $N$ variables with a sample size of $K$. That is,
$\displaystyle \mathbb{P}(M=m|N,K)=\frac{\binom{m-1}{K-1}}{\binom{N}{K}}$

Now let's think differently. The history of the German tank problem meant that the serial number of the tanks were noted after they were captured or destroyed. But what if they weren't? Assume that the serial number were noted without the tanks being captured/destroyed. In other words, there is a possibility of the observed numbers being repeated. Therefore,

$\displaystyle \mathbb{P}(M=m|N,K)=\frac{\binom{K+m-1}{m-1}-\binom{K+m-2}{m-2}}{\binom{K+N-1}{N-1}}=\frac{K}{m-1}\frac{\binom{K+m-2}{m-2}}{\binom{K+N-1}{N-1}}$

Surprisingly, even in this case the total likelihood function remains the same as in the original German tank problem. That is,

$\displaystyle \sum_{n}\mathcal{L}(n)=\frac{k}{k-1}$ using the sum $\displaystyle \sum_{n=m}^\infty \frac{1}{\binom{k+n-1}{n-1}}=\frac{k}{m}\frac{1}{\binom{k+m-2}{k-2}}$

Proceeding like in the wiki article, this we have,

$\displaystyle N \approx \frac{k+1}{k}m-\frac{1}{m}$ by frequentist analysis and
$\displaystyle \mathbb{E}(N|M=m,K=k)=m\frac{k-1}{k-2}$ by Bayesian analysis.

Now coming back to the video, I quoted at the start of this post, it is clear that the population estimation is done with the famous capture-recapture method. This is basically the setting usually described in terms of Hypergeometric distribution. The estimate of the total population in this case is given by,

$\displaystyle N \approx \frac{nK}{k}$

where $K$ is the initial sample size, $n$ is the second sample size and $k$ is the number of elements that are recaptured.

But this is based on frequentist inference. What about the Bayseian inference? Turns out we would need the following sum to come up with the total likelihood function.

$\displaystyle \sum_{N \geq \text{max}(n,K)}\frac{\binom{N-K}{n-K}}{\binom{N-m}{n-m}}=\frac{n-m}{K-m-1}\frac{1}{\binom{K-m-2}{k-m-2}}$

Using we have, $\displaystyle \sum_{n}\mathcal{L}(n)=\frac{nK}{k(k-1)}$ and therefore $\displaystyle \mathbb{E}(N)=\frac{(n-1)(K-1)}{k-2}$ as given here.

Now, the final part. Matt does something interesting in the video. He uses the mark-and-recapture method to estimate the number of candies in a bowl by repeated sampling. How do we go about that?

Let $n_0$ be the total number of candies in the bowl that we are trying to estimate. To this end, we capture a bunch of them each time and mark how many times those candies were captured. Let's say we do this $k$ times and $n_1,n_2,\cdots,n_k$ be sample sizes. Let $X_j$ denote the number of candies that were captured $j$ times. Now we have,

$\displaystyle \mathbb{E}(X_j)=n_0[t^j]\prod_{i=1}^k \left(1-\frac{n_i}{n_0}+ t\frac{n_i}{n_0}t\right)$

Using the data given by Matt in the video, we observe that $2$ candies were capture $4$ times. Using this as an estimate for $\mathbb{E}(X_4)$ and solving resulting equation for $n_0$ and picking the maximal root, we have $n_0 \approx 164.3$ whereas the bowl originally had about $169$ candies.

The above formulae is based on frequentist inference. It's not so hard to arrive at the $\mathbb{P}(X_j=m)$ with recurrence but finding a closed form, which I think will be needed in Bayesian inference, remains a challenge. I'll update if I'm successful in that front.


Until then,
Yours Aye
Me

Monday, April 30, 2018

The Birthday Problem


The Birthday problem is one of the classic problems in Probability theory concerning the probability that, in a set of $n$ randomly chosen people, some pair will have a the same birthday. The idea in this post is to generalize this setup and find the expected value in this generalization.

The generalization here is this: Assume that there is a very long line of persons ready to enter a very large room one by one. Each person is let in and declares her birthday upon entering the room. How many people, on average, must enter in order to find $k$ people that have birthdays that are $d$ days apart (given an $r$-day year)?

Non-Spoiler: Some readers may recognize that this Problem 584 of Project Euler. But note that this post neither gives the answer nor the right way to solve that problem. All we are going here is find an approximation of the required value which is far off from the correct answer.

Well, when we say 'approximation', we are immediately reminded of the Poisson approximation of the Birthday problem in Wikipedia. But the idea here is that most of the approximations I found in the web concerns only about the probability and not about the expected value.  A special case of this problem is solved in Strategic Practice 5 of the (amazing) Stat 110 course by the (extraordinary) Prof. Joseph Blitzstein. And I got curious as to how good the approximation really is.

Let $\mathbb{E}(k,d,r)$ be the expected number of people that have to enter room such that we have $k$ people with birthdays $d$ days apart for the first time.

Let $X_n$ be the number of $k$-set birthday matches in when $n$ people have entered the room. By the Poisson paradigm, $X_n$ can be considered to be a poisson variable. The parameter $\lambda_n$ for this variable is given by,

$\displaystyle\lambda_n=\displaystyle\binom{n}{k}\cdot\frac{1}{(k-1)!}\left(\frac{2d+1}{r}\right)^{k-1}$  (why?)

Therefore we immediately have, $\mathbb{P}(X_n=0)=e^{-\lambda_n}$

Now by a simple trick common in evaluating expected values, we have,

$\begin{align}
\displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)=k+\displaystyle\sum_{n=k}^\infty e^{-\lambda_n}
\end{align}$

Though we have an infinite sum, the numbers decrease rapidly because of the exponential factor. For the values given the said Project euler problem, this approximation gives $\mathbb{E}_{\text{Poisson}}(3,1,10)\approx 6.15092$ and $\mathbb{E}_{\text{Poisson}}(3,7,100)\approx 8.79751$

But we can take it a little further. When the parameter of a Poisson variable get very large, then the Normal variable with the same mean and variance as the Poisson variable gives better accuracy. Therefore,

$\begin{align}
\displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)\\
&=k+\displaystyle\sum_{n=k}^\infty \mathbb{P}(-1/2 \leq X_n \leq 1/2)
\end{align}$

where $X_n \sim \mathcal{N}(\lambda_n,\lambda_n)$ along with the continuity correction.

Here again, the numbers decrease fast and the sum converges in about fifty terms. Now we have $\mathbb{E}_{\text{Normal}}(3,1,10)\approx 5.68554$ and $\mathbb{E}_{\text{Normal}}(3,7,100)\approx 8.04791$

In fact for the required answer, the normal approximation gives $\mathbb{E}_{\text{Normal}}(4,7,365)\approx 33.6396$ with a relative error of just about $2.5\%$. Not bad.

These values were calculated with the following Mathematica code.

Clear["Global`*"];
PoissonApprox[k_, d_, r_] :=
  k + Sum[N[
     Exp[-Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/
        Factorial[k - 1]]], {n, k, 100}];
NormalApprox[k_, d_, r_] :=
 k + Sum[N[
    CDF[NormalDistribution[
       Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],
       Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/
          Factorial[k - 1]]], 1/2] -
     CDF[NormalDistribution[
       Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],
       Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/
          Factorial[k - 1]]], -1/2]], {n, k, 100}]
PoissonApprox[4, 7, 365]
NormalApprox[4, 7, 365]

Hope you enjoyed this. See you again with a nice problem.


Until then
Yours Aye
Me

Tuesday, April 24, 2018

Probability on a Chess Match


Consider two friends $A$ and $B$ playing a series of bullet chess games. As every time the player who lost the recent game wants to continue, they come with an idea. They'll play the first game. After the game, they'll toss a fair coin. If the coin comes up heads, they'll continue playing another game, else they end the match. They'll continue this procedure until the coin turns up Tails.

The question we are interested here is the probability of either player winning the match given the probability $\alpha$ of $A$ winning a game, probability $\gamma$ of $B$ winning a game and the remaining probability $\beta=1-\alpha-\gamma$ of a game ending in a draw. WLOG assume $\alpha<\gamma$.

All the equations referred in this post now on are from A list of Integrals and Identities.

Let $A_{n,k}$ denote the event of $A$ winning a match of $n$ games by $k$ more wins (ignoring the coin tosses). Then,

$\displaystyle\mathbb{P}(A_{n,k})=\sum_{a+b+c =n\\a-c=k}\binom{n}{a,b,c}\alpha^a\beta^b\gamma^c$

where $a$ denotes the number of games won by $A$, $c$ the number of games won by $B$ and $b$ the number of games that ended in a draw.

First, let's tackle an easy question. That is finding the probability of the match ending in a draw.

$\begin{align}
\mathbb{P}(\text{Draw})&=\displaystyle\sum_{n=1}^\infty\mathbb{P}(A_{n,0}\text{ | }n\text{ games played})\cdot\mathbb{P}(n\text{ games being played})\\
\end{align}$

As we know that the probability of the match lasting $n$ games is $2^{-n}$, using (1) we have,

$\begin{align}
\mathbb{P}(\text{Draw})&=\displaystyle\frac{\cot\phi}{\sqrt{\alpha\gamma}}-1\\
\end{align}$,   where $\phi=\cos^{-1}\displaystyle\left(\frac{2\sqrt{\alpha\gamma}}{1+\alpha+\gamma}\right)$


Similarly, we now look at the probability of $A$ not losing the match. This time we have,

$\begin{align}
\mathbb{P}(A \text{ not losing})&=\displaystyle\sum_{n=1}^\infty\sum_{k=0}^\infty\mathbb{P}(A_{n,k}\text{ | }n\text{ games played})\cdot\mathbb{P}(n\text{ games being played})\\
\end{align}$

Using (6), this reduces to,

$\begin{align}
\mathbb{P}(A \text{ not losing})&=\displaystyle\frac{\cot\phi}{\sqrt{\alpha\gamma}-\alpha(\sec\phi-\tan\phi)}-1\\
\end{align}$

In fact, instead of using a fair coin, if they decide to use a biased coin so that they continue playing with probability $q$ and end the match with probability $p=1-q$,

$\begin{align}
\displaystyle\frac{q}{1-q}\mathbb{P}(\text{Draw})&=\frac{\frac{1}{2}\cot\phi}{\sqrt{(q\alpha)(q\gamma)}}-1\\
\end{align}$

$\displaystyle\frac{q}{1-q}\mathbb{P}(A \text{ wins the match})=\displaystyle\frac{\frac{1}{2}\cot\phi}{(q\gamma)(\sec\phi+\tan\phi)-\sqrt{(q\alpha)(q\gamma)}}$

$\displaystyle\frac{q}{1-q}\mathbb{P}(B \text{ wins the match})=\displaystyle\frac{\frac{1}{2}\cot\phi}{(q\alpha)(\sec\phi+\tan\phi)-\sqrt{(q\alpha)(q\gamma)}}$

where $\displaystyle\phi=\cos^{-1}\left(\frac{2\sqrt{(q\alpha)(q\gamma)}}{(1-q)+q\alpha+q\gamma}\right)$

These expressions can be written algebraically without the use of trigonometric functions but they are clumsy and I prefer this form. Also, there a couple of nice details hidden in these expressions. First, using a right choice of $q$, the weaker player will be able to decrease his opponent's chance of winning the match which was surprising for me.

Second, in the interesting and limiting case of $q\to1$, the LHSs of the above three expression can be seen as the expected number of games after which the match is at a drawn stage, player $A$ is in the lead and player $B$ is in the lead respectively.

$\displaystyle\lim_{q\to1}\displaystyle\frac{q}{1-q}\mathbb{P}(\text{Draw})=\frac{1}{|\gamma-\alpha|}-1$,

$\displaystyle\lim_{q\to1}\displaystyle\frac{q}{1-q}\mathbb{P}(A \text{ wins the match})=\frac{\alpha}{\text{max}(0,\gamma-\alpha)^2}$ and

$\displaystyle\lim_{q\to1}\displaystyle\frac{q}{1-q}\mathbb{P}(B \text{ wins the match})=\frac{\gamma}{\text{max}(0,\alpha-\gamma)^2}$

ASIDE: An interesting discussion on a related problem is given in Joint probability of geometric random variables. However, there the distributions are independent. Our problem can be stated similarly using the result that if,

$\vec{\mathbf{X}}\sim \text{Mult}(N, \vec{\mathbf{p}})$ where $N \sim \text{NBin}(m,s)$, then $X_i \sim \displaystyle\text{NBin}\left(m,\frac{s}{s+(1-s)p_i}\right)$

But the $X_i$'s are not independent (they would be had $N$ been a Poisson variable).

Even though we started with the assumption $\alpha<\gamma$, fortunately the final results holds for all values of $\alpha$ and $\gamma$. The symmetry of the formulae is really enchanting to me and greatly satisfied that I've solved this. Hope you enjoyed it.

Until then
Yours Aye
Me

A List of Integrals and Identities


A list the integrals and identities used in solving the problem in this post.

Throughout this post we consider $0<\alpha, \gamma,r \leq 1$ and $a, b, c, k$ is a non-negative integer.


(1) I'm not sure where I saw this. I'll update the reference if I find it. UPDATE: I found the reference. It's given in Advances in Combinatorial Methods and Applications to Probability and Statistics

$\displaystyle\sum_{a,b,c \geq 0 \\ a+b+c=n \\ a-c=k}\binom{n}{a,b,c}\alpha^a(1-\alpha-\gamma)^b\gamma^c=\frac{(\alpha/\gamma)^{k/2}}{\pi}\int_0^\pi  \cos k\theta\text{ } (1-\alpha-\gamma+2\sqrt{\alpha\gamma}\cos \theta)^n \, d\theta$


(2) This is given in Generality of algebra. Anyway, this is easy to prove if we recognize $r^k \cos kx=\text{Re}(r^k e^{ikx})$ and it's just a geometric series.

$1+r\cos x+r^2 \cos 2x+r^3 \cos 3x+\cdots=\displaystyle\frac{1-r\cos x}{1-2r\cos x+r^2}$


(3) This is just pattern recognition by using different values in Mathematica.

$\displaystyle\int \frac{1-r \cos x}{1-2r \cos x +r^2}\,dx=\frac{x}{2}+\tan^{-1}\left(\frac{1+r}{1-r}\tan\left(\frac{x}{2}\right)\right)+\text{constant}$

Interestingly,

$\displaystyle\frac{1}{\pi}\int_0^\pi \frac{1-r \cos x}{1-2r \cos x +r^2}\,dx=H[1-r]$

where $H[x]$ is the Heaviside Unit step function.


(4) I definitely saw this Wikipedia. But again I couldn't find it now. UPDATE: Found this too in List of definite integrals.

$\displaystyle\frac{1}{\pi}\int_0^\pi\frac{\cos kx}{\sec \phi-\cos x}\,dx=\frac{(\sec \phi - \tan \phi)^k}{\tan \phi}$


(5) Using $k=0$ in (4),

$\displaystyle\frac{1}{\pi}\int_0^\pi\frac{1}{\sec \phi-\cos x}\,dx=\cot\phi$


(6) Multiplying (4) by $r^k$ and summing across all $k$,

$\displaystyle\frac{1}{\pi}\int_0^\pi\frac{1+r\cos x+r^2 \cos 2x+r^3 \cos 3x+\cdots}{\sec \phi-\cos x}\,dx=\frac{\cot \phi}{1-r(\sec \phi - \tan \phi)}$



Until then
Yours Aye
Me

Thursday, April 19, 2018

A mild generalization of the Ménage problem


We all know about the classic Ménage problem which for the number of different ways in which it is possible to seat a set of male-female couples at a dining table so that men and women alternate and nobody sits next to his or her partner.

Here again we consider a similar setup except each family consists four members, a father, mother, a son and a daughter. To be clear, we seek the number of different ways in which it possible to seat a set of families at a dining table so that men and women alternate and no family end up sitting together. Note that we consider a family to be sitting together only when all members of the same family sit next to each other.

The idea here is inspired by the Non-sexist solution of the menage problem. It is easy to see that there are $2(2n)!^2$ seatings in total such that the dinner table has a male-female arrangement. That is, $2$ ways to decide which seat is for men/women; $(2n)!$ ways each for men and women.

Consider $n>1$. Let $L_n$ be the answer we are trying to find for the case of $n$ families. Then, using inclusion-exclusion, we have

$L_n=\displaystyle\sum_{k=0}^n(-1)^k\binom{n}{k}w_k$

where $w_k$ denotes the number of seatings such that a given set of $k$ families are seated together (There maybe other families sitting together as well but we don't care about them). To compute $w_k$, let's define $d_k$ which is the number of ways of covering a circle containing $4n$ equally spaced points with a curved domino that spans $4$ points.

Calculation of $d_k$'s are very easy. It is just the stars and bars problem on a circular setup. For the problem at hand, we have,

$d_k=\displaystyle\frac{4n}{4n-3k}\binom{4n-3k}{k}$

The idea for the above formulae is exactly like that of this Stack Exchange question.

Now, we compute $w_k$. Note that there are $8$ ways in which a family can be seated together with men and women alternating

If we are done choosing $k$ families, we have $2$ ways to decide on which are men's seats and which are women's. We have $d_k$ ways to seat these $k$ families in the circular table. The families can be permuted in $k!$ ways. As we have selected men's (and women's) seats, each of the $k$ families that are supposed to sit together, can have $4$ ways of being seated. The remaining $2n-2k$ men and $2n-2k$ women can be permuted in the remaining seats. In short, we have,

$w_k=2\cdot d_k \cdot 4 ^k \cdot k! \cdot (2n-2k)!^2$

Therefore,

$L_n=\displaystyle\sum_{k=0}^n(-1)^k\binom{n}{k}\cdot 2\cdot 4^k\cdot k!\cdot (2n-2k)!^2\cdot d_k$

Substituting $d_k$, and simplifying we finally end up with,

$L_n=2\cdot n!\displaystyle\sum_{k=0}^n(-4)^k\frac{4n}{4n-3k}\binom{4n-3k}{k}\frac{(2n-2k)!^2}{(n-k)!}$

And there we have our final answer.

Another interesting variation in our 'generalized menage problem' is asking for the number of ways such that no two members of the same family sit together. This seems rather complicated in that it doesn't seem to admit any closed form. Of course, this is just my opinion.

Trying to find an answer to this variation, I started thinking about arranging balls of $n$ different colors with $c[i]$ balls of color $i$. Seems DP is the best approach as discussed in AMBALLS - Editorial. But let's postpone that for a later post.

Hope you enjoyed this post. I'll write again with a nice post.


UPDATE: In the spirit of ménage numbers, if we define

$A_n=\displaystyle\sum_{k=0}^n(-4)^k\frac{4n}{4n-3k}\binom{4n-3k}{k}\frac{(2n-2k)!^2}{(n-k)!}$

then we have this monstrous recurrence for $A_n$'s.

$\begin{align}
(2 n - 3) (2 n - 5) (2 n - 7) (n - 2) (n - 3) (n - 4) (n - 5)A_n&=4 (n - 5) (n - 4) (n - 3) n (2 n - 7) (2 n - 5) (8 n^4 - 36 n^3 +  54 n^2 - 17 n - 13) A_{n - 1} + \\&
 16 (n - 5) (n - 4) (n - 3) n (2 n - 7) (16 n^5 - 144 n^4 + 472 n^3 -
    736 n^2 + 505 n - 95) A_{n - 2} - \\&
 384 (n - 5) (n - 4) n (2 n - 1) (12 n^3 - 84 n^2 + 197 n - 145) A_{
   n - 3} + \\&
 2304 (n - 5) (n - 3) n (2 n - 3) (2 n - 1) (6 n^2 - 21 n + 10) A_{
   n - 4} + \\&
 27648 (n - 4) (n - 3) (n - 1) n (2 n - 5) (2 n - 3) (2 n - 1) A_{
   n - 5}+ \\&
96 \cdot (-4)^n \cdot (2 n - 7) (2 n - 5) (2 n - 3) (2 n - 1) (4 n - 5)
\end{align}$

Obviously, I didn't find this recurrence manually. I used the amazing RISCergosum package which has a Mathematica implementation of Zeilberger's algorithm for finding holonomic recurrences. And it did the job in a matter of seconds. Amazing!!

Mathematica code:
Clear["Global`*"];
<< RISC`HolonomicFunctions`
ann = Annihilator[
   Power[-4, k] (4 n)/(4 n - 3 k)
     Binomial[4 n - 3 k, k] Factorial[2 n - 2 k]^2/
    Factorial[n - k], {S[k], S[n]}];
Takayama[ann, {k}]


Until then
Yours Aye
Me

Monday, February 19, 2018

Python Code - Gauss's Circle Problem

This code uses the algorithm discussed here.

Iterative version of the code - Python

from math import sqrt, ceil, floor
from timeit import default_timer

start = default_timer()

def matmult (A, B):
    rows_A = len(A)
    cols_A = len(A[0])
    rows_B = len(B)
    cols_B = len(B[0])

    if cols_A != rows_B:
      print("Cannot multiply the two matrices. Incorrect dimensions.")
      return

    # Create the result matrix
    # Dimensions would be rows_A x cols_B
    C = [[0 for row in range(cols_B)] for col in range(rows_A)]

    for i in range(rows_A):
        for j in range(cols_B):
            for k in range(cols_A):
                C[i][j] += A[i][k] * B[k][j]
    return C

def gauss(n):
    sn = ceil(sqrt(n))
    res = 0
    stk = [([[1, 0, 0], [0, 1, 0], [0, 0, -n]], sn, sn)]
    while stk:
        m, w, h = stk.pop()
        if w <= 0 or h <= 0:
            continue
        a, b, c, d, e, f = m[0][0], 2 * m[0][1], m[1][1], 2 * m[0][2], 2 * m[1][2], m[2][2]
        if d * d - 4 * a * f < 0 or e * e - 4 * c * f < 0:
            res += w * h
            continue
        x0, y0 = (- d + sqrt(d * d - 4 * a * f)) / 2 / a, (- e + sqrt(e * e - 4 * c * f)) / 2 / c
        xint, yint, uint, vint, flag = 0, 0, 0, 0, 0
        if floor(x0) == ceil(x0):
            xint = 1
        if floor(y0) == ceil(y0):
            yint = 1
        if w <= 1:
            res += (h - ceil(y0)) * w - yint
            continue
        if h <= 1:
            res += (w - ceil(x0)) * h - xint
            continue
        if w - x0 > 1:
            res += (w - ceil(x0)) * h - xint
            stk.append((m, ceil(x0), h))
            continue
        if h - y0 > 1:
            res += (h - ceil(y0)) * w - yint
            stk.append((m, w, ceil(y0)))
            continue
        temp = (a - b + c) * (c * d * d - b * d * e + a * e * e + b * b * f - 4 * a * c * f)
        if temp < 0:
            continue
        u = ((a - b + c) * (b * e - 2 * c * d) - (b - 2 * c) * sqrt(temp)) / (a - b + c) / (4 * a * c - b * b)
        if floor(u) == ceil(u):
            uint = 1
        if u > w:
            u, flag = w, 1
        if u < 0:
            u, flag = 0, 1
        temp = (e  + b * u) * (e + b * u) - 4 * c * (f + u * (d + a * u))
        if temp < 0:
            continue
        v = (- e - b * u + sqrt(temp)) / 2 / c
        if floor(v) == ceil(v):
            vint = 1
        if v < 0:
            v = 0
        u, v = ceil(u), ceil(v)
        if u == w and v == h:
            continue
        if uint == 1 and vint == 1 and flag == 0:
            res -= 1
        res += (w - u + h - 1 - v) * (w - u + h - v) // 2
        stk.append((matmult(matmult([[1, 0, 0], [-1, 1, 0], [h - v, v, 1]], m), [[1, -1, h - v], [0, 1, v], [0, 0, 1]]), u - (h - v), h - v))
        stk.append((matmult(matmult([[1, -1, 0], [0, 1, 0], [u, w - u, 1]], m), [[1, 0, u], [-1, 1, w - u], [0, 0, 1]]), w - u, v - (w - u)))
    res = (sn - 1) * (sn - 1) - res
    res += sn - 1
    res *= 4
    res += 1
    if sn * sn == n:
        res += 4
    return res

print(gauss(10 ** 20))
print(default_timer() - start)


Recursive version of the code - Python

from math import sqrt, ceil, floor
from timeit import default_timer

start = default_timer()

def matmult (A, B):
    rows_A = len(A)
    cols_A = len(A[0])
    rows_B = len(B)
    cols_B = len(B[0])

    if cols_A != rows_B:
      print("Cannot multiply the two matrices. Incorrect dimensions.")
      return

    # Create the result matrix
    # Dimensions would be rows_A x cols_B
    C = [[0 for row in range(cols_B)] for col in range(rows_A)]

    for i in range(rows_A):
        for j in range(cols_B):
            for k in range(cols_A):
                C[i][j] += A[i][k] * B[k][j]
    return C

def poins(m, w, h):
    res = 0
    if w <= 0 or h <= 0:
        return 0
    a, b, c, d, e, f = m[0][0], 2 * m[0][1], m[1][1], 2 * m[0][2], 2 * m[1][2], m[2][2]
    if d * d - 4 * a * f < 0 or e * e - 4 * c * f < 0:
        return w * h
    x0, y0 = (- d + sqrt(d * d - 4 * a * f)) / 2 / a, (- e + sqrt(e * e - 4 * c * f)) / 2 / c
    xint, yint, uint, vint, flag = 0, 0, 0, 0, 0
    if floor(x0) == ceil(x0):
        xint = 1
    if floor(y0) == ceil(y0):
        yint = 1
    if w <= 1:
        return (h - ceil(y0)) * w - yint
    if h <= 1:
        return (w - ceil(x0)) * h - xint
    if w - x0 > 1:
        return (w - ceil(x0)) * h - xint + poins(m, ceil(x0), h)
    if h - y0 > 1:
        return (h - ceil(y0)) * w - yint + poins(m, w, ceil(y0))
    temp = (a - b + c) * (c * d * d - b * d * e + a * e * e + b * b * f - 4 * a * c * f)
    if temp < 0:
        return 0
    u = ((a - b + c) * (b * e - 2 * c * d) - (b - 2 * c) * sqrt(temp)) / (a - b + c) / (4 * a * c - b * b)
    if floor(u) == ceil(u):
        uint = 1
    if u > w:
        u, flag = w, 1
    if u < 0:
        u, flag = 0, 1
    temp = (e  + b * u) * (e + b * u) - 4 * c * (f + u * (d + a * u))
    if temp < 0:
        return 0
    v = (- e - b * u + sqrt(temp)) / 2 / c
    if floor(v) == ceil(v):
        vint = 1
    if v < 0:
        v = 0
    if uint == 1 and vint == 1 and flag == 0:
        res -= 1
    u, v = ceil(u), ceil(v)
    if u == w and v == h:
        return 0
    res += (w - u + h - 1 - v) * (w - u + h - v) // 2
    res += poins(matmult(matmult([[1, 0, 0], [-1, 1, 0], [h - v, v, 1]], m), [[1, -1, h - v], [0, 1, v], [0, 0, 1]]), u - (h - v), h - v)
    res += poins(matmult(matmult([[1, -1, 0], [0, 1, 0], [u, w - u, 1]], m), [[1, 0, u], [-1, 1, w - u], [0, 0, 1]]), w - u, v - (w - u))
    #print(m, w, h, u, v, res)
    return res

def gauss(n):
    m = ceil(sqrt(n))
    res = poins([[1, 0, 0], [0, 1, 0], [0, 0, -n]], m, m)
    res = (m - 1) * (m - 1) - res
    res += m - 1
    res *= 4
    res += 1
    if m * m == n:
        res += 4
    return res

print(gauss(10 ** 14))

print(default_timer() - start)


Recursive version of the code


Clear["Global`*"];
Poins[m0_, w0_, h0_] :=
  Poins[m0, w0, h0] =
   Block[{$RecursionLimit = Infinity, m = m0, w = w0, h = h0, a, b, c,
      d, e, f, temp, u, v, res = 0, trans, x0, y0, Flag = 0},
    If[Or[w <= 0, h <= 0], Return[0];];
    If[w < h,
     temp = {{0, 1, 0}, {1, 0, 0}, {0, 0, 1}};
     m = Transpose[temp].m.temp;
     temp = w; w = h; h = temp;
     ];
    a = m[[1, 1]]; b = 2 m[[1, 2]]; c = m[[2, 2]]; d = 2 m[[1, 3]];
    e = 2 m[[2, 3]]; f = m[[3, 3]];
    If[Or[d*d - 4*a*f < 0, e*e - 4*c*f < 0], Return[w h];];
    x0 = (-d + Sqrt[d*d - 4*a*f])/2/a;
    y0 = (-e + Sqrt[e*e - 4*c*f])/2/c;
    If[w <= 1, Return[(h - Ceiling[y0]) w - Boole[IntegerQ[y0]]];];
    If[h <= 1, Return[(w - Ceiling[x0]) h - Boole[IntegerQ[x0]]];];
    If[w - x0 > 1,
     Return[(w - Ceiling[x0]) h - Boole[IntegerQ[x0]] +
        Poins[m, Ceiling[x0], h]];
     ];
    If[h - y0 > 1,
     Return[(h - Ceiling[y0]) w - Boole[IntegerQ[y0]] +
        Poins[m, w, Ceiling[y0]]];
     ];
    temp = (a - b + c)*(c*d*d - b*d*e + a*e*e + b*b*f - 4*a*c*f);
    If[temp < 0, Return[0];];
    u = ((a - b + c)*(b*e - 2*c*d) - (b - 2*c)*Sqrt[temp])/(a - b +
         c)/(4*a*c - b*b);
    If[u > w, u = w; Flag = 1;];
    If[u < 0, u = 0; Flag = 1;];
    temp = (e + b*u)*(e + b*u) - 4*c*(f + u*(d + a*u));
    If[temp < 0, Return[0];];
    v = (-e - b*u + Sqrt[temp])/2/c;
    If[v < 0, v = 0;, v = Ceiling[v];];
    If[And[IntegerQ[u], IntegerQ[v], Flag == 0], res -= 1;];
    u = Ceiling[u]; v = Ceiling[v];
    If[And[u == w, v == h], Return[0];];
    res += (w - u + h - 1 - v)*(w - u + h - v)/2;
    trans = {{1, -1, h - v}, {0, 1, v}, {0, 0, 1}};
    res += Poins[Transpose[trans].m.trans, u - (h - v), h - v];
    trans = {{1, 0, u}, {-1, 1, w - u}, {0, 0, 1}};
    res += Poins[Transpose[trans].m.trans, w - u, v - (w - u)];
    res
    ];
GaussCnt[n0_] := Block[{n = n0, m, res},
   m = Ceiling[Sqrt[n]];
   res = Poins[{{1, 0, 0}, {0, 1, 0}, {0, 0, -n}}, m, m];
   res = (m - 1)*(m - 1) - res;
   res += m - 1; res *= 4; res += 1;
   If[m m == n, res += 4;];
   res
   ];
Rad = Power[10, 6];
GaussCnt[Rad*Rad] // AbsoluteTiming



Note that in all the codes, the core function takes as input the square of the radius of the circle under consideration.



Until then
Yours Aye
Me

A Sub-linear algorithm for Gauss's Circle problem


I was proud of my previous two posts, I thought I would not have anything that I can share for a while. But I'm glad I found something very interesting.

I found this PDF titled A Successive approximation Algorithm for Computing the Divisor Summatory function a long time back (in OEIS I think). It has an $O(n^{1/3})$ for calculating the divisor summatory function. I have read this many times both because I find it interesting and I could not fully understand the tiny details though I got the crux of what the algo is trying to do.

I was reading it again some time last week and was on page 6. The PDF uses the properties of a Hyperbola to do the summation and the author quotes "... the hyperbolic segment has the same basic shape as the full hyperbola" and then suddenly it hit me. After all, the Hyperbola is a conic section and if it holds for a hyperbola, why not a different conic? And if that conic is an Ellipse (of which Circle is a special case), then pretty much the same algo should solve Gauss's Circle problem. Indeed it did.

As I said earlier, I could not get all the details of the original algo and I proceeded in a more natural way. In essence, let

$C(x,y)=Ax^2+Bxy+Cy^2+Dx+Ey+F=0$ be a conic section with an increasing slope in the first quadrant bounded by the lines $x=w$ and $y=h$.

Then we will the counting the lattice points such that,

$0 \leq x < w$, $0 \leq y < h$ and $C(x,y) > 0$

Consider the first quadrant of the circle as shown in the left figure below. Now we can draw an 'almost' tangent with a slope of $-1$ as shown in the right figure below. Let $(u,v)$ be the almost tangent point (shown as a black dot).

 

Now it's we can simply count the number of lying inside the triangle that don't violate our conditions in $O(1)$ time. We can then make two parallelograms such that they cover our region of interest. Something like in the figure below.


Now comes the beautiful part. Let's concentrate on the parallelogram above. The reasoning will be almost similar to the other parallelogram except for a tiny detail.

We can transform the co-ordinate system now such that the lower left corner of the upper parallelogram becomes the origin. In addition to this, the parallelogram can be sheared horizontally such that the parallelogram becomes a rectangle. Simply put, we are trying to do the following:





Here the first figure, which is the upper parallelogram, after shifting the origin and shearing becomes the second figure. Note that the number of lattice points inside both the figures remain the same after this operation. This fact can be understood by a bit of linear algebra but let's not get into the details.

Now we are in the same situation as before. We have a conic, a bounding rectangle and we have to count the points inside the rectangle and outside the conic. This essentially forms the crux of the recursion.

The almost tangent point can be found as the point of intersection of the conic and the tangent (with a slope of $-1$) equation of the same conic. Mathematica solves the symbolic equation in the blink of an eye.

The 'exact' point of intersection $(u, v)$ is found to be,

$u_e=\displaystyle\frac{(A-B+C)(BE-2CD)-(B-2C)\sqrt{\Delta_u}}{(A-B+C)(4AC-B^2)}$ and $v_e=\displaystyle\frac{-C-Bu+\sqrt\Delta_v}{2C}$

where $\Delta_u=(A - B + C) (CD^2 - BDE + AE^2 + B^2F - 4ACF)$ and $\Delta_v=(E+Bu)^2 - 4C(F + u (D + Au))$

The almost tangent point can be found by setting $u=\lceil u_e \rceil$ and $v=\lceil v_e \rceil$.

Now the transformation. These can be attained beautifully with matrices. A Conic can be represented as a matrix in the following way.

$C=\mathbf{x}^T A_Q\mathbf{x}$

where the matrix $A_Q$ is given in Matrix representation of Conic Sections. Now the 'shifting and shearing' operations can be done using the following two matrices.

$X=\begin{pmatrix}1 & -1 & h-v\\0 & 1 & v \\0&0&1\end{pmatrix}$ and $Y=\begin{pmatrix}1 & 0 & u\\-1 & 1 & w-u \\0&0&1\end{pmatrix}$

Now the matrix $X^TA_QX$ gives the conic after we have translated the origin to $(u, v)$ and sheared it horizontally and vice versa for the vertical shearing using matrix $Y$.

Finally, though we start with a simple conic, the conics that subsequently results in further recursion are a little tricky and there are a number of cases that have to dealt with carefully. A better programmer could've done it in a day or two, but it took me almost ten days to finally complete the code. You can find the recursive and the iterative version of the Python code in this post.

Assuming no precision issues, the iterative version of the code finds $N(10^{9})=3141592653590319313$ in about 5 mins and $N(10^{10})=314159265359046103545$ in about 20 mins. The timing could be halved by taking the eight-fold symmetry of a circle but I didn't do it yet.

Hope you enjoyed it. See ya in the next post.


Until then
Yours Aye
Me

Sunday, January 21, 2018

Complicating a simple Probability Problem II


Hi all, we complicated a simple problem in Probability in the previous post. Here again, we do the same thing.

The setup is this. Consider the unit interval $(0, 1)$. We start at the origin and choose $n$ random points in the interval that lies to the right. We now choose the $k$th point nearest to the origin and move there. We again choose $n$ points in the interval that lies to the right and keep moving to the right until we get inside an interval $[0, x)$ for a given $x$.

Let $\mathbb{E}_{n,k}(x)$ be the expected number of times we have to generate the numbers (or in other words, the moves we have to make). So as you may have guessed, this post about finding this number.

Okay, the procedure pretty much follows what we saw the previous time. It gets all messy and I'm not even sure how to present it. So I'm directly giving the result.

Let $P_{n,k}(z)$ denote the characteristic equation. We have,

$P_{n,k}(z)=\displaystyle\frac{z^\underline{n}}{n!}-(-1)^{n-k}\sum_{j=0}^{n-k}(-1)^j\binom{n-1-j}{k-1}\frac{z^\underline{j}}{j!}$

where $z^\underline{n}$ is the falling factorial.

Let $\eta_1,\eta_2,\cdots,\eta_n$ be the $n$ roots of the equation $P_{n,k}(z)=0$. Then we have,

$\begin{align}
\mathbb{E}_{n,k}(x)=\displaystyle\frac{H_n-\log{x}}{H_n-H_{n-k}}&+\frac{x^n}{n!}\sum_{r=1}^n\frac{x^{-\eta_r}\eta_r^\underline{n}}{\eta_rP'_{n,k}(z)}\\
&-\displaystyle\frac{x^nn^{n-1}}{n!(H_n-H_{n-k})}\sum_{r=1}^n\sum_{i=0}^{n-1}\sum_{j=0}^i\frac{(n-i)x^{-\eta_r}\eta_r^\underline{n}\eta_r^{j-1}a_{i-j}}{n^{i}P'_{n,k}(\eta_r)}
\end{align}$

where $a_m=[z^{n-m}]P_{n,k}(z)$ and $H_n$ is the $n$th Harmonic number.

Using this, we can see that

$\mathbb{E}_{n,1}(x)=1-n\log{x}$

$\mathbb{E}_{n,2}(x)=1+\displaystyle\frac{n(n-1)}{(2n-1)^2}(x^{2n-1}-1)-\frac{n(n-1)\log{x}}{2n-1}$

Considering the number of identities involving the falling factorials, there seems to enough room for simplifying the above formula but am not able to see things clearly. I think adding an animation would make it more easy to see what the 'procedure' means and am on it. I'll update it as soon as possible.


Until then
Yours Aye
Me

Friday, January 12, 2018

Complicating a simple Probability problem


This problem is all about taking a simple problem in probability and complicating it.

Consider the problem of finding the expected number of uniform variables between $0$ and $1$ needed to get a sum greater than $1$. This problem is well studied and surprisingly the answer turns out to be $e$, Euler's number.

The derivation goes something like this. Starting with $0$, we keep generating Uniform random variable $X \sim U(0,1)$ and adding them until we exceed $1$. Let $\mathbb{E}(x)$ be the expected number of random variables needed when we start from $x$.

It's easy to that $\mathbb{E}(x)=0$ for $x>1$ and $E(1)=1$, both by problem definition.

Now it's easy to write the following equation.

$\mathbb{E}(x)=1+\displaystyle\int\limits_0^1\mathbb{E}(x+t)\,dt$

Now let $y=x+t$. The only thing to take care is when $t=1$, $y=1+x$. But we already know that expected value is $0$ for a value exceeding $1$. Hence, we can only consider the integral with $1$ as the upper limit.

$\mathbb{E}(x)=1+\displaystyle\int\limits_x^1\mathbb{E}(y)\,dy$

For reasons that'll be apparent later in this post, let's define a family of functions $f_k(z)$ such that $f_k^{(k)}(z)=D^k f_k(z)=\mathbb{E}(z)$ and $f_k(1)=0$, $k > 0$. Here the bracketed exponents denotes differentiation and $D$ denotes the differentiation operator.

Using this definition, the equation above becomes,

$\begin{align}
f_1'(x)&=1+\displaystyle\int\limits_x^1\,df_1\\
&=1+f_1(1)-f_1(x)\\
&=1-f_1(x)
\end{align}$

Therefore, we end up with the differential equation $f'_1(x)+f_1(x)=1$.

Solving this with Wolfram alpha, we get $f_1(x)=1+c_1e^{-x}$.

But $\mathbb{E}(x)=f_1'(x)=-c_1 e^{-x}$. Using the boundary condition, $\mathbb{E}(1)=1$, we get $\mathbb{E}(x)=e^{1-x}$ and hence $\mathbb{E}(0)=e$.             $\blacksquare$

Now we make a slight modification. Instead of just generating one random number each time, what if we generate a bunch of $n$ uniform random numbers and choose the minimum of those to be the addend. Now we seek the expected number of random numbers added. We know that the pdf of the minimum of $n$ numbers is,

$pdf(z)=n(1-z)^{n-1}$, $0 \leq z \leq 1$

Therefore the integral equation we start with becomes,

$\mathbb{E}_{n,1}(x)=1+\displaystyle\int\limits_0^1\mathbb{E}_{n,1}(x+t)n(1-t)^{n-1}\,dt$

Note that $\mathbb{E}_{n,1}(1)=1$. Using $y=x+t$,

$\displaystyle\mathbb{E}_{n,1}(x)=1+n\int\limits_x^1\mathbb{E}_{n,1}(y)(1+x-y)^{n-1}\,dy$

Using the $f$ functions, we can transform the equation to,

$f_1'(x)=1-n f_1(x)+n(n-1)\int\limits_x^1f_1(y)(1+x-y)^{n-2}\,dy$

Now this can be repeatedly applied until the exponents inside the integral becomes $0$. We'll finally end-up with,

$\displaystyle \frac{f_n^{(n)}(x)}{n!}+\frac{f_n^{(n-1)}(x)}{(n-1)!}+\frac{f_n^{(n-2)}(x)}{(n-2)!}+\cdots+f_n(x)=\frac{1}{n!}$

Or using the differential operator,

$\displaystyle \left(\frac{D^n}{n!}+\frac{D^{n-1}}{(n-1)!}+\frac{D^{n-2}}{(n-2)!}+\cdots+1\right)f_n(x)=\frac{1}{n!}$

The characteristic equation of the differential equation is the partial exponential sum function. Let $P_{n,1}(m)$ denote this characteristic of equation and let $\eta_1,\eta_2,\cdots,\eta_n$ be the roots. That is,

$P_{n,1}(z)=\displaystyle 1+z+\frac{z^2}{2!}+\frac{z^3}{3!}+\cdots+\frac{z^n}{n!}$

Note that $P'_{n,1}(z)=P_{n-1,1}(z)$. Also, as

$P_{n,1}(z)=(z-\eta_1)(z-\eta_2)(z-\eta_3)\cdots(z-\eta_n)$

we have, $P'_{n,1}(\eta_i)=\displaystyle\prod_{i \ne j,1 \leq j \leq n}(\eta_i-\eta_j)=P_{n-1,1}(\eta_i)$

Back to the differential equation, $f_n(x)=\frac{1}{n!} + c_1 e^{\eta_1x}+ c_2 e^{\eta_2x}+ c_3 e^{\eta_3x}+\cdots+ c_n e^{\eta_nx}$

$f_n^{(k)}(x)=c_1 \eta_1^ke^{\eta_1x}+ c_2 \eta_2^ke^{\eta_2x}+ c_3\eta_3^k e^{\eta_3x}+\cdots+ c_n \eta_n^ke^{\eta_nx}$, $0\leq k\leq n$

Before solving for the constants, by this time we should realize that the constant in the RHS of the starting integral equation doesn't matter as it'll be taken care by the boundary conditions. The other constants can be solved by the conditions we enforced on the $f$ functions.

$f_n^{(k)}(1) = 0$ for $k<n$ and $f_n^{(n)}(1)=\mathbb{E}(1)=1$.

The post is getting unusually long. So I'll cut short the details. Solving for the constants using determinants, we'll have,

$\mathbb{E}_{n,1}(x)=\displaystyle\frac{1}{n!}\sum_{i=1}^n \frac{e^{-\eta_i(1-x)}\eta_i^{n-1}}{P_{n-1,1}(\eta_i)}$ and hence $\mathbb{E}_{n,1}(0)=\displaystyle\frac{1}{n!}\sum_{i=1}^n \frac{e^{-\eta_i}\eta_i^{n-1}}{P_{n-1,1}(\eta_i)}$       $\blacksquare$

Now for the final complication, what if we generate a bunch of $n$ random uniform numbers, sort them and choose the $k$th smallest number as the addend? Let $\mathbb{E}_{n,k}(x)$ be the expected value. The only thing of importance is the characteristic equation denoted as $P_{n,k}(z)$.

$P_{n,k}(z)=\displaystyle \frac{z^n}{n!}-(-1)^k \sum_{j=0}^{n-k}\binom{n-1-j}{k-1}\frac{z^j}{j!}$

That is, $P_{n,k}(D)f_n(x)=\frac{1}{n!}$

Note that still,  $P'_{n,k}(z)=P_{n-1,k}(z)$ and $P'_{n,k}(\eta_i)=\displaystyle\prod_{i \ne j,1 \leq j \leq n}(\eta_i-\eta_j)=P_{n-1,k}(\eta_i)$

And the same analysis shows,

$\mathbb{E}_{n,k}(x)=\displaystyle\frac{1}{n!}\sum_{i=1}^n \frac{e^{-\eta_i(1-x)}\eta_i^{n-1}}{P_{n-1,k}(\eta_i)}$ and hence $\mathbb{E}_{n,k}(0)=\displaystyle\frac{1}{n!}\sum_{i=1}^n \frac{e^{-\eta_i}\eta_i^{n-1}}{P_{n-1,k}(\eta_i)}$

One of the beautiful things to note here is that the nature of the final formula gives a interpretation via the contour integral. That is,

$\mathbb{E}_{n,k}(x)=\displaystyle\frac{1}{n!}\sum_{i=1}^n \frac{e^{-\eta_i(1-x)}\eta_i^{n-1}}{P_{n-1,k}(\eta_i)}=\frac{1}{2\pi i n!}\oint_{\Gamma}\frac{e^{-z(1-x)}z^{n-1}}{P_{n,k}(z)}\,dz$

where the contour $\Gamma$ is large enough to encompass all the zeroes of $P_{n,k}(z)$.

Using these I was able to find,

$\mathbb{E}_{1,1}(0)=e$, $\mathbb{E}_{2,1}(0)=e(\cos 1 + \sin 1)$ and $\mathbb{E}_{2,2}(0)=\cosh \sqrt 2$,

The other characteristic equations does not appear to have closed form solutions. But the next simpler things are $\mathbb{E}_{n,n}(0)$.

$\mathbb{E}_{n,n}(0)={}_0F_{n-1}(;1/n,2/n,\cdots,(n-1)/n;n^n/n!)$

where ${}_pF_q(z)$ is the generalized hyper-geometric function.

Solving the characteristic equations is the real trouble and I tried to find some simpler ways to do that without success. However I was able to establish,

$\mathbb{E}_{n,k}(0)\approx \displaystyle \frac{n+1}{k}\left(1+\frac{1}{2}\frac{k+1}{n+2}\right)$

This approximation is very good even when $n$ and $k$ are far apart.

I tried to cover the derivations too but the post is getting longer. Maybe I'll post them separately and break this into shorter ones.


Until then,
Yours Aye
Me

Sunday, January 7, 2018

A Note on Conditional Expectations


This post is about a very small problem on conditional expectations but which got me thinking for a couple of days. Eventually I realized how I never really fully understood them.

The problems is this: Let $X, Y$ be two random variables uniformly distributed between $0$ and $1$. Find $\mathbb{E}(X\vert X+Y \leq 1)$.

I was discussing with a friend and he even tried to write the Bayes formula for exceptions. Only when I started thinking about this I realized, how may people, including me, don't see the absurdity of trying to find Expectation of events and Probability of variables.

For example, $\mathbb{E}(X+Y \leq 1)$ doesn't make any sense at all. Neither does $\mathbb{P}(X)$. Both of them completely nonsensical. For the first one to make sense, it would be more appropriate to talk about $\mathbb{E}(I_{X+Y \leq 1})$ where $I$ is an indicator random variable. Similarly, $\mathbb{P}(0.25\leq X \leq 0.5)$ is valid expression.

Let $X$ be a variable and $Z$ be an event, something like $X+Y\leq1$ or $X \geq Y$. Let $I_Z$ be the indicator random variable of the event $Z$. Now, conditioning on $Z$,

$\begin{align}\mathbb{E}(X\cdot I_z)&=\mathbb{P}(Z)\mathbb{E}(X\cdot I_z\vert Z)+\mathbb{P}(Z^c)\mathbb{E}(X\cdot I_z \vert Z^c)\\
&=\mathbb{P}(Z)\mathbb{E}(X\cdot 1\vert Z)+\mathbb{P}(Z^c)\mathbb{E}(X\cdot 0\vert Z^c)\\
&=\mathbb{P}(Z)\mathbb{E}(X\vert Z)\\
\end{align}$

On the other hand, conditioning on $X$,

$\begin{align}\mathbb{E}(X\cdot I_z)&=\mathbb{E}(\mathbb{E}(X\cdot I_z \vert X))\\
&=\mathbb{E}(X\cdot \mathbb{E}(I_z\vert X))\\
&=\mathbb{E}(X\cdot \mathbb{P}(Z\vert X))
\end{align}$

Equating the two, we finally get,

$\mathbb{E}(X\vert Z)=\displaystyle\frac{\mathbb{E}(X\cdot \mathbb{P}(Z\vert X))}{\mathbb{P}(Z)}$

Now this makes the problem tractable, as all the terms makes perfect sense provided we understand that $X$ is a variable and $Z$ is an event (which could even be something like $X=3$ in the discrete case).

An alternate way of deriving the same result will be like the following. Assume $X$ is a continuous random variable. The reasoning will be very similar if $X$ is discrete.

$\begin{align}
\mathbb{E}(X\vert Z)&=\int x f_{X\vert Z}(x)\,dx\\
&=\int x \text{ }\frac{\mathbb{P}(Z\vert X)f_{X}(x)}{\mathbb{P}(Z)}\,dx=\displaystyle\frac{1}{\mathbb{P}(Z)}\int x\text{ } \mathbb{P}(Z\vert X)f_{X}(x)\,dx\\
&=\displaystyle\frac{\mathbb{E}(X\cdot \mathbb{P}(Z\vert X))}{\mathbb{P}(Z)}
\end{align}$

However, the first way includes a lot of tiny beautiful details like the conditioning on a random variable, law of total expectation, and most importantly it gives a chance to understand that,

$\mathbb{E}(X\cdot I_z\vert I_z=1)\ne\mathbb{E}(X)$ in general. Rather when the dependence between $X$ and $Z$ is not clear or not given, it is always better to state the relation as,

$\mathbb{E}(X\cdot I_z\vert I_z=1)=\mathbb{E}(X\vert Z)$

That's what I wanted to convey in this post because I had a very difficult time on how to solve that conditional expectation. Hope you enjoyed this.


Until then
Yours Aye,
Me

Thursday, January 4, 2018

Visualizing Conformal Mapping with Mathematica


Hi everyone, My first post in 2018..

Well, this post does not have any mathematics. Rather, this post is about, as the title says, visualizing conformal mapping. I recently came across an amazing Youtube Channel, 3Blue1Brown. If you haven't seen it, you should definitely check it out.

Most of the videos are pretty amazing. If you think Mathematics is beautiful with just equations, then visualizing Math is possibly the only thing better than Math itself (in my humble opinion).

Especially the video Visualizing the Riemann Zeta function is outstanding, simply the best video I've seen on Youtube, I think. Ever since I saw that, I wanted to create those animations. A simple Google search led me the fact that Grant Sanderson, creator of the Channel, uses his own Python module manim to create these animations.

I'm just a beginner in Python and using a specialized module is far from my relationship with Python. So I turned to the next best thing at my disposal, Mathematica. It took me about four evenings to get everything right to get an Animation and save it as GIF using Mathematica. But now after seeing those animations, I feel like it's really worth it.

So here's one of my first animations. The Inverse of a complex number.



Pretty neat, don't you think?

Now for the Zeta function itself. Mathematica takes about 5 mins to create the following animation.



Look at that.. So beautiful...

Moreover, looking at the two mappings we can get to see how the Zeta function has a simple pole at 1. Both graphs look almost the same at the final stage except that the Zeta function is translated by a unit distance along the real axis.

I also created one with a darker background but it doesn't look all that nice. The different colors are not by choice. In fact I'm not sure how to make all the curves along the real axis look the same color. Hope you would let me know in case you are aware of that.

Now for the code.

Clear["Global`*"];
f[z_] := Zeta[z];
Lim = 5;
DiscreteXValues =
 Join[Range[-Lim, -2], Table[j/10, {j, -15, 15}],
  Range[2, Lim]]; ContinuousYValues = {-Lim, Lim};
DiscreteYValues =
 Join[Range[-Lim, -2], Table[j/10, {j, -15, 15}],
  Range[2, Lim]]; ContinuousXValues = {-Lim, Lim};
RangeofthePlot = {{-Lim, Lim}, {-Lim, Lim}};
slides = Table[
    Show[ParametricPlot[
      Evaluate[
       Table[{XVaries (1 - t) + t Re[f[XVaries + I k]],
         k (1 - t) + t Im[f[XVaries + I k]]}, {k,
         DiscreteYValues}]], {XVaries, First[ContinuousXValues],
       Last[ContinuousXValues]}],
     ParametricPlot[
      Evaluate[
       Table[{k (1 - t) + t Re[f[k + I YVaries]],
         YVaries (1 - t) + t Im[f[k + I YVaries]]}, {k,
         DiscreteXValues}]], {YVaries, First[ContinuousYValues],
       Last[ContinuousYValues]}], PlotRange -> RangeofthePlot,
     AxesOrigin -> {0, 0}, GridLines -> Automatic], {t, 0, 1,
     1/40}]; // AbsoluteTiming
ListAnimate[slides, AnimationDirection -> ForwardBackward,
 AnimationRunning -> True, AnimationRate -> 8]
(*Export["ZetaFunction.gif",slides,ConversionRules\[Rule]{\
"ControlAppearance"\[Rule]None,"DisplayDurations"\[Rule]1/5}]*)
Export["ZetaFunction.gif",
 Flatten[{slides,
   Table[slides[[i]], {i, Length[slides] - 1, 2, -1}]}],
 "ControlAppearance" -> None, "DisplayDurations" -> 1/8]

After several iterations, I ended up with the above. It has options to control how many grid lines you want in each direction, and for each of them how long you wanna plot in the perpendicular direction. Play with DiscreteXValues, DiscreteYValues, ContinuousXValues and ContinuousYValues.

And before I finish, while we are on the 3Blue1Brown, his video on Pi hiding in prime irregularities, inspired me to find the following two 'non-obvious' results.

$\displaystyle \frac{\sqrt{2}\pi}{5}=\left\vert \frac{1}{1} + \frac{i}{2}- \frac{i}{3} - \frac{1}{4} + \frac{1}{5} + \frac{i}{6} - \frac{i}{7} - \frac{1}{8} + \cdots\right\vert$

$\displaystyle \frac{\pi}{\sqrt{10}}=\left\vert \frac{1}{1} + \frac{i}{3}- \frac{i}{7} - \frac{1}{9} + \frac{1}{11} + \frac{i}{13} - \frac{i}{17} - \frac{1}{19} + \cdots\right\vert$

Hope you enjoyed them.



Until then,
Yours Aye
Me.