Monday, February 19, 2018

A Sub-linear algorithm for Gauss's Circle problem


I was proud of my previous two posts, I thought I would not have anything that I can share for a while. But I'm glad I found something very interesting.

I found this PDF titled A Successive approximation Algorithm for Computing the Divisor Summatory function a long time back (in OEIS I think). It has an $O(n^{1/3})$ for calculating the divisor summatory function. I have read this many times both because I find it interesting and I could not fully understand the tiny details though I got the crux of what the algo is trying to do.

I was reading it again some time last week and was on page 6. The PDF uses the properties of a Hyperbola to do the summation and the author quotes "... the hyperbolic segment has the same basic shape as the full hyperbola" and then suddenly it hit me. After all, the Hyperbola is a conic section and if it holds for a hyperbola, why not a different conic? And if that conic is an Ellipse (of which Circle is a special case), then pretty much the same algo should solve Gauss's Circle problem. Indeed it did.

As I said earlier, I could not get all the details of the original algo and I proceeded in a more natural way. In essence, let

$C(x,y)=Ax^2+Bxy+Cy^2+Dx+Ey+F=0$ be a conic section with an increasing slope in the first quadrant bounded by the lines $x=w$ and $y=h$.

Then we will the counting the lattice points such that,

$0 \leq x < w$, $0 \leq y < h$ and $C(x,y) > 0$

Consider the first quadrant of the circle as shown in the left figure below. Now we can draw an 'almost' tangent with a slope of $-1$ as shown in the right figure below. Let $(u,v)$ be the almost tangent point (shown as a black dot).

 

Now it's we can simply count the number of lying inside the triangle that don't violate our conditions in $O(1)$ time. We can then make two parallelograms such that they cover our region of interest. Something like in the figure below.


Now comes the beautiful part. Let's concentrate on the parallelogram above. The reasoning will be almost similar to the other parallelogram except for a tiny detail.

We can transform the co-ordinate system now such that the lower left corner of the upper parallelogram becomes the origin. In addition to this, the parallelogram can be sheared horizontally such that the parallelogram becomes a rectangle. Simply put, we are trying to do the following:





Here the first figure, which is the upper parallelogram, after shifting the origin and shearing becomes the second figure. Note that the number of lattice points inside both the figures remain the same after this operation. This fact can be understood by a bit of linear algebra but let's not get into the details.

Now we are in the same situation as before. We have a conic, a bounding rectangle and we have to count the points inside the rectangle and outside the conic. This essentially forms the crux of the recursion.

The almost tangent point can be found as the point of intersection of the conic and the tangent (with a slope of $-1$) equation of the same conic. Mathematica solves the symbolic equation in the blink of an eye.

The 'exact' point of intersection $(u, v)$ is found to be,

$u_e=\displaystyle\frac{(A-B+C)(BE-2CD)-(B-2C)\sqrt{\Delta_u}}{(A-B+C)(4AC-B^2)}$ and $v_e=\displaystyle\frac{-C-Bu+\sqrt\Delta_v}{2C}$

where $\Delta_u=(A - B + C) (CD^2 - BDE + AE^2 + B^2F - 4ACF)$ and $\Delta_v=(E+Bu)^2 - 4C(F + u (D + Au))$

The almost tangent point can be found by setting $u=\lceil u_e \rceil$ and $v=\lceil v_e \rceil$.

Now the transformation. These can be attained beautifully with matrices. A Conic can be represented as a matrix in the following way.

$C=\mathbf{x}^T A_Q\mathbf{x}$

where the matrix $A_Q$ is given in Matrix representation of Conic Sections. Now the 'shifting and shearing' operations can be done using the following two matrices.

$X=\begin{pmatrix}1 & -1 & h-v\\0 & 1 & v \\0&0&1\end{pmatrix}$ and $Y=\begin{pmatrix}1 & 0 & u\\-1 & 1 & w-u \\0&0&1\end{pmatrix}$

Now the matrix $X^TA_QX$ gives the conic after we have translated the origin to $(u, v)$ and sheared it horizontally and vice versa for the vertical shearing using matrix $Y$.

Finally, though we start with a simple conic, the conics that subsequently results in further recursion are a little tricky and there are a number of cases that have to dealt with carefully. A better programmer could've done it in a day or two, but it took me almost ten days to finally complete the code. You can find the recursive and the iterative version of the Python code in this post.

Assuming no precision issues, the iterative version of the code finds $N(10^{9})=3141592653590319313$ in about 5 mins and $N(10^{10})=314159265359046103545$ in about 20 mins. The timing could be halved by taking the eight-fold symmetry of a circle but I didn't do it yet.

Hope you enjoyed it. See ya in the next post.


Until then
Yours Aye
Me

6 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. I stumbled upon your blog and found this algorithm very interesting. I implemented my own version in C++ and noticed that your benchmark numbers given above are incorrect.I checked that your Mathematica code posted in the other page yields exactly the same number as mine. The algorithm uses only integer arithmetic with 256bit integer. N(10^9) = 3141592653589764829 and N(10^10)=314159265358978759661. It takes 26 seconds to obtain N(10^10).

    ReplyDelete
    Replies
    1. Thanks @Sung.. My laptop is old and maybe that caused the immense difference in runtime.. Anyway, glad you liked the post..

      Delete
    2. I implemented similar algorithm and also got the same numbers for N(10^9) and N(10^10). My implementation uses int128 and takes 4 seconds to compute N(10^12)=3141592653589793234680617
      The complexity of my variant is O(N^2/3) and I think it's a lower bound (up to a constant) for any approach with the same idea.

      Delete
    3. Thanks @Vlad Isenbaev for trying this. Glad your variant is a lot faster. Can you share the key differences in your variant? Thanks.

      Delete
    4. I think it's mainly three key differences:
      1) C++ implementation with integer arithmetic (__int128_t specifically)
      2) Instead of transforming the curve itself, which is quite expensive (2 3x3 matrix products) and requires higher precision, I keep and transform the lattice itself (just a couple of vector additions)
      3) early stop the recursion and count the points directly when lattice is smaller than some fixed size (128x128 for my hardware was the optimal choice)

      Delete