\documentclass[12pt]{article}
\usepackage{scribe}
\usepackage[english]{babel}

\Scribe{Purushottam Kar}
\Lecturer{Manindra Agrawal}
\LectureNumber{1 \& 2}
\LectureDate{July~30, 2009}
\LectureTitle{Integer and Modular Arithmetic}

\bibliographystyle{alpha}

\begin{document}

\MakeScribeTop

\section{Integer Arithmetic}
Efficient recipes for performing integer arithmetic are indispensable as they are widely used in several algorithms in diverse areas such as cryptology, computer graphics and other engineering areas. Hence our first object of study would be the most basic integer operations - namely addition, subtraction, multiplication and division. We will start off with algorithms that are typically referred to as ``high-school'' or ``peasant'' algorithms and move on to more efficient ones wherever scope for improvement is found.

\subsection{Integer Addition and Subtraction}
Given two $n$-bit numbers $a$ and $b$, the high-school algorithms take $\O{n}$ time to perform addition and subtraction. Now this is optimal upto constant factors since the input size itself is $\Omega(n)$. Hence these do not pose a challenge to us. Of course for someone designing a processor, these constant factors may decide whether the chip becomes a market leader or not but we shall consider the issue of integer addition and subtraction closed from now on having obtained optimal algorithms for the same (many thanks to our respective high school Math teachers for this) .

\subsection{Integer Multiplication}
Unfortunately we have less to thank our high school teachers on this matter since the high school algorithm takes $\O{n^2}$ time to multiply two $n$-bit numbers. This is too slow for most scientific calculations. A slight improvement comes due to an observation of Gauss on multiplication of complex numbers.

Gauss observed that the product $(a + bi)(c + di)$ can be calculated using just three multiplications and five additions instead of four multiplications and four additions. Let $k_1 = ac, k_2 = bd, k_3 = (a+b)(c+d)$. Then $(a + bi)(c + di) = (k_1 - k_2) + (k_3 - k_1 - k_2)i$. This was the trick employed by Karatsuba and Ofman in \cite{karatsuba} to arrive at an $\O{n^{\log_23}}$ algorithm for integer multiplication.

What is done in this algorithm is to express each $n$-bit number using two $\frac{n}{2}$-bit numbers. Thus $a = a_12^{\frac{n}{2}} + a_0$ and $b = b_12^{\frac{n}{2}} + b_0$. Hence we can use the above trick to multiply these two numbers using three $\frac{n}{2}$-bit multiplications and some constant number of $\O{n}$ operations which include adding the intermediate results, left shifts etc. So the time complexity of this algorithm is $M(n) = 3M\left( \frac{n}{2} \right) + \O{n}$ which gives $M(n) = \O{n^{\log_23}}$.

This method of breaking up numbers can be generalized to break numbers into more parts and save on some multiplications using similar tricks. This led to a family of multiplication algorithms known as Toom-Cook methods \cite{toom,cook} which give an asymptotic time complexity of $\O{n^{\frac{\log(2k-1)}{\log(k)}}}$ for any $k > 1$ and hence can give a complexity of $\O{n^{1+\eps}}$ for any $\eps > 0$.

This was improved by Sch\"onage and Strassen using Spectral techniques (more specifically Fourier Analysis) to $\O{n\log n\log\log n}$ in \cite{schonage-strassen}. After this came a long period of waiting before F\"urer and De-Kurur-Saha-Saptharishi in a series of closely spaced papers \cite{furer,de-kurur-saha-saptharishi} improved the time complexity to $\O{n\log n\ 2^{\log^* n}}$ where $\log^*$ denotes the iterated $\log$ function.

Of course one does not typically use the ``fastest'' method on all instances. As it turns out the algorithms by F\"urer and De \textit{et al} outperform the Sch\"onage-Strassen algorithm in practice only on astronomically large numbers due to the large constants involved. Thus for really huge numbers, one uses the algorithms due to F\"urer and De \textit{et al} at the top level but as the recursive calls are made to smaller numbers, one switches to the Sch\"onage-Strassen algorithm only to switch to the Karatsuba algorithm for still smaller numbers and finally to the high school algorithm for the bottom level recursive calls.

\subsection{Integer Division}
As it turns out, if asymptotic complexity is all that we are interested in then integer division has the same time complexity as integer multiplication. In fact the operations of multiplication, squaring, division, inversion and square-rooting all have the same time complexity. However in this lecture our aim is only to show that division is no more difficult than multiplication. For this we require the Newton-Raphson method for inversion.

This method is used in general to find roots of a real valued function $f$ by starting with a well-informed guess $x_0$ for the root and improving it iteratively using the update rule $x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)}$. The method typically has a quadratic convergence but is also known to go astray at times. But in our case we will show that it approaches its goal in a steadfast manner. We shall of course use a slightly modified version of the algorithm keeping in mind not to keep very high precision numbers. Note that all logarithms are taken to base $2$ unless otherwise mentioned.

Let us take the two $n$-bit integers to be divided as $a$ and $b$ where $b < a$. We want to find integers $d$ and $r$ such that $a = bd + r$ and $r < b$. We will always work only with positive integers for simplicity. For now let us assume that $b$ has been scaled down to the range $\left( \frac{1}{2}, 1\right)$ by dividing by an appropriate power or $2$. For this assume $b >2$. The case $b =2$ is clearly very simple to handle. From now on, till otherwise mentioned, we shall call this scaled down $n$-bit number as $b$. Our goal here is to find a good approximation of $\frac{1}{b}$. Note that $\frac{1}{b} \in (1,2)$.

Let us choose $x_0 = 1$ as our first approximation. Clearly $x_0 < \frac{1}{b}$ and $1 - bx_0 < \frac{1}{2}$. Truncate $1 - bx_0$ to the first bit after the decimal point and call this quantity $y_0$. Clearly $y_0 < \frac{1}{2}$ as well. Define $y_i = y_{i-1}^2$ and $x_{i+1} = x_i(1 + y_i)$ for all $i > 0$. Consider the following results :

\begin{lemma}
For any $i > 0$, we have the following to be true
\begin{enumerate}
\item $y_i < \frac{1}{2^{2^i}}$
\item $1 - bx_i < \frac{1}{2^{2^i}}$
\item $y_i$ is at most $2^i$ bits long
\item $x_i$ is at most $2^i$ bits long
\end{enumerate}
\end{lemma}
\begin{proof}
We present the respective proofs below. All proofs are by induction on $i$.
\begin{enumerate}
\item For $i = 0$ we have already shown the result. Suppose the result also holds upto some $i$, then we have $y_{i+1} = y_i^2 < \frac{1}{2^{2^{i+1}}}$.
\item For $i = 0$ we have already shown the result. Suppose the result also holds upto some $i$, then we have $1 - bx_{i+1} = 1 - bx_i(1 + y_i) = 1 - bx_i - bx_iy_i < \frac{1}{2^{2^i}} + \left( \frac{1}{2^{2^i}} - 1 \right)y_i < \frac{1}{2^{2^i}} + \left( \frac{1}{2^{2^i}} - 1 \right)\frac{1}{2^{2^i}} = \frac{1}{2^{2^{i+1}}}$.
\item For $i = 0$ this is true because of choice of $y_0$. Suppose the result also holds upto some $i$, then we have $y_{i+1} = y_i^2$ and hence the number of bits in $y_{i+1}$ can be atmost twice that of $y_i$. Hence the result holds.
\item For $i = 0$ this is true by choice of $x_0$. Suppose the result also holds upto some $i$, then we have $x_{i+1} = x_i(1 + y_i)$. Since $y_i < 1$ and both $x_i$ and $y_i$ are at most $2^i$ bits long, the result holds.
\end{enumerate}
\end{proof}

At this point we fix the issue of scaling $b$ to within the range $\left( \frac{1}{2},1 \right)$. Let $m > 0$ such that $\widetilde{b} = \frac{b}{2^m} \in \left[\frac{1}{2},1\right)$. Then it is clear that if $x$ is such that $1 - \widetilde{b}x < \epsilon$ then if $\widetilde{x} = \frac{x}{2^m}$ then $1 - b\widetilde{x} < \epsilon$ as well. Now we state a final lemma before moving on to prove our desired result.

\begin{lemma}
Fix $i = \log n + 1$. Then if $a = bd + r$ then $\|a\widetilde{x_i} - d| \leq 1$.
\end{lemma}
\begin{proof}
Clearly $\|a\widetilde{x_i} - d\| = \|d(b\widetilde{x_i} - 1) + r\widetilde{x_i}\| < \frac{d}{2^{2^{i}}} + \frac{r}{b} \leq \frac{d}{2^{2^{i}}} + 1 - \frac{1}{b} \leq 1$ since $r < b$, $\log b \leq n$ and $\log d \leq n$.
\end{proof}

The preceding lemmata give us an $\O{M(n)}$ algorithm for integer division where $M(n)$ is the asymptotic complexity for integer multiplication. This can be shown as follows : at the $i$-th iteration, all we do is multiply and add $2^i$-bit numbers (this includes the squaring step). This takes $M\left( \O{2^i} \right) + \O{2^i}$ time which is $\O{M\left( 2^i \right)}$ since $M\left( \O{n} \right) = \O{M(n)}$ and $M(n) = \Omega(n)$. Another outcome of $M(n) = \Omega(n)$ is that $M(a) + M(b) \leq M(a+b)$ for all $a,b \geq 1$.

In the beginning we calculate $1- bx_0$ which takes $\O{M(n)}$ time. The calculations at the end to find $d$ from $x_{\log n + 1}$ also takes $\O{M(n)}$ time. Due to some uncertainty in the value of $d$, we may have to try out a constant number of values in the vicinity of the $d$ obtained. All this will take time $\O{M(n)}$. Hence integer division is no more difficult than integer multiplication.\\

\noindent\textbf{Note}: There are some minute details of the algorithm that need to be taken care of. It can be seen that the algorithm will fail if $y_0 = 0$. However this is actually a good thing for us because it tells us that our very first approximation is as good as something we hoped to get several iterations down the line. Thus if $\frac{1}{2^{l-1}} \geq 1 - bx_0 > \frac{1}{2^l}$, then we can retain $2^{\lceil \log l \rceil}$ bits of $1 - bx_0$ and call this the $\lceil \log l \rceil$-th iteration and proceed.

\section{Modular Arithmetic}
In this section we will deal with how to do the four basic integer operations on $n$-bit numbers $a$ and $b$ \emph{modulo} another $n$-bit number $m$. It should be clear that modular addition, subtraction and multiplication can be done in $\O{M(n)}$ time. As far as division is concerned we first have to define what does it mean for $\frac{a}{b}$ to be $z \md m$.

\begin{definition}
For any three integers $a,b$ and $c, \frac{a}{b} \md m$ is said to be defined if there exists an integer $z$ such that $bz \equiv a \md m$. In such a situation we say $\frac{a}{b} \equiv z \md m$.
\end{definition}

\begin{lemma}
$\frac{a}{b} \md m$ for integers $a$ and $b$ such that $(a,b) = 1$ is defined iff $(m,b) = 1$.
\end{lemma}
\begin{proof}
$(\Rightarrow)$ Suppose $\frac{a}{b} \md m$ is defined and $\frac{a}{b} =  rm + z$ where $z \in \Z$ and $r \in \Q$ such that $br \in \Z$. Hence $a = brm + bz$. Now suppose $(m,b) > 1$. This implies from the above equation that $(a,b) > 1$ as well which is contradictory to our assumption. Hence $(m,b) = 1$.

\noindent$(\Leftarrow)$ Suppose $(m,b) = 1$, hence there exist $p,q \in \Z$ such that $pm + qb = 1$. Hence $apm + aqb = a$ which implies $b(aq) \equiv a \md m$. Hence $\frac{a}{b} \md m$ is defined.
\end{proof}

Using the above lemma we can easily find $\frac{a}{b} \md m$ in $\O{n^2}$ time. First using the extended Euclid's algorithm determine if $(m,b) = 1$. If this is indeed the case then the algorithm will also output the pair $(p,q)$ and clearly $q \equiv \frac{1}{b} \md m$. Hence $aq \equiv \frac{1}{b} \md m$. Since the Euclid's algorithm works in quadratic time and $M(n) = o(n^2)$ hence the algorithm takes only $\O{n^2}$ time.

\bibliography{ref}

\end{document}