Shor’s algorithm

Although any integer number has a unique decomposition into a product of primes, finding the prime factors is believed to be a hard problem. In fact, the security of our online transactions rests on the assumption that factoring integers with a thousand or more digits is practically impossible. This assumption was challenged in 1995 when Peter Shor proposed a polynomial-time quantum algorithm for the factoring problem. Shor’s algorithm is arguably the most dramatic example of how the paradigm of quantum computing changed our perception of which problems should be considered tractable. In this section we briefly summarize some basic facts about factoring, highlight the main ingredients of Shor’s algorithm, and illustrate how it works by using a toy factoring problem.

Complexity of factoring

Suppose our task is to factor an integer N with d decimal digits. The brute force algorithm goes through all primes p up to \sqrt{N} and checks whether p divides N . In the worst case, this would take time roughly \sqrt{N} , which is exponential in the number of digits d .  A more efficient algorithm, known as the quadratic sieve, attempts to construct integers a,b such that a^2-b^2 is a multiple of N . Once such a,b are found, one checks whether a\pm b have common factors with N .  The quadratic sieve method has asymptotic runtime exponential in \sqrt{d} . The most efficient classical factoring algorithm, known as general number field sieve, achieves an asymptotic runtime exponential in d^{1/3} .

The exponential runtime scaling limits applicability of the classical factoring algorithms to numbers with a few hundred digits; the world record is d=232 (which took roughly 2,000 CPU years!).  In contrast, Shor’s factoring algorithm has runtime polynomial in d . The version of the algorithm described below, due to Alexey Kitaev, requires roughly 10d qubits, and has runtime roughly d^3 .


Figure 1: classical vs. quantum factoring algorithms

Period finding

It has been known to mathematicians since the 1970s that factoring becomes easy if one can solve another hard problem: find a period of the modular exponential function. The period-finding problem is defined as follows. Given integers N and a , find the smallest positive integer r such that a^r-1 is a multiple of N . The number r is called the period of a modulo N . Recall that in modular arithmetic, the remainder of a division a/N is called the value of a modulo N and denoted a\pmod{N} . For example, 1=16=91 \pmod{15} . Thus, the period of a modulo N is the smallest positive integer r such that a^r=1{\pmod N} . For example, suppose N=15 and a=7 . Then


That is, 7 has period 4 modulo 15 . Note that computing the higher powers of 7 would give rise to a periodic sequence: 7^{x+4}=7^x\pmod{15} for any integer x . Thus, r=4 is the period of the modular exponential function 7^x .  In general, the period-finding problem is well-defined if N and a are co-prime (have no common factors).

From factoring to period finding

Assume for a moment that we are given a period-finding machine that takes as input co-prime integers N,a and outputs the period of a modulo N . We will now show how to use the machine to find all prime factors of N . For simplicity, assume that N has only two distinct prime factors:


First, pick a random integer a between 2 and N-1  and compute the greatest common divisor (gcd) (N,a) . This can be done very efficiently using Euclid’s algorithm. If we are lucky, N and a have some common prime factors, in which case gcd (N,a) equals p_1 or p_2 , so we are done. From now on, let us assume that gcd (N,a)=1 , that is, N and a are co-prime. Let r be the period of a modulo N computed by the machine. Repeat the above steps with different random choices of a until r is even.  It can be shown that a significant fraction of all integers a have an even period (see Table 1 for examples), so on average one needs only a few repetitions.


Table 1: period of integers a mod 15

At this point we have found some pair r,a  such that r is even, and r is the smallest integer such that a^r-1 is a multiple of N . Let us use the identity


The above shows that a^{r/2}-1 is not a multiple of N (otherwise the period of a would be r/2 ). Assume for a moment that a^{r/2}+1 is not a multiple of N . Then neither of the integers a^{r/2}\pm 1  is a multiple of N , but their product is.  This is possible only if p_1 is a prime factor of a^{r/2}-1  and p_2 is a prime factor of a^{r/2}+1 (or vice versa). We can thus find p_1 and p_2 by computing gcd (N,a^{r/2}pm 1) ; see Table 1 for examples.  In the remaining “unlucky” case, when a^{r/2}+1 is a multiple of N , we give up and try a different integer a .  For example, a=14 is the only unlucky integer in Table 1. In general, it can be shown that the unlucky integers a are not too frequent, so on average, only two calls to the period-finding machine are sufficient to factor N .

Shor’s algorithm

Let us now show that a quantum computer can efficiently simulate the period-finding machine.  As in the case of the Deutsch-Jozsa algorithm, we shall exploit quantum parallelism and constructive interference to determine whether a complicated function has a certain global property that cannot be learned by evaluating the function only at a few points. However, instead of detecting the property of being a balanced function, we seek to detect and measure the periodicity of the modular exponentiation function. The fact that interference makes it easier to measure periodicity should not come as a big surprise. After all, physicists routinely use scattering of electromagnetic waves and interference measurements to determine periodicity of physical objects such as crystal lattices. Likewise, Shor’s algorithm exploits interference to measure periodicity of arithmetic objects.

Suppose we are given co-prime integers a,N . Our goal is to compute the period of a  modulo  N , that is, the smallest positive integer r such that a^r=1\pmod{N} . The basic idea is to construct a unitary operation U_a that implements the modular multiplication function x\to ax \pmod{N} . It can be shown that eigenvalues of U_a are closely related to the period of a . Namely, each eigenvalue of U_a has a form e^{j\phi} , where \phi=2\pi k/r  for some integer k . Furthermore, as we saw in the previous section on quantum phase estimation, eigenvalues of certain unitary operations can be measured efficiently using the phase estimation algorithm. Unfortunately, inferring r from the measured eigenvalues of U_a  is only possible if the eigenvalues are measured exactly (or with an exponentially small precision). For example, factoring a 1000-digit number would require measuring the eigenvalue of U_a with a precision of 10^{-2000} . Such accuracy cannot be achieved by a direct application of the phase estimation algorithm, as this would require too large a pointer system. Here comes the main trick: we will estimate the eigenvalue of U_a by applying the phase estimation algorithm to a family of unitary operations U_b  with b=a, a^2,a^4,a^8 etc. We stop at b=a^{2^p} with 2^p\approx N^2 .

Why does it work? The first observation is that all operations U_b are integer powers of U_a . Namely, if b=a^t , then U_b=(U_a)^t . This implies that the operations U_b have the same eigenvectors. In particular, eigenvalues of the entire family U_b  can be measured simultaneously.  Second, implementing U_b is as easy as implementing U_a - one just need to precompute the powers b=a,a^2,a^4,a^8,\ldots \pmod{N} by the repeated squaring method. Finally, even if the eigenvalues of U_b are measured with a poor precision (say 10%), each squaring of a reduces the error in the estimated eigenvalue of U_a by a factor of 1/2 .  Indeed, consider an eigenvector of U_a with an eigenvalue e^{j\phi} . If b=a^2 , then the eigenvalue of U_b is e^{2j\phi} . If b=a^4 , then the eigenvalue of U_b is e^{4j\phi} , etc. Thus we can estimate \phi,2\phi,4\phi,\ldots,2^p\phi with a constant precision (say 10%). We will see that this is enough to estimate \phi with a precision roughly 2^{-p} .  For example, one can achieve a precision of 10^{-2000}  by a sequence of fewer than 10^6 measurements of U_b with an error of 10%. Furthermore, it can be shown that estimating a few randomly picked eigenvalues  \phi=2\pi k/r with a precision less than 1/N^2 is enough to determine the period r exactly (the idea is to find the best rational approximation to the estimate of k/r using continued fractions).

In order to use the phase estimation algorithm, we need to construct a quantum circuit implementing the modular multiplication operation. By analogy with classical algorithms that can link standard library functions, a quantum algorithm is allowed to call classical subroutines; for example, a subroutine for computing the modular multiplication. Importantly, before such classical subroutines are incorporated into a quantum circuit, they must be transformed into a reversible form. More precisely, a quantum algorithm can call a classical subroutine only if it is compiled into a sequence of reversible logical gates such as CNOT or Toffoli gates (in particular, the number of input and output wires in each gate must be the same). The subroutine is allowed to use a scratch memory similar to local variables used by the standard library functions. However, once the subroutine is completed, the scratch memory must be totally clean (say, all zeros). The reason is that a quantum algorithm operates on coherent superpositions of different classical states. Leaving information about the inputs or the outputs in the scratch memory could potentially destroy quantum coherence and prevent the algorithm from seeing interference between different states. Since the notion of reversible classical circuits plays an important role in Shor’s algorithm and many other quantum algorithms, below we briefly discuss methods for constructing such circuits.

Reversible classical circuits

An important insight made in 1973 by our IBM colleague Charles Bennett is that any classical computation can be transformed into a reversible form. How does it work? Suppose f(x) represents some classical computation that takes as input n bit strings x and outputs m bit strings f(x) . The first observation is that the answer f(x)  can be computed without erasing any intermediate data if we are allowed to use some extra memory. Indeed, let us write down an algorithm for computing f(x) and compile it into a sequence of elementary logical gates such as AND, OR, etc. For concreteness, assume that each gate has two input wires and one output wire.  Let L be the total number of gates. We will extend the n -bit memory storing the input x by adding L bits initialized by zeros. These extra bits will serve as a scratch memory for storing intermediate data. We will write the output of the i -th gate to the i -th bit of the scratch memory and keep the values of the input bits. Once the computation is completed, the final answer f(x) is contained in some designated output register within the scratch memory. The remaining part of the scratch memory contains some “garbage” bit string g(x) (intermediate data).  Below, we illustrate how it works for the example when f(x) computes the 3-bit majority function.


At this point, the circuit is reversible as a whole, but its individual gates are still irreversible. The next step is to transform each gate into a reversible form. Consider as an example the AND gate with input wires a,b and output wire c such that c=a\wedge b . Let us define its reversible version R-AND. One of the output wires of R-AND must carry the output bit c of the standard AND gate. To avoid losing information, R-AND must have at least two other output wires (note that in the case c=0 there are three possible input strings: ab=00,01,10 ). The simplest version of R-AND has three input wires and three output wires as shown below.


Here d is a dummy input wire, and \oplus denotes XOR operation (addition modulo two). The gate expects to receive inputs with d=0 in which case c=a\wedge b . If d=1 then the output data bit is flipped. Note that all inputs of R-AND can be computed from its outputs since d=c\oplus (a\wedge b) . Thus R-AND indeed acts reversibly (technically, R-AND realizes a permutation on the set of 3-bit strings). Note also that R-AND coincides with the Toffoli gate.

The same construction can be applied to any other gate with two input wires and one output wire. Namely, if a gate F computes some Boolean function c=F(a,b) , then its reversible version R-F would map inputs a,b,d to outputs a,b,c where c=d\oplus F(a,b) ; see below. Note that applying R-F twice implements the identity gate, that is, R-F coincides with its own inverse.


Suppose the original circuit is described by a sequence of L gates F_1,\ldots,F_L . Replace each gate  F_i  by its reversible version G_i=R - F_i constructed above.  We shall connect the dummy input wire of G_i and its output wire c to the i -th bit of the scratch memory such that the gate always receives inputs with d=0 . The new circuit has n+L input and n+L output wires and is composed from reversible 3-bit gates. The final state generated by the circuit can be written as x,g(x),f(x) , where  f(x) is the final answer stored in the output register somewhere within the scratch memory and g(x)  represents “garbage” (intermediate data). Here we assumed that the scratch memory is initially clean (all zeros). Thus we have constructed a reversible circuit that maps x,0^L to x,g(x),f(x) . The final step is to get rid of the garbage g(x) without erasing any information (which would render the circuit irreversible). A solution is to copy the answer f(x) to a clean auxiliary register of m bits and then “uncompute” f(x)  by applying the circuit backwards in time. Below we sketch how this works.


Ignoring for simplicity all auxiliary bits that are initialized and returned in the zero state, we obtained a reversible circuit on n+m bits that maps input strings x,y to output strings x,y\oplus f(x) . In the special case when the f(x) is invertible, one can use similar tricks to construct a reversible circuit that maps input strings x to output strings f(x) . In practice, one would never use the method described above, since it requires too large a scratch memory. Several optimization techniques for constructing reversible circuits have been proposed (such as uncomputing partial results more often and reusing scratch memory bits).

Quantum circuits for modular multiplication

Suppose now that  f(x)=ax\pmod{N} is the modular multiplication function. Let n be the number of binary digits in N . Using n -bit strings to represent integers modulo N , one can implement  f(x) by a classical circuit U_a composed of 3-bit reversible gates with n input and output wires, as described above. The circuit U_a may also use auxiliary bits that are initialized and returned in the 0 state. The state-of-the-art implementation would require roughly n^2 gates and roughly 2n auxiliary bits. For simplicity, below we will often ignore the auxiliary bits.  Let us convert U_a to a quantum circuit by replacing each classical gate with its quantum counterpart. This is possible because, by construction, each gate of U_a implements some permutation on the set of input bit strings 000,001,\ldots,111 . The corresponding quantum gate implements the same permutation on the set of basis states |000\rangle,|001\rangle,\ldots,|111\rangle . We obtained a quantum circuit  U_a acting on a register of n qubits that maps a basis state |x\rangle  to |f(x)\rangle . An example for f(x)=7x\pmod{15} is shown below. The period-finding algorithm requires modular multiplication circuits U_b for b=a,a^2,a^4,\ldots,a^{2^p} \pmod{N} , where 2^p\approx N^2 .


Some basis states representing integers modulo 15 :


The modular multiplication operation maps |x\rangle to |7x mod 15\rangle .  This quantum circuit implements U_7 (see Markov and Saeedi 2012).


Controlled operations and phase estimation

Let U=U_a be the modular multiplication operation. At this point, we know how to construct a quantum circuit implementing  U , as well as repeated squares of U such as U^2,U^4,U^8 etc.  We also know that eigenvalues of U reveal information about the period of a modulo N . The final step is to measure the eigenvalues. For that we need a controlled version of U . A controlled unitary operation is a quantum analog of classical conditional statements such as if-then-else.  In general, suppose U is a quantum circuit acting on n qubits. A controlled version of U  is a unitary operation acting on a larger system  control + target , where control is a single qubit and target is a register of n qubits. Controlled- U applies U to the target register if the control qubit is |1\rangle , and does nothing if the control qubit is |0\rangle .


Like their classical counterparts, controlled quantum operations are used in almost any quantum algorithm. We note that if U can be realized by a short quantum circuit, then so can controlled- U . Indeed, one can take the circuit realizing U and replace each gate by its controlled version (with the same control qubit). The main distinction from the classical if-then-else construct is that the controlled qubit can be in a superposition of state \alpha|0\rangle +\beta|1\rangle . One could say that in the quantum world, two branches of a conditional statement can be executed “at the same time”.

Consider now a special case when the target register is prepared in some state \psi , which is an eigenvector of U , that is U|\psi\rangle=e^{j\phi} |\psi\rangle . The only difference between the two branches of the controlled- U operation is the phase shift e^{j\phi} . In other words, the control qubit is mapped from \alpha|0\rangle+\beta|1\rangle to \alpha|0\rangle+e^{j\phi}\beta |1\rangle , while the target register remains in the state \psi . We can thus describe the action of controlled- U on the composite system control + target by a single-qubit phase shift gate P  acting on the control qubit.


Below we focus on what happens with the control qubit only (keeping in mind that it is part of the larger system control + target ).  We will measure the eigenvalue e^{j\phi} using a pair of phase estimation circuits, as shown below:



One can easily check that the probability of observing the measurement outcome 0  is 0.5(1+\cos{(\phi)}) for the first circuit and 0.5(1-\sin{(\phi)}) for the second circuit.  Keep in mind that P represents the controlled- U operation, so the circuit extracts information about the phase \phi by measuring interference between two branches of controlled- U , where one branch accumulates a phase factor e^{j\phi} and the other branch accumulates no phase. By repeating each circuit several times and collecting the measurement statistics, we can estimate the probabilities, which gives us an estimate of \phi . For concreteness, assume that we are willing to perform at most 100 measurements. The statistical error in our estimate of \phi is thus roughly 10%. To factor a number N with 1000 decimal digits, the phase \phi has to be estimated with a very high precision \epsilon \sim 1/N^2 \sim 10^{-2000} . To this end, we will perform the phase estimation for a family of unitary operations U^t , where  t=1,2,4,8 etc. We stop at t=2^p such that 2^p\approx 1/\epsilon . Recall that we can efficiently implement U^t for very large values of t by classically computing b=a^{t}\pmod{N} and using the identity U^t=(U_a)^t=U_b . Since all operations U^t have the same eigenvector \psi , we can do all phase estimations with the same target register (initialized in the eigenvector |\psi\rangle) .

For simplicity, let us assume that the phase estimations are performed sequentially, in which case only one control qubit is needed. The controlled- U^2 operation gives rise to a phase shift P^2 by angle 2\phi  on the control qubit; thus, we can estimate 2\phi with a precision of 10% by performing roughly 100 measurements. This gives an estimate of \phi  with 5% precision.  More precisely, since the phase \phi lives on the unit circuit, we get a pair of candidate angles \phi' and \phi''=\phi'+\pi such that one of them approximates \phi with a precision 5% and the other is very far from \phi (approximately by \pi ).  However, we have already estimated \phi itself with a 10% precision. This is enough to select one of the candidate angles \phi' and \phi'' . Applying this argument inductively several times shows that estimating \phi,2\phi,\ldots,2^p\phi with a constant precision (say, 10%) is enough to estimate  \phi with a precision roughly 2^{-p}\sim \epsilon . Overall we would need approximately M=100\log_2{(1/\epsilon)}\sim 10^6 measurements, which translates to 10^6 controlled modular multiplication operations. In general, M scales as \log{(N)}