# Shor’s algorithm¶

Although any integer number has a unique decomposition into a product of primes, finding the prime factors is believed to be a hard problem. In fact, the security of our online transactions rests on the assumption that factoring integers with a thousand or more digits is practically impossible. This assumption was challenged in 1995 when Peter Shor proposed a polynomial-time quantum algorithm for the factoring problem. Shor’s algorithm is arguably the most dramatic example of how the paradigm of quantum computing changed our perception of which problems should be considered tractable. In this section we briefly summarize some basic facts about factoring, highlight the main ingredients of Shor’s algorithm, and illustrate how it works by using a toy factoring problem.

## Complexity of factoring¶

Suppose our task is to factor an integer with decimal digits. The brute force algorithm goes through all primes up to and checks whether divides . In the worst case, this would take time roughly , which is exponential in the number of digits .  A more efficient algorithm, known as the quadratic sieve, attempts to construct integers such that is a multiple of . Once such are found, one checks whether have common factors with .  The quadratic sieve method has asymptotic runtime exponential in . The most efficient classical factoring algorithm, known as general number field sieve, achieves an asymptotic runtime exponential in .

The exponential runtime scaling limits applicability of the classical factoring algorithms to numbers with a few hundred digits; the world record is (which took roughly 2,000 CPU years!).  In contrast, Shor’s factoring algorithm has runtime polynomial in . The version of the algorithm described below, due to Alexey Kitaev, requires roughly qubits, and has runtime roughly .

Figure 1: classical vs. quantum factoring algorithms

## Period finding¶

It has been known to mathematicians since the 1970s that factoring becomes easy if one can solve another hard problem: find a period of the modular exponential function. The period-finding problem is defined as follows. Given integers and , find the smallest positive integer such that is a multiple of . The number is called the period of modulo . Recall that in modular arithmetic, the remainder of a division is called the value of modulo and denoted . For example, . Thus, the period of modulo is the smallest positive integer such that . For example, suppose and . Then

That is, has period modulo . Note that computing the higher powers of would give rise to a periodic sequence: for any integer . Thus, is the period of the modular exponential function .  In general, the period-finding problem is well-defined if and are co-prime (have no common factors).

## From factoring to period finding¶

Assume for a moment that we are given a period-finding machine that takes as input co-prime integers and outputs the period of modulo . We will now show how to use the machine to find all prime factors of . For simplicity, assume that has only two distinct prime factors:

First, pick a random integer between and  and compute the greatest common divisor (gcd) . This can be done very efficiently using Euclid’s algorithm. If we are lucky, and have some common prime factors, in which case gcd equals or , so we are done. From now on, let us assume that gcd , that is, and are co-prime. Let be the period of modulo computed by the machine. Repeat the above steps with different random choices of until is even.  It can be shown that a significant fraction of all integers have an even period (see Table 1 for examples), so on average one needs only a few repetitions.

Table 1: period of integers mod

At this point we have found some pair  such that is even, and is the smallest integer such that is a multiple of . Let us use the identity

The above shows that is not a multiple of (otherwise the period of would be ). Assume for a moment that is not a multiple of . Then neither of the integers  is a multiple of , but their product is.  This is possible only if is a prime factor of  and is a prime factor of (or vice versa). We can thus find and by computing gcd ; see Table 1 for examples.  In the remaining “unlucky” case, when is a multiple of , we give up and try a different integer .  For example, is the only unlucky integer in Table 1. In general, it can be shown that the unlucky integers are not too frequent, so on average, only two calls to the period-finding machine are sufficient to factor .

## Shor’s algorithm¶

Let us now show that a quantum computer can efficiently simulate the period-finding machine.  As in the case of the Deutsch-Jozsa algorithm, we shall exploit quantum parallelism and constructive interference to determine whether a complicated function has a certain global property that cannot be learned by evaluating the function only at a few points. However, instead of detecting the property of being a balanced function, we seek to detect and measure the periodicity of the modular exponentiation function. The fact that interference makes it easier to measure periodicity should not come as a big surprise. After all, physicists routinely use scattering of electromagnetic waves and interference measurements to determine periodicity of physical objects such as crystal lattices. Likewise, Shor’s algorithm exploits interference to measure periodicity of arithmetic objects.

Suppose we are given co-prime integers . Our goal is to compute the period of  modulo , that is, the smallest positive integer such that . The basic idea is to construct a unitary operation that implements the modular multiplication function . It can be shown that eigenvalues of are closely related to the period of . Namely, each eigenvalue of has a form , where  for some integer . Furthermore, as we saw in the previous section on quantum phase estimation, eigenvalues of certain unitary operations can be measured efficiently using the phase estimation algorithm. Unfortunately, inferring from the measured eigenvalues of  is only possible if the eigenvalues are measured exactly (or with an exponentially small precision). For example, factoring a 1000-digit number would require measuring the eigenvalue of with a precision of . Such accuracy cannot be achieved by a direct application of the phase estimation algorithm, as this would require too large a pointer system. Here comes the main trick: we will estimate the eigenvalue of by applying the phase estimation algorithm to a family of unitary operations  with etc. We stop at with .

Why does it work? The first observation is that all operations are integer powers of . Namely, if , then . This implies that the operations have the same eigenvectors. In particular, eigenvalues of the entire family  can be measured simultaneously.  Second, implementing is as easy as implementing - one just need to precompute the powers by the repeated squaring method. Finally, even if the eigenvalues of are measured with a poor precision (say 10%), each squaring of reduces the error in the estimated eigenvalue of by a factor of .  Indeed, consider an eigenvector of with an eigenvalue . If , then the eigenvalue of is . If , then the eigenvalue of is , etc. Thus we can estimate with a constant precision (say 10%). We will see that this is enough to estimate with a precision roughly .  For example, one can achieve a precision of  by a sequence of fewer than measurements of with an error of 10%. Furthermore, it can be shown that estimating a few randomly picked eigenvalues  with a precision less than is enough to determine the period exactly (the idea is to find the best rational approximation to the estimate of using continued fractions).

In order to use the phase estimation algorithm, we need to construct a quantum circuit implementing the modular multiplication operation. By analogy with classical algorithms that can link standard library functions, a quantum algorithm is allowed to call classical subroutines; for example, a subroutine for computing the modular multiplication. Importantly, before such classical subroutines are incorporated into a quantum circuit, they must be transformed into a reversible form. More precisely, a quantum algorithm can call a classical subroutine only if it is compiled into a sequence of reversible logical gates such as CNOT or Toffoli gates (in particular, the number of input and output wires in each gate must be the same). The subroutine is allowed to use a scratch memory similar to local variables used by the standard library functions. However, once the subroutine is completed, the scratch memory must be totally clean (say, all zeros). The reason is that a quantum algorithm operates on coherent superpositions of different classical states. Leaving information about the inputs or the outputs in the scratch memory could potentially destroy quantum coherence and prevent the algorithm from seeing interference between different states. Since the notion of reversible classical circuits plays an important role in Shor’s algorithm and many other quantum algorithms, below we briefly discuss methods for constructing such circuits.

## Reversible classical circuits¶

An important insight made in 1973 by our IBM colleague Charles Bennett is that any classical computation can be transformed into a reversible form. How does it work? Suppose represents some classical computation that takes as input bit strings and outputs bit strings . The first observation is that the answer  can be computed without erasing any intermediate data if we are allowed to use some extra memory. Indeed, let us write down an algorithm for computing and compile it into a sequence of elementary logical gates such as AND, OR, etc. For concreteness, assume that each gate has two input wires and one output wire.  Let be the total number of gates. We will extend the -bit memory storing the input by adding bits initialized by zeros. These extra bits will serve as a scratch memory for storing intermediate data. We will write the output of the -th gate to the -th bit of the scratch memory and keep the values of the input bits. Once the computation is completed, the final answer is contained in some designated output register within the scratch memory. The remaining part of the scratch memory contains some “garbage” bit string (intermediate data).  Below, we illustrate how it works for the example when computes the 3-bit majority function.

At this point, the circuit is reversible as a whole, but its individual gates are still irreversible. The next step is to transform each gate into a reversible form. Consider as an example the AND gate with input wires and output wire such that . Let us define its reversible version R-AND. One of the output wires of R-AND must carry the output bit of the standard AND gate. To avoid losing information, R-AND must have at least two other output wires (note that in the case there are three possible input strings: ). The simplest version of R-AND has three input wires and three output wires as shown below.

Here is a dummy input wire, and denotes XOR operation (addition modulo two). The gate expects to receive inputs with in which case . If then the output data bit is flipped. Note that all inputs of R-AND can be computed from its outputs since . Thus R-AND indeed acts reversibly (technically, R-AND realizes a permutation on the set of 3-bit strings). Note also that R-AND coincides with the Toffoli gate.

The same construction can be applied to any other gate with two input wires and one output wire. Namely, if a gate F computes some Boolean function , then its reversible version R-F would map inputs to outputs where ; see below. Note that applying R-F twice implements the identity gate, that is, R-F coincides with its own inverse.

Suppose the original circuit is described by a sequence of gates . Replace each gate  by its reversible version - constructed above.  We shall connect the dummy input wire of and its output wire to the -th bit of the scratch memory such that the gate always receives inputs with . The new circuit has input and output wires and is composed from reversible 3-bit gates. The final state generated by the circuit can be written as , where  is the final answer stored in the output register somewhere within the scratch memory and  represents “garbage” (intermediate data). Here we assumed that the scratch memory is initially clean (all zeros). Thus we have constructed a reversible circuit that maps to . The final step is to get rid of the garbage without erasing any information (which would render the circuit irreversible). A solution is to copy the answer to a clean auxiliary register of bits and then “uncompute”  by applying the circuit backwards in time. Below we sketch how this works.

Ignoring for simplicity all auxiliary bits that are initialized and returned in the zero state, we obtained a reversible circuit on bits that maps input strings to output strings . In the special case when the is invertible, one can use similar tricks to construct a reversible circuit that maps input strings to output strings . In practice, one would never use the method described above, since it requires too large a scratch memory. Several optimization techniques for constructing reversible circuits have been proposed (such as uncomputing partial results more often and reusing scratch memory bits).

## Quantum circuits for modular multiplication¶

Suppose now that  is the modular multiplication function. Let be the number of binary digits in . Using -bit strings to represent integers modulo , one can implement  by a classical circuit composed of 3-bit reversible gates with input and output wires, as described above. The circuit may also use auxiliary bits that are initialized and returned in the state. The state-of-the-art implementation would require roughly gates and roughly auxiliary bits. For simplicity, below we will often ignore the auxiliary bits.  Let us convert to a quantum circuit by replacing each classical gate with its quantum counterpart. This is possible because, by construction, each gate of implements some permutation on the set of input bit strings . The corresponding quantum gate implements the same permutation on the set of basis states . We obtained a quantum circuit  acting on a register of qubits that maps a basis state  to . An example for is shown below. The period-finding algorithm requires modular multiplication circuits for , where .

Some basis states representing integers modulo :

The modular multiplication operation maps to mod .  This quantum circuit implements (see Markov and Saeedi 2012).

## Controlled operations and phase estimation¶

Let be the modular multiplication operation. At this point, we know how to construct a quantum circuit implementing , as well as repeated squares of such as etc.  We also know that eigenvalues of reveal information about the period of modulo . The final step is to measure the eigenvalues. For that we need a controlled version of . A controlled unitary operation is a quantum analog of classical conditional statements such as if-then-else.  In general, suppose is a quantum circuit acting on qubits. A controlled version of  is a unitary operation acting on a larger system , where is a single qubit and is a register of qubits. Controlled- applies to the target register if the control qubit is , and does nothing if the control qubit is .

Like their classical counterparts, controlled quantum operations are used in almost any quantum algorithm. We note that if can be realized by a short quantum circuit, then so can controlled-. Indeed, one can take the circuit realizing and replace each gate by its controlled version (with the same control qubit). The main distinction from the classical if-then-else construct is that the controlled qubit can be in a superposition of state . One could say that in the quantum world, two branches of a conditional statement can be executed “at the same time”.

Consider now a special case when the target register is prepared in some state , which is an eigenvector of , that is . The only difference between the two branches of the controlled- operation is the phase shift . In other words, the control qubit is mapped from to , while the target register remains in the state . We can thus describe the action of controlled- on the composite system by a single-qubit phase shift gate  acting on the control qubit.

Below we focus on what happens with the control qubit only (keeping in mind that it is part of the larger system ).  We will measure the eigenvalue using a pair of phase estimation circuits, as shown below:

One can easily check that the probability of observing the measurement outcome  is for the first circuit and for the second circuit.  Keep in mind that represents the controlled- operation, so the circuit extracts information about the phase by measuring interference between two branches of controlled-, where one branch accumulates a phase factor and the other branch accumulates no phase. By repeating each circuit several times and collecting the measurement statistics, we can estimate the probabilities, which gives us an estimate of . For concreteness, assume that we are willing to perform at most 100 measurements. The statistical error in our estimate of is thus roughly 10%. To factor a number with 1000 decimal digits, the phase has to be estimated with a very high precision . To this end, we will perform the phase estimation for a family of unitary operations , where  etc. We stop at such that . Recall that we can efficiently implement for very large values of by classically computing and using the identity . Since all operations have the same eigenvector , we can do all phase estimations with the same target register (initialized in the eigenvector .

For simplicity, let us assume that the phase estimations are performed sequentially, in which case only one control qubit is needed. The controlled- operation gives rise to a phase shift by angle  on the control qubit; thus, we can estimate with a precision of 10% by performing roughly 100 measurements. This gives an estimate of  with 5% precision.  More precisely, since the phase lives on the unit circuit, we get a pair of candidate angles and such that one of them approximates with a precision 5% and the other is very far from (approximately by ).  However, we have already estimated itself with a 10% precision. This is enough to select one of the candidate angles and . Applying this argument inductively several times shows that estimating with a constant precision (say, 10%) is enough to estimate  with a precision roughly . Overall we would need approximately measurements, which translates to controlled modular multiplication operations. In general, scales as