WO2002033885A1 - Multiplication modulaire pour systeme rsa et autre chiffrement/dechiffrement asymetrique - Google Patents
Multiplication modulaire pour systeme rsa et autre chiffrement/dechiffrement asymetrique Download PDFInfo
- Publication number
- WO2002033885A1 WO2002033885A1 PCT/SE2001/002270 SE0102270W WO0233885A1 WO 2002033885 A1 WO2002033885 A1 WO 2002033885A1 SE 0102270 W SE0102270 W SE 0102270W WO 0233885 A1 WO0233885 A1 WO 0233885A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input
- modulus
- multiplication
- input value
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/722—Modular multiplication
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/499—Denomination or exception handling, e.g. rounding or overflow
- G06F7/49931—Modulo N reduction of final result
Definitions
- This invention relates to a method and an apparatus to perform modular multiplication applied on asymmetric encryption/decryption, for example according to the so-called RSA (Rivest-Shaman-Adleman)-system.
- RSA Rivest-Shaman-Adleman
- a number e is chosen in such a way that e and (p-l)*(q-l) do not have any common factor larger than 1.
- the two pairs of numbers, (e,m) and (d,m), are the public and secret keys respectively for the system.
- the security of the RSA- algorithm is based on the assumption that the work required to break the system is equivalent to the work required to divide the module m in factors. Since p and q are very large numbers also m will be a large number. Today, there is no known effective algorithm to divide such large numbers in factors in a reasonable time.
- the computations that are performed in RSA-encryption are called modulus exponentiation. Modulus exponentiation means taking the power of one large positive integer number with another large positive integer and computing the result modulus a third large positive integer. This should in practice be performed through a series of modular multiplications, i.e. a modulus reduction is done after each step of the computation. This technique limits the size of the operators to the size of the modulus.
- Modulo multiplication is in general hard to perform in a time efficient manner, especially when the numbers are long, typically >512 bits. In practical implementations modulo multipliers are therefore often realised in some kind of hardware.
- Multiplication in general as well as modulo multiplication is an operation that can be performed in many different ways, although it may intuitively be believed that this is not the case.
- FPGA Field-Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- a FPGA contains among other things programmable elements, so called LUTs (Look Up Table).
- LUTs Look Up Table
- the designer that is using FPGAs wants to minimise the number of used LUTs and other resources when implementing a certain function with a certain specification regarding speed and throughput of the circuit. Inversely, the designer may request minimal computation time using a specified amount of LUTs and other resources.
- RSA Rivest-Shamir-Adleman
- EP0502782 Microcircuit for the Implementation of RSA Algorithm and Ordinary and Modular Arithmetic, in Particular Exponentiation, with Large Operands; PCT WO 00/42484(A2) Acceleration and Security Enhancements for Elliptic Curve and RSA Coprocessors;
- PCT WO 00/42484(A2) Acceleration and Security Enhancements for Elliptic Curve and RSA Coprocessors;
- the prior art consists of multiplication units that are not directly usable for modulus multiplication; modulus multiplication units based on parallel subtraction and addition that requires a large silicon area and are not usable at high clock frequencies; or multiplication according to the Montgomery method which is used when the length of the modulus can be determined in the design phase.
- the general purpose of the invention is to solve the problem with inefficient modulus multiplication, i.e. the purpose is to accomplish a fast, efficient and resource efficient modulus multiplier.
- One aspect of the problem is to accomplish a modulus multiplier that, when it is implemented, will permit a minimization of the silicon area of the circuit, and at least give a smaller area than given by existing comparable multipliers.
- Other aspects of the problem is as follows: - to accomplish a modulus multiplier that can be run at the highest possible clock frequency limited by the underlying technology;
- multiplier that is flexible in such way that if several multipliers are connected together, they can in parallel process numbers belonging to different modules with different lengths.
- the invention is based on the inventors understanding about and the application of one for the purpose suitable way of performing modulus multiplication.
- the Equation 4 is reformulated to
- A*B mod(m) A*B - (int(l/m * A * B)) * m (5) where 1/m and m are considered to be constants.
- a modulus multiplier is realised that applies the invented relation in hardware based in serial arithmetic.
- the efficient mathematical algorithm renders a basis for a very area efficient realisation of a modulus multiplier, which thus enables the surface cost to be minimised.
- the efficient mathematical algorithm combined with hardware architecture according to the invention gives a time efficient modulus multiplier.
- a modulus multiplier that can be clocked with the maximum clock frequency permitted by the underlying technology.
- the simplicity of the control machine based on independence of the treated data, makes it possible to clock the modulus multiplier at higher clock frequencies thus enabling the use of the RSA-system in new demanding applications.
- the hardware structure according to the invention consists of three sequential multiplications preferably divided into two computational chains. This structure provides a modulus multiplier that permits simultaneous computation of several incoming numbers, which gives a time for the throughput that is considerably shorter than the response time.
- modulus multiplier that permits simultaneous computation of several incoming numbers, which gives a time for the throughput that is considerably shorter than the response time.
- multiplier that is scalable and flexible through linking of several multipliers.
- These multipliers can in parallel compute numbers with different lengths on the modulus.
- the length of the used modulus controls the used part of the multiplier in such way that numbers with shorter length on the modulus is computed faster.
- this gives the architecture the possibility to compute numbers with short modulus without limiting the possibility to compute numbers with long modulus.
- a method for bit-serial modulus multiplication comprising the following steps: feeding as a first input value a multiplication factor serially to a first computational chain; feeding as a second input value a parallel multiplication factor to the first computational chain; multiplying in the first computational chain all bits in the second input value with all bits in the first input value resulting in a first partial result; completing the multiplication by setting the first input value to zero and further computing a second partial result where the complete result is the second partial result followed by the first partial result.
- a further elaboration comprises the steps to: move an in the first computational chain received carry length to a second computational chain; possibly start a new computation with a second input value to the first computational chain while the previous computation is completed in the second computational chain; output serially a first intermediate result from the first computational chain simultaneously with the output of a second intermediate result from the second computational chain;
- Yet another elaboration comprises the following: three consecutive sequences of the previously described multiplier sequences; where the first multiplier sequence has a first and a second input value; - where the second multiplier sequence has a constant as first input value and the output from the first multiplier sequence as the second input value.; where the third multiplier unit has a first input value, which is a second constant, and a second input value that is the result from the second multiplier sequence; where a result of the modulus multiplication is formed as the difference between the results from the first and the second multiplier sequence.
- the input values are arranged where the first input value to the second multiplier sequence is input in a parallel form and the second input value is input in a serial form; where the first input value to the third multiplier sequence is input in a parallel form and the second input value is input in a serial form.
- the first multiplier sequence has a first and a second input value
- the second multiplier sequence has a constant as first input value and the output from the first multiplier sequence as the second input value.
- the third multiplier unit has a first input value, which is a second constant, and a second input value that is the result from the second multiplier sequence; where a result of the modulus multiplication is formed as the difference between the results from the first and the second multiplier sequence.
- first input value to the second multiplier sequence is input in a parallel form and the second input value is input in a serial form
- the first input value to the third multiplier sequence is input in a parallel form and the second input value is input in a serial form
- the memory elements included in the first computing chain are put in reset state depending on a reset-signal.
- the length of the signal paths in the serial arithmetic computing unit is limited with a delay element.
- the length of the first constant is one or more bits longer than the second constant.
- the length of the first constant is one or more bits longer than the second constant.
- the method for modulus multiplication mentioned above is usefully applied in asymmetric encryption.
- the invention is usefully implemented as an apparatus in the form of a bit serial modulus multiplier, comprising: a serial input for a multiplication factor as a first input value to a first computing chain; - a parallel input for a multiplication factor that is a second input value to the first computing chain; a serial arithmetic computing unit realised as the mentioned first computing chain arranged to multiply an actual bit in the first input value on the serial input with all bits in the second input value on the parallel input; - a serial output for a first intermediate result from the mentioned first computing chain.
- the process elements of the mentioned first computing chain comprises a multiplier in the form of an AND-gate, two delay elements and one three input full adder for each element.
- the mentioned delay elements are arranged in such a way that they are set to zero with a reset control signal.
- a first computing element in the first computing chain consists preferably of only an AND-gate and a delay element.
- the mentioned delay element is usually arranged to be set to zero with a reset control signal.
- One delay element is arranged at the en of the first computing chain.
- a second computing chain coupled to the first computing chain; a signal path to move an in the first computing chain acquired carry length to the second computing chain; a serial output for a second intermediate result from the second computing chain.
- the mentioned second computing chain comprises process elements each having two delay elements and two multiplexers.
- the mentioned two multiplexers are controlled by a common control signal. It is furthermore arranged a reset control signal to the delay elements in the first computing chain, where the mentioned reset control signal is connected with and constitutes the control signal to mentioned two multiplexers.
- a delay stage is arranged to limit the length of the signal paths in the serial arithmetic computational unit. This is arranged such that also other signal paths, like signals for computed numbers and control signals are delayed, which allows that computed data also after the delay element henceforth are synchronized.
- the first input of the second modulus multiplier is the parallel input and the second input is the serial input.
- the first input of the third modulus multiplier is the parallel input and the second input is the serial input.
- the third modulus multiplier has a first input that is communicatively coupled to the mentioned memory and a second input that is communicatively coupled to an output from the second modulus multiplier;
- the third modulus multiplier has a first input that is communicatively coupled to the mentioned memory and a second input that is communicatively coupled to an output from the second modulus multiplier; - a differentiating element communicatively coupled to an output from the first modulus multiplier and a second output that is communicatively coupled to an output from the third modulus multiplier; an output from the differentiating element.
- the first input of the second modulus multiplier is the parallel input and the second input is the serial input; the first input of the third modulus multiplier is the parallel input and the second input is the serial input.
- the modulus multiplier can be configured such that the mentioned serial input to the modulus multiplier is connected to a multiplexer; the mentioned parallel input to the modulus multiplier is connected to a parallel to serial converter; - the mentioned output from the first computing chain and the mentioned output from the second computing chain is connected to a parallel to serial converter.
- the mentioned parallel to serial converter is by a parallel port connected to a memory unit.
- the multiplier is arranged to: store a first and a second constant in a memory; store a third input value in a memory; store an encryption exponent for an asymmetric crypto system in a memory; put an intermediate result as the mentioned third input value; - update the mentioned intermediate result by performing a modulus multiplication with the intermediate result as a first input value and either the intermediate result or the mentioned third input value as the second input value, depending on the binary number sequence of the encryption exponent; repeat the previous step and successively choosing bits in the binary sequence of the encryption exponent and thus receiving modulus exponentiation; save the final intermediate result as the final result. It is then preferably arranged such that the two choices of the second input value always is done such that the computing sequence is independent of the treated data.
- Different embodiments of the apparatus can be applied in different versions for asymmetric encryption/decryption, comprising a modulus multiplier according to any of the previously mentioned aspects.
- An arrangement for modulus multiplication comprising a serial arithmetic unit (1) arranged to perform the multiplication, comprises a number of circuits (20) in the serial arithmetic unit (1) including a first and a second computational chain, which are arranged to perform at least two parallel computations simultaneously.
- the first computational chain is arranged to move an at the computation received carry length to a second computational chain and then start a new computation.
- a memory (2) coupled to the arithmetic unit (1), where input data, output data and computed intermediate results are stored and at least one unit (3) for conversion of numbers from a first format to a second format connected to the memory (2) and the arithmetic unit (1).
- throughput time is meant the average time for processing in the modulus multiplier according to the invention when it is processing a queue of operations.
- response time or latency time is meant the time from the point when a modulus multiplier according to the invention starts a computation until the modulus multiplier is finished with the computation.
- used area is meant the number of used LUTs to realise a modulus multiplier according to the invention. It should in this connection be considered that FPGAs from different manufacturers have different internal design, and thus the number of LUTs cannot always directly be compared between different manufacturers. For an ASIC implementation the area can be directly computed in square millimetres, or be given as the number of used transistors.
- a multiplier for modulus multiplication can according to prior art comprise one or more computational units and a control machine, which controls the computational units.
- a control machine which controls the computational units.
- the control machine When estimating the used area and possible processing time, usually as the maximum allowed clock frequency, it should be noted that it is usually the control machine that limits the maximum clock frequency.
- Fig 1 describes a block diagram of an embodiment for modulus multiplication according to the invention
- Fig 2 shows a detail of an embodiment
- Fig 3 shows a flow diagram for an embodiment of the method according to the invention
- Fig 4 shows a modulus multiplier according to an embodiment of the invention implemented as an exponentiation unit
- Fig 5 shows an embodiment of a modulus multiplier based on Barret's method
- Fig 6 shows a partial detail view of an embodiment of a modulus multiplier
- Fig 7 shows a multiplier comprising a chain of addition units
- Fig 8 shows a pre-serialisation unit to a multiplier.
- Fig 9 shows a pre-parallelisation unit to a multiplier
- Fig 10 shows a delay element to a multiplier
- Fig 11 shows a first embodiment of an addition unit in a multiplier
- Fig 12 shows a multiplication unit in the form of a chain of addition units according to Fig 11;
- Fig 13 shows a second embodiment of an addition unit
- Fig 14 shows a multiplication unit in the shape of a chain of addition units according to Fig 13;
- Fig 15 shows an overview of an arrangement with three parallel modulus multipliers according to the invention.
- Fig 16 shows a serial subtraction unit included in different embodiments of the invention.
- the equation 1 performs the encryption while the equation 2 performs the decryption.
- C is the encrypted message
- e and m are the public keys and d is a secret or private key.
- M, C, e, m and d are numbers with the length n.
- X ⁇ mod(m) is performed with n bits long numbers, that nowadays typically can be more than a thousand bits long.
- Equation 4 has been reformulated as follows:
- 1/m and m are considered as constants.
- encryption decryption the same constants are used in several operations. In such cases it is only needed to initially compute 1/m and consequently no divisions are computed when computing Equation (5).
- Montgomery's method can be replaced with a multiplication of two numbers each with the length of approximately n+1 bits, where the exact relation can be extracted from equation lOa+lOb.
- Fig 1 shows an embodiment of the invention, where only one multiplier is used iteratively with the aid of a memory for storage of intermediate results.
- reference numeral 1 represents a serial arithmetic unit, in which a multiplication is performed by feeding one of the multiplicands serially at X and the other multiplicand in parallel form at Y. The result is output serially at Rl (Rlow), where the least significant half of the result is fed out, and Rh (Rhigh), where the most significant half of the result is fed out.
- a memory 2 is used to store intermediate results, but also works as input and output for the operators and the result, respectively.
- At least one serial/parallel- and parallel/serial- converter 3 receives and sends out serial numbers, but feeds in and out parallel numbers to and from the memory 2.
- three converters are used 3a, 3b, 3c. If two n-bit numbers are multiplied a result that is 2n bits long is achieved. To avoid problems with the time, since the operators are read in n clock cycles while it takes 2n clock cycles to read out the result, the carry-result stored in the chain is transferred in parallel form into another parallel computing chain after n clock cycles. Because of this the least significant numbers will be read out at Rl (Rlow) and the most significant bits at Rh (Rhigh). A new computation is started after n clock cycles and the computations are overlapping.
- a multiplexer 5 (MUX) is arranged at the input to the arithmetic unit 1.
- a subtraction should be performed according to Equation (5). It is performed separately in a simple adder 4, which is either arranged separately, as in Fig 1, or is integrated into the arithmetic unit 1.
- the serial arithmetic unit 1 comprises a predefined number of circuits 20, which are shown in Fig 2, each of them performing a serial addition of X and Yj, where Yj is the number Y's bit number i.
- the arithmetic unit 1 consists of 512 circuits 20, but can of course be fewer or more depending on how long numbers that are wished to be multiplied.
- the circuit 20 comprises a full adder 21, a half adder 22, an AND-gate 23 and four D-flip flops 24, 25, 26 and 27.
- X and Yj are input values to the AND-gate 23.
- the input values to the full adder are the output values from the AND-gate 23, the previous carry- value through the D-flip flop 24 and the value of the sum from the circuit i-1 with higher bit weight through the D- flip flop 25.
- the carry values in the D-flip flops 24 and 25 are moved in parallel to the D- flip flops 26 and 27 after n clock cycles, when a new computation is started in the upper computational chain.
- the inputs to the half-adder 22 are the deposited carry-bits in the D- flip flops 26 and 27.
- D-flip flops are arranged at the outputs Rl and Rh and also the X-line after a predefined, amount of circuits 20. This in order to do a practical distribution of X to all circuits 20. An embodiment for this is shown in Fig 10. , . _ .-, .
- Equation (5) the algorithm according to the invention, see Equation (5), is performed according to the method described below: ' 1.
- the operators A, B and the constants 1/m, m are fed into the memory 2 (step 3)
- the product A*B is computed (step 32) by feeding A in serial form at X in the arithmetic unit, utilising the converter 3, and B is fed in parallel form down to Y in the serial arithmetic unit 1. After n clock cycles the least significant bits of the result Pi have been fed into the converter 3b and the carry- values will be moved in parallel into the lower computational chain, whereupon the most significant bits of the result Pi are started to be fed into the converter 3 a. The next computational algorithm is started at the same time as the carry- values are moved.
- the result Pi is stored in the memory 2 (step 33) through the converters 3a and 3b after n clock cycles further; 3. In the same way as in paragraph 2 above the product Int[P ⁇ * 1/m] (step 34) is then computed.
- the result P 2 is stored in the memory 2 (step 35);
- the product P 2 *m is computed (step 36).
- the result P 3 is stored in the memory 2 (step 37);
- FIG 4 a block-diagram over a modulus multiplier is shown as a block diagram implemented as an exponentiation unit consisting of one input unit 4001, one modulus exponentiation unit 4002 and one output unit 4003.
- Fig 5 shows schematically a block diagram of a modulus multiplier according to an embodiment of the invention based on Barrett's method.
- This embodiment comprises an A*B-multiplier 5021, an AB*(l/m)-multiplier 5022, a k*m multiplier 5023 and a subtraction unit.
- a modulus exponentiation can be broken down into squares and multiplications with algorithms according to prior art, particularly Gordon, Daniel M., "A Survey of Fast Exponentiation Methods", Centre for Communications Research, San Diego Dec 30, 1997.
- Fig 6 shows a multiplication unit according to a preferred embodiment of the invention.
- An input unit 211 delivers to the multiplication unit 5021 bits Ai to be multiplied in serial form with the least significant bit first.
- this consists of the A-number from unit 211.
- the second number which should be multiplied, is delivered to the multiplication unit as bits Bi in parallel form, shown in schematic form with an input unit 212.
- the least significant bit in the number B is to farthest right in the figure.
- the numbers are multiplied serially, one bit at a time, in the multiplication unit 41,42 where preferably a special version of the embodiment is used in the first element. If the unit 42 is used as a first multiplication element, a complete addition unit is used for adding zero to a number. This can be simplified, which is shown as a unit 41 in Fig 6.
- the unit 41 comprises in the shown embodiment an AND-gate 1601 that receives Ai and Bi as input and feeds its output to delay unit 1602 in the form of a D-flip flop.
- the output signal from the delay unit 1602 is further fed to a first input of a multiplexer 1603, whose second input is fed with a logical zero.
- the output of the multiplier is fed to another delay unit 1604 in the form of a D-flip flop.
- the bit Ai and the output from the delay units 1602 and 1604 are fed to the next series connected multiplication unit.
- a selector unit 216 which is arranged to select and j oin the numbers that are fed out from the multiplier 42, in such way that the connection to other units are not unnecessarily obstructed but instead are simplified.
- a delay element 43 consisting of one single D-flip flop 605, with the purpose of introducing a phase shift to a signal, as it will be described below, to simplify the selector unit 216.
- the usefulness and the advantage with a modulus multiplier according to the invention are especially accentuated in a simplicity in the realization, and therefor unnecessary complications in surrounding units should be avoided if possible. This is accomplished as it has been shown in the examples by making simple processing in processing steps that are interfacing with surrounding units.
- a chain of addition units which together forms a serial multiplier 42 (same reference number as in Fig 6).
- the first addition unit is simplified, as shown in Fig 3 module 41.
- a part of the product is formed by a chain of addition units that multiplies and adds a bit each in the product.
- the chain has the same amount of cells for addition, 421, as there are bits in the parallel number B (212).
- the serial number A (211 in Fig 6) is fed in serial form to the multiplier, and shall simultaneously be used in all addition units 421. Since the chain can have a thousand or more elements, the signal path for the serial number will be long, and this limits the maximum available clock frequency. Therefore special delay elements 50 are preferably introduced in the chain, as shown in figure 1, which cuts the module in segments. This means that the serial signal path only has to go to a limited number of addition units 412. For simplicity two such delay elements 50 are shown in Fig 7. The introduction of the modules 50 implies that the output signal is delayed the same amount of clock cycles as the number of elements 50 in the chain of addition units 421. This means that the response time has increased.
- the throughput time has not been affected, and therefor (depending on the usage of the module) this is possibly of less importance.
- the number of delay elements 50 should have any special relation compared to the total length of the chain. Therefore the maximum number of addition units 421 that are connected between the delay stages 50 can be chosen with respect to the required maximum clock frequency, which is a demand for such a multiplier to support.
- a special case is noted when the multiplier has to be divided, due to special circumstances, in two parts that are placed at some mutual distance, and that two or more units 50 can be placed directly after each other somewhere in the module 42 if so required.
- Fig 8 shows an embodiment of the serialisation unit 211 that has the function to deliver a number (for example A) in serial form to the actual multiplier.
- the serialisation unit 211 includes a converter where the numbers comes in on a data-bus 81 (the bits A96...A94) which are loaded into a register, here realised as delay elements in the form of D-flipflops 83 and multiplexers (MUX-2) 82.
- the register is controlled by a control signal by which an external system via the multiplexers 82 can put a number in the upper chain of D-flip flops 83.
- the number is then copied to the lower chain of multiplexers 84 and delay elements 85 controlled by a reset signal CLR 605, which gives the number in a serial form with the correct phase to the multiplier.
- a zero 86 is fed into the first multiplexer (to the left in fig 8) in the lower chain to mark the end of the serial number.
- the serial input from the serialisation unit has the reference number 88.
- Fig 9 shows an embodiment of a unit for parallelisation 212, which has the purpose to convert a number for example from a data-bus to a parallel form suitable for the multiplier. It should be noted that if the multiplier is used with minimal throughput time it is necessary to change the parallel number in synchronisation with the processing of the addition chain 42. Since delay units 50 possibly are used, there is shown a design where an external signal LOAD 75 puts in a number B40...B39 with a multiplexer 71 in an upper chain of delay elements 72. The multiplexer 71 feeds the upper number when the control signal is a logical one, otherwise the lower number is fed through.
- the reset signal CLR is the first time named 605 but after the delay elements named 632 and controls then the exchange of the parallel number in a lower chain 74 by the multiplexer 73 controlled by the mentioned signal CLR 632, 605.
- Fig 10 shows an embodiment of a delay element 50 consisting of a number of clocked (clock 624) delay elements.
- the signals that are delayed 621-626 are shown coming in from the left in Fig 10, and coming out being available on the corresponding outputs 631-636 to the right in the figure.
- the clock 624 here schematically shown, is normally not included as an explicit signal.
- FIG 11 there is schematically shown a preferred embodiment of an addition unit 421, comprising a chain of full adders, so called FA 606, a chain of half adders 610, an arrangement (604, 607) controlled by a control signal and arranged for copying the state of the FA-chain to a second chain of states (608,612).
- This arrangement has the purpose to release the FA-chain 606 for the new multiplication of a new number when the serial number Ain 601 is ended, simultaneously with the completion of the multiplication in the lower chain and output of the most significant half of the product.
- the input signal Ain 601 is the incoming binary number for one of the factors in a serial form
- the input B6 602 is the second binary number in parallel form
- the output signal Ain 621 is a serial stream of output numbers.
- a multiplication element 603 for two binary numbers, a so called AND-gate an incoming control signal 605 which makes it possible to reset (CLR) the FA-chain 605 and connections for reset of the delay elements 604,607.
- CLR reset
- the upper signal path in multiplexers (609,611) is chosen when the control signal is a logical one.
- Fig 12 there is shown a multiplication unit consisting of a first stage 41 (see also fig 6) followed by two units of the type 421 as described in Fig 11. Thereafter there are a delay unit 50 and another unit 421. To the right in the figure is shown the mentioned delay 43 (see Fig 6), here in detail in its real context.
- Fig 12 shows how the unit 50 breaks up the horizontal signal paths.
- the unit in the picture is shown with separate control functions CLR and DOWN, where CLR resets the numbers in the FA-chain and where DOWN copies a state from the upper chain to the lower chain.
- the picture shows a direct reset of all delay units in the in the FA-chain, which thereby is done in one clock cycle.
- the same cycle is used for copying to the HA-chain in a so-called synchronous reset.
- all delay elements are implemented with a connection for a synchronous reset, thus this is not a particular problem.
- the two signals DOWN and CLR can preferably be combined into one signal, which gives a significant simplification when implementing into an FPGA, due to the lower number of control signals.
- CLR possibly needs to be active during many cycles, while the DOWN signal only is active during the first CLR-cycle.
- the delay units related to unit 50 are shown with reference number 50.
- Fig 13 there is shown a multiplier unit 42 IB that is elaborated differently than 421, where the units in the embodiment form 42 IB lacks half adders. This gives however an extra signal path.
- This embodiment can, depending on the used type of FPGA or ASIC, be more advantageous than the other.
- Fig 14 there is shown, as an example of an embodiment, an implementation in a chain according to the model in the previous figure (Fig 13), with double signal paths in the lower computational chain.
- a FA-adder has been included at the end of the chain, which results in identical output data from the two alternatives.
- the delay units belonging to the unit 50 are shown with reference number 50.
- Two delay units, belonging to the last unit 421, are included and shown with reference number 421.
- a modulus multiplier arranged according to the invention corresponding to Fig 5, with three consecutive multiplication units (213,214), (217,218) and (222,223).
- the arrangement has a serial input Ain (226), for example arranged according to the serialisation unit 211 in fig 8, and a parallel input Bin, for example arranged according to the parallelisation unit 212 in Fig 9.
- the multiplication of the numbers A and B is performed in the multiplication unit 213, which gives the least significant half of the product to the switching unit 216A in serial form with the least significant bit first.
- the most significant half is computed by the multiplication unit 214, after copying with the signal 616 (see unit 421 in Fig 11).
- the (n+1) least significant bits in the product (n bits from 213 and one bit from 214) are moved by the switching unit 216A to a memory (a so-called shift register) 219.
- the n+1 most significant bits (i.e. one bit from 213 and n bits from 214) are transferred as serial input data to the multiplication unit 217.
- the parallel number in the input unit 216B is used for multiplication in unit 217.
- the switching unit 220 is set to feed the n+1 most significant bits to unit 222, and this is the number W2 in Equation (2). This gives a truncation of the n-1 least significant bits (unit 217).
- the unit 220 reads the serial input signal from unit 219, and sends it out in a serial form to the unit 224 (this is the A*B-number).
- the module 225 is arranged to form the difference between the n+1 least significant bits (n bits from 222 and one bit from 223) and the AB number, which is received from the unit 224.
- the subtraction is performed according to Equation (4).
- One output 227 is drawn where the numbers rl and r2 are fed out in serial form.
- the subtraction can be performed in unit 225, with two serial subtraction units, simultaneously with the computation of the product k*m in unit 222.
- the units 213,217 and 222 can be realised as three similar units when multiplications are performed according to the invention. Moreover, the units 214, 218 and 223 can be realised as three similar units. If then it is chosen to implement the units 216A, 220 and 225 as a combination of these units, and it is chosen to activate the function in the function units respectively (216 A, 220 and 225) according to the above description, it is found that a modulus multiplier can preferably be elaborated with three consecutive identical units.
- the functionality in the units (216A, 220, 225) is increased in such way that these three units can work as delay units 50. Since the units (216A, 220, 225) only are using a fraction of the total area, a flexible modulus multiplier can be implemented as a long chain of units according to the figure.
- a logical unit (212, 213, 214, 215, 216 A) is formed, with a length adapted to the length in bits of the processed numbers.
- the configuration of the other units can be done in the same way. This procedure implies that it is possible to flexibly divide a long chain in multiple independent units depending on the length of the processed numbers.
- the throughput time in an embodiment according to the figure, is approximately n+1 clock cycles for the multiplication A*B (result to 219). Thereafter follows approximately n+1 clock cycles for the multiplication (AB)(l/m) in unit 217. Simultaneously the number (AB) is transferred from 219 to 224. Finally there follows approximately n+1 clock cycles when the number AB from 224 is subtracted with the product from 222, and the numbers rl and r2 are formed.
- the word "approximately" in this context refers to the fact that the configuration should be according to Equations 10a, 10b and that no time for possible delay units 50 has been incorporated.
- Fig. 15 shows a bit-serial subtraction unit 70 that is included in embodiments of the invention where numbers in bit serial form are subtracted.
- the subtraction unit 70 is put into a zero state with reset control signal 702.
- There is arranged an input Z 703, another input X 704 and an output DIFF 705 in such way that DIFF Z-X.
- 708 is meant an inverter
- 707 is a delay unit
- 709 is a multiplexer
- 710 is a constant logical one
- 706 is a three input full adder.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Storage Device Security (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Facsimile Transmission Control (AREA)
- Complex Calculations (AREA)
Abstract
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| AU2002211130A AU2002211130A1 (en) | 2000-10-17 | 2001-10-17 | Modular multiplication for rsa and other assymetric encryption/decryption |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| SE0003757A SE0003757L (sv) | 2000-10-17 | 2000-10-17 | Metod och anordning vid modulomultiplikation samt användning av metoden vid asymmetrisk kryptering/dekryptering |
| SE0003757-2 | 2000-10-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2002033885A1 true WO2002033885A1 (fr) | 2002-04-25 |
Family
ID=20281454
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/SE2001/002270 Ceased WO2002033885A1 (fr) | 2000-10-17 | 2001-10-17 | Multiplication modulaire pour systeme rsa et autre chiffrement/dechiffrement asymetrique |
Country Status (3)
| Country | Link |
|---|---|
| AU (1) | AU2002211130A1 (fr) |
| SE (1) | SE0003757L (fr) |
| WO (1) | WO2002033885A1 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2003093970A3 (fr) * | 2002-04-29 | 2004-07-15 | Infineon Technologies Ag | Dispositif et procede pour calculer un quotient entier |
| WO2003093969A3 (fr) * | 2002-04-29 | 2004-10-14 | Infineon Technologies Ag | Dispositif et procede de calcul d'un resultat d'une multiplication modulaire |
| RU2335092C2 (ru) * | 2003-04-14 | 2008-09-27 | Квэлкомм Инкорпорейтед | Оптимизация пропускной способности проводной сотовой сети |
| US7558817B2 (en) | 2002-04-29 | 2009-07-07 | Infineon Technologies Ag | Apparatus and method for calculating a result of a modular multiplication |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4405829A (en) * | 1977-12-14 | 1983-09-20 | Massachusetts Institute Of Technology | Cryptographic communications system and method |
| EP0502782A2 (fr) * | 1991-03-04 | 1992-09-09 | FORTRESS U&T (2000) Ltd. | Microcircuit pour l'implémentation des algorithmes RSA et de l'arithmétique ordinaire et modulaire, particulièrement l'exponentation, avec des grandes opérandes |
| EP0504996A2 (fr) * | 1991-03-22 | 1992-09-23 | Koninklijke Philips Electronics N.V. | Unité arthmétique pour multiplier des nombres entiers longs modus M et convertisseur RSA pourvu d'un tel dispositif de multiplication |
| US5499299A (en) * | 1993-07-02 | 1996-03-12 | Fujitsu Limited | Modular arithmetic operation system |
-
2000
- 2000-10-17 SE SE0003757A patent/SE0003757L/ not_active IP Right Cessation
-
2001
- 2001-10-17 WO PCT/SE2001/002270 patent/WO2002033885A1/fr not_active Ceased
- 2001-10-17 AU AU2002211130A patent/AU2002211130A1/en not_active Abandoned
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4405829A (en) * | 1977-12-14 | 1983-09-20 | Massachusetts Institute Of Technology | Cryptographic communications system and method |
| EP0502782A2 (fr) * | 1991-03-04 | 1992-09-09 | FORTRESS U&T (2000) Ltd. | Microcircuit pour l'implémentation des algorithmes RSA et de l'arithmétique ordinaire et modulaire, particulièrement l'exponentation, avec des grandes opérandes |
| EP0504996A2 (fr) * | 1991-03-22 | 1992-09-23 | Koninklijke Philips Electronics N.V. | Unité arthmétique pour multiplier des nombres entiers longs modus M et convertisseur RSA pourvu d'un tel dispositif de multiplication |
| US5499299A (en) * | 1993-07-02 | 1996-03-12 | Fujitsu Limited | Modular arithmetic operation system |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2003093970A3 (fr) * | 2002-04-29 | 2004-07-15 | Infineon Technologies Ag | Dispositif et procede pour calculer un quotient entier |
| WO2003093969A3 (fr) * | 2002-04-29 | 2004-10-14 | Infineon Technologies Ag | Dispositif et procede de calcul d'un resultat d'une multiplication modulaire |
| US7558817B2 (en) | 2002-04-29 | 2009-07-07 | Infineon Technologies Ag | Apparatus and method for calculating a result of a modular multiplication |
| RU2335092C2 (ru) * | 2003-04-14 | 2008-09-27 | Квэлкомм Инкорпорейтед | Оптимизация пропускной способности проводной сотовой сети |
Also Published As
| Publication number | Publication date |
|---|---|
| SE517045C2 (sv) | 2002-04-09 |
| SE0003757L (sv) | 2002-04-09 |
| AU2002211130A1 (en) | 2002-04-29 |
| SE0003757D0 (sv) | 2000-10-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN1503937B (zh) | 扩展精度累加器 | |
| Kwon et al. | Two implementation methods of a 1024-bit RSA cryptoprocessor based on modified Montgomery algorithm | |
| JP2722413B2 (ja) | モンゴメリ法によるモジュラ乗算の実施方法 | |
| US6763365B2 (en) | Hardware implementation for modular multiplication using a plurality of almost entirely identical processor elements | |
| EP0531158A2 (fr) | Méthode et appareil pour le chiffrement et le déchiffrement de données de communication | |
| JPH08263315A (ja) | モンゴメリ法によるモジュラリダクションの実施方法 | |
| KR102496446B1 (ko) | 모듈러 연산을 위한 워드 병렬 연산 방법 | |
| US5261001A (en) | Microcircuit for the implementation of RSA algorithm and ordinary and modular arithmetic, in particular exponentiation, with large operands | |
| CN109814838B (zh) | 获取加解密运算中的中间结果组的方法、硬件装置和系统 | |
| US6978016B2 (en) | Circuits for calculating modular multiplicative inverse | |
| US5121429A (en) | Digital signal processing | |
| US7046800B1 (en) | Scalable methods and apparatus for Montgomery multiplication | |
| US6917956B2 (en) | Apparatus and method for efficient modular exponentiation | |
| US6914983B2 (en) | Method for checking modular multiplication | |
| Hossain et al. | Efficient fpga implementation of modular arithmetic for elliptic curve cryptography | |
| US6804696B2 (en) | Pipelining operations in a system for performing modular multiplication | |
| US6963977B2 (en) | Circuits and methods for modular exponentiation | |
| WO2002033885A1 (fr) | Multiplication modulaire pour systeme rsa et autre chiffrement/dechiffrement asymetrique | |
| O'Rourke et al. | Achieving NTRU with Montgomery multiplication | |
| US6963645B2 (en) | Method for implementing the chinese remainder theorem | |
| Nedjah et al. | Reconfigurable hardware implementation of Montgomery modular multiplication and parallel binary exponentiation | |
| US7607165B2 (en) | Method and apparatus for multiplication and/or modular reduction processing | |
| Wang et al. | New VLSI architectures of RSA public-key cryptosystem | |
| US5912904A (en) | Method for the production of an error correction parameter associated with the implementation of modular operations according to the Montgomery method | |
| US6230178B1 (en) | Method for the production of an error correction parameter associated with the implementation of a modular operation according to the Montgomery method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |