The Cortex M4, M7 MCUs and the Cortex A CPUs support the ARM DSP
instructions, and especially the umaal instruction which greatly
speed up MULADDC code. In addition the patch switched the ASM
constraints to registers instead of memory, giving the opportunity
for the compiler to load them the best way.
The speed improvement is variable depending on the crypto operation
and the CPU. Here are the results on a Cortex M4, a Cortex M7 and a
Cortex A8. All tests have been done with GCC 6.3 using -O2. RSA uses a
RSA-4096 key. ECDSA uses a secp256r1 curve EC key pair.
+--------+--------+--------+
| M4 | M7 | A8 |
+----------------+--------+--------+--------+
| ECDSA signing | +6.3% | +7.9% | +4.1% |
+----------------+--------+--------+--------+
| RSA signing | +43.7% | +68.3% | +26.3% |
+----------------+--------+--------+--------+
| RSA encryption | +3.4% | +9.7% | +3.6% |
+----------------+--------+--------+--------+
| RSA decryption | +43.0% | +67.8% | +22.8% |
+----------------+--------+--------+--------+
I ran the whole testsuite on the Cortex A8 Linux environment, and it
all passes.
Yotta is no longer supported by Mbed TLS, so has been removed. Specifically, the
following changes have been made:
* references to yotta have been removed from the main readme and build
instructions
* the yotta module directory and build script has been removed
* yotta has been removed from test scripts such as all.sh and check-names.sh
* yotta has been removed from other files that that referenced it such as the
doxyfile and the bn_mul.h header
* yotta specific configurations and references have been removed from config.h
We don't compile in the assembly code if compiler optimisations are disabled as
the number of registers used in the assembly code doesn't work with the -O0
option. Also anyone select -O0 probably doesn't want to compile in the assembly
code anyway.
This fix adds the ebx register to the clobber list for the i386 inline assembly
for the multiply helper function.
ebx was used but not listed, so when the compiler chose to also use it, ebx was
getting corrupted. I'm surprised this wasn't spotted sooner.
Fixes Github issues #1550.
On x32, pointers are only 4-bytes wide and need to be loaded using the "movl"
instruction instead of "movq" to avoid loading garbage into the register.
The MULADDC routines for x86-64 are adjusted to work on x32 as well by getting
gcc to load all the registers for us in advance (and storing them later) by
using better register constraints. The b, c, D and S constraints correspond to
the rbx, rcx, rdi and rsi registers respectively.