Integer-based mathematical subroutines which implement RISC-V M and B extension functionality on I, and Zmmul instruction sets. There is partial support for the reduced functionality E instruction set where this is easy to do and doesn't involve a performance tradeoff.
These routines are designed to be both concise and efficient, although where those two goals clash, I went with concise.
- 32-bit by 32-bit signed and unsigned division with 32-bit result and remainder.
- Fast 32-bit unsigned division by 3
- Fast 32-bit unsigned division by 5
- Fast 32-bit unsigned division by 6
- Fast 32-bit unsigned division by 7
- Fast 32-bit unsigned division by 9
- Fast 32-bit unsigned division by 10
- Fast 32-bit unsigned division by 11
- Fast 32-bit unsigned division by 12
- Fast 32-bit unsigned division by 13
- Fast 32-bit unsigned division by 100
- Fast 32-bit unsigned division by 1000
- Fast 32-bit signed division by 3
- Fast 32-bit signed division by 5
- Fast 32-bit signed division by 6
- Fast 32-bit signed division by 7
- Fast 32-bit signed division by 9
- Fast 32-bit signed division by 10
- Fast 32-bit signed division by 11
- Fast 32-bit signed division by 12
- Fast 32-bit signed division by 13
- Fast 32-bit signed division by 100
- Fast 32-bit signed division by 1000
- 64-bit by 64-bit signed and unsigned division with 64-bit result and remainder.
- Fast 64-bit unsigned division by 3
- Fast 64-bit unsigned division by 5
- Fast 64-bit unsigned division by 6
- Fast 64-bit unsigned division by 7
- Fast 64-bit unsigned division by 9
- Fast 64-bit unsigned division by 10
- Fast 64-bit unsigned division by 11
- Fast 64-bit unsigned division by 12
- Fast 64-bit unsigned division by 13
- Fast 64-bit unsigned division by 100
- Fast 64-bit unsigned division by 1000
- Fast 64-bit signed division by 3
- Fast 64-bit signed division by 5
- Fast 64-bit signed division by 6
- Fast 64-bit signed division by 7
- Fast 64-bit signed division by 9
- Fast 64-bit signed division by 10
- Fast 64-bit signed division by 11
- Fast 64-bit signed division by 12
- Fast 64-bit signed division by 13
- Fast 64-bit signed division by 100
- Fast 64-bit signed division by 1000
- abs (absolute value) of 32 or 64 bit register.
- 32-bit by 32-bit signed and unsigned multiplication with 32-bit result.
- 32-bit by 32-bit signed and unsigned multiplication with 64-bit result.
- 32-bit by 32-bit signed and unsigned multiplication with 64-bit result.
- 64-bit by 64-bit signed and unsigned multiplication with 64-bit result.
- 64-bit by 64-bit signed and unsigned multiplication with 128-bit result.
- Multiply 32-bit or 64-bit register by 3.
- Multiply 32-bit or 64-bit register by 5.
- Multiply 32-bit or 64-bit register by 6.
- Multiply 32-bit or 64-bit register by 9.
- Multiply 32-bit or 64-bit register by 10.
- Multiply 32-bit or 64-bit register by 11.
- Multiply 32-bit or 64-bit register by 12.
- Multiply 32-bit or 64-bit register by 13.
- Multiply 32-bit or 64-bit register by 100.
- Multiply 32-bit or 64-bit register by 1000.
These operations support 32-bit numbers on 32-bit architectures and 64-bit numbers on 64-bit architectures. All routines are RV32E, RV32I, and RV64I compatible.
- ASCII binary to binary.
- ASCII unsigned decimal to binary.
- ASCII signed decimal to two's complement binary.
- ASCII hexadecimal to binary.
- binary to ASCII binary.
- two's complement binary to ASCII signed decimal.
- unsigned binary to unsigned ASCII decimal.
- binary to ASCII hexadecimal.
- 32-bit integer square root on 32-bit processors.
- 64-bit integer square root on 64-bit processors.
- 32-bit GCD of two unsigned 32-bit numbers on 32-bit processors.
- 64-bit GCD of two unsigned 64-bit numbers on 64-bit processors.
- 32-bit LCM of two unsigned 32-bit numbers on 32-bit processors.
- 64-bit LCM of two unsigned 64-bit numbers on 64-bit processors.
- Count leading zeroes in 32-bit number on 32-bit processors.
- Count leading zeroes in 64-bit number on 64-bit processors.
- Count trailing zeroes in 32-bit number on 32-bit processors.
- Count trailing zeroes in 64-bit number on 64-bit processors.
clang, lld, and make are assumed. I'm currently using clang 18. The TARGET, ARCH, and ABI should be edited in the Makefile, and the CPU_BITS constant should be set to match in config.s HAS_... flags should be set for available extensions.
A static library, librvint.a, will be created to link against.
Only certain routines support RV32E. The routines that do not currently give compile errors; this will be fixed in the future. RV64E is not a focus as I am unaware of any real world implementations. The Makefile has a CH32V003 target that is an RV32EC_zicsr target.
The test programs assume Linux syscalls are available. In particular syscall 64 is write and 93 is exit.
- I use this emulator to run RV32I code: https://riscv-programming.org/ale/
- I'm running RV64 code on a scaleway elastic metal cloud instance: https://labs.scaleway.com/en/em-rv1/
Bit manipulation - bits.s:
These routines implement functionality in the ZBB ISA extension for CPUs which do not have this available. Despite being efficiently implemented (they run in O(log n) time), they are heavyweight enough that they are not useful in places like inner loops in division routines.
Count number of trailing zeroes on CPUs with no ZBB extension. O(log n). RV32E/RV32I/RV64I compatible.
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Best Case | 3 | 3 |
| Average Case | ~21 | ~25 |
| Worst Case | 23 | 27 |
a0 = number
a0 = count of trailing zeroes
Count number of leading zeroes on CPUs with no ZBB extension. O(log n). RV32E/RV32I/RV64I compatible.
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Best Case | 3 | 3 |
| Average Case | ~21 | ~25 |
| Worst Case | 23 | 27 |
a0 = number
a0 = count of leading zeroes
Division - div.s:
Signed and unsigned division for CPUs without dividers.
The initial implementation of divremu came from the division routine in the rvmon monitor. I believe this code originally came from Bruce Hout on reddit. It has been heavily extended to minimize the number of cycles by using ISA extensions and restructuring the code by selectively unrolling parts. The divrem routine is a wrapper which enables signed division.
Unsigned integer division for CPUs without M extension. RV32E/RV32I/RV64I compatible. Restoring division algorithm. Available in ROLLED (compact) and UNROLLED (faster) versions (this is selected in config.s with the DIVREMU_UNROLLED flag). Worst-case performance:
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA (Rolled) | 455 | 903 |
| Base ISA (Unrolled) | 351 | 695 |
| With Extensions (Rolled) | 391 | 775 |
| With Extensions (Unrolled) | 287 | 567 |
a0 = dividend a1 = divisor
a0 = quotient a1 = remainder
Signed integer division that rounds towards zero. RV32E/RV32I/RV64I compatible restoring division algorithm. Uses divremu, so performance depends upon whether that routine is used in ROLLED or UNROLLED form. Worst case performance:
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA (Rolled) | 496 | 944 |
| Base ISA (Unrolled) | 392 | 736 |
| With Extensions (Rolled) | 430 | 814 |
| With Extensions (Unrolled) | 326 | 606 |
a0 = dividend a1 = divisor
a0 = quotient a1 = remainder
The following division routines are branchless, straight line code that execute in a fixed number of cycles for all inputs, and thus are suitable for cryptographic applications.
Initial implementation was done from Hacker's Delight, 2nd edition. The algorithms were extended to 64 bits, and agressively optimized using the peculiar features of the RISC-V ISA to minimize the cycle count. The algorithmic approach is to use a series expansion to estimate the quotient, then an estimated remainder is calculated, and the quotient is corrected.
Unsigned integer divide by 3. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 20 | 22 |
| With Extensions | 17 | 19 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 5. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 15 | 17 |
| With Extensions | 14 | 16 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 6. You may be better off calling div3u and shifting the result. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 18 | 22 |
| With Extensions | 17 | 21 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 7. Note that only the base ISA is used here, so there is no accelleration from ISA extensions. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 16 | 20 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 9. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 15 | 17 |
| With Extensions | 14 | 16 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 10. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 16 | 18 |
| With Extensions | 15 | 17 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 11. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 18 | 20 |
| With Extensions | 15 | 17 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 12. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 17 | 19 |
| With Extensions | 16 | 18 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 13. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 18 | 20 |
| With Extensions | 16 | 18 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 100. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 10 | 21 |
| With Extensions | 17 | 19 |
a0 = dividend
a0 = quotient
Unsigned integer divide by 1000. This basically divides by 10, then by 100. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 37 | 41 |
| With Extensions | 35 | 39 |
a0 = dividend
a0 = quotient
Signed integer divide by 3. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 25 | 27 |
| With Extensions | 22 | 24 |
a0 = dividend
a0 = quotient
Signed integer divide by 5. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 24 | 26 |
| With Extensions | 23 | 25 |
a0 = dividend
a0 = quotient
Signed integer divide by 6. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 27 | 31 |
| With Extensions | 26 | 30 |
a0 = dividend
a0 = quotient
Signed integer divide by 7. Note that ISA extensions are not used by this routine. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 25 | 29 |
a0 = dividend
a0 = quotient
Signed integer divide by 9. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 24 | 26 |
| With Extensions | 23 | 25 |
a0 = dividend
a0 = quotient
Signed integer divide by 10. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 25 | 27 |
| With Extensions | 24 | 26 |
a0 = dividend
a0 = quotient
Signed integer divide by 11. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 27 | 29 |
| With Extensions | 24 | 26 |
a0 = dividend
a0 = quotient
Signed integer divide by 12. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 26 | 28 |
| With Extensions | 25 | 27 |
a0 = dividend
a0 = quotient
Signed integer divide by 13. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 27 | 31 |
| With Extensions | 25 | 29 |
a0 = dividend
a0 = quotient
Signed integer divide by 100. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 24 | 26 |
| With Extensions | 22 | 24 |
a0 = dividend
a0 = quotient
Signed integer divide by 1000. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 39 | 43 |
| With Extensions | 36 | 40 |
a0 = dividend
a0 = quotient
The below remainder routines are based on Hacker's Delight 2nd Edition, but this and the other remainder routines I didn't implement are of limited utility as it's faster to calculate the quotient and calculate the remainder from that.
I could modify the above routines to also return the remainder in addition to the quotient, which seems useful, but would likely cost 3-4 instructions per routine. Not yet sure if I want to do this or not, I'm interested which approach library users would find most helpful.
These routines run in constant time and are branchless.
unsigned integer remainder after division by 3. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 21 | 23 |
| With Extensions | 19 | 21 |
a0 = dividend
a0 = quotient
signed integer remainder after division by 3. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 24 | 26 |
| With Extensions | 22 | 24 |
a0 = dividend
a0 = quotient
Algorithms: gcd.s
These routines calculate the greatest common divisor (gcd) and least common multiple (lcm) of two numbers. They can take advantage of the M and Zbb extensions, if available.
Computes greatest common divisor of two numbers, gcd(u,v). Can take advantage of M and Zbb extensions. With M extension it uses Euclid's algorithm. If M is not present, it computes the gcd via Stein's algorithm. O(log n). RV32E/RV32I/RV64I
Average Cycle Counts
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | ~500 | ~1000 |
| With Zbb | ~150 | ~300 |
| With M | ~100 | ~400 |
a0 = u
a1 = v
a0 = result
Computes the least common multiple of two numbers, lcm(u, v). Computes via lcm(u,v) = (u / gcd(u,v)) * v. Can take advantage of M and Zbb extensions. O(log n). RV32E/RV32I/RV64I
Average cycle counts
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | ~1200 | ~2500 |
| With Zbb | ~750 | ~1500 |
| With M | ~250 | ~500 |
a0 = u
a1 = v
a0 = result
I/O and Base Conversion io.s
Convert a value in a register to an ASCII hexadecimal string. RV32E/RV32I/RV64I
Cycle Counts
| Configuration | Prefix? (a2) | Cycle Count | Notes |
|---|---|---|---|
| 8-bit (Byte) | No | 31 | 2 nibbles |
| 32-bit (Word) | No | 97 | 8 nibbles |
| 32-bit (Word) | Yes (0x) | 100 | Add prefix overhead |
| 64-bit (Long) | No | 185 | 16 nibbles (RV64 only) |
a0 = number to convert to ASCII hex
a1 = number of bytes to convert (eg 1, 2, 4, 8)
a2 = 0 do not insert leading 0x, 1 insert leading 0x
a0 = address of nul (\0) terminated buffer with output
a1 = length of string
Convert a value in a register to an ASCII binary string. RV32E/RV32I/RV64I
Cycle Counts with a2=3
| Configuration | Input Width | Cycle Count |
|---|---|---|
| Base ISA | 32 | 438 |
| Base ISA | 64 | 870 |
| With Zbs | 32 | 406 |
| With Zbs | 64 | 806 |
a0 = number to convert to ascii binary
a1 = number of bytes to convert (eg 1, 2, 4, 8)
a2 = flags (0=none, 1=0b prefix, 2=spaces every 8 bits, 3=both)
a0 = address of nul (\0)-terminated buffer with output
a1 = length of string
Convert a value in a register to an ASCII unsigned decimal string. RV32E/RV32I/RV64I/RV128I This routine has an inline implemention of the div10u series expansion in it.
| Configuration | Cycles (32) | Cycles (64) | Cycles (128) |
|---|---|---|---|
| Base ISA | 230 | 590 | 1,450 |
| Base + Zba | 194 | 495 | 1,220 |
| Base + M | ~410 | ~850 | Not recommended |
a0 = unsigned number to convert to ascii decimal
a0 = address of nul (\0)-terminated buffer with output
a1 = length of string
Convert a value in a register to an ASCII signed decimal string. RV32E/RV32I/RV64I/RV128I This routine is a wrapper around to_decu.
| Configuration | Cycles (32) | Cycles (64) | Cycles (128) |
|---|---|---|---|
| Base ISA | ~245 | ~605 | ~1,465 |
| Base + Zba | ~209 | ~510 | ~1,235 |
| Base + M | ~425 | ~865 | Not recommended |
a0 = signed number to convert to ascii decimal
a0 = address of nul (\0)-terminated buffer with output
a1 = length of string
Convert an ASCII Hexadecimal string to binary. RV32E/RV32I/RV64I
Cycle count, assuming Hexadecimal string is half digits, half letters.
| Cycles (32) | Cycles (64) |
|---|---|
| 90 | 170 |
a0 = pointer to ASCII Hexadecimal string, terminated with a non-hex char.
a0 = pointer (advanced to point to non-hex char)
a1 = number
a2 = error check: 0 if no digits found, otherwise 1
Convert an ASCII binary string to binary. RV32E/RV32I/RV64I
| Cycles (32) | Cycles (64) |
|---|---|
| 331 | 651 |
a0 = pointer to ASCII binary string, terminated with a non-binary char (not 0 or 1)
a0 = pointer (advanced to point to non-binary char)
a1 = number
a2 = error check: 0 if no digits found, otherwise 1
Read an ASCII unsigned decimal string into a register. The parsing of the value stops when we read the first non-decimal character. RV32E/RV32I/RV64I
Assume 10 digits for 32 bit case, 20 digits for 64 bit case:
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | 131 | 251 |
| With Zba | 121 | 231 |
a0 = pointer to number to convert from decimal, terminated with non-decimal char.
a0 = pointer (advanced to point to non-decimal char)
a1 = number
a2 = error check: 0 if no digits found, otherwise 1
Read an ASCII signed decimal string into a register. The parsing of the value stops when we read the first non-decimal character.
Assume 10 digits for 32 bit case, 20 digits for 64 bit case:
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | ~151 | ~271 |
| With Zba | ~141 | ~251 |
a0 = pointer to number to convert from decimal, terminated with non-decimal char.
a0 = pointer (advanced to point to non-decimal char)
a1 = number
a2 = error check: 0 if no digits found, otherwise 1
Math macros mac.s
A macro to compute the absolute value of a number.
The src and dest registers can be the same if src does not need to be preserved.
| Configuration | Cycles |
|---|---|
| Base ISA | 3 |
| Zbb Extension | 2 |
abs dest src scratch
src - input register
scratch - temporary register
dest - output register
Native word length multiplication (32 on RV32I/E, 64 on RV64I). Intended for CPUs without M extension.
| Configuration | Cycles (32) | Cycles (64) | Notes |
|---|---|---|---|
| Base ISA | ~270 | ~540 | Scales via bit width |
| Zbb | ~170 | ~330 | Scales with popcount |
a0 = multiplicand
a1 = multiplier
a0 = product (low order)
Unified (RV64E, RV32I, and RV32I) Unsigned/Signed 32x32-bit to 64-bit multiply.
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | ~415 | ~280 |
| Zbb Extension | ~410 | ~165 |
a0 = op1
a1 = op2
a2 = signed_flag: 0=unsigned, 1=signed
a0 = product low word
a1 = product high word
64x64 bit to 128-bit multiplication (signed/unsigned) on RV64I CPUs.
| Configuration | Cycles |
|---|---|
| Base ISA | ~710 |
| Zbb Extension | ~660 |
a0 = 64-bit Operand 1 (multiplicand)
a1 = 64-bit Operand 2 (multiplier)
a2 = Signedness flag (0 for unsigned, non-zero for signed)
a0 = Lower 64 bits of the 128-bit product
a1 = Upper 64 bits of the 128-bit product
Multiplication Macros mul-macs.s
Macros which implement multiplication by a constant. All macros are designed such that dest and src registers can be the same, or destination and scratch registers can be the same.
Multiply a register by 3.
| Configuration | Cycles |
|---|---|
| Base ISA | 2 |
| Zba Extension | 1 |
mul3 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*3
Multiply a register by 5.
| Configuration | Cycles |
|---|---|
| Base ISA | 2 |
| Zba Extension | 1 |
mul5 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*5
Multiply a register by 6.
| Configuration | Cycles |
|---|---|
| Base ISA | 3 |
| Zba Extension | 2 |
mul6 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*6
Multiply a register by 9.
| Configuration | Cycles |
|---|---|
| Base ISA | 2 |
| Zba Extension | 1 |
mul9 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*9
Multiply a register by 10.
| Configuration | Cycles |
|---|---|
| Base ISA | 3 |
| Zba Extension | 2 |
mul10 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*10
Multiply a register by 11.
| Configuration | Cycles |
|---|---|
| Base ISA | 4 |
| Zba Extension | 2 |
mul11 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*11
Multiply a register by 12.
| Configuration | Cycles |
|---|---|
| Base ISA | 3 |
| Zba Extension | 2 |
mul12 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*12
Multiply a register by 13.
| Configuration | Cycles |
|---|---|
| Base ISA | 4 |
| Zba Extension | 2 |
mul13 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*13
Multiply a register by 100.
| Configuration | Cycles |
|---|---|
| Base ISA | 5 |
| Zba Extension | 3 |
mul100 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*100
Multiply a register by 1000.
| Configuration | Cycles |
|---|---|
| Base ISA | 5 |
| Zba Extension | 4 |
mul1000 dest src scratch0
src = source register containing number to multiply.
scratch0 = scratch register used for intermediate computation
dest = destination register containing src*1000
Square Root sqrt.s
Integer square root - floor(sqrt(n)) using a binary non-restoring algorithm. RV32E/RV32I/RV64I
| Configuration | Cycles (32) | Cycles (64) |
|---|---|---|
| Base ISA | ~130 | ~260 |
| Zbb Extension | ~105 | ~ 210 |
a0 = number to compute square root of
a0 = root
