Nimbus update: May 1

Wherever you may be, we hope you're able to see some light at the end of the tunnel. In the hopes of distracting you from the world (however briefly), here's an end of month update from us.

Nimbus update: May 1

Wherever you may be, we hope you're able to see some light at the end of the tunnel. In the hopes of distracting you from the world (however briefly), here's an end of month update from us.

Security audit request for proposals‌‌

Today, we're announcing a Request for Proposals (RfP) for a new type of external security review of Nimbus.

We expect this engagement to differ from typical time-boxed security audits; we want to incentivise the security community to focus their attention on both the eth2 development process and the Nim language itself (proxied through our implementation) over an extended period of time.

The scope of this security engagement includes a broad range of components including networking, consensus logic, cryptography and user experience.

The assessment will focus on identifying vulnerabilities that can lead to the following:

  • Denial-of-service conditions
  • Remote code execution
  • Data integrity loss
  • Underflows and overflows
  • Consensus splits
  • Operations pool halt
  • Unspecified/unexpected client behaviour

The proposed timeline of this work is a span of 4 months, broken up into 4 phases. The deadline for proposals is May 24th. If you'd like to find out more, see the official request.

Schlesi testnet

As you may have seen, there's an eth2 multi-client testnet (Schlesi) up and running! So far, it contains both Prysm and Lighthouse clients.‌‌

We are ironing out the final compatibility issues to fully join this network.

Mainnet testnet configuration

We managed to compile one of our test-nets (testnet1) to a mainnet configuration. While it works, we are experiencing performance problems.

As we mentioned last time, a mainnet configuration requires many more validators, larger committees, and four times more slots per epoch. So some performance problems are to be expected.

For example, upon switching to a mainnet config we had immediate beacon state stack overflow errors  on android (where the stack size is considerably smaller than on desktop).

Note: the beacon state exceeds 1MB which is the maximum stack size on restricted environments such as Android - if we place it on the stack, we get stack overflow errors.

In parallel, we're continuing to optimise the state-transition function. On this front, we've made considerable speed improvements over the last week.

These speed improvements come from reducing the number of times we call the hash-tree-root function (a function which takes as input a Merkle tree and outputs a hash) and using more caching instead.

Note: there's an important tradeoff however between memory and performance. The more we cache, the better the performance, but the greater our application state too. We need to be careful not to cache too much because we need the application state to stay under 500 MB for Nimbus to be able to run comfortably on a phone or Raspberry Pi.

Compatibility with specification v0.11.1

We're now pretty much compatible with v0.11.1. We've integrated the remaining EF tests. And state transition and SSZ parsing have been fuzzed.

We've fixed four different errors that were found by fuzzing (a couple concerning SSZ parsing related to reading variably-sized objects). There's a fifth that's still outstanding (which, as it stands, might well be blocking us from fully joining Schlesi).

Mobile benchmark results

We have some benchmarks in place with respect to staking, signing and validating on a phone.

Here are our results on a Huawei P20 Lite phone (2018 entry/midrange phone) with a HiSilicon Kirin 659 2360MHz (ARM v8) processor using Milagro in 64-bit mode:‌

Compiled with GCC
Optimization level => no optimization: false | release: true | danger: true
Using Milagro with 64-bit limbs

=================================================================================================================

Scalar multiplication G1                                    284.394 ops/s      3516247 ns/op
Scalar multiplication G2                                    102.853 ops/s      9722601 ns/op
EC add G1                                                105864.916 ops/s         9446 ns/op
EC add G2                                                 36911.265 ops/s        27092 ns/op
Pairing (Milagro builtin double pairing)                     50.957 ops/s     19624477 ns/op
Pairing (Multi-Pairing with delayed Miller and Exp)          50.068 ops/s     19972640 ns/op

⚠️ Warning: using draft v5 of IETF Hash-To-Curve (HKDF-based).
           This is an outdated draft.

Hash to G2 (Draft #5)                                       117.156 ops/s      8535621 ns/op

Signing involves both a Hash to G2 operation and a G2 scalar multiplication . Verification involves both a Hash to G2 and a Pairing

In sum, we're able to do 50 pairings per second on a standard mobile phone.

Compared to state of the art (MCL) signing and verification, what we're using is twice slower on ARM. But the tradeoff here is about security;‌
‌we're using constant time operations that, although slower, guarantee that not one bit from your secret keys is leaked (without constant time ops, it's possible for an attacker to recover a key by observing the time an op takes).

See here for the gory benchmark details, and here for more on our elliptic curves arithmetic.

Discovery improvements

We continue working on improving the error handling in discovery v5. We've tested the Schlesi discovery interop, which works, although we're looking into an issue with Schlesi bootstrap node ENRs containing local ip addresses.

Technical debt repayments

Lots of error handling work this week. We've been working on the codebase quality so that we cover the failure case instead of just focusing on the happy path.

This has meant majorly refactoring the way we handle errors in our libraries.

Specifically, runtime exceptions have been reworked to warnings in non-essential components (such as metrics). This decreases the risk of these components disrupting core functionality.

We've also made changes to our API to make it harder to shoot yourself in the foot.

Libp2p updates

Last month marked the point where nim-libp2p started being used in nim-beacon-chain. As expected, this revealed to us many bugs, leaks and general issues.

This month the focus has very much been on fixes and optimisations. We've also spent some time getting it in shape for the upcoming audit.

So no major features on this front, just a lot of refactoring.

Tutorials

We've continued making progress with our nim-libp2p tutorials.‌‌

We now have two more tutorials that build off the first in our series.

This trilogy assumes as little prior knowledge as possible, taking you from an introduction to Nim's basic syntax to defining and establishing a customised libp2p node. If you haven't had the chance yet, check it out!

We're hiring

We're looking for a distributed networking engineer to join the team. As with all our positions, this position is remote. If you think you might be interested, see here for the complete posting.‌


That's it from us! We hope you enjoyed this update, we’ll be posting another  soon. Until then, all the best from all of us here at Nimbus 💛