Optimization Tricks

Runtime overhead in Substrate corresponds to the efficiency of the underlying Rust code. Therefore, it is essential to use clean, efficient Rust patterns for performance releases. This section introduces common approaches for optimizing Rust code in general and links to resources that may guide further investigation.

This section was inspired by and pulls heavily from

Premature Optimization

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. - Page 268 of Structured Programming with goto Statements by Donald Knuth

Before worrying about performance optimizations, focus on optimizing for readability, simplicity, and maintainability. The first step when building anything is achieving basic functionality. Only after establishing a minimal viable sample is it appropriate to consider performance-based enhancements. With that said, severe inefficiency does open attack vectors for Substrate runtimes (see the next section). Moreover, the tradeoff between optimization and simplicity is not always so clear...

A common misconception is that optimized code is necessarily more complicated, and that therefore optimization always represents a trade-off. However, in practice, better factored code often runs faster and uses less memory as well. In this regard, optimization is closely related to refactoring, since in both cases we are paying into the code so that we may draw back out again later if we need to. - src

Rust API Guidelines

Also, use clippy!

Efficiency => Security in Substrate

We call an algorithm efficient if its running time is polynomial in the size of the input, and highly efficient if its running time is linear in the size of the input. It is important for all on-chain algorithms to be highly efficient, because they must scale linearly as the size of the Polkadot network grows. In contrast, off-chain algorithms are only required to be efficient. - Web3 Research

See Substrate Best Practices for more details on how efficiency influences the runtime's economic security.

Related Reading

Rust Zero-Cost Abstractions

Substrate developers should take advantage of Rust's zero cost abstractions.

Articles

Tweets

Video

Entering unsafe Waters 🏴‍☠️

Please read The Rustonomicon before experimenting with the dark magic that is unsafe

To access an element in a specific position, use the get() method. This method performs a double bound check.


# #![allow(unused_variables)]
#fn main() {
for arr in array_of_arrays {
    if let Some(elem) = arr.iter().get(1738) {
        println!("{}", elem);
    }
}
#}

The .get() call performs two checks:

  1. checks that the index will return Some(elem) or None
  2. checks that the returned element is of type Some or None

If bound checking has already been performed independently of the call, we can invoke .getunchecked() to access the element. Although this is unsafe to use, it is equivalent to C/C++ indexing, thereby improving performance when we already know the element's location.


# #![allow(unused_variables)]
#fn main() {
for arr in array_of_arrays {
    println!("{}", unsafe { arr.get_unchecked(1738) })
}
#}

NOTE: if we don't verify the input to .getunchecked(), the caller may access whatever is stored in the location even if it is a memory address outside the slice

Fearless Concurrency && Asynchrony

As a systems programming language, Rust provides significant flexibility with respect to low-level optimizations. Specifically, Rust provides fine-grain control over how you perform computation, delegate said computation to the OS's threads, and schedule state transitions within a given thread. There isn't space in this book to go into significant detail, but I'll try to provide resources/reading that have helped me get up to speed. For a high-level overview, Stjepan Glavina provides the following descriptions in Lock-free Rust: Crossbeam in 2019:

  • Rayon splits your data into distinct pieces, gives each piece to a thread to do some kind of computation on it, and finally aggregates results. Its goal is to distribute CPU-intensive tasks onto a thread pool.
  • Tokio runs tasks which sometimes need to be paused in order to wait for asynchronous events. Handling tons of such tasks is no problem. Its goal is to distribute IO-intensive tasks onto a thread pool.
  • Crossbeam is all about low-level concurrency: atomics, concurrent data structures, synchronization primitives. Same idea as the std::sync module, but bigger. Its goal is to provide tools on top of which libraries like Rayon and Tokio can be built.

To dive deeper down these 🐰 holes

Asynchrony

Are we async yet?

Conceptual

Projects

Concurrency

Conceptual

Projects