FLP Impossibility and Blockchain Consensus Protocols

Sarin Madarasmi
8 min readJun 27, 2021

--

In any distributed network of processors or nodes, failed or malicious nodes can cause the system to be unable to reach a consensus. This is known as the impossibility of consensus or the FLP (Fischer, Lynch, and Paterson) impossibility where a consensus algorithm cannot achieve all the three key properties of Safety, Liveness, and Fault Tolerance. Here, consensus refers to getting a distributed network of processors, nodes, or parties to agree on a common value.

Let’s take a look at a few blockchain protocols and to see where they fall on the impossibility triangle.

Figure 1: FLP Triangle and blockchain protocols

A Simple Consensus Algorithm

Figure 2. A 5-node distributed network, with read and writes going to the elected leader node

In this consensus, time is split into segments, each called a “term”. Each term will begin with a leader election where a node will consider itself as a candidate and send a request vote transaction to the other nodes in the network. When there is a majority vote, the candidate becomes the leader node that accepts read and write requests to the distributed network. There could be cases of split voting where more than one candidate is requesting for vote. In such cases where a leader cannot be elected from a majority of the votes, the election process will simply time out and start a new election term.

A write transaction walkthrough:

  • Leader receives a request from client
  • Leader logs to local disk to ensure durability
  • It then forwards the transaction to other nodes so all followers can do the same
  • Client will get a confirmation of the write once the leader gets majority of the follower’s confirmation
  • The transaction is committed

What has just been described is more or less the Raft protocol. While there is more to the protocol than described in the above few sentences, it’s still a rather simple consensus algorithm. Different variations of the Raft protocol are used in various production use cases such as in Consul, etcd, and InfluxDB, to name a few. In most cases, a read-only query can directly be handled by one of the follower nodes.

FLP Impossibility

In Blockchain, a consensus across the distributed network means to agree that a transaction is part of a block, which in turn is part of the blockchain.

As mentioned earlier, the FLP Impossibility states that a consensus algorithm can only have two of the three key properties:

  1. Safety (a.k.a. Finality or Agreement). A block being committed should remain part of the final blockchain that is accepted by the distributed network as the true blockchain.
  2. Liveness (a.k.a. Guaranteed Termination). The network should be able to continue to accept, process, and commit new transactions to the blockchain.
  3. Byzantine Fault Tolerance (a.k.a. Integrity). A system should be able to withstand failures derived from the Byzantine Generals’ Problem; meaning, such a system can continue to operate even if some of the nodes fail or act maliciously. In our context, a distributed network should be able to reach consensus in spite of such failures.

It is impossible to have all the three above properties in an asynchronous system without some relaxed constraints on the property. In contrast, this is achievable in a synchronous system.

We’ll take a look at a few blockchain protocols to describe how they achieve the properties in the FLP Impossibility triangle.

Availability and BFT

Proof of Work (PoW) is the first and arguably the most popular consensus protocol of Blockchain. It achieves Availability and BFT. It, however, lacks in finality. In Bitcoin, for example, it is recommended to wait for six additional blocks (or confirmations) before your transaction is considered final in the blockchain.

PoW achieves BFT through making mining an expensive and complex process. A malicious actor or node that tries to alter a block’s data will need to re-mine that block and any block that comes after it in the blockchain. Because of the expensive mining process, the malicious actor will not be able to catch up with the rest of the network in order to form the longest blockchain. There is a strong incentive for each mining node to follow the longest blockchain to increase the probability of the mined block being accepted by other nodes in the network.

PoW is available because, at any point in time, miners are trying to mine new blocks to add to the blockchain. There is no state in which miners are blocked from mining in order to add new blocks to the blockchain. The difficulty level of mining a block is adjusted periodically to ensure that a new block can be mined roughly in x time.

In PoW, it is possible for two or more nodes to mine a new block in similar amount of time. This may cause nodes in the network to try to build on top of different blocks since all of them are valid and part of the longest blockchain. This is called a fork. There can be multiple forks, and they will continue to diverge until one of the chains is able to mine a new block faster. Once mined, the block will be broadcasted to the rest of the network, incentivizing the entire network to follow that chain.

Figure 3. Forks in a Blockchain.

Both the purple and the green blockchains are valid and have equal height. Some nodes in the network may start building blocks on top of the purple chain, while others build on top of the green blockchain. Say, a new block is mined faster in the green blockchain, making it the longer chain. The purple blocks will then no longer be part of the blockchain, causing the transactions in it to be excluded from the one true blockchain accepted by the majority of nodes.

Safety and BFT

A good example of a protocol that achieves safety and BFT is Tendermint. Tendermint is a Proof of Stake (PoS) consensus algorithm where validators have a stake which denotes their voting power. It prefers Fault-tolerance and Safety over Liveness.

PoS is based on stakeholders owning sufficient blockchain tokens instead of having greater compute power as required for building a new block in PoW consensus protocols. It is worth noting that not all PoS fall into safety and BFT. ETH 2.0 (Gasper protocol), for example, prefers Liveness and BFT over Safety.

Figure 4. The Tendermint Validation Process from Tendermint Github

Tendermint achieves Safety or Finality through never forking. It has four main states: Propose, Prevote, Precommit, and Commit. Moving to the next requires two-thirds of the validators’ votes. This ensures that Tendermint can endure up to one-third of its actors being malicious, making it BFT.

Tendermint achieves Finality by only allowing the proposer chosen deterministically from the set of validators to commit a new block, after obtaining more than two-thirds of the votes. The Prevote step also ensures that honest nodes will not commit invalid states.

Let us say take a round with 100 validators. Assume that 33 nodes are offline for some reason. We will consider a situation where one of these 67 nodes is a Byzantine node, leaving us with 66 honest nodes.

At round n, 65 honest nodes broadcast a Precommit for a new proposed block. There is a Byzantine node B, however, that sends an empty vote to all 65 nodes except one Precommit to node H. With its own vote, H has 67 Precommits for the block at round n and therefore commits. However, all the other nodes only have 66 Precommits for the block, and they move to the next round. This is a breach of Safety or Finality, and one of the reasons why we need two rounds of communication: the Prevote and the Precommit! The Prevote step may result in a wrong Precommit but never a wrong Commit.

When two-thirds of the vote is not reached in either of the two steps, the system will eventually timeout and start a new round with a new block proposer. This makes Tendermint suffer from the Liveness or Guaranteed Termination property.

Safety and Liveness

In a permission-less network such as in BTC and ETH, where any node may join to become a miner, we need to achieve BFT. However, in a permissioned setting where there is a decentralized authority controlling nodes joining the network, being fully BFT may be unnecessary and causes performance and throughput impact. Participants in this setting are not completely anonymous and can be easily identified if performing malicious actions.

In a permissioned setting, a Crash Fault Tolerance (CFT) consensus protocol may be good enough in order to improve performance. A permissioned blockchain is generally useful to provide secure interactions between a group of entities that have a share a common goal but may not fully trust each other. More concrete examples of this are supply chain uses cases or sharing of data between organizations such as medical records between hospitals.

An earlier example of the simple consensus algorithm described above, Raft, does not achieve BFT for two reasons:

  1. Rigged Leader Election — A malicious node can vote more than once, vote for an outdated node, or ignore leader election term process altogether. This can cause more than one node to become a leader, resulting in undesired inconsistencies.
  2. A node can pretend to be another node.

It is worth noting that there are variations of the Raft protocol that are BFT, but they do come with performance costs.

An example of a blockchain protocol that achieves Safety and Liveness is the Hyperledger Fabric protocol. It achieves CFT instead of BFT through using a variation of Raft from the etcd library. This means that nodes may fail or crash (but not act maliciously) and the system can still achieve availability to reach consensus.

Through Raft, safety/finality is achieved by having one elected leader handle writes and replicate to followers so there won’t be a fork in the blockchain. It achieves liveness in a similar fashion of having a leader election process and identifying crashed leaders through heartbeats. A new election term occurs in such cases, where a new leader is identified and can continue the process. As long as leaders can be elected, transactions can be committed.

--

--