Once you started learning about blockchains, you must have heard of this word, “Node”. Every dApp(decentralized application) needs to fetch data from blockchain to provide their services based on user requests such as querying token balances, sending transactions, or retrieving block data. It can be achieved easily by having connection to desired blockchain which consist of “nodes”.
You can simply understand “a node is a computer” that runs blockchain client software. It is connected to other “nodes” in the network. Generally, these nodes work together to create blocks, verify transactions, provide and store valid data.
The reason why it is difficult to achieve perfect comprehension regarding “nodes” is because there are so many different terms suggested from various blockchain projects. This article will explore the various types of nodes and software client implementations, which serve various purposes and offer different features.
Let’s delve into their concepts, functions, and classifications!
In terms of how many blocks(data) they maintain, we can simply categorize nodes as full nodes and light nodes.
A full node stores all the transactions that have been executed on the blockchain, collecting and storing all blockchain data from the very first block, which is commonly referred to as the "Genesis block," to the most recent block. In addition, it continuously synchronizes blockchain data to keep itself up to date. Essentially, a full node acts as a gateway for receiving the latest transactions from other nodes or passing on the transactions received from users to other nodes. Since the full node has all the records within itself, it is capable of verifying certain transactions without the need for assistance from other nodes. However, due to the substantial amount of capacity required to maintain a full node, downloading the entire dataset can take a considerable amount of time and can occupy a large amount of disk space.
An RPC node typically refers to a full node that has the capability to respond to RPC requests.
Remote Procedure Call (RPC) is a software communication protocol that enables a program (client) to request a service from a program located on another computer on a different network (server), without requiring an understanding of the network's technical details. RPC allows processes on remote systems to be called in the same way as local processes.
For instance, a dApp (Decentralized Application) usually needs to query or update blockchain data to provide its services. In this scenario, based on the RPC client-server model, the dApp is a client and the server acts as an RPC node.
An archive node is an enhanced implementation of a full node that not only stores the same data as full nodes but also retains all previous states of the blockchain. While a typical full node prunes states that existed before the most recent 128 blocks, an archive node preserves every state of blocks allowing history methods, by verifying all blocks and replaying all transactions within them.
Archive nodes are typically operated by entities with specific purposes that require querying arbitrary historical data. For example, they can be used to query token balances from a previous state, track specific users' activities, or replay historical transactions.
However, operating archive nodes requires even higher technical expertise and operational costs than operating full nodes. For example, an Ethereum archive node currently stores over 14TB of data, so it is essential to ensure that the sync status of this node is maintained properly to prevent data corruption, which takes much longer time to recover from.
The constraint that all nodes must maintain all data in order to participate in a blockchain network can be a major constraint on network expansion or mass adoption of blockchain. For example, if the purpose is to process a user's request by simply propagating it to the network, a node that only serves as a gateway may be a more appropriate case.
Nodes that perform this role are commonly referred to as light nodes. Light nodes only have some of block data, mainly block headers. Therefore, light nodes can propagate transactions, but cannot verify themselves, so they usually request full nodes for individual verification. Although the dependency of the data verification process is a disadvantage, the low cost of operating a node is a big advantage.
The concept of light nodes originates from Bitcoin. it was known as a SPV node or a lightweight node. they maintain only a subset of the blockchain and verify transactions using a method called simplified payment verification (SPV).
It's important to keep in mind that light nodes are unable to participate in the consensus process, which means that they cannot function as validator nodes. This is because they do not have full functionality and rely on full nodes for transaction processing.
With nodes created by the Cosmos SDKs, you'll often come across the term, "Pruned Node". While launching a Cosmos SDK-based node, configuration files named "app.toml" and "config.toml" are created in the "/config/" directory. Within the "app.toml" file, there are min-retain-blocks and pruning options, each of which determines how much block data and state data are pruned from the node. Nodes with these special pruning conditions are called pruned nodes, and according to this criterion, pruned nodes with certain conditions can be equivalent to the concepts of full nodes and archive nodes described above, respectively.
Validator nodes are full nodes that possess the ability to validate new transactions submitted to certain chains' mempools. To accomplish this, they hold a private key that allows them to sign the transactions, marking them as valid by attaching the signature. In return for their services, they are typically rewarded each time a new block is created, which is commonly referred to as "mined."
The term and concept of a "validator node" are primarily found in consensus algorithms based on Proof of Stake. In blockchains that utilize Proof of Work, like Bitcoin, it is typically referred to as a "mining node." For further information on mining nodes, please consult this link.
Seed nodes are nodes acting as "trackers" within the blockchain network, helping to locate and connect with other nodes belonging to the network. By creating a list of IP address pairs for other nodes operating within the network, these seed nodes serve as a bridge for new nodes to connect to the rest of the network.
However, It's important to note that these seed nodes, despite being referred to as "nodes," it doesn't necessarily need to be complete nodes. They simply need to maintain a constant connection in order to perform their tracking function.
For instance, a new node is seeking entry into the network. Seed nodes can be used for locating other complete nodes that run the same blockchain. It's a kind to an "address book" that directs nodes on where to go to join the network.
Therefore, when a new node wants to join a network, it must connect to one of the seed nodes, which provides a list of active nodes' IP addresses. These nodes act as IP address locators, though they cannot function as regular full nodes. That is, they are only used to allow new nodes to connect to the peer network via active nodes.
After receiving the list of nodes, the new nodes can begin syncing with the network.
The term was first introduced with the Bitcoin network, and within ethereum-compatible chains, it is usually referred to as “Boot nodes”, which perform the same function as seed nodes.
There are still a lot of node types that I haven't covered, but most of them are either outdated or relatively unimportant by today's standards, so I won't describe them here. I fully understand that the concept of nodes can be very intimidating still. I think this material will also be very outdated in a few years. As always, if you are studying blockchains, I definitely recommend you to cross-verify with multiple sources rather than trusting any single source. I hope this article has helped you understand blockchain a little better.
*
Twitter: @journeywith_eth