A Database is not a Blockchain
People often compare the blockchain with the database. It makes sense since both stores data. However, there are significant differences.
It is often said that blockchain is a slow and expensive database. Moreover, it does not scale well. So there is no reason to use it for anything else than Bitcoin.
Must Blockchain be a slow and expensive Database? Cardano is going to rewrite History.
There are many myths in crypto-world. These myths are often kept alive by people who have an economic interest in supporting them. Let's take a look at one of them.. Read more
Nothing from that is a correct claim. Blockchain can be scalable, fast and cheap data storage. Well, blockchain will never be as fast as a traditional database. However, a blockchain has a few advantages over a database and it makes sense to know which one.
In the article, we will discuss what exactly blockchain is from the data perspective and then we have a look at the most important differences between a blockchain and database.
Chain of blocks
From the data storage point of view, a blockchain is a way how to store data in blocks. It is a data structure very similar to a linked list.
A linked list is a linear data structure where each element is a separate object. Each element of a list is comprising of two items — the data and a reference to the next element. The last node has a reference to null. The entry point into a linked list is called the head of the list.
Blockchain adds one extra feature and it is protection against history tampering. The first study on a cryptographically secured chain of blocks was described in 1991 by authors Stuart Haber and W. Scott Stornetta. They wanted to implement a system where document timestamps could not be tampered with. In 1992, Bayer, Haber, and Stornetta incorporated Merkle trees to the design, which improved its efficiency by allowing several document certificates to be collected into one block. Notice that they did not use the term blockchain but “chain of blocks” in their work. Satoshi Nakamoto used the same term chain of blocks in the Bitcoin white paper.
The claim that Satoshi invented blockchain is not correct. It is a mistake. He just reused the existing work.
The question is how fast you can append a new block to a blockchain. As a matter of fact, many blocks can be added within a single second. Blockchain is not slow due to appending blocks. This process might be very fast if you have data that is to be added into a blockchain on the same computer. Basically many blocks can be appended immediately. The speed, scalability, and cost of adding a new block into the blockchain are influenced by a consensus mechanism.
It is a PoW consensus that makes Bitcoin a very slow and expensive database. In case the DPoS or PoS is used the blockchain project can be faster, cheaper and more scalable.
What is a blockchain - reloaded
Nowadays, many projects and IT giants talk about blockchain technology. What do they speak about? As you could see, the original meaning of the term blockchain is linked with the cryptographically secured chain of blocks. The meaning of the term shifted and now the term blockchain is used more generally for every distributed network using blockchain as a data structure.
Maybe the term distributed ledger (DL) might be less confusing since it just expresses that there is a ledger that is distributed on many nodes and mutual consensus must be made to change it. I personally prefer the term distributed ledger.
How a database stores data and what user can do
A common database does not work with blocks but with tables. A table is a collection of related data held in a table format within a database. It consists of columns and rows.
In relational databases, a table is a set of data elements (values) using a model of vertical columns (identifiable by name) and horizontal rows, the cell being the unit where a row and column intersect. A table has a specified number of columns but can have any number of rows. Each row is identified by one or more values appearing in a particular column subset. A specific choice of columns that uniquely identify rows is called the primary key.
You can use four basic operations within the database in relation to data: create, read, update, and delete (CRUD). Current blockchains are only able to append a whole block (with transactions inside) at the end of the blockchain. After the appending the data cannot be updated or deleted. A blockchain allows you only two operations: create and read.
A database allows you to constantly change and even delete data that have been stored in the past. A blockchain intentionally keeps historic data unchangeable and always available. We will discuss in more detail how data are protected against modification.
Who controls the data
The most significant difference between a blockchain and a database is the number of operations that are allowed and who is responsible for that. Moreover, how safely are some operations restricted?
A database is maintained by an administrator or by a group of administrators. The administrator has the right to do anything that he wants to do with data (CRUD). Administrators are often employees of a big company and must follow rules set by the owners of the company. Administrators give limited rights to users who can create, read, modify or delete data. It depends on implementation and given applications or services. However, even if you insert correct data into the database administrator can always modify or delete it. You have no or limited chance to prove that you have inserted correct data if there is some dispute about data correctness. Administrator always has more rights than you.
There is no administrator with the modify and the delete rights in the blockchain. There are nodes in the network that must come to a consensus upon any new block that is to be appended into it. Once the block is appended (and confirmed) nobody can easily change the history and you can always prove that something happened in the past.
Public and private blockchain
At this point, it is needed to differentiate between private (permissioned) and public blockchains. We know that it is an unpopular topic and there is nothing like private blockchain for many of you. Still, it makes sense to speak about private blockchain. Let’s have a look at what public and private blockchain is good for.
A public blockchain is suitable everywhere, where there is a higher amount of users who do not trust each other and need to interact in a trusted way. If the group is not able or does not want to find out a trusted third party than they might consider using blockchain. In the current financial system, the trusted third parties are banks. People mostly believe them and know what to do if something goes wrong. Blockchain can do similar work as banks do. Public blockchain can be used for cryptocurrencies very well and can serve millions of users.
In case there are only two companies that do not trust each other it is often better and cheaper to find out trusted third parties who will run a database for them. Can they use a database without avoiding the involvement of a trusted third party? The question here is who would be responsible for maintaining it and who will be the administrator? Should it be someone from company A or from company B? Or both? As we have already mentioned administrator can do everything with the database. So the database might not always work well in this case and using blockchain can be a valid option.
Now imagine that there will be 10 such companies searching for some level of trust. Should there also be 10 administrators, one from each company? Still, who will be responsible for the master database?
A database might work well in some cases. However, in some other cases considering private blockchain is a much better option. Why? Companies naturally do not trust each other. When some data should be added into blockchain then an agreement of all parties could be a relevant requirement. Imagine that there are 10 validating nodes in a private environment and every company owns one node. All nodes (or a significant majority of them) have to agree with adding a new block into a blockchain. In this case, any single company is not able to change data stored in other nodes so it makes no sense to change data in own node. Moreover, no company is able to write some data that would be considered as invalid by any other company. The roles of administrators are restricted to run a node.
There is no need to use any token in a private blockchain. Simple consensus (based on BFT) is sufficient. A private blockchain can be very fast and block-time can be a few seconds (even a second). Operation costs are very low since every company just operate one validating node.
What about security? For a few companies, a private blockchain is sufficiently secured if they interact only with each other. No other node can join the consortium and if data transmission is encrypted then only companies have access to data. It is a required setup when data are not supposed to be public and must be protected against misuse. Here, public blockchain cannot be used. Nobody wants to let others see some business secrets what is another reason why to avoid public ledgers.
A private blockchain would not be naturally considered as secured if it was used by users outside of companies. Companies know each other and could potentially act dishonestly against users. Users can fully trust only to a public open network where everybody can join and participate in consensus.
As you can see, all can benefit from blockchain decentralization regardless of whether they use a private or public blockchain.
Centralized server vs. distributed network
In computer science, client-server is a software architecture model consisting of two parts, client systems, and server systems, both communicating over a computer network or on the same computer. A client-server application is a distributed system made up of both client and server software. We speak about a centralized solution where the server is the centrum.
A database sits on the server. So if there is only a single server it is a so-called single point of failure. Once the server is not able to operate all clients are not able to communicate with the server and thus also with each other. From the data point of view, all clients must relly on the server that it behaves honestly and is properly secured against attack. However, it is not often the case.
If a client requests some change that will be inserted into the database then the client believes that the data will be safely stored for the next session without unexpected change. Nowadays, there is not often seen only a single server network. In most cases, there are more redundant servers in a network. If one server crashes or is temporarily unavailable then there is another one that can overtake all requests. It is possible only when data are copied between servers.
If you send a transaction or a request towards a server, data are written only into one database at a given moment. After that, data are copied to other (backup) databases. The copying often happens a bit later so there are mostly some delays and data are inconsistent on servers. The copying process is called data replication.
Data replication is about storing data in more than one site or node. It is useful in improving the availability of data. It is simply copying data from a database from one server to another server so that all the users can share the same data hopefully without any inconsistency. There might be an inconsistency if a server crashes before the replication. It is unpleasant since you might consider some transaction as confirmed but next time you log in to the server the transaction will be gone. However, it happens quite rarely nowadays.
It is important to understand that data replication can protect data only in the sense of protection against possible loss. It has nothing to do with the protection against tampering of history or rewriting the current state from the administrator position. If one server accepts changes and some other not there might be data inconsistency.
Blockchain solves the above-described issues in an elegant way via utilizing decentralized consensus. Once all or the majority of nodes in the network agree to append a new block then data are simultaneously written to many hard discs. It does not matter that a node, that has proposed a new block, crashes immediately after the agreement. Data are always safe on other nodes and the crashed node is able to get a valid version of all blocks.
Can be set of replicated databases the same secured as a blockchain? No. Data replication means that one server sends data to other servers in order to back up it. There is no consensus between servers to agree with a single version of the truth before storing the data. If one server sends invalid or false data other servers just blindly take it and store it (some kind of data validation can still be in play). Contrary to that, in blockchain majority of nodes must agree with a proposed block before storing it in the blockchain.
Making a consensus between decentralized nodes is what makes blockchain safer in comparison with databases. Instead of one server, maintained by the administrator(s), there is a group of independent nodes that mutually agree on what to append into the blockchain.
You can read more about decentralization in my article. There is a difference between decentralized and distributed networks and it is good to understand it. You can read our article about the topic:
Private blockchain between a few entities can be considered as a distributed and decentralized system from point of view of direct participants. If a private blockchain is used within a single company then there are some advantages since it is a distributed system, however, it is a centralized solution and database might be a better option.
ICO and database
Some people wrongly assume that ICO does not need a blockchain and the database is a sufficiently viable solution. Let’s try it. Someone must be responsible for the database so it could be the team that publicly offers tokens. So there will be an administrator who can change everything in the database. He can leave the project or unexpectedly disappear and if he keeps all necessary passwords and keys to the database nobody else can change something. A single database can easily crash and at the moment all data about coin distribution is lost forever. So there is a single point of failure. In case the administrator is an honest guy he might not be experienced enough to protect the database. A hacker can attack the database and change what he wants. In all cases, investors must trust the team and lose their money if a single person of the team decides to destroy or misuse the database.
Alternatively, the project can ask a trusted third party to keep the database. You, as an investor, must believe in some third party, who was selected by the project team, that it will behave honestly and will be able to protect records in the database. Imagine that it is a third party somewhere in Russia and you are an investor from the USA or Europe. I guess you will not trust the ICO a lot. Again, there is a single point of failure.
A public blockchain has a big advantage over the database. ICO tokens might have some value, are globally available and it can be difficult to enforce some law in the international environment when something goes wrong. Using public blockchain is necessary here since a code of smart contract and blockchain ensures security, availability, immutability and other important properties. Blockchain usually is a global network with nodes all around the world. There is no administrator and the consensus must be achieved to transfer any single coin or token. Nobody is able to create more coins. Nobody is able to make false transactions since private keys are in the hands of investors.
Which blockchain properties are the most important
Let’s have a look at a few properties that cannot be replicated by the traditional database. Databases are strong tools and you can achieve nearly all that you need. Still, there is something that the only blockchain can offer.
- Data immutability. Blockchain is a distributed network by nature and data are simultaneously written to many discs once a consensus is achieved. Thus, it is very difficult or impossible to change history.
- Safe data appending. As in the previous point, a new block is appended only if the majority of entities agree with that. Thus, it is not possible to insert something that would be considered as invalid. Rules must be strictly followed and more independent entities keep eye on it.
- No administrator. There is no administrator with the right to change anything. Blockchain does not append a new block without mutual consensus. The responsibility is split among validating nodes. Blockchain is trustless and censorship-resistant.
- There is no single point of failure. It is true mainly for PoS and PoW consensus. DPoS might have a problem when a few nodes become unavailable at the same time.
- With smart contracts, blockchain is able to execute some agreement in a trustless way and thus replace the traditional law system.
Notice, that all listed properties could be easily covered by a single word: Decentralization.
It is all about consensus
The efficiency of a blockchain is only about the consensus algorithm. It is quite clear that when PoW is used then the solution is naturally expensive and slow. Such blockchain is useless from a business perspective. The approximate block time is 10 minutes and a lot of energy is consumed during the consensus. The PoW makes Bitcoin the slowest and expensive database. It is not the blockchain technology from the data perspective. The block time and block size are the main reasons why Bitcoin does not scale.
The other extreme is a private blockchain with a limited amount of nodes. We can very often see some kind of DPoS consensus here. It can be very fast and also cheap. In addition, with better privacy. That is why projects like Hyperledger or R3 Corda are so successful. In some cases, it makes sense to use them.
The biggest challenge is to make open, public and decentralized networks efficient. It is possible via PoS consensus since it can be cheaper and more scalable. Cardano PoS Ouroboros has block-time 20 seconds and consumes electricity as two big houses. Moreover, Cardano team IOHK plans to employ sharding in the future what will be a significant scalability boost.
It is naive to think that we must stick to PoW and use it forever. People can always improve all technology that is useful for them. We can easily improve the consensus algorithm as well.
It is difficult to achieve high scalability and keep a high level of decentralization. Data must be distributed around the world so network latency must be taken into account. Making global consensus takes some time. Blockchain will never be as effective as a database but it can offer us trust, decentralization, and protection against history tampering. This cannot be achieved via a database.
It is also believed that we need PoW for data security. As you could see it is also about data distribution and the consensus. Actually the consensus makes data secured. PoW is good to make difficult and expensive to create a block. Well, it is also the point of inefficiency and it is not necessary. It is possible to achieve the same level of security via PoS. Cardano Ouroboros PoS does not let you create a valid block in case you are not provided the right to do so. An attacker can create some block but he is not able to provide a valid proof so the network does not accept the invalid block. Instead of PoW, modern cryptography (Verifiable Random Function, Key Evolving Signature) is used in Cardano PoS.
Blockchain can be used when it is needed to immutably store information like a state X was valid for user Y at time Z. It is suitable for keeping ownership of valuable things. That is why digital money can be created on the blockchain. This kind of information cannot be changeable by individuals and security must be high. It is why the consensus is required to change data via appending block. The block appending is about changing many X states for many users in a trustless way since mutual trust between all participants is required.
The possible database uses include security monitoring, alerting, statistics gathering and authorization. Many databases provide active database features in the form of database triggers. A cloud database relies on cloud technology. Data stored in a database are often important only for a less number of people or it is not needed to build mutual trust between all participants. The security that can be achieved within database systems is sufficient. Users are fine to trust database owners since there are other mechanisms on how to solve possible issues. For example the law.
A database is not a blockchain but blockchain can be considered as a kind of database. The traditional database is always a centralized solution and if you will think about it the internet is centralized as well. The Internet has connected people all around the world but only a few centralized companies like Google, Facebook, and Amazon benefit from it. All these companies use databases for their business. Blockchain has the potential to decentralize the whole internet and we can get rid of big companies. So not the owners of centralized companies but all users will benefit from blockchain decentralization. Once projects like Cardano take over the role of current IT giants the world could be a better place for living since users will effectively own the Cardano network and will be able to decide about the future.