Hyperledger Sawtooth Overview
In this recipe, we will guide you through the main steps of building a Sawtooth transaction family with a case study that implements a marketplace designed to keep track of the owner of a house. The sample application is developed with the Sawtooth Python SDK and uses the Sawtooth XO transaction family as a template.
For those who are not familiar with Hyperledger project Intro to Hyperledger Family and Hyperledger Blockchain Ecosystem and Hyperledger Design Philosophy and Framework Architecture articles are strongly recommended.
To better follow and understand this recipe, it is advisable to read Essential Hyperledger Sawtooth Features for Enterprise Blockchain Developers, Blockchain Developer Guide- How to Install and work with Hyperledger Sawtooth and Configuring Hyperledger Sawtooth Validator and REST API articles in advance.
The main focus of this case study is on demonstrating how to develop a Sawtooth application as a proof of concept. In a real production environment, state access control, data encryption, confidentiality, and security would be customized based on the different commercial requirements.
The simple business cases for this sample marketplace are creating records to keep track of the owner of a house and transferring the house from one owner to another:
Transaction Family Namespace and Address Overview
In this recipe, we will cover how to design the transaction family namespace and address. After that, we will explain how Sawtooth stores the global state and accesses data from it using the namespace and address.
The namespace is a three-byte address prefix and must be calculated in the same way within a particular transaction family. The design could be one of the following:
- Map to an arbitrary constant string, such as A00000 or A00001.
- Define a short and meaningful transaction family name that is up to three characters in length and hex-encode it.
- Hash the family name and slice the first six hex characters. Hashing is a useful way to generate an address of a fixed length from family names of various lengths.
The address of the transaction family needs to be 32 bytes and can be constructed with either simple or complicated logic, based on the transaction family. The following list contains a few possible approaches:
- Calculate the address from a set of key attributes using the SHA-512 hash algorithm. As a SHA-512 hash will generate a 512-bits (32-byte) signature for any length of string, it is simple way to calculate the address from a string value of a set of attributes for a transaction family directly.
- Divide the address into segments, and hash each segment, slicing them into different lengths of bytes accordingly. The address could be specified as the asset type.asset ID. The address can be constructed by hashing the asset type and the asset ID, slicing each into different lengths to compose the 32-byte address. For example, you could slice one asset type with 4 bytes and the asset ID with the remaining 28 bytes.
- Directly hex-encode a name, such as the LDAP distinguished name. The address could simply be a hex-encoded LDAP distinguished name, such as uid=, organization unit (ou)=, or domain component (dc)=, with each set being of a fixed length. For example, the DC could be 4 bytes, the OU could be 4 bytes, and the UID could be the remaining 24 bytes.
- The address must be deterministic and is always calculated in the same way for the same set of key attributes for the transaction family. It may be, however, that two different sets of key attributes result in the same address, based on your address scheme. In this case, the mechanism of serialization and deserialization should correctly handle the address collision and store and retrieve the data correctly for each set of key attributes.
For our application, the namespace and address design will be as follows:
Transaction family name = ‘mkt’
The namespace prefix is the first six hex characters of its hash:
The data address is simply the SHA-512 hash of the name of the address:
Hyperledger Sawtooth Global State Overview
Sawtooth blockchain is not the same as other enterprise blockchain systems, which distribute the ledger among participating nodes in the network and keep a consistent copy of the transactions on each node. It also keeps track of the latest state for all transaction families in a single Radix Merkle tree, which is the global state for the blockchain on each validator node.
The global state is built as a copy-on-write Radix Merkle tree and the root hash is generated by children nodes from the leaf to root and level by level. The root hash will be saved into the block header to make sure each validator can reach a consensus, not only on the transactions in the block but also on the global state for the block. If the root hash is different or modified, the block will be invalidated.
The latest state for an asset in each transaction family is stored in the leaf node of the Radix Merkle tree. The tree could have 35 levels and each parent node in a level could have up to 256 children nodes. The global state Merkle tree for one version looks as follows:
Hyperledger Sawtooth Namespace and Address Scheme
Data is stored in the leaf node of the global state. To access the data from the Merkle tree, the address is used. The address is the unique path that identifies how to access the data from the root to the leaf node in the global state. Sawtooth defines the address with 35 bytes, represented as 70 hex-encoded characters. Each byte in the address defines the next-level node in the tree to the leaf node associated with the address. A byte has 8 bits, so each level would have 28 (256) children nodes. The length of the address is 35 bytes, so it could specify up to 35 levels of depth for the global state Merkle tree.
The address scheme for the 35-byte address is that the first 3 bytes (6 hex-encoded characters) are the namespace prefix. The rest of the 32 bytes of the address, represented as 64 hex-encoded characters, can be specified in various ways, based on different business scenarios. In a Sawtooth blockchain, you can define a total of 224 transaction families for your enterprise. The address scheme looks as follows:
The most important guideline to follow for designing a namespace and address for the transaction family states that the address must be deterministic. In the transaction processor and the client application, for the same set of key attributes, the calculated address must be always the same.
As long as the design complies with this rule, how to design the namespace and address for a transaction family is very flexible and completely up to the enterprise. Hashing is a common approach, but it is not obligatory. Your own addressing schema should be designed with your enterprise’s business requirements in mind.
Implementing Hyperledger Sawtooth Transaction Family
After the namespace and address scheme is defined for the transaction family, the state, transaction, and payload encoding scheme can be defined.
To define the state, you should analyze the data requirements for your organization and follow an appropriate modeling process to define the semantic data model for the system. For example, you could use entity-relationship modeling to represent conceptual data logic in your enterprise. In our example, the state is as follows:
This can be explained as follows:
- House APN (Key): The Assessor Parcel Number (APN) for a house is a unique number assigned to each parcel of land by a county tax assessor. The APN is based on formatting codes, depending on the home’s location. Local governments use APNs to identify and keep track of land ownership for property tax purposes.
- House Owner: The name of the person who currently owns the house.
Defining transactions involves analyzing all your business use cases and the attributes used to perform these business operations. In our example, transactions and their payloads are as follows:
- House APN (Key): The key attribute for the state.
- Action: This can be either the create keyword or the transfer keyword.
- House Owner: The name of the house owner.
To define payload encoding schemes, you could choose from one of the following methods:
- Protobuf Encoding: Protocol buffers are Google’s language and platform-neutral mechanism for serializing data. With protobuf encoding, you define the message formats in a .proto file and compile them using the protocol buffer compiler. To find out more, you can follow the guide that can be found here: https://developers.google.com/protocol-buffers/. It is small, fast, and simple, but it is not human-readable. Like JSON, it is also supported in many languages, such as Java, Python, C++, and Go.
- Simple text encoding: This involves defining your own message format and carrying out character encoding using your own protocol with a special delimiter, or following common formats, such as .csv or base64, to represent data in an ASCII format or string. These are human-readable, simple, easy, fast, and language- and platform-neutral.
For our example, the encoding for the state and payload is simple text encoding that encodes the data with UTF8 and the CSV format. We are using the Sawtooth XO family as a template.
For hash collisions, the colliding state will be stored as the UTF-8 encoding of the string with a delimiter, |, such as entry 1|entry 2:
for house, (owner) in house_list.items()])).encode()
Put Things Together
State, transaction, and encoding schemes play an important role when you design the transaction family for your business. In this section, we will explain how each of these work:
- Defining the state with data modeling for the transaction family: The state is the data model for your enterprise and it is the data in your system. The state is stored in the global Merkle tree and the actions performed on the state are transactions stored on the blockchain.
- Defining transactions with a unified command interface for the transaction family: Transactions are actions performed on the state and the business use cases in your enterprise. Analyzing all your business use cases and defining a unified command interface to abstract all the actions, such as basic create, read, update, delete (CRUD) actions, in a common way makes your programming model clean and structured. Transaction payloads are messages transferred over the network. The transaction payload is the communication protocol between the client and the validator node and between the validator nodes on the network.
- Defining the encoding scheme for the transaction family: The encoding scheme is the serialization and deserialization of the state and the transaction payload. Sawtooth stores arbitrary data in the leaf nodes of the global state, and use an arbitrary message format for the payload on the network. It is up to you to choose a feasible and suitable encoding scheme for your business. Sawtooth provides an infrastructure platform and businesses can customize this based on their own needs.
There are no requirements for encoding the transaction family, but it is a good idea to take the following guidelines into account:
- Cross-platform, cross-languages: For Sawtooth, different smart contracts and transaction families can be implemented on different hosts and with different languages. The encoding scheme should support different platforms and different languages.
- Fast and efficient processing: Serialization and deserialization are common operations, so they often have an impact on the overall network performance. Fast and efficient encoding improves the network and makes it run more healthily.
Here is a link to the source files of this recipe.
So far you learned how to configure Hyperledger Sawtooth (Validator and REST API) and design namespace and address for Hyperledger Sawtooth transaction family. The next two steps are: ¬†Building Transaction Handler and Processor for Hyperledger Sawtooth and Building Transaction Processor Service and Python Egg for Hyperledger Sawtooth.
This recipe is written in collaboration with Brian Wu who is a senior Hyperledger instructor at Coding Bootcamps school in Washington DC.