Kubernetes with OpenShift World Tour: Get hands-on experience and build applications fast! Find a workshop!

What does it mean to have rights to persist private data?

This article provides a quick review of what private data collections are and explains what it actually means to have rights to persist private data. Specifically, it explores the relevance of the policy attribute found in a collection definition document and what this attribute means for the organizations in the network.

Private data collections

Organizations that are connected to a Hyperledger Fabric business network have a toolbox of privacy and confidentiality mechanisms at their disposal. Channel is the primary mechanism by which organizations can share a distributed ledger and invoke chaincode to transact with other members. Within that context, private data collections enable chaincode to control the dissemination of the data by storing data in “collections,” which are only visible to a subset of the members of the network, thereby preserving the privacy of the data.

To leverage a private data collection in your blockchain business network, you need to build a collection definition that specifies the name of the collection along with its access and retention rules:

  • Who can persist private data
  • The minimum number of peers the data need to be distributed to, for a successful endorsement invocation
  • The maximum number of peers the data need to be distributed to, for a successful endorsement invocation
  • How long (in terms of number of blocks) the private data is persisted in the private database

For the purposes of this article, we are most interested in the “who can persist private data” property. This attribute states the organization peers that are allowed to persist the collection data. In the collection definition document, the policy attribute declares such organizations. For example, take a look at the following collection definition document:

[
   {
          "name": "pdc_xyz",
          "policy": "OR('Org1MSP.member', 'Org2MSP.member','Org3MSP.member'')",
          "requiredPeerCount": 0,
          "maxPeerCount": 3,
          "blockToLive": 1000000,
          "memberOnlyRead": true 
  }
]

The above collection definition states that peer nodes that belong to organizations Org1, Org2, and Org3 can persist the private collection data named pdc_xyz. Now, what does this actually mean?

Example scenario: AcmeChain

Before we address the question of what it means to have rights to persist private data, let’s define an example, hypothetical scenario. Let’s say that you have a blockchain network, AcmeChain, where the following four organizations are participants:

  • Org1
  • Org2
  • Org3
  • Org4

Let’s also say that AcmeChain has only one channel (channel1) and that there are two private data collections in this channel, PDC1 and PDC2, each with the following values for the policy attribute:

  • PDC1 –> OR('Org1MSP.member', 'Org2MSP.member')
  • PDC2 –> OR('Org3MSP.member', 'Org4MSP.member')

The above policy attributes state that Org1 and Org2 peers can persist PDC1, while Org3 and Org4 peers can persist PDC2.

Now, as you may know, once a private data collection has been created (which happens at chaincode instantiation time), it is ready to be used from within your chaincode. To read and write private data, you use the following chaincode APIs:

  • GetPrivateData("<collection name>", "<key name>")
  • PutPrivateData("<collection name>", "<key name>", <key value>)

Let’s also assume that among the several methods in the chaincode component, the following four are present:

  • readPDC1(key) — reads, from PDC1, the value that corresponds to the key argument
  • readPDC2(key) — reads, from PDC2, the value that corresponds to the key argument
  • writePDC1(key, value) — assigns the specified value to the key argument and stores it in PDC1
  • writePDC2(key, value) — assigns the specified value to the key argument and stores it in PDC2

Given the above network and chaincode configuration, you may be thinking that a peer that belongs to Org1 is not allowed to invoke the writePDC2(key, value) method — and that if they were to do so, that operation would fail and throw an unauthorized error. This is a common misconception that we have seen quite often: The policy field in a collection definition specifies which organizations are allowed to invoke the PutPrivateData("<collection name>", "<key name>", <key value>) method for persisting private data. However, this is not the case.

Instead, the chaincode running on any peers, regardless of which organization the peer belongs to, can invoke the PutPrivateData("<collection name>", "<key name>", <key value>) method to write private data to any data collection on a permissioned channel. For example, in AcmeChain, a peer that belongs to Org1 or Org2 can certainly invoke the writePDC2(key, value) method for writing private data to PDC2 without an error being thrown. In other words, any chaincode in any peer in the AcmeChain network can write to any private data collection that exists in channel1.

You might now be wondering: Doesn’t the above behavior contradict the policy specified in the collection definition? Though at first glance, this may seem to be the case, it is not. The first thing to know is that chaincode never directly writes to a collection. Instead, the peer stores the data in a transient database, waiting for the commit event to take place. When that commit event is received, the peer uses the gossip protocol to distribute the transient data to the anchor peers of the organization specified by the policy. Though any of the peers that belong to Org1 or Org2 can write private data to PDC2, they cannot persist PDC2 as a side database on any of their peers.

The second thing to know is that while the write operations are never directly applied to the collections, the chaincode can issue reads against a collection. As a result, if any of the peers that belong to Org1 or Org2 were to invoke the readPDC2(key) method, this operation would fail since they cannot read from data storage they do not have.

Key considerations

While it may feel like any organizations can write anything anywhere, that is actually not the case. There are a few important aspects to consider in this model:

  • The chaincode determines which collections to use — Since the chaincode is controlled at the network level, organizations have the opportunity to review the code before installing on their peers; they can understand how the collections will be managed.
  • An endorsement policy of two or more organizations ensures that nobody can fake a transaction — Having two or more organizations execute the chaincode and sign the results reduces the likelihood of a concerted effort to tamper with the results.
  • It places the responsibility on the client application to invoke the chaincode from the correct organization — If the client makes a mistake, the chaincode will still ensure that the right organization receives the right data, but the private data could temporarily reside in another organization peer transient database.

It is worth mentioning that in Hyperledger Fabric v2.0 there will be a few new capabilities that will further enhance the private data collections:

  • memberOnlyWrite — If your business requirements make this current behavior unwanted, it will let you alter the behavior described in this article. Setting this attribute to true will prevent organization peers that are not included in the policy attribute from writing private data to the collection. In other words, if the memberOnlyWrite is set to true, only organization peers specified in the policy attribute will be allowed to invoke the PutPrivateData("<collection name>", "<key name>", <key value>) method.
  • Private data events — One powerful mechanism from Hyperledger Fabric is the ability to listen to events on the network to allow applications to react to specific situations. Until this feature was introduced, client applications could receive block or transaction events, but those were common to every member of the network. As such, the client had to resort to querying the collection via the chaincode to find out about new events. This new feature enables organizations within the policy to receive private data events so that they can be directly notified.

Given the business needs of several organizations that are building enterprise blockchain solutions with Hyperledger Fabric, one goal of the Fabric development team’s initial design was to give chaincode the ability to write to any organization’s private data collection. For instance, this capability enables sharing an asset from one organization’s unilateral collection to another organization’s collection. The recipient organization can then compute the hash value of the shared asset, confirm that it matches the hash value on the ledger, and then proceed to a pending transaction that they may have with the other organization (sender).

So…

Give it a go! Start with the Fabric tutorial on private data collection and see how this new approach can be applied to your use case.

Ricardo Olivieri
Luc Desrosiers