We live in a multi-dimensional world, and we can easily understand information in three dimensions or fewer. We represent a point in space as a set (vector) of three values for x, y, z. If we consider time the fourth dimension, we can stretch our understanding by animating the information in a sequence – in a video, for instance. However, information in higher dimensions quickly becomes harder for us to comprehend, and when it gets to hundreds or thousands of dimensions, it’s essentially impossible for us to absorb the information.
Yet, high-dimension data is common around us. Consider how we represent a photo in digital form. A typical 1920×1080 picture would need three values for each pixel to capture the red, green, blue, so adding up, the picture in raw digital form would have 6,220,800 values. Viewed as a vector of data, this is in fact a point in a 6,220,800-dimension space. Of course, we have no problem visualizing this vector of data because we know exactly how to convert it into a 2-D photo. We have, in effect, reduced the 6,220,800 dimensions down to just two. If we had received this vector of raw data without knowing that it is a digital picture, we would have to deal with this huge dimension in some other way. The key here is how to find the best way to reduce the dimensions down to three or fewer dimensions that make sense to us.
High-dimensional arrays are common in the internal implementation of neural networks. The arrays dimension can be in the hundreds or thousands. Although neural network algorithms automatically perform the calculation in the arrays without our intervention, we often need to understand how the model is operating, such as when the accuracy is low or when we want to tune and improve the model. Visualizing these arrays is a good way to understand model operation. Since the goal of the neural network is to detect input features, the model is successful when it can distinguish these features in separate clusters. As a result, this clustering behavior is the pattern we can look for when analyzing the data by visualization.
Fortunately, much work has been done in visualizing high-dimension data, and we can leverage this work to deal with our arrays. In ways similar to how we process a photo as explained above, the key is in how to reduce the many dimensions down to two or three that give us what we are looking for: the clustering behavior. In this respect, TensorFlow provides the tool – TensorBoard – that lets you visualize high-dimensional arrays in the model using dimension reduction. Out of the box, TensorBoard supports two methods of dimension reduction: principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE). Note that there are many other methods available.
Both PCA and t-SNE attempt to show the clustering behavior in the data, but they take different approaches.
PCA works by finding the axes that have the highest variability in the data. These are called the principal components. In the best case, the first three principal components would have the most variance; the remaining components would have little variance and can be ignored. In this manner, the dimension has been effectively reduced from an arbitrary number down to three. The data would then be projected onto these three axes and plotted as a 3-D graph. Note that the axes here don’t really represent anything in the physical sense; they are just “angles” useful for viewing the data. The net is that by viewing the data at an angle that has the most variability, we have a better chance at spotting any clustering effect in the data.
In TensorBoard, you can select a tensor, and the PCA tab will display the PCA view of the tensor. Typically, you would select a tensor toward the end of the network, when the features have started to emerge. You can tweak the PCA graph in a number of ways to improve the visualization – adding labels, color, zooming, etc. You can view the list of components and choose which to map to the x, y, and z axes. If the model has been successful in finding the features in the data, it’s likely that the first few components will have the most variability, allowing the data to show up in clusters using these components. If not, you will likely find less variability among the components. Figure 1 shows a PCA view of the MNIST data.
Figure 1 – PCA view in TensorBoard for MNIST
The t-SNE method works by assuming that some clustering exists, so it tries to find the same number of neighbors for each node. It first computes the probability distribution of two nodes being neighbors in the high-dimensional space. Then in the 3-D space, it iterates and tries to place the nodes using the same probability distribution, while minimizing the divergence between the high- and low-dimension spaces. In doing so, t-SNE tries to preserve the topology as the dimension is reduced from an arbitrary number down to three.
In TensorBoard, the algorithm will begin by finding the neighbors in the high dimension, then you can let the algorithm run for a sufficient number of iterations for the clustering to emerge in the graph. Figure 2 shows a t-SNE view of the MNIST data after about 1,000 iterations.
Figure 2 – t-SNE view in TensorBoard for MNIST
Because the two methods use different approaches, one is not better than the other, but they typically are used in a complementary manner. Two methods for visualization is likely not enough, so in the near future, you can look for additional methods or the ability for users to add their own methods.
To use TensorBoard as shown in the example figures above, you need to add some code in your TensorFlow model to write data to disk during training, then you can run TensorBoard on this data when the training has completed. For a complete exercise in deep learning that includes visualization using TensorBoard, check out the new developer code pattern “Classify art using TensorFlow model.”