I am looking to use the Silhouette Procedure in SPSS (Analyze > Classify > Cluster Silhouettes) to evaluate various k-means clustering solutions, and I am not 100% certain how to correctly run the Silhouette Procedure via the prompt in SPSS.
In SPSS (version 24), I am assuming that I include the variable that specifies the cluster that each case was assigned to in the Cluster Number field? What should I then include in the Cluster Variables field?
Additionally, other than specifying the variables for the Next Best Cluster and Silhouette Value fields, do I need to do anything else? How do I actually get the Silhouette Coefficient for each solution?
Thank you in advance for the help!
Answer by jkpeck (6118) | Jun 14, 2017 at 01:03 PM
@tzm12
Yes, use the cluster number saved from k means in the silhouette cluster number field. List all the variables used for clustering in the Cluster Variables field. Make sure that the dissimilarity measure matches what you used in k means. The procedure displays a table of the mean, minimum, and maximum silhouette statistic by cluster. You might also find the plots useful.
Answer by g.g.g (1) | Feb 17, 2018 at 10:57 AM
@jkpeck I used a two-step cluster analysis and i would like to get a silhouette plot. However i use the log-likelihood distance in the clustering procedure, which distance measure should i use to build the silhouette plot? I tried with the euclidean one but it has been working for hours, is it normal?
Answer by jkpeck (6118) | Feb 17, 2018 at 12:45 PM
@g.g.g
This command requires the data in memory and takes time that is proportional to the square of the number of cases. With larger datasets, you may want to carry this analysis out on a random sample of the data.
As for the measure, the choice doesn't affect the time much. If your clustering variables are all continuous, Euclidean is typically the best choice, but if you also have categorical variables, you might try Gower, since that treats continuous and categorical variables differently.
Answer by jkpeck (6118) | Apr 08 at 12:32 PM
You can install the STATS CLUS SIL extension command from Extensions > Extension Hub. It takes the clustering output and produces silhouette measures and charts. It will appear on the Analyze > Classify menu. See the dialog or syntax help for details.