Digital Developer Conference: a FREE half-day online conference focused on AI & Cloud – North America: Nov 2 – India: Nov 9 – Europe: Nov 14 – Asia Nov 23 Register now

Close outline
  • United States
IBM?
  • Site map
IBM?
  • Marketplace

  • Close
    Search
  • Sign in
    • Sign in
    • Register
  • IBM Navigation
IBM Developer Answers
  • Spaces
    • Blockchain
    • IBM Cloud platform
    • Internet of Things
    • Predictive Analytics
    • Watson
    • See all spaces
  • Tags
  • Users
  • Badges
  • FAQ
  • Help
Close

Name

Community

  • Learn
  • Develop
  • Connect

Discover IBM

  • ConnectMarketplace
  • Products
  • Services
  • Industries
  • Careers
  • Partners
  • Support
10.190.13.195

Refine your search by using the following advanced search options.

Criteria Usage
Questions with keyword1 or keyword2 keyword1 keyword2
Questions with a mandatory word, e.g. keyword2 keyword1 +keyword2
Questions excluding a word, e.g. keyword2 keyword1 -keyword2
Questions with keyword(s) and a specific tag keyword1 [tag1]
Questions with keyword(s) and either of two or more specific tags keyword1 [tag1] [tag2]
To search for all posts by a user or all posts with a specific tag, start typing and choose from the suggestion list. Do not use a plus or minus sign with a tag, e.g., +[tag1].
  • Ask a question

Using the Silhouette Procedure to evaluate k-means clustering solutions

270007C56G gravatar image
Question by tzm12  (1) | Jun 14, 2017 at 09:36 AM spssstatisticsspss24clustering

I am looking to use the Silhouette Procedure in SPSS (Analyze > Classify > Cluster Silhouettes) to evaluate various k-means clustering solutions, and I am not 100% certain how to correctly run the Silhouette Procedure via the prompt in SPSS.

In SPSS (version 24), I am assuming that I include the variable that specifies the cluster that each case was assigned to in the Cluster Number field? What should I then include in the Cluster Variables field?

Additionally, other than specifying the variables for the Next Best Cluster and Silhouette Value fields, do I need to do anything else? How do I actually get the Silhouette Coefficient for each solution?

Thank you in advance for the help!

People who like this

  0
Comment
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster

6 answers

  • Sort: 
270003FAQU gravatar image

Answer by jkpeck (6118) | Jun 14, 2017 at 01:03 PM

@tzm12
Yes, use the cluster number saved from k means in the silhouette cluster number field. List all the variables used for clustering in the Cluster Variables field. Make sure that the dissimilarity measure matches what you used in k means. The procedure displays a table of the mean, minimum, and maximum silhouette statistic by cluster. You might also find the plots useful.

Comment

People who like this

  0   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
50FNCPBE08 gravatar image

Answer by g.g.g (1) | Feb 17, 2018 at 10:57 AM

@jkpeck I used a two-step cluster analysis and i would like to get a silhouette plot. However i use the log-likelihood distance in the clustering procedure, which distance measure should i use to build the silhouette plot? I tried with the euclidean one but it has been working for hours, is it normal?

Comment

People who like this

  0   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
270003FAQU gravatar image

Answer by jkpeck (6118) | Feb 17, 2018 at 12:45 PM

@g.g.g

This command requires the data in memory and takes time that is proportional to the square of the number of cases. With larger datasets, you may want to carry this analysis out on a random sample of the data.

As for the measure, the choice doesn't affect the time much. If your clustering variables are all continuous, Euclidean is typically the best choice, but if you also have categorical variables, you might try Gower, since that treats continuous and categorical variables differently.

Comment

People who like this

  0   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
270004V1YJ gravatar image

Answer by Jeanne123 (1) | Apr 08 at 07:33 AM

I am also using the two-cluster method in SPSS and I am wondering how I can get SPSS to tell me the Silhouette Coefficient for each solution?

Comment

People who like this

  0   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
270003FAQU gravatar image

Answer by jkpeck (6118) | Apr 08 at 12:32 PM

@Jeanne123

You can install the STATS CLUS SIL extension command from Extensions > Extension Hub. It takes the clustering output and produces silhouette measures and charts. It will appear on the Analyze > Classify menu. See the dialog or syntax help for details.

Comment

People who like this

  0   Show 1   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
270004V1YJ gravatar image Jeanne123 (1)   Apr 08 at 05:59 PM 0
Share

Thank you jkpeck! Found it--very helpful!

270003FAQU gravatar image

Answer by jkpeck (6118) | Apr 08 at 12:32 PM

@Jeanne123

You can install the STATS CLUS SIL extension command from Extensions > Extension Hub. It takes the clustering output and produces silhouette measures and charts. It will appear on the Analyze > Classify menu. See the dialog or syntax help for details.

Comment

People who like this

  0   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster

Follow this question

120 people are following this question.

Answers

Answers & comments

Related questions

Two-step clustering and the Silhouette Coefficient 0 Answers

The version of R should be installed in SPSS? 1 Answer

File Browser Control Not Working on Custom Extension 3 Answers

How is Relative Variable Importance computed in TwoStep Clustering in SPSS? 3 Answers

SPSS24 Error #2072, specific symptom 156 on Mac 1 Answer

  • Contact
  • Privacy
  • IBM Developer Terms of use
  • Accessibility
  • Report Abuse
  • Cookie Preferences

Powered by AnswerHub

Authentication check. Please ignore.
  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • API Connect
  • Analytic Hybrid Cloud Core
  • Application Performance Management
  • Appsecdev
  • BPM
  • Blockchain
  • Business Transaction Intelligence
  • CAPI
  • CAPI SNAP
  • CICS
  • Cloud Analytics
  • Cloud Automation
  • Cloud Object Storage
  • Cloud marketplace
  • Collaboration
  • Content Services (ECM)
  • Continuous Testing
  • Courses
  • Customer Experience Analytics
  • DB2 LUW
  • Data and AI
  • DataPower
  • Decision Optimization
  • DevOps Build
  • DevOps Services
  • Developers IBM MX
  • Digital Commerce
  • Digital Experience
  • Finance
  • Global Entrepreneur Program
  • Hadoop
  • Hybrid Cloud Core
  • Hyper Protect
  • IBM Cloud platform
  • IBM Design
  • IBM Forms Experience Builder
  • IBM Maximo Developer
  • IBM StoredIQ
  • IBM StoredIQ-Cartridges
  • IIDR
  • ITOA
  • InformationServer
  • Integration Bus
  • Internet of Things
  • Kenexa
  • Linux on Power
  • LinuxONE
  • MDM
  • Mainframe
  • Messaging
  • Node.js
  • ODM
  • Open
  • PartnerWorld Developer Support
  • PowerAI
  • PowerVC
  • Predictive Analytics
  • Product Insights
  • PureData for Analytics
  • Push
  • QRadar App Development
  • Run Book Automation
  • Search Insights
  • Security Core
  • Storage
  • Storage Core
  • Streamsdev
  • Supply Chain Business Network
  • Supply Chain Insights
  • Swift
  • UBX Capture
  • Universal Behavior Exchange
  • UrbanCode
  • WASdev
  • WSRR
  • Watson
  • Watson Campaign Automation
  • Watson Content Hub
  • Watson Marketing Insights
  • dW Answers Help
  • dW Premium
  • developerWorks Sandbox
  • developerWorks Team
  • Watson Health
  • More
  • Tags
  • Questions
  • Users
  • Badges