Digital Developer Conference: a FREE half-day online conference focused on AI & Cloud – North America: Nov 2 – India: Nov 9 – Europe: Nov 14 – Asia Nov 23 Register now

Close outline
  • United States
IBM?
  • Site map
IBM?
  • Marketplace

  • Close
    Search
  • Sign in
    • Sign in
    • Register
  • IBM Navigation
IBM Developer Answers
  • Spaces
    • Blockchain
    • IBM Cloud platform
    • Internet of Things
    • Predictive Analytics
    • Watson
    • See all spaces
  • Tags
  • Users
  • Badges
  • FAQ
  • Help
Close

Name

Community

  • Learn
  • Develop
  • Connect

Discover IBM

  • ConnectMarketplace
  • Products
  • Services
  • Industries
  • Careers
  • Partners
  • Support
10.190.13.195

Refine your search by using the following advanced search options.

Criteria Usage
Questions with keyword1 or keyword2 keyword1 keyword2
Questions with a mandatory word, e.g. keyword2 keyword1 +keyword2
Questions excluding a word, e.g. keyword2 keyword1 -keyword2
Questions with keyword(s) and a specific tag keyword1 [tag1]
Questions with keyword(s) and either of two or more specific tags keyword1 [tag1] [tag2]
To search for all posts by a user or all posts with a specific tag, start typing and choose from the suggestion list. Do not use a plus or minus sign with a tag, e.g., +[tag1].
  • Ask a question

Inefficient SPSS Macro

310002H4J3 gravatar image
Question by cybermark  (1) | Sep 20, 2016 at 08:13 AM spssstatisticssyntaxmacro

I have details relating to over 700,000 customer contacts. Our customers may use more than one address (e.g. home and business) and more than one telephone number (e.g. mobile or landline). It is also possible that there may be more than one name (e.g. mis-spelling) recorded for a particular person. I have identified 511,000 unique combinations of name / address / telephone number. Also, there are 4,000 unique names, 210 unique telephone numbers and 2,000 unique addresses.

My challenge is to identify all on the customer contacts that may be related to a single customer. I have a file which lists all 700K contacts, along with the unique identifier created for each combination of name / address / telephone number. We also know all of the names, addresses and telephone numbers associated with that contact.

In testing, and on small scale data, the following macro works well. However, it is proving impossible for the macro to manage the entire dataset as a single entity. Is there another, more efficient, way that this process could be written in SPSS?

Thank you in anticipation.

SET MITERATE 1000001. DEFINE !Repeat_Contact(). !Do !Repeat_Contact = 1 !TO 155649.

DATASET ACTIVATE CONTACT_DATA_FILE. DATASET COPY Nominal_DATA. DATASET ACTIVATE Nominal_DATA.

COUNT TARGET=UNIQUE_REF to END_1 !CONCAT ("('","URN_",!Repeat_Contact,"')"). IF (TARGET ge 1) Group_URN = !QUOTE (!CONCAT ("Group_URN_",!Repeat_Contact)). SELECT IF Group_URN = !QUOTE (!CONCAT ("Group_URN_",!Repeat_Contact)).

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=Group_URN /Num_Link_Contacts=N.

Select if (Num_Link_Contacts ge 5). Execute.

DATASET ACTIVATE Output_TEMPLATE. ADD FILES /FILE=* /FILE='Nominal_DATA'. EXECUTE.

DATASET CLOSE Nominal_DATA. !DOEND. !ENDDEFINE.

  • RUN MACRO!. *SET MPRINT ON. !Repeat_Contact.

People who like this

  0
Comment
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster

1 reply

  • Sort: 
270003FAQU gravatar image

Answer by jkpeck (6143) | Sep 20, 2016 at 09:14 AM

I do agree that this macro is inefficient. For starters, it is copying the whole input dataset a huge number of times. Answers to a few questions might help to craft a more efficient way.

What constitutes a match? What is the framework? Would you start with one basic dataset of customers that has all these variables and then look for matches in the contact database with some number of the attributes? One approach might be to convert the names to a canonical representation such as NIISYS and then look for matches? or use some string similarity measure?

What is the penalty for error, particularly if you miss a contact that should belong to a known customer. If this is for statistical purposes, perhaps some errors are accceptable.

Comment

People who like this

  0   Share
10 |3000 characters needed characters left characters exceeded
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster

Follow this question

84 people are following this question.

Answers

Answers & comments

Related questions

use Syntax/Macro to automate generation of file names 4 Answers

HELP: automate file selection 17 Answers

Output Export Problem 3 Answers

How to use custom dialog variable list in CTABLES /TABLE subcommand 2 Answers

Conditional poisson regression 1 Answer

  • Contact
  • Privacy
  • IBM Developer Terms of use
  • Accessibility
  • Report Abuse
  • Cookie Preferences

Powered by AnswerHub

Authentication check. Please ignore.
  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • API Connect
  • Analytic Hybrid Cloud Core
  • Application Performance Management
  • Appsecdev
  • BPM
  • Blockchain
  • Business Transaction Intelligence
  • CAPI
  • CAPI SNAP
  • CICS
  • Cloud Analytics
  • Cloud Automation
  • Cloud Object Storage
  • Cloud marketplace
  • Collaboration
  • Content Services (ECM)
  • Continuous Testing
  • Courses
  • Customer Experience Analytics
  • DB2 LUW
  • Data and AI
  • DataPower
  • Decision Optimization
  • DevOps Build
  • DevOps Services
  • Developers IBM MX
  • Digital Commerce
  • Digital Experience
  • Finance
  • Global Entrepreneur Program
  • Hadoop
  • Hybrid Cloud Core
  • Hyper Protect
  • IBM Cloud platform
  • IBM Design
  • IBM Forms Experience Builder
  • IBM Maximo Developer
  • IBM StoredIQ
  • IBM StoredIQ-Cartridges
  • IIDR
  • ITOA
  • InformationServer
  • Integration Bus
  • Internet of Things
  • Kenexa
  • Linux on Power
  • LinuxONE
  • MDM
  • Mainframe
  • Messaging
  • Node.js
  • ODM
  • Open
  • PartnerWorld Developer Support
  • PowerAI
  • PowerVC
  • Predictive Analytics
  • Product Insights
  • PureData for Analytics
  • Push
  • QRadar App Development
  • Run Book Automation
  • Search Insights
  • Security Core
  • Storage
  • Storage Core
  • Streamsdev
  • Supply Chain Business Network
  • Supply Chain Insights
  • Swift
  • UBX Capture
  • Universal Behavior Exchange
  • UrbanCode
  • WASdev
  • WSRR
  • Watson
  • Watson Campaign Automation
  • Watson Content Hub
  • Watson Marketing Insights
  • dW Answers Help
  • dW Premium
  • developerWorks Sandbox
  • developerWorks Team
  • Watson Health
  • More
  • Tags
  • Questions
  • Users
  • Badges