This is just intended to be a quick tip on cleaning up string fields in SPSS. Frequently if I am parsing a field or matching string records (such as names or addresses) I don’t want extra ascii characters besides names and/or numbers in the field. For example, if I have a name I might want to eliminate hyphens or quotes, or if I have a field that is meant to be a house number I typically do not want any alpha character in the end (geocoding databases will rarely be able to tell the difference between Apt’s
We can use a simple loop and the PIB variable format in SPSS to clean out unwanted ascii codes in string characters. So for instance if I wanted to replace all the numbers with nothing in a string field I could use this code below (where
OrigField is the original field with the numbers contained, and
CleanField is the subsequent cleaned variable).
string CleanField (A5). compute CleanField = OrigField. loop #i = 48 to 57. compute CleanField = REPLACE(CleanField,STRING(#i,PIB),""). end loop.
DEC column in the linked ascii table corresponds to the ascii character code in SPSS’s PIB format. The numbers 0 through 9 end up being 48 to 57 in decimal values, so I create a string corresponding to those characters via the
string(#i,PIB) commmand and replace them with nothing in the
REPLACE command. I loop through values of 48 to 57 to get rid of all numeric values.
This extends to potentially all characters, for instance if I want to return only capital alpha characters, I could use a loop with an if statement like below;
string CleanField (A5). compute CleanField = OrigField. loop #i = 1 to 255. if #i < 65 or #i > 90 CleanField = REPLACE(CleanField,STRING(#i,PIB),""). end loop.
There are (a lot) more than 255 ascii characters, but that should suffice to clean up most string fields in English.