2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Chinese Phonetic Similarity Estimator


The Chinese Phonetic Similarity Estimator provides a phonetic algorithm for indexing Chinese characters by sound. Given two Chinese words of the same length, the model determines the distances between the two words and also returns a few candidate words which are close to the given word(s). The code complies with the phonetic principles of Mandarin Chinese as guided by the Romanization defined in ISO 7098:2015. The model is based on the DimSim model.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
NLP Text Clustering/Phonetics Social Media Python N/A Chinese Text (utf-8 encoded)



Component License Link
Model GitHub repository Apache 2.0 LICENSE
Model Weights N/A N/A
Model Code (3rd party) Apache 2.0 LICENSE
Test assets N/A N/A

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Run Locally as a library from PyPi: follow the instructions in the model README on GitHub

  • Deploy from Dockerhub:

    docker run -it -p 5000:5000 codait/max-chinese-phonetic-similarity-estimator
  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-chinese-phonetic-similarity-estimator as the image name.

  • Deploy on Kubernetes:

    kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Chinese-Phonetic-Similarity-Estimator/master/max-chinese-phonetic-similarity-estimator.yaml

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

  • Locally: follow the instructions in the model README on GitHub

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally, run the following command through the terminal:

$ curl -X POST "http://localhost:5000/model/predict?first_word=%E5%A4%A7%E8%99%BE&second_word=%E5%A4%A7%E4%BE%A0&mode=simplifiedθ=1" -H  "accept: application/json"

You should see a JSON response like that below:

  "status": "ok",
  "predictions": [
      "distance": "0.0002380952380952381",
      "candidates": [

Test the model through Python

Open a Python shell through terminal

$ python

Run the following commands through Python to test the model:

import dimsim

dist = dimsim.get_distance("大侠","大虾")

dist = dimsim.get_distance("大侠","大人")

dist = dimsim.get_distance(['da4','xia2'],['da4','xia1']], pinyin=True)

dist = dimsim.get_distance(['da4','xia2'],['da4','ren2']], pinyin=True)

Test the model in a serverless app

You can utilize this model in a serverless application by following the instructions in the Leverage deep learning in IBM Cloud Functions tutorial.

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.