Win $20,000. Help build the future of education. Answer the call. Learn more

Use Watson to classify programming languages based on code

The rise of Python as the go-to programming language for data scientists has made the field one of the more monoglot developer communities, to the point where, given a snippet of non-Python code, a data scientist may ask, “What programming language am I even looking at?” Luckily, machine-learning models can be built to perform programming language detection for data scientists.

The code pattern titled Classify programming languages will go over a few such approaches, including Naive Bayes and leveraging Watson™ APIs to classify a program to its programming language based on its text. The data set we used was built using GitHub APIs and collected from the IBM website. Models built are then tested for accuracy. The pattern walks data scientists through an introduction to some machine-learning and data engineering concepts, without requiring an image classifier dataset like MNIST again! Once the pattern is finished, its Python-heavy users may even discover a newfound understanding of a different programming language. Check it out and give it a try.