Why machine learning is primarily written in Python
This beginner's blog looks at why machine learning is primarily written in Python.
This is the second part in a beginner-to-beginner blog series, a series from a developer new to artificial intelligence and machine learning to other beginners seeking to learn more about artificial intelligence.
The very first line of code I ever wrote was a classic print (“Hello, World!”) in Python. While it was a new and difficult frontier to me then, I look back now knowing a few more languages and am still grateful Python was my first. It is a simple and elegant language that allowed me to use it to do what I wanted relatively quickly. As my interests in programming expanded to machine learning (ML), I was both surprised and pleased to realize ML is primarily done in Python. However, the reasons behind the popularity of Python were not immediately obvious to me. What exactly about Python makes it one of the best options for a system to learn from data and identify existing patterns, the goal of machine learning?
Python is known as a high-level, extremely readable language especially friendly for beginners. It is consistent in syntax that is helpful for people learning the language in both reading others’ code and writing their own. In fact, the “Zen of Python,” a document summarizing the core philosophies of the language, emphasizes these principles.
Why does this simplicity lend itself well to ML? Because the algorithms and calculations needed for implementation are complex, and it’s unnecessary and inefficient to add further complexities with the actual language used. The threshold for learning is lower so programmers can more quickly explore and try out their code. Furthermore, collaboration can be much easier.
Not only has the syntax been praised for its simplicity but also for being described as “math-like.” It has been noted that the semantic presentation is similar to many mathematical concepts, which lends itself well to the math necessary for ML.
Python is part of a select group of programming languages that are open source. One benefit of open source languages are that they can be customized according to the developer’s needs. In the world of ML where the use cases are constantly growing and evolving, this is helpful for faster development. Beyond the flexibility open source allows, it also means the language is constantly under peer review so bugs can be fixed faster.
Python has a huge amount of resources in terms of documentation and overall online community. It is predicted to be the most discussed language on Stack Overflow by 2020. All programmers know that good “Googling” is half the battle of writing good code, and with the amount of resources Python provides, it makes it much easier for developers to solve bugs and find answers.
While compelling, none of the reasons above fully explain why Python has become the language of choice for ML programming. Ruby, for example, has a lot of syntactic similarities and may be arguably more simple than Python. R may be easier from the start for beginning programmers. Scala runs much faster than Python. The most significant distinction of Python really comes down its libraries.
Libraries and packages
No programming languages come close as Python to matching the amount of libraries that can be easily imported and are built specifically for ML technology.
NumPy and SciPy can be used for scientific and advanced computation, respectively. PyTorch is a deep learning platform specifically created for Python. Another library, scikit-learn, is a “Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages.” NLTK allows for linguistic data analysis and text processing. The list literally goes on and on.
Python is infamous for how easily it allows users to import the many available libraries and use them flexibly in their code. Many of these libraries are adapted for ML programs and of course this is a huge benefit to programmers seeking to make projects using AI and machine learning.
I should mention a few of the drawbacks, of course. The first and most known one being whitespacing. In Python, whitespace is used to denote blocks of code unlike the more explicit curly brackets used in many other languages. The strictness of whitespacing in Python can be quite tricky, especially for beginners. Of course, there are rules to it, and it is something that can be learned, but it is still a huge point of contention in the programming community.
Python is also a bit slow. The primary reason given for this slowness is because Python is a dynamic language, and dynamic languages tend to be slower since it is being interpreted at runtime rather than compiled.
On the other hand, Donald Knuth, author of “The Art of Computer Programming,” is known for saying “Premature optimization is the root of all evil.” In essence, developers spend too much time on worrying about the speed of their code, while most of the time it is already good enough. This quotation would ring even more true for beginner programmers who should primarily be concerned with making their code work in the first place.
Still, is it necessary to know Python?
In consideration of the drawbacks, Python is far from the only choice of languages that can be used in machine learning. Among many others, R, Java™, and C++ are other languages that are used for ML. Furthermore, if someone simply wanted to learn the concepts of ML, he wouldn’t need to write any code at all to get a good understanding. However, to actually apply the concepts, especially as a beginner, there is little doubt that Python is the most popular and offers the most documentation and libraries specifically for ML/AI.
A beginner programmer seeking to work on AI technologies should learn Python. Even if they find they prefer to supplement their programs with another language later in the process, some understanding of the language is vital to both understand previous work and getting a head start on building their own projects.