IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Look at some of the compelling reasons to adopt Python for scientific research

It might be hard to believe, but the Python programming language isn’t new and is actually more mature than the Java™ language and even HTTP. Unfortunately, however, one of the common misconceptions about Python that continues to persist is that Python is slow.

This misconception is rooted in the fact that interactive versions of Python that use an interpreter and standard Python, which uses a built-in compiler called CPython, are indeed slow. But, while Python interpreters and the Python language may be slower than Fortran or C, Python runtime code is not necessarily slow. Scientific computing packages such as SciPy and NumPy don’t have many of the shortcomings of standard Python.

Besides, there are other major Python implementations than standard Python. These implementations, known as distributions, may in fact be in more widespread use than the standard Python distribution. In addition, you can compile Python to accelerate runtime. Some compiler implementations, such as the Just-in-Time (JIT) compiler PyPy, can produce runtime code that can run as fast or faster than C.

In this post, let’s look at some of the compelling reasons to adopt Python for scientific research. Before we look at the merits of Python, let’s look at the tools researchers currently prefer for scientific research. In a follow-up article, Accelerating Python for scientific research, we’ll look at Python performance optimization and acceleration.

Scientific computation methods

Most scientists use tools like MATLAB or GNU Octave for modeling and scientific computations. For larger problems and more rigorous work, researchers still use Fortran and, to a lesser extent, C/C++.

This is mainly because the modeling tools have built-in Fortran-friendly extension capabilities that can seamlessly extend into native Fortran. It is also possible to use C/C++, although in general, Fortran is faster than C/C++ for numerical calculations. Another benefit to Fortran is that it can use OpenMP and Open MPI to easily convert programs for use in supercomputer clusters.

To solve problems that don’t need the brute performance of a high-end supercomputer, workstations are a good alternative. Workstations exploit high-end GPU cards or CPU coprocessors to run tasks in parallel.

Fortran optimization on these platforms often requires the use of special vendor-specific and platform-specific compilers. For example, you would need Intel® Fortran compilers to optimally run programs on Intel CPU clusters or GP-GPU using the Open Computing Language (OpenCL). Similarly, you would need to use PGI CUDA Fortran to take advantage of NVIDIA® GPUs.

Why use Python for scientific computing?

So, with so many options, “Why use Python for scientific computing at all?” It turns out that there are many compelling reasons. Let’s first look at some of the strengths of Python in the scientific computing context which underpin the rationale for using Python.


reasons to use Python for scientific computing?

  • Python has built-in support for scientific computing. Most Python distributions include the SciPy ecosystem (open source) which includes SciPy (a SciPy library), a numerical computation package called NumPy, and multiple independent toolkits, each known as a Scikits. Matplotlib, a 2D plotting library for visualization, is also part of the SciPy ecosystem. Matplotlib is much like MATLAB in terms of its functionality and usage but is open source.

  • Python has bridges to MATLAB or Octave. Python can install the MATLAB Engine API so that Python programs can call MATLAB as a computational engine. MATLAB programs can also call Python functions with some limitations. Some distributions of Python may use Python packages such as Pymatbridge, which can support both MATLAB and Octave and allows the use of MATLAB within Jupyter Notebooks.

  • Python is a highly extensible language. Developers have used Python wrappers for C/C++ programs for many years. Python uses the C Foreign Function Interface for Python (cFFI) to interact directly with C code. Tools such as SWIG make this easy to accomplish. Developers can also call Fortran subroutines from Python by using the Fortran to Python interface generator package, F2Py, which is now a part of NumPy.

  • Python has very good input/output (I/O) options. Until recently, Fortran I/O has traditionally been records based. In contrast, Python has long supported multiple options for I/O and many additional packages to support all types of I/O formats, including real-time and streaming formats.

  • Python has strong support for task automation. Python’s built-in scripting features and multiple packages have strong support for task automation. Automation of repetitive tasks and performing data logging are easy and takes little effort.

  • Python can use a web front end. Python packages such as Django and Flask make it possible to develop and use Python as an API with a web front end. This functionality is particularly useful when using a cloud-based infrastructure as a platform to access high-performance computing (HPC) back ends.

Returning to the crux of the matter, let’s focus on how these basic strengths become more pragmatic reasons to adopt Python for scientific computing. The biggest driver for using Python in scientific computing is the evolution of problem-solving approaches.

New scientific problem-solving paradigms

Over the years, the scientific problem-solving toolkit has evolved. Early approaches relied on mathematical modeling and simulation to understand the universe around us. As our understanding of the universe has improved, the body of knowledge has grown so much that models have become extremely complex and difficult to simulate easily. Scientists have developed new approaches to solve scientific problems at scale. Let’s look at a couple of them.

  • The data-driven approach to scientific research: To better handle the data deluge, scientists have begun to shift to data-driven scientific research approaches. Many scientific problems use statistical or Bayesian analysis tools to solve specific classes of problems. Although traditional usage was based on R programming, Python has become the de facto programming language for data scientists.
  • The discovery-based approach to scientific research: Cognitive approaches such as machine learning and deep learning to look for patterns and discover correlations provide scientists another discovery-based approach to scientific research. Discovery is an approach much favored in life sciences research. Machine learning and deep learning frameworks are mostly Python-based.
  • The quantum computing approach to scientific research: The advent of quantum computing has opened entirely new problem-solving approaches to problems that were previously not possible to solve, even with supercomputers. Most quantum computer implementations use a form of Assembly language for programming. Python makes an ideal high-level wrapper and API for these implementations that allow communication between a scientific research application and the quantum computing system back-end.

Python doesn’t replace Fortran

By now, you’ve realized that I am not advocating the replacement of Fortran or C/C++ with Python. Instead, the strengths of Python are in the integration of multiple approaches to problem solving. Scientists are no longer using uni-dimensional approaches to problem-solving. Instead, there is widespread realization that understanding our universe requires a multidimensional approach.

The optimal approach to solving a set of problems may vary and thus we need different tools for each approach. Python serves as a wonderful toolkit with which to integrate all the problem-solving tools into one container. Python gives scientists a powerful way to wrap special-purpose tools and make them easily accessible from a common application layer.

Stay tuned. In a follow-up article called Accelerating Python for scientific research, I will examine how Python can use an appropriate back end such as CPU, GPU or quantum processing backends for acceleration.