2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

The languages of AI

The evolution of artificial intelligence (AI) grew with the complexity of the languages available for development. In 1959, Arthur Samuel developed a self-learning checkers program at IBM on an IBM® 701 computer using the native instructions of the machine (quite a feat given search trees and alpha-beta pruning). But today, AI is developed using various languages, from Lisp to Python to R. This article explores the languages that evolved for AI and machine learning.

The programming languages that are used to build AI and machine learning applications vary. Each application has its own constraints and requirements, and some languages are better than others in particular problem domains. Languages have also been created and have evolved based on the unique requirements of AI applications.

Before high-level languages

Early in AI’s history, the only languages that existed were the native languages of the machines themselves. These languages, called machine language or assembly language, were cumbersome to use because simple operations were the only operations that existed (for example, move a value from memory to a register, subtract the contents of a memory address from the accumulator). Likewise, the data types of the machine were the only types available, and they were restricted. However, even before high-level languages appeared, complex AI applications were being developed.

In 1956, one of the founding fathers of AI, John McCarthy, created a tree search pruning scheme called alpha-beta pruning. This work occurred at a time when many AI problems were considered search problems and while considerable research activity was happening. Memory and compute power were also limited, but this technique allowed researchers to implement more complex problems on early computer systems with limited resources. The alpha-beta pruning technique was applied to early applications of AI in games.

Also in 1956, Arthur Samuel developed a checkers-playing program on an IBM 701 computer using McCarthy’s alpha-beta search. However, Samuel’s game included an advantageous element: Rather than playing the checkers program himself to teach it how to play, Samuel introduced the idea of self-learning and allowed the program to play itself. Samuel developed his program in the native instruction set of the IBM 701 system, which was quite a feat given the complexity of his application and the low-level instructions at his disposal.

Eras of AI language evolution

The history of AI is full of timelines, some of which are very detailed. I reduce the recent history of AI to three segments based on the evolution of languages that occurred. These segments are the early years (1954-1973), turbulent times (1974-1993), and the modern era (1994 to the present).

Graphical time line showing the development of the major AI languages by date, from 1960 to 2010

The early years (1954-1973)

The early years were a time of discovery—the introduction of new machines and their capabilities and the development of high-level languages that could use the power of these machines for a broad range of applications.

In 1958, a chess-playing program called NSS (named after its authors, Newell, Shaw, and Simon) was developed for the IBM 704 computer. This program viewed chess in terms of search and was developed in Information Processing Language (IPL), also developed by the authors of NSS. IPL was the first language to be developed for the purpose of creating AI applications. IPL was a higher-level language than machine language, but only slightly. It did, however, permit developers to use the language on various computer systems.

IPL introduced numerous features that are still used today, such as lists, recursion, higher-order functions, symbols, and even generators that could map a list of elements to a function that would iterate and process the list. The first version of IPL was never implemented, but subsequent versions (2 – 6) were implemented and used on systems like the IBM 704, IBM 650, and IBM 7090, among others. Some of the other early AI applications that were developed in IPL include Logic Theorist and General Problem Solver.

Despite the success of IPL and its wide deployment on the computer architectures of the day, IPL was quickly replaced by an even higher-level language that is still in use almost 60 years later: LISP. IPL’s esoteric syntax gave way to the simpler and more scalable LISP, but IPL’s influence can be seen in its later counterpart, particularly its focus on lists as a core language feature.

LISP—the LISt Processor—was created by John McCarthy in 1958. McCarthy’s goal after the Dartmouth Summer Research Project on AI in 1956 was to develop a language for AI work that was focused on the IBM 704 platform. FORTRAN was introduced in 1957 on the same platform, and work at IBM extended FORTRAN to list processing in a language called the FORTRAN List Processing Language (FLPL). This language was used successfully for IBM’s plane geometry project, but as an extension to FORTRAN, FLPL lacked some key features.

LISP was a foundational programming language and implemented many of the core ideas in computer science, such as garbage collection, trees, dynamic typing, recursion, and higher-order functions. LISP not only represented data as lists but even defined the source code itself as lists. This feature made it possible for LISP to manipulate data as well as LISP code itself. LISP is also extensible, allowing programmers to create new syntax or even new languages (called domain-specific languages) to be embedded within LISP.

The following example illustrates a LISP function to compute the factorial of a number. In the snippet, note the use of recursion to calculate the factorial (calling factorial within the factorial function). This function could be invoked with (factorial 9).

(defun factorial (n)
   (if (= n 0) 1
       (* n (factorial (- n 1)))))

In 1968, Terry Winograd developed a ground-breaking program in LISP called SHRDLU that could interact with a user in natural language. The program represented a block world, and the user could interact with that world, directing the program to query and interact with the world using statements such as “pick up the red block” or “can a pyramid be supported by a block?” This demonstration of natural language understanding and planning within a simple physics-based block world created considerable optimism for AI and the LISP language.

Turbulent times (1974-1993)

The turbulent times represent a period of instability in the development and funding of AI applications. This era began with the first AI winter where funding disappeared because of a failure to meet expected results. In 1980, expert systems rekindled excitement for and funding of AI (as did advancements in connectionist architectures), but by 1987, the AI bubble burst again, despite the advancements made during this time, which led to the second AI winter.

LISP continued to be used in a range of applications during this time and also proliferated through various dialects. LISP lived on through Common LISP, Scheme, Clojure, and Racket. The ideas behind LISP continued to advance through these languages and others outside the functional domain. LISP continues to power the oldest computer algebra system, called Macsyma (Project MAC’s SYmbolic MAnipulator). Developed at the Massachusetts Institute of Technology’s AI group, this computer algebra environment is the grandfather of many programs, like Mathematica, Maple, and many others.

Other languages began to appear in this time frame, not necessarily focused on AI but fueling its development. The C language was designed as a systems language for UNIX systems but quickly grew to one of the most popular languages (with its variants, such as C++), from systems to embedded device development.

A key language in this time was developed in France and called Prolog (Programming in Logic). This language implemented a subset of logic called Horn clauses and allowed information to be represented by facts and rules and to allow queries to be executed over these relations. The following simple Prolog example illustrates the definition of a fact (Socrates is a man) and a rule that defines that if someone is a man, he is also mortal:

man( socrates ).                    // Fact: Socrates is a man.
mortal( X ) :- man( X ).            // Rule: All men are mortal.

Prolog continues to find use in various areas and has many variants that incorporate features such as object orientation, the ability to compile to native machine code, and interfaces to popular languages (such as C).

One of the key applications of Prolog in this time frame was in the development of expert systems (also called production systems). These systems supported the codification of knowledge into facts, and then rules used to reason over this information. The problem with these systems is that they tended to be brittle, and maintaining the knowledge within the system was cumbersome and error prone.

An example expert system was the eXpert CONfigurer (XCON), which was used to configure computing systems. XCON was developed in 1978 in OPS5 (a production system language written in LISP) that used forward-chaining for inference. By 1980, XCON was made up of 2,500 rules but was too expensive to maintain.

Prolog and LISP weren’t the only languages used to develop production systems. In 1985, the C Language Integrated Production System (CLIPS) was developed and is the most widely used system to build expert systems. CLIPS operates on a knowledge system of rules and facts but is written in C and provides an interface to C extensions for performance.

The failure of expert systems was one factor that led to the second AI winter. Their promise and lack of delivery resulted in significant reductions in funding for AI research. However, new approaches rose from this winter, such as a revival of connectionist approaches, bringing us to the modern era.

The modern era (1994 to present)

The modern era of AI brought a practical perspective to the field and clear success in the application of AI methods to real-world problems, including some problems from early in AI’s history. The languages of AI also showed an interesting trend. While new languages were applied to AI problems, the workhorses of AI (LISP and Prolog) continued to find application and success. This era also saw the revival of connectionism and new approaches to neural networks, such as deep learning.

The explosion of LISP dialects resulted in a unification of LISP into a new language called Common LISP, which had commonality with the popular dialects of the time. In 1994, Common LISP was ratified as American National Standards Institute Standard X3.226-1994.

Diverse programming languages began to appear in this time frame, some based on new ideas in computer science, others focused on key characteristics (such as multiparadigm and being easy to learn). One key language fitting this latter category is Python. Python is a general-purpose interpreted language that includes features from many languages (such as object-oriented features and functional features inspired by LISP). What makes Python useful in the development of intelligent applications is the many modules available outside the language. These modules cover machine learning (scikit-learn, Numpy), natural language and text processing (NLTK), and many neural network libraries that cover a broad range of topologies.

The R language (and the software environment in which you use it) follows the Python model. R is an open source environment for statistical programming and data mining, developed in the C language. Because a considerable amount of modern machine learning is statistical in nature, R is a useful language that has grown in popularity since its stable release in 2000. R includes a large set of libraries that cover various techniques; it also includes the ability to extend the language with new features.

The C language has continued to be relevant in this time. In 1996, IBM developed the smartest and fastest chess-playing program in the world, called Deep Blue. Deep Blue ran on a 32-node IBM RS/6000 computer running the IBM AIX® operating system and was written in C. Deep Blue was capable of evaluating 200 million positions per second. In 1997, Deep Blue became the first chess AI to defeat a chess grandmaster.

IBM returned to games later in this period, but this time less structured than chess. The IBM Watson® question-and-answer system (called DeepQA) was able to answer questions posed in natural language. The IBM Watson knowledge base was filled with 200 million pages of information, including the entire Wikipedia website. To parse the questions into a form that IBM Watson could understand, the IBM team used Prolog to parse natural-language questions into new facts that could be used in the IBM Watson pipeline. In 2011, the system competed in the game Jeopardy! and defeated former winners of the game.

With a return to connectionist architectures, new applications have appeared to change the landscape of image and video processing and recognition. Deep learning (which extends neural networks into deep, layered architectures) are used to recognize objects in images or video, provide textual descriptions of images or video with natural language, and even pave the way for self-driving vehicles through road and object detection in real time. These deep learning networks tend to be so large that traditional computing architectures cannot efficiently process them. However, with the introduction of graphics processing units (GPUs), these networks can now be applied.

To use GPUs as neural network accelerators, new languages were needed to bring traditional CPUs and GPUs together. An open standard language called the Open Computing Language (OpenCL) allows C– or C++-like programs to be executed on GPUs (which consist of thousands of processing elements, simpler than traditional CPUs). OpenCL allows parallelism of operations within GPUs orchestrated by CPUs.

Going further

The past 60 years have seen significant changes in computing architectures along with advances in AI techniques and their applications. These years have also seen an evolution of languages, each with its own features and approaches to problem solving. But today, with the introduction of big data and new processing architectures that include clustered CPUs with arrays of GPUs, the stage is set for a new set of innovations in AI and the languages that power them.