Artificial Intelligence

Natural Language Processing in Artificial Intelligence: What It Is and Why It Matters

Investor sitting at computer

For the past few decades, the idea of Artificial Intelligence (AI) has moved from being the stuff of pure science fiction to something that seems to be inching closer to reality. To be clear, when I talk about AI, I’m talking about the kinds of sentient machines we have seen in movies like the replicants in Blade Runner, the robotic child in A.I. Artificial Intelligence or the anthropomorphic computer programs in The Matrix movie series.

We have yet to build something that can interact with humans at a level that passes what is known as the Turing Test, meaning it is impossible to tell if we are dealing with a human or a machine. What we have managed to develop are Natural Language Processing (NLP) models that are fed as many samples as possible and then work to autonomously produce responses. Here, I give some background on NLP, talk about where we are now and look at what steps are next for this technology.

What is Natural Language Processing?

Natural language processing is a subfield of artificial intelligence and linguistics that focuses on developing techniques for computers to process, understand, interpret and generate human languages. NLP algorithms typically use machine learning methods to identify patterns in a given set of text data.

The primary goal of NLP is to enable machines to interact with humans using natural language. This can be done by analyzing the structure and meaning behind words, such as syntax and semantics, as well as context-based approaches like discourse analysis or sentiment analysis.

Common tasks in NLP include speech recognition, text classification, topic modeling, machine translation, information extraction, document summarization, question-answering systems and dialogue agents/chatbots.

Natural Language Processing's Evolution and Future

It Was an Enigma Until It Wasn’t

While leading his codebreaking team during World War II to decrypt messages sent using the Dutch-engineered and German-acquired Enigma machine, Alan Turing, who knew better than anyone about the brute force power of machines, began to wonder if machines could go beyond these defined tasks. In 1952, Alan Turing published a paper titled “Computing Machinery and Intelligence” that discussed how machines could be trained to think like humans. This article is considered one of the earliest examples of A.I. research.

Beyond Enigma

Many of you may have heard of or have used Alphabet’s (GOOG) Google Translate service. If you haven’t, I would suggest you copy and paste the URL of this article into the web tool and translate it into another language that you or a friend or family member knows and compare the two. While remarkable, it does occasionally fall short of a human translator who generally has a better grasp of the meaning of certain phrases and doesn’t simply perform a dictionary word match-up. As with many evolving technologies, this is getting better over time, but the tech still suffers from this same issue its forebearers had in the 1960s, namely, understanding the nuances of human interaction.

As with translation bots, chatbots also have their roots in the 1960s. In 1965, Joseph Weizenbaum created ELIZA, one of the first chatbots designed to simulate human conversation by recognizing certain keywords in user input and responding accordingly. While the technology wasn’t widely available at the time, it did mark the beginning of the development of the kind of modern-day conversational AI technology we are now familiar with, like Apple's Siri or Amazon's Alexa devices.

From Call and Response to Unstructured Conversations

The 1970s saw further advancements in NLP including speech recognition software developed by DARPA which allowed users to interact with computers using voice commands rather than typing on a keyboard or mouse. This type of interaction, like Apple’s (AAPL) speech-to-text functionality or Microsoft’s (MSFT) Dragon Naturally Speaking software is very useful but doesn’t really rise to the level of what I would call artificial intelligence. Don’t get me wrong, there is a lot of work that goes into the development of these services to understand each language and even the nuances of regional accents, but at the end of the day, these services are essentially lookup tables that match the recorded input with the desired text output.

Where things start to get interesting is when you get into the realm of letting the computer (algorithm or software really) figure out what it’s doing with as little direction as possible.

What is Chat(GPT)?

Almost all of our interaction with NLP technology is in the form of structured conversations – commands like “Alexa, play this week’s top ten Billboard songs” or “Hey Siri, how do I get to Yankee Stadium.” Directions to the stadium are a little more involved than turning on a light but even still, there’s a very direct request being made.

The latest iteration of the branch of AI that deals with NLP was recently opened up to the public in the form of Open AI’s ChatGPT platform. The platform is free, although there is a link to the “Pro Plan.” I set up an account and started asking questions. What I found interesting is that a search engine will point users to where it thinks the answers are. The Wolfram Alpha website takes the task a little further. For example, if I search “Yankee Stadium” in Google, I get a list of links to the stadium’s website, Wikipedia-based information and news stories about the venue. Making the same request on the Wolfram Alpha site shows some of the same factual information but instead of my having to follow links to get this information it generates a single page with various data points about the stadium.

Yankee Stadium and Ernest Hemingway

When I give the ChatGPT website the prompt “Tell me about Yankee Stadium” I get a roughly 300-word, 6-paragraph response in conversational English that includes items like the original opening date, the origin of the nickname “the house that Ruth built,” Babe Ruth’s famous called home run in the 1932 World Series, the rebuild, current seating capacity. The result wasn’t perfect with reference to Yankee Stadium being “a [sic] iconic baseball stadium” and reference to the old stadium being “closed in 2008 and demolished in 2010, and a new stadium was built on the same location and opened in 2009.” While it is true that the new stadium was completed before the old one was torn down, the language provided is confusing. This is a simple example of this new technology in action, but it is fascinating.

After this exercise, I asked ChatGPT to tell me about Yankee Stadium in the style of Ernest Hemingway. I will share the first few lines:

“Yankee Stadium, the house that Ruth built. It stands tall in the Bronx, a testament to America’s pastime. The roar of the crowd, the crack of the bat, the smell of hotdogs and popcorn. It’s a place of legends, where heroes are made and records are broken.”

For comparison, the initial response started this way:

“Yankee Stadium is a baseball stadium located in the Bronx, New York City. It is the home of the New York Yankees, one of the most successful and popular Major League Baseball teams.”

Looking Ahead

In case you are wondering, I did prompt Chat-GPT to provide a few hundred words on the history of the development of NLP with references. While I did not end up directly quoting the response, it did give me some direction regarding some points to highlight. The reference links that were provided generally did not work. Ultimately, I ended up using the response as my own writing prompt.

We have quickly moved from the era of students (and others) using the internet to find existing papers or services that will produce papers from whole cloth. In fact, with enough samples, a student could train a bot on his or her own writing style and save themselves the hassle of having to put the output in his or her own voice.

Final Thoughts

To be clear, this technology, while robust, does have its limitations. However, If you wanted a high-level review of a complicated topic or wanted to get a quick retelling of some point in history, this tool could be useful.

In Alan Turing’s 1952 paper, he spends some time discussing the idea of what he calls “a learning machine.” In the last pages, he outlines some approaches he might take to develop such a machine. The very last line of Mr. Turing’s paper reads, “We can only see a short distance ahead, but we can see there’s plenty there that needs to be done.” Seventy-one years later, these words ring as true as they ever did.

The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.

Other Topics

Technology

Mark Abssy

Mark Abssy is Head of Indexing at Tematica Research focused on index and Exchange Traded Product development. He has product development and management experience with Indexes, ETFs, ETNs, Mutual Funds and listed derivatives. In his 25 year career he has held product development and management positions at NYSE|ICE, ISE ETF Ventures, Morgan Stanley, Fidelity Investments and Loomis Sayles. He received a BSBA from Northeastern University with a focus in Finance and International Business.

Read Mark's Bio