Newsletter Subject

They Couldn’t Tell It Was An AI

From

brownstoneresearch.com

Email Address

feedback@e.brownstoneresearch.com

Sent On

Tue, Jul 2, 2024 08:16 PM

Email Preheader Text

They Couldn’t Tell It Was An AI By Jeff Brown, Editor, The Bleeding Edge Having long been a gra

Desktop View
HTML
Text
Mobile View

Go Premium to Unlock

Subscribe Now

[The Bleeding Edge]( They Couldnât Tell It Was An AI By Jeff Brown, Editor, The Bleeding Edge Having long been a grand challenge of computer science, the Turing test waits patiently for a machine intelligence to pass its clever game… Created by Alan Turing in 1950 – far before semiconductors existed – the test was designed to determine if a machine – an AI – could be indistinguishable from a human. The construct of the test is simple. A human evaluator would observe a conversation between two parties, one being a machine – an AI – and the other being a human. The conversation would take place only by text. If the evaluator can’t consistently tell the difference between a human and an AI, then the Turing test has been passed. It seems straightforward, and we’d probably think that it should have been passed already. But no AI has yet claimed the prize. Or perhaps one has… Time to Pass Turing’s Test Every couple of years, there tends to be talk that the Turing test has been passed. But then it soon appears that it wasn’t. The last time this happened was in June 2022 when an engineer at Google not only claimed that its AI passed the test, but that it was conscious and sentient. It was an exciting moment because large language models (LLM) were advancing quickly. OpenAI already had GPT-3, which was far from perfect but showing great potential. And the much more secretive Google had been working on its LaMDA LLM. It was a moment in time where many in the industry thought it just might be possible, which is why there was such excitement. And with limited information from Google, it led to that much more speculation. After a few weeks, though, the excitement passed. It became understood that the AI was definitely intelligent and capable, but not humanlike enough to pass the Turning test. And definitely not sentient. But given the developments of the last 12 months and the recent release of OpenAI’s GPT-4o multi-modal large language model, I’ve been a bit surprised that the topic of the Turing test hasn’t resurfaced. It feels like it’s time… Which is why I was so excited to review a [recent paper by two scientists from the University of California San Diego](. Can You ID This AI? The title sums it up perfectly: The Turing test was structured with a human interrogator, who would converse with one of four possible “witnesses.” There were 500 participants in the trial. The first group of interrogators unknowingly communicated with human witnesses, whose job was to try and convince them that they were human. And the remaining four groups of interrogators were randomly assigned either a human witness, GPT-4, GPT-3.5, or ELIZA, the last three being AIs. The user interface for the Turing test was like a familiar messaging application. Excerpts of some of the conversations are shown below… [(click here to expand image)]( Well? How did you do? I’ll give you a hint: Only one of the above chats is human. Can you guess which? (Answers at the end for readers inclined to participate.) The results of the actual trial were pretty incredible. GPT-4 was judged to be human by human interrogators 54% of the time… GPT-3.5 was judged to be human 50% of the time… And the much less sophisticated ELIZA model – which was intended only as a baseline – was judged human only 22% of the time. Think about that. The majority of the human interrogators thought that they were talking to a human… when they were actually speaking with an AI – GPT-4. Is that it? Are we there? Not so fast… The Big Trophy The nuance of the test was that the conversations were limited to just five minutes. That’s not a lot of time to interrogate the witness and form an opinion about whether or not the witness is human or machine. With that said, it was still a useful exercise. And one that definitely demonstrates a significant truth: We’re on the cusp of radical change. And arguably the most interesting data that came out of the research regarded humans conversing with other humans. Shown above in blue, human interrogators only recognized that they were communicating with a human 67% of the time. That means that human interrogators thought they were communicating with an AI, despite speaking with a human, 33% of the time. They couldn’t tell a human was a human. Sounds crazy, I know. I believe that this is heavily influenced by the general awareness that AI technology has advanced so much that humanlike conversation is expected from the leading LLMs. The reality is that, for most of us, it would be hard to tell the difference when conversing with GPT-4, or any of the newer models available, in this kind of randomized, controlled trial. So naturally, the real question is – what about OpenAI’s GPT-4o, Google's Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, or xAI’s Grok 1.5? I’d bet anything that the test results would be even more impressive. Probably good enough technology for longer conversations, as well. They could arguably pass the Turing test. Sure, skilled interrogators and computer scientists – given enough time – would likely be able to determine the difference. But for most, it would be too difficult to tell the difference. So why hasn’t it been done yet? To put it bluntly: It’s not the prize the industry is working towards. The Turing test is just a game. The industry, meanwhile, is racing towards lifelike artificial general intelligence (AGI). That’s the trophy that all the big players want to hoist. Ridiculously Humanlike Speech To that end, OpenAI has started to quietly release an alpha version of its advanced voice mode. OpenAI is clearly in testing mode, as the message above explains. But the direction is clear. And it is expected to roll out Advanced Voice Mode to all users later this year. The new improvements will result in more natural conversations with human emotion and tone. And the ability to turn on our camera and share our surroundings with the AI tells us that the model is multi-modal, capable of “seeing” and understanding the real world. For anyone who would like to hear how incredible the natural language of the AI sounds today, with emotion and tone, just [click here to hear a one-minute clip of the AI telling a story](. Better yet, the AI inserts – in real-time – sound effects to bring the story to life. It’s nearly impossible to tell machine versus human already. We don’t need a Turing test to tell us that. And based on what’s happening right now, the Turing test won’t need to be limited to a chat window. Before the end of the year, it will be possible to run the test using speech rather than text. Why not have the interrogator and the witness speak over the phone instead? Emotion, tone, and speech cadence are what make us human. And rather than chasing a test held in a chat window, the industry is [manifesting AI]( in a way that feels natural and comfortable to us humans. So natural, in fact, that we won’t be able to tell the difference. --------------------------------------------------------------- Results: A: AI (GPT-4) B: Human C: AI (GPT-3.5) D: AI (ELIZA) --------------------------------------------------------------- Like what you’re reading? Send your thoughts to feedback@brownstoneresearch.com. [Brownstone Research]( Brownstone Research 55 NE 5th Avenue, Delray Beach, FL 33483 [www.brownstoneresearch.com]( To ensure our emails continue reaching your inbox, please [add our email address]( to your address book. This editorial email containing advertisements was sent to {EMAIL} because you subscribed to this service. To stop receiving these emails, click [here](. Brownstone Research welcomes your feedback and questions. But please note: The law prohibits us from giving personalized advice. To contact Customer Service, call toll free Domestic/International: 1-888-512-0726, Mon–Fri, 9am–7pm ET, or email us [here](mailto:memberservices@brownstoneresearch.com). © 2024 Brownstone Research. All rights reserved. Any reproduction, copying, or redistribution of our content, in whole or in part, is prohibited without written permission from Brownstone Research. [Privacy Policy]( | [Terms of Use](

Edit & Download HTML

Add To Favourites

EDM Keywords (161)

years year xai would working witness whole whether well way use us university understanding turn try trophy trial topic tone time thoughts text test tends tell talking talk surroundings subscribed structured story still started speculation simple shown share service sentient sent seeing said run roll review resurfaced results result redistribution recognized reality rather questions put prize possible perfectly perfect passed pass participate part opinion openai one nuance need naturally natural much moment model might message means many majority machine lot long limited like life led know kind judged interrogators interrogator interrogate intended industry indistinguishable incredible id human hint hear hard happened guess google given give game form feedback fast far fact explains expected excitement excited even evaluator ensure engineer end emotion eliza direction difficult difference developments determine designed definitely cusp convince conversing conversations conversation content construct conscious communicating comfortable click clearly clear claimed chats chasing capable camera came bring bluntly believe baseline based arguably anyone answers ais ai advanced able ability 22

brownstoneresearch.com

The Bleeding Edge

Follow domain to get weekly email update

Marketing emails from brownstoneresearch.com

Sent On

05/07/2024

Sent On

04/07/2024

Sent On

03/07/2024

Sent On

01/07/2024

Sent On

29/06/2024

Sent On

27/06/2024

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

No.	Font Name
Subscribe Now

They Couldn’t Tell It Was An AI

Email Preheader Text

EDM Keywords (161)

brownstoneresearch.com

Marketing emails from brownstoneresearch.com

Email Content Statistics

Font Used