Newsletter Subject

Stumping the AI

From

bloomberg.com

Email Address

noreply@news.bloomberg.com

Sent On

Fri, Aug 16, 2024 11:05 AM

Email Preheader Text

Hi, it’s Rya in San Francisco. Are you a New York Times word game enthusiast? The Mini Crosswor

Hi, it’s Rya in San Francisco. Are you a New York Times word game enthusiast? The Mini Crossword, Wordle and the infuriating Connections puz [View in browser]( [Bloomberg]( Hi, it’s Rya in San Francisco. Are you a New York Times word game enthusiast? The Mini Crossword, Wordle and the infuriating Connections puzzle have drawn many fans, and some of them tested how AI might fare. But first... Three things you need to know today: • Autodesk kept going with a risky sales practice [after promising to stop]( • China’s AMEC is suing the Pentagon as it [tries to void US sanctions]( • A CIA fund backed Yale scientists [developing quantum error correction]( AIn’t so smart Like many people on the internet, I have a love-hate relationship with [Connections]( For the uninitiated, the game takes place on a 4x4 virtual grid with sixteen words placed on it. The player’s job is to group those into sets of four, with each grouping becoming progressively harder. An easy set could include synonyms for conformists — followers, lemmings, puppets, and sheep — while a more challenging selection might be the phonological names of cities — such as deli, niece, roam and soul. If this metropolis grouping strikes you as bizarre, you aren’t alone. The game has become notorious for its brain-bending test of abstract reasoning skills. Players have made a pastime of dunking on Connections, writing on the social media site X that the game “chose violence today,” “deserves jail time” and is conditioning people to “find patterns that aren’t there.” But take solace that artificial intelligence bots aren’t faring much better than us. They can only solve the entire game 8% of the time. We know this because a group of students in a computer science class at Barnard College decided to test the Connections skills of chatbots. They asked the latest models from OpenAI, Alphabet Inc.’s Google, Anthropic and Meta Platforms Inc. to solve 200 games, and found their performance was worse than human novices and much worse than human experts. It soon dawned on the students that their project wasn’t just nerdy fun. They had stumbled upon a sophisticated way to test chatbots’ reasoning abilities, which is precisely what researchers are trying to measure and companies are trying to improve. At a recent all-hands OpenAI meeting, leadership told employees that the startup was [on the cusp of its systems becoming “reasoners”]( — meaning they can do basic problem solving. Executives showed a demonstration of how OpenAI’s most advanced systems can answer word problems that have stumped models in the past. While it isn’t clear if Connections was one of those word problems, the research by the Barnard students — who developed the class project with their professor into an academic paper — establishes this viral internet game as a valuable and challenging benchmark for AI’s reasoning abilities. Connections is designed to test different types of knowledge — encyclopedic, semantic, associative and linguistic. For the 200 games, the researchers classified the types of knowledge required to solve each category so they could test how well AI can solve different types of problems. They found that while AI is good at solving some problems involving semantic knowledge, other categories are much more challenging. For example, AI can easily group together followers, lemmings, puppets and sheep, because they share the same broad semantic meaning. However, it found associative categories harder, such as basketball, carrot, goldfish and pumpkin — things that are orange — and got stumped by categories that combine knowledge types like deli, niece, roam and soul, which requires linguistic and encyclopedic knowledge. “When it is required to think outside the box, or do any kind of divergent thinking, it struggles a lot,” said research scientist Tuhin Chakrabarty, who was a teaching assistant for the Barnard class and a [co-author of the paper]( The team’s findings can be used by researchers to improve specific kinds of abstract reasoning in their models, he added. The game designers of Connections intentionally place “red herrings” or distractors on the grid to confuse players. AI often falls into the trap of these red herrings, because it leaps into solving the game step by step without considering the big picture. “It's not good at viewing the whole puzzle as a problem in itself, which is one of the biggest shortcomings,” said Mariam Mustafa, one of the Barnard students and a co-author of the paper. If a grid has Monday, Tuesday, Wednesday and Thursday, the AI will likely group them together without considering that the grid also contains Morticia, Gomez and Pugsley, all Addams family characters that could be grouped with Wednesday (the daughter in the family). Because AI is trained to produce the most likely next word, “it will say the thing that is most obvious without exploring all 16 words,” said Chakrabarty. “It is abstract reasoning in the presence of distractors – that is super hard for humans, and for LLMs it’s even harder.” While AI companies continue working to improve their models’ reasoning skills, the takeaway for the researchers in the current moment is clear: Even after ingesting all of this data, AI still can’t solve the puzzle that everyone loves to hate.—[Rya Jetha](mailto:rjetha1@bloomberg.net) The big story Google now displays convenient AI-based answers at the top of its search pages — meaning users may never click through to the websites whose data is being used to power those results. But many site owners say they can’t afford to block Google’s AI from summarizing their content, because blocking the AI would also [hamper a site’s ability to be discovered online.]( Get fully charged Chinese tech stocks rose after JD.com beat expectations and Alibaba held steady against a [stubbornly reluctant consumer demand.]( BetMGM betting is coming to Brazil in early 2025, if a joint venture receives a [license from the government this fall.]( Starlink rival AST jumped more than 50% to close at a record after confirming an early September window [for inaugural commercial launch.]( More from Bloomberg Get Bloomberg Tech weeklies in your inbox: - [Cyber Bulletin]( for coverage of the shadow world of hackers and cyber-espionage - [Game On]( for reporting on the video game business - [Power On]( for Apple scoops, consumer tech news and more - [Screentime]( for a front-row seat to the collision of Hollywood and Silicon Valley - [Soundbite]( for reporting on podcasting, the music industry and audio trends - [Q&AI]( for answers to all your questions about AI Follow Us Stay updated by saving our new email address Our email address is changing, which means you’ll be receiving this newsletter from noreply@news.bloomberg.com. Here’s how to update your contacts to ensure you continue receiving it: - Gmail: Open an email from Bloomberg, click the three dots in the top right corner, select “Mark as important.” - Outlook: Right-click on Bloomberg’s email address and select “Add to Outlook Contacts.” - Apple Mail: Open the email, click on Bloomberg’s email address, and select “Add to Contacts” or “Add to VIPs.” - Yahoo Mail: Open an email from Bloomberg, hover over the email address, click “Add to Contacts.” Like getting this newsletter? [Subscribe to Bloomberg.com]( for unlimited access to trusted, data-driven journalism and subscriber-only insights. Want to sponsor this newsletter? [Get in touch here](. You received this message because you are subscribed to Bloomberg's Tech Daily newsletter. If a friend forwarded you this message, [sign up here]( to get it in your inbox. [Unsubscribe]( [Bloomberg.com]( [Contact Us]( Bloomberg L.P. 731 Lexington Avenue, New York, NY 10022 [Ads Powered By Liveintent]( [Ad Choices](

Marketing emails from bloomberg.com

View More
Sent On

07/12/2024

Sent On

06/12/2024

Sent On

08/11/2024

Sent On

08/11/2024

Sent On

07/11/2024

Sent On

03/11/2024

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

Font Used

No. Font Name
Subscribe Now

Copyright © 2019–2024 SimilarMail.