Newsletter Subject

Stumping the AI

From

bloomberg.com

Email Address

noreply@news.bloomberg.com

Sent On

Fri, Aug 16, 2024 11:05 AM

Email Preheader Text

Hi, it’s Rya in San Francisco. Are you a New York Times word game enthusiast? The Mini Crosswor

Desktop View
HTML
Text
Mobile View

Go Premium to Unlock

Subscribe Now

Hi, itâs Rya in San Francisco. Are you a New York Times word game enthusiast? The Mini Crossword, Wordle and the infuriating Connections puz [View in browser]( [Bloomberg]( Hi, itâs Rya in San Francisco. Are you a New York Times word gameÂ enthusiast?Â The Mini Crossword, WordleÂ and the infuriating Connections puzzle have drawn many fans, and some of them tested how AI might fare.Â But first... Three things you need to know today: â¢ Autodesk kept going with a risky sales practice [after promisingÂ to stop]( â¢ Chinaâs AMEC is suing the Pentagon as it [tries to void US sanctions]( â¢ A CIA fund backed Yale scientists [developing quantum error correction]( AInât so smart Like many people on the internet, I have a love-hate relationship with [Connections]( For the uninitiated, the game takes place on a 4x4 virtual grid with sixteen words placed on it. The playerâs job is to group those into sets of four, with each grouping becoming progressively harder. An easy set could include synonyms for conformists âÂ followers, lemmings, puppets, and sheep âÂ while a more challenging selection might be the phonological names of cities âÂ such as deli, niece, roam and soul. If this metropolis grouping strikes you asÂ bizarre, you arenât alone.Â The game has become notorious for its brain-bending test of abstract reasoning skills. Players have made a pastime of dunking on Connections, writing on the social media site X that the game âchose violence today,â âdeserves jail timeâ and is conditioning people to âfind patterns that arenât there.â But take solace that artificial intelligence bots arenât faring much better than us. They can only solve the entire game 8% of the time.Â We know this because a group of students in a computer science class at Barnard College decided to test the Connections skills of chatbots. They asked the latest models from OpenAI, Alphabet Inc.âs Google, Anthropic and Meta Platforms Inc. to solve 200 games, and found their performance was worse than human novices and much worse than human experts. It soon dawned on the students that their project wasnât just nerdy fun. They had stumbled upon a sophisticated way to test chatbotsâ reasoning abilities, which is precisely what researchers are trying to measure and companies are trying to improve. At a recent all-hands OpenAI meeting, leadership told employees that the startup was [on the cusp of its systems becoming âreasonersâ]( âÂ meaning they can do basic problem solving. Executives showed a demonstration of how OpenAIâs most advanced systems can answer word problems that have stumped models in the past.Â While it isnât clear if Connections was one of those word problems, the research by the Barnard students âÂ who developed theÂ class project with their professor into an academic paper âÂ establishes this viral internet game as a valuable and challenging benchmark for AIâs reasoning abilities.Â Connections is designed to test different types of knowledge âÂ encyclopedic, semantic, associative andÂ linguistic. For the 200 games, the researchers classified the types of knowledge required to solve eachÂ category so they could test how well AI can solve different types of problems. They found that while AI is good at solving some problems involving semantic knowledge, other categories are much more challenging. For example, AI can easily group together followers, lemmings, puppetsÂ and sheep, because they share the same broad semantic meaning. However, it found associative categories harder, such asÂ basketball, carrot, goldfish and pumpkin âÂ things that are orange âÂ and got stumped by categories that combine knowledge types like deli, niece, roam and soul, which requires linguistic and encyclopedic knowledge.Â âWhen it is required to think outside the box, or do any kind of divergent thinking, it struggles a lot,â said research scientist Tuhin Chakrabarty, who was a teaching assistant for the Barnard class and aÂ [co-author ofÂ the paper]( The teamâs findings can be used by researchers to improve specific kinds of abstract reasoning in their models, he added.Â The game designers of Connections intentionally place âred herringsâ or distractors on the grid to confuse players. AI often falls into the trap of these red herrings, because it leaps into solving the game step by step without considering the big picture.Â âIt's not good at viewing the whole puzzle as a problem in itself, which is one of the biggest shortcomings,â said Mariam Mustafa, one of the Barnard students and a co-author of the paper.Â If a grid has Monday, Tuesday, Wednesday and Thursday, the AI will likely group them together without considering that the grid also containsÂ Morticia, Gomez and Pugsley, allÂ Addams family characters that could be grouped with Wednesday (the daughter in the family). Because AI is trained to produce the most likely next word, âit will say the thing that is most obvious without exploring all 16Â words,â said Chakrabarty. âIt is abstract reasoning in the presence of distractors â that is super hard for humans, and for LLMsÂ itâs even harder.â While AI companies continue working to improve their modelsâ reasoning skills, the takeaway for the researchers in the current moment is clear: Even after ingesting all of this data, AI still canât solve the puzzle that everyone loves to hate.â[Rya Jetha](mailto:rjetha1@bloomberg.net) The big story Google now displays convenient AI-based answers at the top of its search pages â meaning users may never click through to the websites whose data is being used to power those results. But many site owners say they canât afford to block Googleâs AI from summarizing their content, because blocking the AIÂ would also [hamper a siteâs ability to be discovered online.]( Get fully charged Chinese tech stocks rose after JD.com beat expectations and Alibaba held steady against a [stubbornly reluctant consumer demand.]( BetMGM betting is coming to Brazil in early 2025, if a joint venture receives a [license from the government this fall.]( Starlink rival AST jumped more than 50% to close at a record after confirming an early September window [for inaugural commercial launch.]( More from Bloomberg Get Bloomberg Tech weeklies in your inbox: - [Cyber Bulletin]( for coverage of the shadow world of hackers and cyber-espionage - [Game On]( for reporting on the video game business - [Power On]( for Apple scoops, consumer tech news and more - [Screentime]( for a front-row seat to the collision of Hollywood and Silicon Valley - [Soundbite]( for reporting on podcasting, the music industry and audio trends - [Q&AI]( for answers to all your questions about AI Follow Us Stay updated by saving our new email address Our email address is changing, which means youâll be receiving this newsletter from noreply@news.bloomberg.com. Hereâs how to update your contacts to ensure you continue receiving it: - Gmail: Open an email from Bloomberg, click the three dots in the top right corner, select âMark as important.â - Outlook: Right-click on Bloombergâs email address and select âAdd to Outlook Contacts.â - Apple Mail: Open the email, click on Bloombergâs email address, and select âAdd to Contactsâ or âAdd to VIPs.â - Yahoo Mail: Open an email from Bloomberg, hover over the email address, click âAdd to Contacts.â Like getting this newsletter? [Subscribe to Bloomberg.com]( for unlimited access to trusted, data-driven journalism and subscriber-only insights. Want to sponsor this newsletter?Â [Get in touch here](. You received this message because you are subscribed to Bloomberg's Tech Daily newsletter. If a friend forwarded you this message, [sign up here]( to get it in your inbox. [Unsubscribe]( [Bloomberg.com]( [Contact Us]( Bloomberg L.P. 731 Lexington Avenue, New York, NY 10022 [Ads Powered By Liveintent]( [Ad Choices](

Edit & Download HTML

Add To Favourites

EDM Keywords (145)

worse wednesday viewing valuable used us update uninitiated types trying tries trap trained touch top time thursday thing tested test team takeaway summarizing suing subscriber subscribed stumping students struggles startup sponsor soul solving solve site sheep share sets screentime say saving rya results researchers research required reporting record recent receiving received questions puzzle pugsley promising project professor produce problems problem presence precisely power podcasting player performance pentagon pastime past paper orange openai one newsletter need much models message measure means made llms linguistic license leaps know kind job internet ingesting improve humans hollywood hackers grouped group grid government good get game four found findings family ensure email dunking distractors developed designed demonstration daughter cusp coverage could content contacts connections confirming companies coming collision close clear cities chatbots changing challenging category categories brazil box bloomberg blocking bizarre asked answers amec alone ai afford added add ability 50

bloomberg.com

Bloomberg Technology

Follow domain to get weekly email update

Marketing emails from bloomberg.com

Sent On

07/12/2024

Sent On

06/12/2024

Sent On

08/11/2024

Sent On

08/11/2024

Sent On

07/11/2024

Sent On

03/11/2024

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

No.	Font Name
Subscribe Now

Stumping the AI

Email Preheader Text

EDM Keywords (145)

bloomberg.com

Marketing emails from bloomberg.com

Email Content Statistics

Font Used