Newsletter Subject

Sign our petition to OpenAI, Microsoft, and Google to ask for AI transparency

From

mozilla.com

Email Address

mozilla@eml.mozilla.org

Sent On

Thu, Mar 7, 2024 06:11 PM

Email Preheader Text

AI tools were trained on scraped content full of toxicity and hate speech. Here's how we can fix it.

Desktop View
HTML
Text
Mobile View

Go Premium to Unlock

Subscribe Now

AI tools were trained on scraped content full of toxicity and hate speech. Here's how we can fix it. Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í Í [Mozilla ]( [â¤ï¸ Mozilla â¤ï¸]( [Donate]( Hello, ChatGPT and other generative AI tools were trained on a huge dataset full of toxic content and hate speech, according to new research by Mozilla.1 The huge data set â totaling 9.5 million gigabytes, and assembled by the small non-profit organisation Common Crawl â is the original data source for so many large language models (LLMs) that make up the AI landscape of today's internet. And now OpenAI, Microsoft and Google are rolling out AI tools to be used by people worldwide, built on scraped data from some of the worst parts of the internet. These tools are both biassed because they’re trained on toxic content, and opaque because we don’t know exactly what content they were trained on. Almost every other product we use or consume on a daily basis has safety warning labels or an ingredients list. As customers, why shouldn’t we have the right to know what’s inside the AI tools we are using? Together, let’s use our power as consumers and put the pressure on OpenAI, Google, and Microsoft to tell us what's inside their AI. [Sign Mozilla’s petition and tell OpenAI, Google, and Microsoft to provide transparency about the data used to train their AI tools!]( [Sign Now →]( Common Crawl has been crawling and archiving the internet to train AI, while virtually undetected. But today, itâs the most influential nonprofit youâve never heard of. We’re at an inflection point for AI, and Mozilla’s investigation has uncovered structural flaws in the way Common Crawl is currently used to train AI models. The main problems are: - Common Crawl's data is only representing a fraction of the internet: it primarily captures English language content, which means AI tools trained on it are only helpful for a narrow part of the population and have a biassed perspective. - Common Crawl's data contains hate speech and explicit content that is harmful when used to train consumer products without care. - Common Crawl hands its dataset to companies and then walks away. That means the companies like OpenAI, Google, and Microsoft are accountable for explaining how they filtered Common Crawl's data, what effect the data has on its AI products, and what measures they take to address harms from biassed and explicit datasets. When it comes to building trustworthy AI products, better is possible. We need to know the totality of how AI is trained so we understand its risks and limitations â and, most importantly, what needs to be improved to make it trustworthy and helpful for everyone on the internet. Better must start with more transparency from the big tech companies responsible for training AI models. [Tell OpenAI, Google, and Microsoft to provide transparency about the data used to train their AI tools.]( [Sign Now →]( Thank you for all you do for the internet. Christian BockHead of Supporter Engagement Mozilla --------------------------------------------------------------- More information: 1. Mozilla Foundation: [Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI](. Written by Stefan Baack and Mozilla Insights. Published 6 February 2024. Connect with us [Twitter]( [YouTube]( [Instagram]( Thanks for reading! You’re receiving this email because you subscribed to Mozilla News. If you no longer want to receive our emails, we’ll understand if you [unsubscribe](. You can also [update your email preferences]( at any time. [â¤ï¸ Mozilla â¤ï¸]( [Donate]( 149 New Montgomery St, 4th Floor, San Francisco, CA 94105 USA [Legal]( • [Privacy]( [Unsubscribe](

Edit & Download HTML

Add To Favourites

EDM Keywords (71)

used use unsubscribe understand trustworthy transparency trained train toxicity totality tools today thank take subscribed sign rolling risks right representing receiving receive reading put product price pressure power possible population petition opaque needs need mozilla microsoft measures means make limitations know investigation internet inside improved importantly impact helpful harmful google fraction fix explaining everyone emails email effect dataset data customers crawling content consumers consume companies comes biassed assembled ask archiving ai accountable

mozilla.com

Mozilla

Follow domain to get weekly email update

Marketing emails from mozilla.com

Sent On

29/02/2024

Sent On

20/02/2024

Sent On

14/02/2024

Sent On

31/01/2024

Sent On

26/01/2024

Sent On

10/01/2024

Email Content Statistics

Subscribe Now

Subject Line Length

Data shows that subject lines with 6 to 10 words generated 21 percent higher open rate.

Subscribe Now

Average in this category

Subscribe Now

Number of Words

The more words in the content, the more time the user will need to spend reading. Get straight to the point with catchy short phrases and interesting photos and graphics.

Subscribe Now

Average in this category

Subscribe Now

Number of Images

More images or large images might cause the email to load slower. Aim for a balance of words and images.

Subscribe Now

Average in this category

Subscribe Now

Time to Read

Longer reading time requires more attention and patience from users. Aim for short phrases and catchy keywords.

Subscribe Now

Average in this category

Subscribe Now

Predicted open rate

Subscribe Now

Spam Score

Spam score is determined by a large number of checks performed on the content of the email. For the best delivery results, it is advised to lower your spam score as much as possible.

Subscribe Now

Flesch reading score

Flesch reading score measures how complex a text is. The lower the score, the more difficult the text is to read. The Flesch readability score uses the average length of your sentences (measured by the number of words) and the average number of syllables per word in an equation to calculate the reading ease. Text with a very high Flesch reading ease score (about 100) is straightforward and easy to read, with short sentences and no words of more than two syllables. Usually, a reading ease score of 60-70 is considered acceptable/normal for web copy.

Subscribe Now

Technologies

What powers this email? Every email we receive is parsed to determine the sending ESP and any additional email technologies used.

Subscribe Now

Email Size (not include images)

No.	Font Name
Subscribe Now

Sign our petition to OpenAI, Microsoft, and Google to ask for AI transparency

Email Preheader Text

EDM Keywords (71)

mozilla.com

Marketing emails from mozilla.com

Email Content Statistics

Font Used