×
AI

AI Models Face Collapse If They Overdose On Their Own Output 106

According to a new study published in Nature, researchers found that training AI models using AI-generated datasets can lead to "model collapse," where models produce increasingly nonsensical outputs over generations. "In one example, a model started with a text about European architecture in the Middle Ages and ended up -- in the ninth generation -- spouting nonsense about jackrabbits," writes The Register's Lindsay Clark. From the report: [W]ork led by Ilia Shumailov, Google DeepMind and Oxford post-doctoral researcher, found that an AI may fail to pick up less common lines of text, for example, in training datasets, which means subsequent models trained on the output cannot carry forward those nuances. Training new models on the output of earlier models in this way ends up in a recursive loop. In an accompanying article, Emily Wenger, assistant professor of electrical and computer engineering at Duke University, illustrated model collapse with the example of a system tasked with generating images of dogs. "The AI model will gravitate towards recreating the breeds of dog most common in its training data, so might over-represent the Golden Retriever compared with the Petit Basset Griffon Vendéen, given the relative prevalence of the two breeds," she said.

"If subsequent models are trained on an AI-generated data set that over-represents Golden Retrievers, the problem is compounded. With enough cycles of over-represented Golden Retriever, the model will forget that obscure dog breeds such as Petit Basset Griffon Vendeen exist and generate pictures of just Golden Retrievers. Eventually, the model will collapse, rendering it unable to generate meaningful content." While she concedes an over-representation of Golden Retrievers may be no bad thing, the process of collapse is a serious problem for meaningful representative output that includes less-common ideas and ways of writing. "This is the problem at the heart of model collapse," she said.
AI

Video Game Performers Will Go On Strike Over AI Concerns (apnews.com) 53

An anonymous reader quotes a report from the Associated Press: Hollywood's video game performers voted to go on strike Thursday, throwing part of the entertainment industry into another work stoppage after talks for a new contract with major game studios broke down over artificial intelligence protections. The strike -- the second for video game voice actors and motion capture performers under the Screen Actors Guild-American Federation of Television and Radio Artists -- will begin at 12:01 a.m. Friday. The move comes after nearly two years of negotiations with gaming giants, including divisions of Activision, Warner Bros. and Walt Disney Co., over a new interactive media agreement.

SAG-AFTRA negotiators say gains have been made over wages and job safety in the video game contract, but that the studios will not make a deal over the regulation of generative AI. Without guardrails, game companies could train AI to replicate an actor's voice, or create a digital replica of their likeness without consent or fair compensation, the union said. Fran Drescher, the union's president, said in a prepared statement that members would not approve a contract that would allow companies to "abuse AI." "Enough is enough. When these companies get serious about offering an agreement our members can live -- and work -- with, we will be here, ready to negotiate," Drescher said. [...]

The last interactive contract, which expired November 2022, did not provide protections around AI but secured a bonus compensation structure for voice actors and performance capture artists after an 11-month strike that began October 2016. That work stoppage marked the first major labor action from SAG-AFTRA following the merger of Hollywood's two largest actors unions in 2012. The video game agreement covers more than 2,500 "off-camera (voiceover) performers, on-camera (motion capture, stunt) performers, stunt coordinators, singers, dancers, puppeteers, and background performers," according to the union. Amid the tense interactive negotiations, SAG-AFTRA created a separate contract in February that covered indie and lower-budget video game projects. The tiered-budget independent interactive media agreement contains some of the protections on AI that video game industry titans have rejected.
"Eighteen months of negotiations have shown us that our employers are not interested in fair, reasonable AI protections, but rather flagrant exploitation," said Interactive Media Agreement Negotiating Committee Chair Sarah Elmaleh. The studios have not commented.
AI

iFixit CEO Takes Shots At Anthropic For 'Hitting Our Servers a Million Times In 24 Hours' (pcgamer.com) 48

Yesterday, iFixit CEO Kyle Wiens asked AI company Anthropic why it was clogging up their server bandwidth without permission. "Do you really need to hit our servers a million times in 24 hours?" Wiens wrote on X. "You're not only taking our content without paying, you're tying up our DevOps resources. Not cool." PC Gamer's Jacob Fox reports: Assuming Wiens isn't massively exaggerating, it's no surprise that this is "typing up our devops resources." A million "hits" per day would do it, and would certainly be enough to justify more than a little annoyance. The thing is, putting this bandwidth chugging in context only makes it more ridiculous, which is what Wiens is getting at. It's not just that an AI company is seemingly clogging up server resources, but that it's been expressly forbidden from using the content on its servers anyway.

There should be no reason for an AI company to hit the iFixit site because its terms of service state that "copying or distributing any Content, materials or design elements on the Site for any other purpose, including training a machine learning or AI model, is strictly prohibited without the express prior written permission of iFixit." Unless it wants us to believe it's not going to use any data it scrapes for these purposes, and it's just doing it for... fun?

Well, whatever the case, iFixit's Wiens decided to have some fun with it and ask Anthropic's own AI, Claude, about the matter, saying to Anthropic, "Don't ask me, ask Claude!" It seems that Claude agrees with iFixit, because when it's asked what it should do if it was training a machine learning model and found the above writing in its terms of service, it responded, in no uncertain terms, "Do not use the content." This is, as Wiens points out, something that could be seen if one simply accessed the terms of service.

AI

OpenAI To Launch 'SearchGPT' in Challenge To Google 31

OpenAI is launching an online search tool in a direct challenge to Google, opening up a new front in the tech industry's race to commercialise advances in generative artificial intelligence. From a report: The experimental product, known as SearchGPT [non-paywalled], will initially only be available to a small group of users, with the San Francisco-based company opening a 10,000-person waiting list to test the service on Thursday. The product is visually distinct from ChatGPT as it goes beyond generating a single answer by offering a rail of links -- similar to a search engine -- that allows users to click through to external websites.

[...] SearchGPT will "provide up-to-date information from the web while giving you clear links to relevant sources," according to OpenAI. The new search tool will be able to access sites even if they have opted out of training OpenAI's generative AI tools, such as ChatGPT.
Google

Google DeepMind's AI Systems Can Now Solve Complex Math Problems (technologyreview.com) 40

Google DeepMind has announced that its AI systems, AlphaProof and AlphaGeometry 2, have achieved silver medal performance at the 2024 International Mathematical Olympiad (IMO), solving four out of six problems and scoring 28 out of 42 possible points in a significant breakthrough for AI in mathematical reasoning. This marks the first time an AI system has reached such a high level of performance in this prestigious competition, which has long been considered a benchmark for advanced mathematical reasoning capabilities in machine learning.

AlphaProof, a system that combines a pre-trained language model with reinforcement learning techniques, demonstrated its new capability by solving two algebra problems and one number theory problem, including the competition's most challenging question. Meanwhile, AlphaGeometry 2 successfully tackled a complex geometry problem, Google wrote in a blog post. The systems' solutions were formally verified and scored by prominent mathematicians, including Fields Medal winner Prof Sir Timothy Gowers and IMO Problem Selection Committee Chair Dr Joseph Myers, lending credibility to the achievement.

The development of these AI systems represents a significant step forward in bridging the gap between natural language processing and formal mathematical reasoning, the company argued. By fine-tuning a version of Google's Gemini model to translate natural language problem statements into formal mathematical language, the researchers created a vast library of formalized problems, enabling AlphaProof to train on millions of mathematical challenges across various difficulty levels and topic areas. While the systems' performance is impressive, challenges remain, particularly in the field of combinatorics where both AI models were unable to solve the given problems. Researchers at Google DeepMind continue to investigate these limitations, the company said, aiming to further improve the systems' capabilities across all areas of mathematics.
AI

AI Video Generator Runway Trained On Thousands of YouTube Videos Without Permission (404media.co) 81

samleecole writes: A leaked document obtained by 404 Media shows company-wide effort at generative AI company Runway, where employees collected thousands of YouTube videos and pirated content for training data for its Gen-3 Alpha model. The model -- initially codenamed Jupiter and released officially as Gen-3 -- drew widespread praise from the AI development community and technology outlets covering its launch when Runway released it in June. Last year, Runway raised $141 million from investors including Google and Nvidia, at a $1.5 billion valuation.

The spreadsheet of training data viewed by 404 Media and our testing of the model indicates that part of its training data is popular content from the YouTube channels of thousands of media and entertainment companies, including The New Yorker, VICE News, Pixar, Disney, Netflix, Sony, and many others. It also includes links to channels and individual videos belonging to popular influencers and content creators, including Casey Neistat, Sam Kolder, Benjamin Hardman, Marques Brownlee, and numerous others.

Security

Cyber Firm KnowBe4 Hired a Fake IT Worker From North Korea (cyberscoop.com) 49

In a blog post on Tuesday, security firm KnowBe4 revealed that a remote software engineer hire was a North Korean threat actor using a stolen identity and AI-augmented images. "Detailing a seemingly thorough interview process that included background checks, verified references and four video conference-based interviews, KnowBe4 founder and CEO Stu Sjouwerman said the worker avoided being caught by using a valid identity that was stolen from a U.S.-based individual," reports CyberScoop. "The scheme was further enhanced by the actor using a stock image augmented by artificial intelligence." From the report: An internal investigation started when KnowBe4's InfoSec Security Operations Center team detected "a series of suspicious activities" from the new hire. The remote worker was sent an Apple laptop, which was flagged by the company on July 15 when malware was loaded onto the machine. The AI-filtered photo, meanwhile, was flagged by the company's Endpoint Detection and Response software. Later that evening, the SOC team had "contained" the fake worker's systems after he stopped responding to outreach. During a roughly 25-minute period, "the attacker performed various actions to manipulate session history files, transfer potentially harmful files, and execute unauthorized software," Sjouwerman wrote in the post. "He used a [single-board computer] raspberry pi to download the malware." From there, the company shared its data and findings with the FBI and with Mandiant, the Google-owned cyber firm, and came to the conclusion that the worker was a fictional persona operating from North Korea.

KnowBe4 said the fake employee likely had his workstation connected "to an address that is basically an 'IT mule laptop farm.'" They'd then use a VPN to work the night shift from where they actually reside -- in this case, North Korea "or over the border in China." That work would take place overnight, making it appear that they're logged on during normal U.S. business hours. "The scam is that they are actually doing the work, getting paid well, and give a large amount to North Korea to fund their illegal programs," Sjouwerman wrote. "I don't have to tell you about the severe risk of this." Despite the intrusion, Sjouwerman said "no illegal access was gained, and no data was lost, compromised, or exfiltrated on any KnowBe4 systems." He chalked up the incident to a threat actor that "demonstrated a high level of sophistication in creating a believable cover identity" and identified "weaknesses in the hiring and background check processes."

AI

Open Source AI Better for US as China Will Steal Tech Anyway, Zuckerberg Argues (fb.com) 37

Meta CEO Mark Zuckerberg has advocated for open-source AI development, asserting it as a strategic advantage for the United States against China. In a blog post, Zuckerberg argued that closing off AI models would not effectively prevent Chinese access, given their espionage capabilities, and would instead disadvantage U.S. allies and smaller entities. He writes: Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult. It seems most likely that a world of only closed models results in a small number of big companies plus our geopolitical adversaries having access to leading models, while startups, universities, and small businesses miss out on opportunities. Plus, constraining American innovation to closed development increases the chance that we don't lead at all. Instead, I think our best strategy is to build a robust open ecosystem and have our leading companies work closely with our government and allies to ensure they can best take advantage of the latest advances and achieve a sustainable first-mover advantage over the long term.
AI

The AI Job Interviewer Will See You Now 82

AI is increasingly being employed in job interviews across China and India, marking a significant shift in recruitment practices in the region. This follows a similar practice making inroads in the U.S. Rest of World adds: A 2023 survey of 1,000 human-resources workers by the U.S. firm ResumeBuilder found that 10% of companies were already using AI in the hiring process, and another 30% planned to start the following year. The research firm Gartner listed natural-language chatbots as one of 2023's key innovations for the recruiting industry, designating the technology as experimental but promising. Companies like Meituan, Siemens, and Estee Lauder are using AI-powered interviews, with platforms such as MoSeeker, Talently.ai, and Instahyre leading the charge in AI recruitment solutions.
Google

Google's Exclusive Reddit Access (404media.co) 43

Google is now the only search engine that can surface results from Reddit, making one of the web's most valuable repositories of user generated content exclusive to the internet's already dominant search engine. 404 Media: If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn't rely on Google's indexing and search Reddit by using "site:reddit.com," you will not see any results from the last week.

DuckDuckGo is currently turning up seven links when searching Reddit, but provides no data on where the links go or why, instead only saying that "We would like to show you a description here but the site won't allow us." Older results will still show up, but these search engines are no longer able to "crawl" Reddit, meaning that Google is the only search engine that will turn up results from Reddit going forward. Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google. The news shows how Google's near monopoly on search is now actively hindering other companies' ability to compete at a time when Google is facing increasing criticism over the quality of its search results.
The news follows Google signing a $60 million deal with Reddit early this year to use the social network's content to train its LLMs.
AI

OpenAI Could Lose $5 Billion This Year 29

OpenAI has built one of the fastest-growing businesses in history. It may also be one of the costliest to run. The Information: The ChatGPT maker could lose as much as $5 billion this year [non-paywalled source], according to an analysis by The Information, based on previously undisclosed internal financial data and people involved in the business. [...] On the cost side, OpenAI as of March was on track to spend nearly $4 billion this year on renting Microsoft's servers to power ChatGPT and its underlying LLMs (otherwise known as inference costs), said a person with direct knowledge of the spending. In addition to running ChatGPT, OpenAI's training costs -- including paying for data -- could balloon to as much as $3 billion this year. Last year, OpenAI ramped up the training of new AI faster than it had originally planned, said a person with direct knowledge of the decision. So while the company earlier planned to spend about $800 million on such costs, it ended up spending considerably more, this person said.
AI

AI Adoption Creeps as Enterprises Wrestle With Costs and Use Cases 32

Global enterprises are grappling with the complexities of AI adoption, according to hundreds of top industry executives at a recent private software conference hosted by UBS. UBS adds: We heard:
1. The data points from a private GPU cloud infrastructure provider were a very bullish readthrough to GPU demand, Microsoft's AI infra capabilities and the ramp of enterprise/software demand for training and inference compute.
2. One F500 customer was at 1% Office Copilot roll-out, moving to perhaps 2% in a year as they a) fine-tune internal best practices and b) negotiate to get Microsoft much lower on price.
3. One private flagged "copilot chaos," with customers having to choose between AI copilots from seemingly every tech firm (we wonder if this creates pricing pressure and/or an evaluation slowdown).
4. Popular use cases are AI apps for internal, domain-specific tasks (simple workflow automation).
5. Little evidence of AI resulting in customer headcount cuts, but headcount reduction with 3rd-party managed services providers and (India-based) SI firms.
AI

Mark Zuckerberg Imagines Content Creators Making AI Clones of Themselves (techcrunch.com) 75

An anonymous reader quotes a report from TechCrunch: Content creators are busy people. Most spend more than 20 hours a week creating new content for their respective corners of the web. That doesn't leave much time for audience engagement. But Mark Zuckerberg, Meta's CEO, thinks that AI could solve this problem. In an interview with internet personality Rowan Cheung, Zuckerberg laid out his vision for a future in which creators have their own bots, of sorts, that capture their personalities and "business objectives." Creators will offload some community outreach to these bots to free up time for other, presumably more important tasks, Zuckerberg says.

"I think there's going to be a huge unlock where basically every creator can pull in all their information from social media and train these systems to reflect their values and their objectives and what they're trying to do, and then people can can interact with that," Zuckerberg said. "It'll be almost like this artistic artifact that creators create that people can kind of interact with in different ways." [...] It's tough to imagine creators putting trust in the hands of flawed AI bots to interact with their fans. In the interview, Zuckerberg acknowledges that Meta has to "mitigate some of the concerns" around its use of generative AI and win users' trust over the long term. This is especially true as some of Meta's AI training practices are actively driving creators away from its platforms.

China

China Is Getting Secretive About Its Supercomputers 28

For decades, American and Chinese scientists collaborated on supercomputers. But Chinese scientists have become more secretive as the U.S. has tried to hinder China's technological progress, and they have stopped participating altogether in a prominent international supercomputing forum. From a report: The withdrawal marked the end of an era and created a divide that Western scientists say will slow the development of AI and other technologies as countries pursue separate projects. The new secrecy also makes it harder for the U.S. government to answer a question it deems essential to national security: Does the U.S. or China have faster supercomputers? Some academics have taken it upon themselves to hunt for clues about China's supercomputing progress, scrutinizing research papers and cornering Chinese peers at conferences.

Supercomputers have become central to the U.S.-China technological Cold War because the country with the faster supercomputers can also hold an advantage in developing nuclear weapons and other military technology. "If the other guy can use a supercomputer to simulate and develop a fighter jet or weapon 20% or even 1% better than yours in terms of range, speed and accuracy, it's going to target you first, and then it's checkmate," said Jimmy Goodrich, a senior adviser for technology analysis at Rand, a think tank. The forum that China recently stopped participating in is called the Top500, which ranks the world's 500 fastest supercomputers. While the latest ranking, released in June, says the world's three fastest computers are in the U.S., the reality is probably different.
Programming

'GitHub Is Starting To Feel Like Legacy Software' (www.mistys-internet.website) 82

Developer and librarian Misty De Meo, writing about her frustrating experience using GitHub: To me, one of GitHub's killer power user features is its blame view. git blame on the commandline is useful but hard to read; it's not the interface I reach for every day. GitHub's web UI is not only convenient, but the ease by which I can click through to older versions of the blame view on a line by line basis is uniquely powerful. It's one of those features that anchors me to a product: I stopped using offline graphical git clients because it was just that much nicer.

The other day though, I tried to use the blame view on a large file and ran into an issue I don't remember seeing before: I just couldn't find the line of code I was searching for. I threw various keywords from that line into the browser's command+F search box, and nothing came up. I was stumped until a moment later, while I was idly scrolling the page while doing the search again, and it finally found the line I was looking for. I realized what must have happened. I'd heard rumblings that GitHub's in the middle of shipping a frontend rewrite in React, and I realized this must be it. The problem wasn't that the line I wanted wasn't on the page -- it's that the whole document wasn't being rendered at once, so my browser's builtin search bar just couldn't find it. On a hunch, I tried disabling JavaScript entirely in the browser, and suddenly it started working again. GitHub is able to send a fully server-side rendered version of the page, which actually works like it should, but doesn't do so unless JavaScript is completely unavailable.

[...] The corporate branding, the new "AI-powered developer platform" slogan, makes it clear that what I think of as "GitHub" -- the traditional website, what are to me the core features -- simply isn't Microsoft's priority at this point in time. I know many talented people at GitHub who care, but the company's priorities just don't seem to value what I value about the service. This isn't an anti-AI statement so much as a recognition that the tool I still need to use every day is past its prime. Copilot isn't navigating the website for me, replacing my need to the website as it exists today. I've had tools hit this phase of decline and turn it around, but I'm not optimistic. It's still plenty usable now, and probably will be for some years to come, but I'll want to know what other options I have now rather than when things get worse than this.

AI

AI Is Already Taking Jobs In the Video Game Industry (wired.com) 89

merbs writes: Video games -- and the people who make them -- are in trouble. An estimated 10,500 people in the industry were laid off in 2023 alone. This year, layoffs in the nearly $200 billion sector have only gotten worse, with studios axing what is believed to be 11,000 more, and counting. Microsoft, home of the Xbox and parent company to several studios, including Activision Blizzard, shuttered Tango Gameworks and Alpha Dog Games in May. All the while, generative AI systems built by OpenAI and its competitors have been seeping into nearly every industry, dismantling whole careers along the way.

But gaming might be the biggest industry AI stands poised to conquer. Its economic might has long since eclipsed Hollywood's, while its workforce remains mostly nonunion. A recent survey from the organizers of the Game Developers Conference found that 49 percent of the survey's more than 3,000 respondents said their workplace used AI, and four out of five said they had ethical concerns about its use. "It's here. It's definitely here, right now," says Violet, a game developer, technical artist, and a veteran of the industry who has worked on AAA games for over a decade. "I think everyone's seen it get used, and it's a matter of how and to what degree. The genie is out of the bottle, Pandora's box is opened."
The story adds: "At Activision, it was the same. 'A lot of 2D artists were laid off,' Noah says. The department was slashed. 'Remaining concept artists,' he claims, 'were then forced to use AI to aid in their work.' Employees, according to Noah, have been made to sign up for AI trainings, and its use is being promoted throughout the org."
Businesses

FTC Launches Probe Into 'Surveillance Pricing' 48

smooth wombat writes: The FTC has sent mandatory notices for information to eight companies it says engages in "surveillance pricing", the process by which prices are rapidly changed using AI based on data about customer behavior and characteristics. This process, the FTC claims, allows companies to charge different customers different prices for the same product.

The list includes Mastercard, JPMorgan Chase, Accenture and consulting giant McKinsey. It also includes software firm Task, which counts McDonald's and Starbucks as clients; Revionics, which works with Home Depot, Tractor Supply and grocery chain Hannaford; Bloomreach, which services FreshDirect, Total Wine and Puma; and Pros, which was named Microsoft's internet service vendor of the year this year. "Firms that harvest Americans' personal data can put people's privacy at risk," FTC Chair Lina Khan said in a news release. "Now firms could be exploiting this vast trove of personal information to charge people higher prices."
Facebook

Meta Warns EU Regulatory Efforts Risk Bloc Missing Out on AI Advances 35

Meta has warned that the EU's approach to regulating AI is creating the "risk" that the continent is cut off from accessing cutting-edge services, while the bloc continues its effort to rein in the power of Big Tech. From a report: Rob Sherman, the social media group's deputy privacy officer and vice-president of policy, confirmed a report that it had received a request from the EU's privacy watchdog to voluntarily pause the training of its future AI models on data in the region. He told the Financial Times this was in order to give local regulators time to "get their arms around the issue of generative AI." While the Facebook owner is adhering to the request, Sherman said such moves were leading to a "gap in the technologies that are available in Europe versus" the rest of the world. He added that, with future and more advanced AI releases, "it's likely that availability in Europe could be impacted." Sherman said: "If jurisdictions can't regulate in a way that enables us to have clarity on what's expected, then it's going to be harder for us to offer the most advanced technologies in those places ... it is a realistic outcome that we're worried about."
Iphone

Apple Moves Forward With Foldable iPhone (theinformation.com) 77

Apple is advancing its plans for a foldable iPhone, with potential release as early as 2026, The Information reported Tuesday. The iPhone-maker has begun engaging with Asian suppliers for component production, the report added. The proposed device is said to feature a clamshell design, reminiscent of Samsung's Galaxy Z Flip series.

The company faces considerable technical hurdles, including display crease issues and achieving optimal device thickness. Despite these challenges, the assignment of an internal codename, V68, suggests the project has progressed beyond the conceptual stage, the report added.
AI

Meta Launches Powerful Open-Source AI Model Llama 3.1 20

Meta has released Llama 3.1, its largest open-source AI model to date, in a move that challenges the closed approaches of competitors like OpenAI and Google. The new model, boasting 405 billion parameters, is claimed by Meta to outperform GPT-4o and Claude 3.5 Sonnet on several benchmarks, with CEO Mark Zuckerberg predicting that Meta AI will become the most widely used assistant by year-end.

Llama 3.1, which Meta says was trained using over 16,000 Nvidia H100 GPUs, is being made available to developers through partnerships with major tech companies including Microsoft, Amazon, and Google, potentially reducing deployment costs compared to proprietary alternatives. The release includes smaller versions with 70 billion and 8 billion parameters, and Meta is introducing new safety tools to help developers moderate the model's output. While Meta isn't disclosing what all data it used to train its models, the company confirmed it used synthetic data to enhance the model's capabilities. The company is also expanding its Meta AI assistant, powered by Llama 3.1, to support additional languages and integrate with its various platforms, including WhatsApp, Instagram, and Facebook, as well as its Quest virtual reality headset.

Slashdot Top Deals