AI

Study Accuses LM Arena of Helping Top AI Labs Game Its Benchmark (techcrunch.com) 10

An anonymous reader shares a report: A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals.

According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. This made it easier for these companies to achieve a top spot on the platform's leaderboard, though the opportunity was not afforded to every firm, the authors say.

"Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others," said Cohere's VP of AI research and co-author of the study, Sara Hooker, in an interview with TechCrunch. "This is gamification."
Further reading: Meta Got Caught Gaming AI Benchmarks.
AI

Duolingo Doubles Its Language Courses Thanks To AI 51

Just a day after announcing its shift to an "AI-first" strategy -- which includes phasing out contract workers in favor of automation -- Duolingo revealed it is more than doubling its course offerings by launching 148 new language courses. The Verge reports: The company said today that it's launching 148 new language courses. "This launch makes Duolingo's seven most popular non-English languages -- Spanish, French, German, Italian, Japanese, Korean, and Mandarin -- available to all 28 supported user interface (UI) languages, dramatically expanding learning options for over a billion potential learners worldwide," the company writes.

Duolingo says that building one new course historically has taken "years," but the company was able to build this new suite of courses more quickly "through advances in generative AI, shared content systems, and internal tooling." The new approach is internally called "shared content," and the company says it allows employees to make a base course and quickly customize it for "dozens" of different languages.
"Now, by using generative AI to create and validate content, we're able to focus our expertise where it's most impactful, ensuring every course meets Duolingo's rigorous quality standards," Duolingo's senior director of learning design, Jessie Becker, says in a statement.
Businesses

Microsoft Puts Brakes on AI Spending as Profit Increases 18% 7

After 10 consecutive quarters of rising AI-related investment, Microsoft has put on the brakes, spending over $1 billion less than the previous quarter (source paywalled; alternative source). Despite the slight slowdown, Microsoft posted stronger-than-expected results with $70 billion in revenue and $25.8 billion in profit. The New York Times reports: In the first three months of 2025, Microsoft spent $21.4 billion on capital expenses, down more than $1 billion from the previous quarter. The company is still on track to spend more than $80 billion on capital expenses in the current fiscal year, which ends in June. But the pullback, though slight, is an indication that the tech industry's appetite for spending on A.I. is not limitless.

Overall, Microsoft's results showed unexpected strength in its business. Sales surpassed $70 billion, up 13 percent from the same period a year earlier. Profit rose to $25.8 billion, up 18 percent. The results far surpassed Wall Street's expectations. "Cloud and A.I. are the essential inputs for every business to expand output, reduce costs, and accelerate growth," Satya Nadella, Microsoft's chief executive, said in a statement.
Google

Google Funding Electrician Training As AI Power Crunch Intensifies 34

Google is investing in training over 100,000 new U.S. electricians through a $10 million grant, aiming to address a critical labor shortage driven by AI-fueled data center growth and rising electricity demands. Reuters reports: A lack of access to power supplies has become the biggest problem for giant technology companies racing to develop artificial intelligence in energy-intensive data centers, which are driving up U.S. electricity demand after nearly 20 years of stagnation. The situation has led President Donald Trump to declare a national energy emergency aimed at speeding up permitting for generation and transmission projects.

Google's funding, which includes a $10 million grant for electrical worker nonprofits, is the latest in a series of recent moves by giant technology companies to alleviate power project backlogs and electricity shortfalls across the United States. [...] The Google grant will be used for electrician apprenticeship programs and the training of existing workforce through organizations, including the Electrical Training Alliance, International Brotherhood of Electrical Workers and the National Electrical Contractors Association. It could increase the pipeline of electrical workers by 70% by the end of the decade, the company said.
"This initiative with Google and our partners at NECA and the Electrical Training Alliance will bring more than 100,000 sorely needed electricians into the trade to meet the demands of an AI-driven surge in data centers and power generation," said Kenneth Cooper, international president of the IBEW labor union.
Programming

Microsoft CEO Says Up To 30% of the Company's Code Was Written by AI (techcrunch.com) 149

Microsoft CEO Satya Nadella said that 20%-30% of code inside the company's repositories was "written by software" -- meaning AI -- during a fireside chat with Meta CEO Mark Zuckerberg at Meta's LlamaCon conference on Tuesday. From a report: Nadella gave the figure after Zuckerberg asked roughly how much of Microsoft's code is AI-generated today. The Microsoft CEO said the company was seeing mixed results in AI-generated code across different languages, with more progress in Python and less in C++.
Wikipedia

Wikipedia To Use AI (wikimediafoundation.org) 40

Wikipedia will employ AI to enhance the work of its editors and volunteers, it said Wednesday, also asserting that it has no plans to replace those human roles. The Wikimedia Foundation plans to implement AI specifically for automating tedious tasks, improving information discovery, facilitating translations, and supporting new volunteer onboarding, it said.
AI

Gen AI Is Not Replacing Jobs Or Hurting Wages At All, Say Economists 108

An anonymous reader quotes a report from The Register: Instead of depressing wages or taking jobs, generative AI chatbots like ChatGPT, Claude, and Gemini have had almost no wage or labor impact so far -- a finding that calls into question the huge capital expenditures required to create and run AI models. In a working paper released earlier this month, economists Anders Humlum and Emilie Vestergaard looked at the labor market impact of AI chatbots on 11 occupations, covering 25,000 workers and 7,000 workplaces in Denmark in 2023 and 2024.

Many of these occupations have been described as being vulnerable to AI: accountants, customer support specialists, financial advisors, HR professionals, IT support specialists, journalists, legal professionals, marketing professionals, office clerks, software developers, and teachers. Yet after Humlum, assistant professor of economics at the Booth School of Business, University of Chicago, and Vestergaard, a PhD student at the University of Copenhagen, analyzed the data, they found the labor and wage impact of chatbots to be minimal. "AI chatbots have had no significant impact on earnings or recorded hours in any occupation," the authors state in their paper.

The report should concern the tech industry, which has hyped AI's economic potential while plowing billions into infrastructure meant to support it. Early this year, OpenAI admitted that it loses money per query even on its most expensive enterprise SKU, while companies like Microsoft and Amazon are starting to pull back on their AI infrastructure spending in light of low business adoption past a few pilots. The problem isn't that workers are avoiding generative AI chatbots -- quite the contrary. But they simply aren't yet equating to actual economic benefits.
"The adoption of these chatbots has been remarkably fast," Humlum told The Register. "Most workers in the exposed occupations have now adopted these chatbots. Employers are also shifting gears and actively encouraging it. But then when we look at the economic outcomes, it really has not moved the needle."

Humlum said while there are gains and time savings to be had, "there's definitely a question of who they really accrue to. And some of it could be the firms -- we cannot directly look at firm profitability. Some of it could also just be that you save some time on existing tasks, but you're not really able to expand your output and therefore earn more. So it's like it saves you time writing emails. But if you cannot really take on more work or do something else that is really valuable, then that will put a damper on how much we should actually expect those time savings to affect your earning ability, your total hours, your wages."

"In terms of economic outcomes, when we're looking at hard metrics -- in the administrative labor market data on earnings, wages -- these tools have really not made a difference so far," said Humlum. "So I think that that puts in some sense an upper bound on what return we should expect from these tools, at least in the short run. My general conclusion is that any story that you want to tell about these tools being very transformative, needs to contend with the fact that at least two years after [the introduction of AI chatbots], they've not made a difference for economic outcomes."
Android

Google Play Sees 47% Decline In Apps Since Start of Last Year (techcrunch.com) 69

Google Play's app marketplace has seen a dramatic 47% drop in available apps -- from 3.4 million to 1.8 million -- since the start of 2024. An analysis by app intelligence provider Appfigures attributes the decline to stricter quality standards, expanded human reviews, and increased enforcement against low-quality and deceptive apps. TechCrunch reports: In July 2024, Google announced it would raise the minimum quality requirements for apps, which may have impacted the number of available Play Store app listings.

Instead of only banning broken apps that crashed, wouldn't install, or run properly, the company said it would begin banning apps that demonstrated "limited functionality and content." That included static apps without app-specific features, such as text-only apps or PDF file apps. It also included apps that provided little content, like those that only offered a single wallpaper. Additionally, Google banned apps that were designed to do nothing or have no function, which may have been tests or other abandoned developer efforts.

Reached for comment, Google confirmed that its new policies were factors here, which also included an expanded set of verification requirements, required app testing for new personal developer accounts, and expanded human reviews to check for apps that try to deceive or defraud users. In addition, the company pointed to other 2024 investments in AI for threat detection, stronger privacy policies, improved developer tools, and more. As a result, Google prevented 2.36 million policy-violating apps from being published on its Play Store and banned more than 158,000 developer accounts that had attempted to publish harmful apps, it said.
TechCrunch also notes that a new trader status rule, which went into effect in the EU this February, could be another contributing factor. It requires developers to display their names and addresses in their app listings, and failure to comply would see their apps removed from EU app stores.
AI

OpenAI's o3 Model Beats Master-Level Geoguessr Player 32

In a blog post yesterday, Master I-ranked human GeoGuessr player Sam Patterson said that OpenAI's o3 model outscored him in a head-to-head match, "correctly identifying all five countries and twice landing within a few hundred meters." Geoguessing is a game -- most popularly known through the platform GeoGuessr -- where players are dropped into a random location in Google Street View and must figure out where in the world they are using only visual clues from the environment. With the release of its newest AI models, o3 and o4-mini, OpenAI now does a surprisingly good job of analyzing uploaded images to determine their locations using nothing but subtle visual clues.

"Even when I embedded fake GPS coordinates in the image EXIF, the model ignored the spoof and still pinpointed the real locations, showing its performance comes from visual reasoning and on-the-fly web sleuthing -- not hidden metadata," says Patterson. From the post: I notice that it often does a lot of unnecessary and repetitive cropping, and will sometimes spend way too much time on something unimportant. A human is very good at knowing what matters, and o3 is less knowledgeable about what things it should focus on. It got distracted by advertising multiple times. However, most of what it says about things like signs and road lines appears to be accurate, or at least close enough to truth that they meaningfully add up. Given the end result of these excellent guesses, it seems to arrive at the guesses from that information.

If it's using other information to arrive at the guess, then it's not metadata from the files, but instead web search. It seems likely that in the Austria round, the web search was meaningful, since it mentioned the website named the town itself. It appeared less meaningful in the Ireland round. It was still very capable in the rounds without search.

So to put a bow on this:
- The o3 model isn't smoke and mirrors, tricking us by only using EXIF data. It's at a comparable Geoguessr skill level to Master I or better players now (at least according to my own ~20 or so rounds of testing).
- Humans still hold a big edge in decision time -- most of my guesses were 4 min.
- Spoofing EXIF data doesn't throw off the model.

Whether you view this as dystopian or as a technological marvel -- or both -- you can't claim it's a parlor trick.
The Almighty Buck

Mastercard Gives AI Agents Ability To Shop Online for You (financialpost.com) 49

Mastercard is working with Microsoft and other leading AI companies to give AI agents the ability to shop online and make payments on behalf of consumers. From a report: Under the new program, a shopper could prompt an AI agent -- Microsoft's Copilot, for example -- to search for a pair of yellow running shoes in a particular size.

The agent would then search and offer the customer options, and then be able to make the purchase while also recommending the best way to pay, Mastercard said in a statement Tuesday.

Firefox

Firefox Finally Delivers Tab Groups Feature (mozilla.org) 47

Firefox has launched its long-awaited tab groups feature, responding to the most upvoted request in Mozilla Connect's three-year history. The feature allows users to organize tabs by name or color through a drag-and-drop interface.

Mozilla is now developing an AI-powered "smart tab groups" feature that automatically suggests organization based on open tabs. Unlike competitors, the company said, Firefox processes this data locally, keeping tab information on the user's device rather than sending it to cloud servers.
Programming

AI-Generated Code Creates Major Security Risk Through 'Package Hallucinations' (arstechnica.com) 34

A new study [PDF] reveals AI-generated code frequently references non-existent third-party libraries, creating opportunities for supply-chain attacks. Researchers analyzed 576,000 code samples from 16 popular large language models and found 19.7% of package dependencies -- 440,445 in total -- were "hallucinated."

These non-existent dependencies exacerbate dependency confusion attacks, where malicious packages with identical names to legitimate ones can infiltrate software. Open source models hallucinated at nearly 22%, compared to 5% for commercial models. "Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users," said lead researcher Joseph Spracklen. Alarmingly, 43% of hallucinations repeated across multiple queries, making them predictable targets.
Privacy

India Court Orders Proton Mail Block On Security Grounds (livelaw.in) 20

The Karnataka High Court on Tuesday directed India's government to block Switzerland-based email service Proton Mail, citing national security concerns and law enforcement challenges. Justice M Nagaprasanna ordered authorities to initiate proceedings under Section 69A of the Information Technology Act to ban the service, while mandating immediate blocking of "offending URLs" until final decisions are made.

The ruling followed a petition from M Moser Design Associates India, which claimed its female employees were targeted with obscene emails containing "AI-generated deepfake images" sent via Proton Mail. Petitioners argued Proton Mail operates servers outside India, making it inaccessible to law enforcement. The court noted several bomb threats to Indian schools were sent using the service, which has already been banned in Russia and Saudi Arabia. Additional Solicitor General Aravind Kamath, representing the government, said authorities would comply with the court's direction.
AI

Reddit Issuing 'Formal Legal Demands' Against Researchers Who Conducted Secret AI Experiment on Users 36

An anonymous reader shares a report: Reddit's top lawyer, Ben Lee, said the company is considering legal action against researchers from the University of Zurich who ran what he called an "improper and highly unethical experiment" by surreptitiously deploying AI chatbots in a popular debate subreddit. The University of Zurich told 404 Media that the experiment results will not be published and said the university is investigating how the research was conducted.

As we reported Monday, researchers at the University of Zurich ran an "unauthorized" and secret experiment on Reddit users in the r/changemyview subreddit in which dozens of AI bots engaged in debates with users about controversial issues. In some cases, the bots generated responses which claimed they were rape survivors, worked with trauma patients, or were Black people who were opposed to the Black Lives Matter movement. The researchers used a separate AI to mine the posting history of the people they were responding to in an attempt to determine personal details about them that they believed would make their bots more effective, such as their age, race, gender, location, and political beliefs.
AI

OpenAI-Microsoft Alliance Fractures as AI Titans Chart Separate Paths (wsj.com) 14

The once-celebrated partnership between OpenAI's Sam Altman and Microsoft's Satya Nadella is deteriorating amid fundamental disagreements over computing resources, model access, and AI capabilities, according to WSJ. The relationship that Altman once called "the best partnership in tech" has grown strained as both companies prepare for independent futures.

Tensions center on several critical areas: Microsoft's provision of computing power, OpenAI's willingness to share model access, and conflicting views on achieving humanlike intelligence. Altman has expressed confidence OpenAI can build models with humanlike intelligence soon -- a milestone Nadella publicly dismissed as "nonsensical benchmark hacking" during a February podcast.

The companies retain significant leverage over each other. Microsoft can block OpenAI's conversion to a for-profit entity, potentially costing the startup billions if not completed this year. Meanwhile, OpenAI's board can trigger contract clauses preventing Microsoft from accessing its most advanced technology.

After Altman's brief ouster in 2023 -- dubbed "the blip" within OpenAI -- Nadella pursued an "insurance policy" by hiring DeepMind co-founder Mustafa Suleyman for $650 million to develop competing models. The personal relationship has also cooled, with the executives now communicating primarily through scheduled weekly calls rather than frequent text exchanges.
AI

Duolingo Will Replace Contract Workers With AI 70

According to an email posted on Duolingo's LinkedIn, the language learning app will "gradually stop using contractors to do work that AI can handle." Co-founder and CEO Luis von Ahn also said the company will be "AI-first." The Verge reports: According to von Ahn, being "AI-first" means the company will "need to rethink much of how we work" and that "making minor tweaks to systems designed for humans won't get us there." As part of the shift, the company will roll out "a few constructive constraints," including the changes to how it works with contractors, looking for AI use in hiring and in performance reviews, and that "headcount will only be given if a team cannot automate more of their work."

von Ahn says that "Duolingo will remain a company that cares deeply about its employees" and that "this isn't about replacing Duos with AI." Instead, he says that the changes are "about removing bottlenecks" so that employees can "focus on creative work and real problems, not repetitive tasks."

"AI isn't just a productivity boost," von Ahn says. "It helps us get closer to our mission. To teach well, we need to create a massive amount of content, and doing that manually doesn't scale. One of the best decisions we made recently was replacing a slow, manual content creation process with one powered by AI. Without AI, it would take us decades to scale our content to more learners. We owe it to our learners to get them this content ASAP."
AI

OpenAI Upgrades ChatGPT Search With Shopping Features (techcrunch.com) 29

OpenAI has upgraded ChatGPT's search tool to include shopping features, allowing users to receive personalized product recommendations, view images and reviews, and access direct purchase links using natural language queries. TechCrunch reports: When ChatGPT users search for products, the chatbot will now offer a few recommendations, present images and reviews for those items, and include direct links to webpages where users can buy the products. OpenAI says users can ask hyper-specific questions in natural language and receive customized results. To start, OpenAI is experimenting with categories including fashion, beauty, home goods, and electronics. OpenAI is rolling out the feature in the default AI model for ChatGPT, GPT-4o, today for ChatGPT Pro, Plus, and Free users, as well as logged-out users around the globe.

[...] OpenAI claims its search product is growing rapidly. Users made more than a billion web searches in ChatGPT last week, the company told TechCrunch. OpenAI says it's determining ChatGPT shopping results independently, and notes that ads are not part of this upgrade to ChatGPT search. The shopping results will be based on structured metadata from third parties, such as pricing, product descriptions, and reviews, according to OpenAI. The company won't receive a kickback from purchases made through ChatGPT search. [...] Soon, OpenAI says it will integrate its memory feature with shopping for Pro and Plus users, meaning ChatGPT will reference a user's previous chats to make highly personalized product recommendations. The company previously updated ChatGPT to reference memory when making web searches broadly. However, these memory features won't be available to users in the EU, the U.K., Switzerland, Norway, Iceland, and Liechtenstein.

China

China's Huawei Develops New AI Chip, Seeking To Match Nvidia (wsj.com) 55

Huawei is gearing up to test its newest and most powerful AI processor, which the company hopes could replace some higher-end products of U.S. chip giant Nvidia. From a WSJ report: Huawei has approached some Chinese tech companies about testing the technical feasibility of the new chip, called the Ascend 910D, people familiar with the matter said. The company is slated to receive the first batch of samples of the processor as soon as late May, some of the people said.

The development is still at an early stage, and a series of tests will be needed to assess the chip's performance and get it ready for customers, the people said. Huawei hopes that the latest iteration of its Ascend AI processors will be more powerful than Nvidia's H100, a popular chip used for AI training that was released in 2022, said one of the people. Previous versions are called 910B and 910C.

AI

Unauthorized AI Bot Experiment Infiltrated Reddit To Test Persuasion Capabilities (404media.co) 82

Researchers claiming affiliation with the University of Zurich secretly deployed AI-powered bots in a popular Reddit forum to test whether AI could change users' minds on contentious topics. The unauthorized experiment, which targeted the r/changemyview subreddit, involved bots making over 1,700 comments across several months while adopting fabricated identities including a sexual assault survivor, a Black man opposing Black Lives Matter, and a domestic violence shelter worker.

The researchers "personalized" comments by analyzing users' posting histories to infer demographic information. The researchers, who remain anonymous despite inquiries, claimed their bots were "consistently well-received," garnering over 20,000 upvotes and 137 "deltas" -- awards indicating successful opinion changes. Hundreds of bot comments were deleted following the disclosure.
IBM

IBM Pledges $150 Billion US Investment (reuters.com) 42

IBM announced plans to invest $150 billion in the United States over the next five years, with more than $30 billion earmarked specifically for research and development of mainframes and quantum computing technology. The investment follows similar commitments from tech giants including Apple and Nvidia -- each pledging approximately $500 billion -- in the wake of President Trump's election and tariff threats.

"We have been focused on American jobs and manufacturing since our founding 114 years ago," said IBM CEO Arvind Krishna in a statement. The company currently manufactures its mainframe systems in upstate New York and plans to continue designing and assembling quantum computers domestically. The announcement comes amid challenging circumstances for IBM, which recently saw 15 government contracts shelved under the Trump administration's cost-cutting initiatives.

Further reading: IBM US Cuts May Run Deeper Than Feared - and the Jobs Are Heading To India;
IBM Now Has More Employees In India Than In the US (2017).

Slashdot Top Deals