AI

AI Can Write Code But Lacks Engineer's Instinct, OpenAI Study Finds 76

Leading AI models can fix broken code, but they're nowhere near ready to replace human software engineers, according to extensive testing [PDF] by OpenAI researchers. The company's latest study put AI models and systems through their paces on real-world programming tasks, with even the most advanced models solving only a quarter of typical engineering challenges.

The research team created a test called SWE-Lancer, drawing from 1,488 actual software fixes made to Expensify's codebase, representing $1 million worth of freelance engineering work. When faced with these everyday programming tasks, the best AI model â" Claude 3.5 Sonnet -- managed to complete just 26.2% of hands-on coding tasks and 44.9% of technical management decisions.

Though the AI systems proved adept at quickly finding relevant code sections, they stumbled when it came to understanding how different parts of software interact. The models often suggested surface-level fixes without grasping the deeper implications of their changes.

The research, to be sure, used a set of complex methodologies to test the AI coding abilities. Instead of relying on simplified programming puzzles, OpenAI's benchmark uses complete software engineering tasks that range from quick $50 bug fixes to complex $32,000 feature implementations. Each solution was verified through rigorous end-to-end testing that simulated real user interactions, the researchers said.
Businesses

Mira Murati Is Launching Her OpenAI Rival: Thinking Machines Lab (theverge.com) 18

Former OpenAI CTO Mira Murati has launched Thinking Machines Lab with several leaders from OpenAI on board, including John Schulman, Barrett Zoph, and Jonathan Lachman. Their mission is "to make AI systems more widely understood, customizable, and generally capable," with a commitment to publishing technical research and code. The Verge reports: In a press release shared with The Verge, the company suggests that it's building products that help humans work with AI, rather than fully autonomous systems. "We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals," says the press release.
AI

HP To Acquire Parts of Humane, Shut Down the AI Pin 51

An anonymous reader quotes a report from Bloomberg: HP will acquire assets from Humane, the maker of a wearable Ai Pin introduced in late 2023, for $116 million. The deal will include the majority of Humane's employees in addition to its software platform and intellectual property, the company said Tuesday. It will not include Humane's Ai pin device business, which will be wound down, an HP spokesperson said. Humane's team, including founders Imran Chaudhri and Bethany Bongiorno, will form a new division at HP to help integrate artificial intelligence into the company's personal computers, printers and connected conference rooms, said Tuan Tran, who leads HP's AI initiatives. Chaudhri and Bongiorno were design and software engineers at Apple before founding the startup. [...]

Tran said he was particularly impressed with aspects of Humane's design, such as the ability to orchestrate AI models running both on-device and in the cloud. The deal is expected to close at the end of the month, HP said. "There will be a time and place for pure AI devices," Tran said. "But there is going to be AI in all our devices -- that's how we can help our business customers be more productive."
AI

AI 'Hallucinations' in Court Papers Spell Trouble For Lawyers (reuters.com) 73

An anonymous reader shares a report: U.S. personal injury law firm Morgan & Morgan sent an urgent email this month to its more than 1,000 lawyers: Artificial intelligence can invent fake case law, and using made-up information in a court filing could get you fired. A federal judge in Wyoming had just threatened to sanction two lawyers at the firm who included fictitious case citations in a lawsuit against Walmart. One of the lawyers admitted in court filings last week that he used an AI program that "hallucinated" the cases and apologized for what he called an inadvertent mistake.

AI's penchant for generating legal fiction in case filings has led courts around the country to question or discipline lawyers in at least seven cases over the last two years, and created a new high-tech headache for litigants and judges, Reuters found. The Walmart case stands out because it involves a well-known law firm and a big corporate defendant. But examples like it have cropped up in all kinds of lawsuits since chatbots like ChatGPT ushered in the AI era, highlighting a new litigation risk.

AI

27% of Job Listings For CFOs Now Mention AI (fortune.com) 20

A new report released by Cisco finds that 97% of CEOs surveyed are planning AI integration. Similarly, 92% of companies recently surveyed by McKinsey plan to invest more in generative AI over the next three years. Fortune: To that end, many companies are seeking tech-savvy finance talent, according to a new report by software company Datarails. The researchers analyzed 6,000 job listings within the CFO's office -- CFO, controller, financial planning and analysis (FP&A), and accountant -- advertised on job search websites including LinkedIn, Glassdoor, Indeed, Job2Careers, and ZipRecruiter.

Of the 1,000 job listings for CFOs in January 2025, 27% included AI in the job description. This compares to 8% mentions of AI in 1,000 CFO job listings at the same time last year. Take, for example, Peaks Healthcare Consulting which required a CFO candidate to "continuously learn and integrate AI to improve financial processes and decision making," Datarails notes in the report. Regarding FP&A professionals, in January 2025, 35% of analyst roles mentioned AI competency as a requirement, compared to 14% in January 2024, according to the report.

AI

DeepSeek Expands Business Scope in Potential Shift Towards Monetization (scmp.com) 6

Chinese AI startup DeepSeek has updated its business registry information with key changes to personnel and operational scope, signaling a shift towards monetizing its cost-efficient-yet-powerful large language models. From a report: The Hangzhou-based firm's updated business scope includes "internet information services," according to business registry service Tianyancha. The move is the first sign of DeepSeek's desire to monetise its popular technology, according to Zhang Yi, founder and chief analyst at consultancy iiMedia.

With eyes on developing a business model, DeepSeek intends to shift away from being purely focused on research and development, Zhang added. "The move reflects that for a company like DeepSeek, which managed to accumulate technology and develop a product, monetisation is becoming a necessary next step," Zhang said. DeepSeek's previous business scope said it engages in engineering and AI software development, among others, hinting at a more research-driven approach.

AI

xAI Releases Its Latest Flagship Model, Grok 3 (x.com) 140

xAI has launched Grok 3, the latest iteration of its large language model, alongside new capabilities for its iOS and web applications. The model has been trained on approximately 200,000 GPUs in a Memphis data center, representing what CEO Elon Musk claims is a tenfold increase in computing power compared to its predecessor.

The new release introduces two specialized variants: Grok 3 Reasoning and Grok 3 mini Reasoning, designed to methodically analyze problems similar to OpenAI's o3-mini and DeepSeek's R1 models. According to xAI's benchmarks, Grok 3 outperforms GPT-4o on several technical evaluations, including AIME for mathematical reasoning and GPQA for PhD-level science problems.

A notable addition is the DeepSearch feature, which combs through web content and X posts to generate research summaries. The platform will be available through X's Premium+ subscription and a new SuperGrok tier ($30/month or $300/year), with the latter offering enhanced reasoning capabilities and unlimited image generation. To prevent knowledge extraction through model distillation -- a technique recently attributed to DeepSeek's alleged copying of OpenAI's models -- xAI has implemented measures to obscure the reasoning models' thought processes in the Grok app. The company plans to release the Grok 2 model as open source once Grok 3 achieves stability.
Data Storage

NAND Flash Prices Plunge Amid Supply Glut, Factory Output Cut (theregister.com) 34

NAND flash prices are expected to slide due to oversupply, forcing memory chipmakers to cut production to match lower-than-expected orders from PC and smartphone manufacturers. From a report: The superabundance of stock is putting a financial strain on suppliers of NAND flash, according to TrendForce, which says growth rate forecasts are being revised down from 30 percent to 10-15 percent for 2025.

"NAND flash manufacturers have adopted more decisive production cuts, scaling back full-year output to curb bit supply growth. These measures are designed to swiftly alleviate market imbalances and lay the groundwork for a price recovery," TrendForce stated.

Shrish Pant, Gartner director analyst and technology product leader, expects NAND flash pricing to remain weak for the first half of 2025, though he projects higher bit shipments for SSDs in the second half due to continuing AI server demand.

"Vendors are currently working tirelessly to discipline supply, which will lead to prices recovering in the second half of 2025. Long term, AI demand will continue to drive the demand for higher-capacity/better-performance SSDs," Pant said. Commenting on the seasonal nature of the memory market, Pant told The Register: "Buying patterns will mean that NAND flash prices will remain cyclical depending on hyperscalers' buying behavior."

Businesses

The 'White Collar' Recession is Pummeling Office Workers (fortune.com) 211

White-collar workers are facing their deepest hiring slump in a decade, with one in four U.S. job losses last year hitting professional workers, according to S&P Global. A 2024 Vanguard report shows hiring for employees earning over $96,000 has fallen to its lowest level since 2014. The downturn has been particularly severe for job seekers â" 40% of applicants failed to secure even a single interview in 2024, according to a survey of 2,000 respondents by the American Staffing Association and The Harris Poll.

Technology and high interest rates appear to be driving the decline, with companies reassessing their workforce needs amid AI adoption and economic pressures. While hiring remains steady for those earning under $55,000 annually, the market continues to be especially challenging for mid-career professionals and higher earners.
AI

Reddit Mods Are Fighting To Keep AI Slop Off Subreddits (arstechnica.com) 68

Reddit moderators are struggling to police AI-generated content on the platform, according to ArsTechnica, with many expecting the challenge to intensify as the technology becomes more sophisticated. Several popular Reddit communities have implemented outright bans on AI-generated posts, citing concerns over content quality and authenticity.

The moderators of r/AskHistorians, a forum known for expert historical discussion, said that AI content "wastes our time" and could compromise the subreddit's reputation for accurate information. Moderators are currently using third-party AI detection tools, which they describe as unreliable. Many are calling on Reddit to develop its own detection system, the report said.
Privacy

Nearly 10 Years After Data and Goliath, Bruce Schneier Says: Privacy's Still Screwed (theregister.com) 57

Ten years after publishing his influential book on data privacy, security expert Bruce Schneier warns that surveillance has only intensified, with both government agencies and corporations collecting more personal information than ever before. "Nothing has changed since 2015," Schneier told The Register in an interview. "The NSA and their counterparts around the world are still engaging in bulk surveillance to the extent of their abilities."

The widespread adoption of cloud services, Internet-of-Things devices, and smartphones has made it nearly impossible for individuals to protect their privacy, said Schneier. Even Apple, which markets itself as privacy-focused, faces limitations when its Chinese business interests are at stake. While some regulation has emerged, including Europe's General Data Protection Regulation and various U.S. state laws, Schneier argues these measures fail to address the core issue of surveillance capitalism's entrenchment as a business model.

The rise of AI poses new challenges, potentially undermining recent privacy gains like end-to-end encryption. As AI assistants require cloud computing power to process personal data, users may have to surrender more information to tech companies. Despite the grim short-term outlook, Schneier remains cautiously optimistic about privacy's long-term future, predicting that current surveillance practices will eventually be viewed as unethical as sweatshops are today. However, he acknowledges this transformation could take 50 years or more.
Programming

'New Junior Developers Can't Actually Code' (nmn.gl) 220

Junior software developers' overreliance on AI coding assistants is creating knowledge gaps in fundamental programming concepts, developer Namanyay Goel argued in a post. While tools like GitHub Copilot and Claude enable faster code shipping, developers struggle to explain their code's underlying logic or handle edge cases, Goel wrote. Goel cites the decline of Stack Overflow, a technical forum where programmers historically found detailed explanations from experienced developers, as particularly concerning.
AI

DeepSeek Removed from South Korea App Stores Pending Privacy Review (france24.com) 3

Today Seoul's Personal Information Protection Commission "said DeepSeek would no longer be available for download until a review of its personal data collection practices was carried out," reports AFP. A number of countries have questioned DeepSeek's storage of user data, which the firm says is collected in "secure servers located in the People's Republic of China"... This month, a slew of South Korean government ministries and police said they blocked access to DeepSeek on their computers. Italy has also launched an investigation into DeepSeek's R1 model and blocked it from processing Italian users' data. Australia has banned DeepSeek from all government devices on the advice of security agencies. US lawmakers have also proposed a bill to ban DeepSeek from being used on government devices over concerns about user data security.
More details from the Associated Press: The South Korean privacy commission, which began reviewing DeepSeek's services last month, found that the company lacked transparency about third-party data transfers and potentially collected excessive personal information, said Nam Seok [director of the South Korean commission's investigation division]... A recent analysis by Wiseapp Retail found that DeepSeek was used by about 1.2 million smartphone users in South Korea during the fourth week of January, emerging as the second-most-popular AI model behind ChatGPT.
Social Networks

Are Technologies of Connection Tearing Us Apart? (lareviewofbooks.org) 88

Nicholas Carr wrote The Shallows: What the Internet Is Doing to Our Brains. But his new book looks at how social media and digital communication technologies "are changing us individually and collectively," writes the Los Angeles Review of Books.

The book's title? Superbloom: How Technologies of Connection Tear Us Apart . But if these systems are indeed tearing us apart, the reasons are neither obvious nor simple. Carr suggests that this isn't really about the evil behavior of our tech overlords but about how we have "been telling ourselves lies about communication — and about ourselves.... Well before the net came along," says Carr, "[the] evidence was telling us that flooding the public square with more information from more sources was not going to open people's minds or engender more thoughtful discussions. It wasn't even going to make people better informed...."

At root, we're the problem. Our minds don't simply distill useful knowledge from a mass of raw data. They use shortcuts, rules of thumb, heuristic hacks — which is how we were able to think fast enough to survive on the savage savanna. We pay heed, for example, to what we experience most often. "Repetition is, in the human mind, a proxy for facticity," says Carr. "What's true is what comes out of the machine most often...." Reality can't compete with the internet's steady diet of novelty and shallow, ephemeral rewards. The ease of the user interface, congenial even to babies, creates no opportunity for what writer Antón Barba-Kay calls "disciplined acculturation."

Not only are these technologies designed to leverage our foibles, but we are also changed by them, as Carr points out: "We adapt to technology's contours as we adapt to the land's and the climate's." As a result, by designing technology, we redesign ourselves. "In engineering what we pay attention to, [social media] engineers [...] how we talk, how we see other people, how we experience the world," Carr writes. We become dislocated, abstracted: the self must itself be curated in memeable form. "Looking at screens made me think in screens," writes poet Annelyse Gelman. "Looking at pixels made me think in pixels...."

That's not to say that we can't have better laws and regulations, checks and balances. One suggestion is to restore friction into these systems. One might, for instance, make it harder to unreflectively spread lies by imposing small transactional costs, as has been proposed to ease the pathologies of automated market trading. An option Carr doesn't mention is to require companies to perform safety studies on their products, as we demand of pharmaceutical companies. Such measures have already been proposed for AI. But Carr doubts that increasing friction will make much difference. And placing more controls on social media platforms raises free speech concerns... We can't change or constrain the tech, says Carr, but we can change ourselves. We can choose to reject the hyperreal for the material. We can follow Samuel Johnson's refutation of immaterialism by "kicking the stone," reminding ourselves of what is real.

AI

What If People Like AI-Generated Art Better? (christies.com) 157

Christie's auction house notes that an AI-generated "portrait" of an 18th-century French gentleman recently sold for $432,500. (One member of the Paris-based collective behind the work says "we found that portraits provided the best way to illustrate our point, which is that algorithms are able to emulate creativity.")

But the blog post from Christie's goes on to acknowledge that AI researchers "are still addressing the fundamental question of whether the images produced by their networks can be called art at all." . One way to do that, surely, is to conduct a kind of visual Turing test, to show the output of the algorithms to human evaluators, flesh-and-blood discriminators, and ask if they can tell the difference.

"Yes, we have done that," says Ahmed Elgammal [director of the Art and Artificial Intelligence Lab at Rutgers University in New Jersey]. "We mixed human-generated art and art from machines, and posed questions — direct ones, such as 'Do you think this painting was produced by a machine or a human artist?' and also indirect ones such as, 'How inspiring do you find this work?'. We measured the difference in responses towards the human art and the machine art, and found that there is very little difference. Actually, some people are more inspired by the art that is done by machine."

Can such a poll constitute proof that an algorithm is capable of producing indisputable works of art? Perhaps it can — if you define a work of art as an image produced by an intelligence with an aesthetic intent. But if you define art more broadly as an attempt to say something about the wider world, to express one's own sensibilities and anxieties and feelings, then AI art must fall short, because no machine mind can have that urge — and perhaps never will.

This also begs the question: who gets credit for the resulting work. The AI, or the creator of its algorithm...

Or can the resulting work be considered a "conceptual art" collaboration — taking place between a human and an algorithm?
AI

Lawsuit Accuses Meta Of Training AI On Torrented 82TB Dataset Of Pirated Books (hothardware.com) 47

"Meta is involved in a class action lawsuit alleging copyright infringement, a claim the company disputes..." writes the tech news site Hot Hardware.

But the site adds that newly unsealed court documents "reveal that Meta allegedly used a minimum of 81.7TB of illegally torrented data sourced from shadow libraries to train its AI models." Internal emails further show that Meta employees expressed concerns about this practice. Some employees voiced strong ethical objections, with one noting that using content from sites like LibGen, known for distributing copyrighted material, would be unethical. A research engineer with Meta, Nikolay Bashlykov, also noted that "torrenting from a corporate laptop doesn't feel right," highlighting his discomfort surrounding the practice.

Additionally, the documents suggest that these concerns, including discussions about using data from LibGen, reached CEO Mark Zuckerberg, who may have ultimately approved the activity. Furthermore, the documents showed that despite these misgivings, employees discussed using VPNs to mask Meta's IP address to create anonymity, enabling them to download and share torrented data without it being easily traced back to the company's network.

Python

Are Fast Programming Languages Gaining in Popularity? (techrepublic.com) 163

In January the TIOBE Index (estimating programming language popularity) declared Python their language of the year. (Though it was already #1 in their rankings, it had showed a 9.3% increase in their ranking system, notes InfoWorld.) TIOBE CEO Paul Jansen says this reflects how easy Python is to learn, adding that "The demand for new programmers is still very high" (and that "developing applications completely in AI is not possible yet.")

In fact on February's version of the index, the top ten looks mostly static. The only languages dropping appear to be very old languages. Over the last 12 months C and PHP have both fallen on the index — C from the #2 to the #4 spot, and PHP from #10 all the way to #14. (Also dropping is Visual Basic, which fell from #9 to #10.)

But TechRepublican cites another factor that seems to be affecting the rankings: language speed. Fast programming languages are gaining popularity, TIOBE CEO Paul Jansen said in the TIOBE Programming Community Index in February. Fast programming languages he called out include C++ [#2], Go [#8], and Rust [#13 — up from #18 a year ago].

Also, according to the updated TIOBE rankings...

- C++ held onto its place at second from the top of the leaderboard.
- Mojo and Zig are following trajectories likely to bring them into the top 50, and reached #51 and #56 respectively in February.

"Now that the world needs to crunch more and more numbers per second, and hardware is not evolving fast enough, speed of programs is getting important. Having said this, it is not surprising that the fast programming languages are gaining ground in the TIOBE index," Jansen wrote. The need for speed helped Mojo [#51] and Zig [#56] rise...

Rust reached its all-time high in the proprietary points system (1.47%.), and Jansen expects Go to be a common sight in the top 10 going forward.

AI

AI Bugs Could Delay Upgrades for Both Siri and Alexa (yahoo.com) 24

Bloomberg reports that Apple's long-promised overhaul for Siri "is facing engineering problems and software bugs, threatening to postpone or limit its release, according to people with knowledge of the matter...." Last June, Apple touted three major enhancements coming to Siri:

- the ability to tap into a customer's data to better answer queries and take actions.
- a new system that would let the assistant more precisely control apps.
- the capability to see what's currently on a device's screen and use that context to better serve users....

The goal is to ultimately offer a more versatile Siri that can seamlessly tap into customers' information and communication. For instance, users will be able to ask for a file or song that they discussed with a friend over text. Siri would then automatically retrieve that item. Apple also has demonstrated the ability for Siri to quickly locate someone's driver's license number by reviewing their photos... Inside Apple, many employees testing the new Siri have found that these features don't yet work consistently...

The control enhancements — an upgraded version of something called App Intents — are central to the operation of the company's upcoming smart home hub. That product, an AI device for controlling smart home appliances and FaceTime, is slated for release later this year.

And Amazon is also struggling with an AI upgrade for its digital assistant, reports the Washington Post: The "smarter and more conversational" version of Alexa will not be available until March 31 or later, the employee said, at least a year and a half after it was initially announced in response to competition from OpenAI's ChatGPT. Internal messages seen by The Post confirmed the launch was originally scheduled for this month but was subsequently moved to the end of March... According to internal documents seen by The Post, new features of the subscriber-only, AI-powered Alexa could include the ability to adopt a personality, recall conversations, order takeout or call a taxi. Some of the new Alexa features are similar to Alexa abilities that were previously available free through partnerships with companies like Grubhub and Uber...

The AI-enhanced version of Alexa in development has been repeatedly delayed due to problems with incorrect answers, the employee working on the launch told The Post. As a popular product that is a decade old, the Alexa brand is valuable, and the company is hesitant to risk customer trust by launching a product that is not reliable, the person said.

AI

Ask Slashdot: What Would It Take For You to Trust an AI? (win.tue.nl) 179

Long-time Slashdot reader shanen has been testing AI clients. (They report that China's DeepSeek "turned out to be extremely good at explaining why I should not trust it. Every computer security problem I ever thought of or heard about and some more besides.")

Then they wondered if there's also government censorship: It's like the accountant who gets asked what 2 plus 2 is. After locking the doors and shading all the windows, the accountant whispers in your ear: "What do you want it to be...?" So let me start with some questions about DeepSeek in particular. Have you run it locally and compared the responses with the website's responses? My hypothesis is that your mileage should differ...

It's well established that DeepSeek doesn't want to talk about many "political" topics. Is that based on a distorted model of the world? Or is the censorship implemented in the query interface after the model was trained? My hypothesis is that it must have been trained with lots of data because the cost of removing all of the bad stuff would have been prohibitive... Unless perhaps another AI filtered the data first?

But their real question is: what would it take to trust an AI? "Trust" can mean different things, including data-collection policies. ("I bet most of you trust Amazon and Amazon's secret AIs more than you should..." shanen suggests.) Can you use an AI system without worrying about its data-retention policies?

And they also ask how many Slashdot readers have read Ken Thompson's "Reflections on Trusting Trust", which raises the question of whether you can ever trust code you didn't create yourself. So is there any way an AI system can assure you its answers are accurate and trustworthy, and that it's safe to use? Share your own thoughts and experiences in the comments.

What would it take for you to trust an AI?
Social Networks

Despite Plans for AI-Powered Search, Reddit's Stock Fell 14% This Week (yahoo.com) 55

"Reddit Answers" uses generative AI to answer questions using what past Reddittors have posted. Announced in December, Reddit now plans to integrate it into their search results, reports TechCrunch, with Reddit's CEO saying the idea has "incredible monetization potential."

And yet Reddit's stock fell 14% this week. CNBC's headline? "Reddit shares plunge after Google algorithm change contributes to miss in user numbers." A Google search algorithm change caused some "volatility" with user growth in the fourth quarter, but the company's search-related traffic has since recovered in the first quarter, Reddit CEO Steve Huffman said in a letter to shareholders. "What happened wasn't unusual — referrals from search fluctuate from time to time, and they primarily affect logged-out users," Huffman wrote. "Our teams have navigated numerous algorithm updates and did an excellent job adapting to these latest changes effectively...." Reddit has said it is working to convince logged-out users to create accounts as logged-in users, which are more lucrative for its business.
As Yahoo Finance once pointed out, Reddit knew this day would come, acknowledging in its IPO filing that "changes in internet search engine algorithms and dynamics could have a negative impact on traffic for our website and, ultimately, our business." And in the last three months of 2024 Reddit's daily active users dropped, Yahoo Finance reported this week. But logged-in users increased by 400,000 — while logged-out users dropped by 600,000 (their first drop in almost two years).

Marketwatch notes that analyst Josh Beck sees this as a buying opportunity for Reddit's stock: Beck pointed to comments from Reddit's management regarding a sharp recovery in daily active unique users. That was likely driven by Google benefiting from deeper Reddit crawling, by the platform uncollapsing comments in search results and by a potential benefit from spam-reduction algorithm updates, according to the analyst. "While the report did not clear our anticipated bar, we walk away encouraged by international upside," he wrote.

Slashdot Top Deals