AI frenzy cannot withstand the harsh reality: Companies lower AI agent expectations, full automation still requires several years

Media reports indicate that companies are retreating from the fervent expectations surrounding AI agents: although AI chat and coding tools have improved efficiency, AI agents capable of "taking over entire jobs" frequently encounter setbacks in implementation. They face challenges such as difficult deployment, high costs, and often produce confident yet incorrect outputs, making them difficult to use in critical areas like customer service and cybersecurity. Many companies are slowing down their fully automated plans, shifting towards a "human-machine collaboration" model, and viewing AI agents as long-term investments in research and development that are unlikely to yield short-term results. Some tech executives predict that AI agents are still several years away from true maturity and implementation

Media reports indicate that AI is changing the way people work through general-purpose chatbots and AI programming tools, leading to revenue growth for companies like OpenAI and Microsoft. Various companies have been attempting to delegate employee tasks to AI agents.

However, many businesses encounter difficulties when using more complex AI agents, which often "are not up to the task," forcing AI providers to personally intervene and troubleshoot issues with clients to prevent AI from "messing things up."

For example, European retailer Fnac faced challenges when using AI customer service agents. Fnac had tested models from OpenAI, Google, and other labs, but the results were unsatisfactory. Olivier Theulle, the company's Chief Digital and E-commerce Officer, told the media that reliability was an issue: when customers reported defective products, the AI requested the product serial number but confused these serial numbers with those of other products, with only one digit differing.

Fnac generates annual revenue of $10 billion. Theulle stated that the AI agent's performance only began to stabilize after collaborating with Israeli company AI21 Labs and receiving assistance from its engineers. Ori Goshen, co-CEO of AI21, said,

"The problem is that the model performs well out of the box on various benchmark tests, but does not perform well in real enterprise environments."

"A considerable degree of customization is required."

Some companies told the media that they could only truly benefit from AI agents after their own software engineers spent months deploying them and receiving direct technical support from AI companies. Today, tech company leaders also indicate that businesses cannot expect complex AI projects to run smoothly without "hands-on support" from AI vendors.

Venture capitalist Vinod Khosla stated in an interview in October,

"It's like saying 'we have a race car, anyone can drive it,' but ordinary people cannot unleash the full potential of a race car."

Khosla is an early investor in OpenAI and recently invested in an AI consulting startup that deploys engineers to companies like T-Mobile to help them implement AI within large organizations. This startup, Distyl, is just one of many emerging companies in this field that provide high-tech consulting services to businesses in need of support. AI developers and AI agent providers like OpenAI, Anthropic, Salesforce, and Snowflake have also begun hiring frontline deployment engineers (FDEs) or launching similar consulting services, but this often increases their costs.

Another example is Cox Automotive, which provides software specifically for car dealerships and has annual sales of $9 billion. Previously, the company developed an AI agent to create marketing web pages for dealerships. As one of Amazon Web Services (AWS) largest clients in the automotive sector, it received "white-glove service." Cox's Chief Product Officer Marianne Johnson told the media that AWS engineers and engineers from Anthropic, which provides AI technology for the agency, flew to Cox's headquarters in Atlanta to work alongside Cox's software developers for several days to build this tool together. She declined to disclose how much Cox paid AWS and Anthropic for this, but estimated that it could save millions of dollars in labor costs over the next few years, as the company no longer needs to manually create websites for customers.

"It confidently babbles nonsense"

The goal of the AI agent is to handle various tasks such as customer service issues and managing IT systems. AI and cloud service providers are betting on the revenue generated by enterprises using AI agents, using it as a justification for investing hundreds of billions of dollars in building AI data centers over the next year or two.

However, these suppliers and some client executives have stated that AI agents are too difficult to configure and their behavior is often unpredictable. This makes them unsuitable for tasks where mistakes can have serious consequences. As a result, clients have lowered their expectations and no longer hope that AI agents can automate too much work, delaying the deployment of AI agents in critical positions such as customer support and cybersecurity.

For example, IT services giant Kyndryl began testing Microsoft's Security Copilot this year, a chatbot designed to interface with enterprise IT systems and explain potential security vulnerabilities in simple English, effectively automating the work of a cybersecurity analyst. However, Scott Owenby, who is responsible for the company's internal cybersecurity, told the media that when Kyndryl employees tried to ask some basic questions, such as "Which company devices are running outdated software," the answers provided by Security Copilot were clearly incorrect. Owenby said,

"It confidently babbles nonsense, and I admire that confidence, but I can't trust its data."

Kyndryl spent about $50,000 testing Security Copilot for six months, after which they decided to stop using the software. Owenby said,

"I basically burned $50,000. That's not a lot, and if it had even a little use, we would have continued using it, but we didn't expect it to be completely unusable."

Owenby also mentioned that other AI tools perform better, such as software from Palo Alto Networks that can automatically handle repetitive and tedious tasks in cybersecurity, such as investigating cases where employees log in from new locations or capture screenshots of sensitive data. This has allowed him to reduce the number of personnel in some security teams over the past year, but he stated that staff are still needed to monitor these AI tools and cannot fully let AI take over.

"Some hype involved"

Bosch Power Tools has annual revenues exceeding $5.7 billion. Florian Haustein, the company's head of digital customer experience, told the media that the company has been testing a chatbot for over a year to answer customer questions about how to use tools and troubleshoot issues However, Haustein stated that this chatbot still frequently provides incorrect answers, some of which could even lead to user harm. Therefore, the project remains in the pilot stage. He also mentioned that Bosch is testing models from several labs, including Google and OpenAI.

Haustein told the media that Bosch has seen better results with a less aggressive customer service chatbot that only answers more basic questions, such as where to buy a certain product; there is also an AI tool provided by SAP that can read customer inquiries and automatically assign them to the appropriate human staff. Haustein said,

"I think the idea of 'fully using AI for customer service' is somewhat overhyped."

"You have to ensure that the answers are nearly 100% accurate... but we still see hallucinations and incorrect answers. I think we have not yet reached the level of confidence needed for full automation."

Some technology vendors also acknowledge that AI agents are not yet mature. Amazon CEO Andy Jassy said during last Thursday's earnings call:

"At this stage, building AI agents is still more difficult than expected."

"But over time, much of the value that businesses realize from AI will come from AI agents."

Revenue from AI Agent Products Hard to Calculate

Currently, the adoption of general chatbots, programming assistants, AI search, and AI video generation tools has already helped engineering, marketing, and product management teams improve efficiency, according to business executives speaking to the media.

This has driven new revenue growth for AI vendors: according to the media's generative AI database, 20 AI-native startups led by OpenAI and Anthropic have achieved an annualized revenue of $23 billion from AI office use, compared to nearly zero three years ago.

However, it is quite difficult to calculate the revenue generated specifically from "AI agents." In cloud companies like Google, Microsoft, and Amazon, most revenue growth comes from large AI developers like OpenAI, Anthropic, and Meta renting servers, rather than enterprise AI applications.

Among enterprise software companies selling AI agents, the results vary. Salesforce stated earlier this year that its Agentforce product (used for automating sales emails, tracking invoices, and other tasks) has annual revenue exceeding $100 million. ServiceNow claimed that its AI software for automatically processing IT service tickets is expected to achieve $1 billion in revenue by the end of 2026. However, the revenue growth for both companies has been slower in recent quarters compared to most of 2023.

SAP has not separately disclosed AI product revenue, but CEO Christian Klein mentioned in this month's earnings call that AI is expected to bring "double-digit revenue growth" in the next two years.

Many software companies offering AI agents, including Salesforce, Snowflake, and Xero, currently do not even charge for such products, hoping to charge once customers truly recognize the value Paul Fipps, President of Global Customer Operations at ServiceNow, told the media that customers are no longer as excited about piloting AI features recently, as they have become more realistic and started to consider which tasks AI agents can reasonably automate. Fipps said,

“In the past 12 to 18 months, due to the rapid development of generative AI, many customers actively piloted these AI capabilities, and the pendulum was pushed to one extreme.”

“Now you see the pendulum starting to swing back.”

He remains optimistic, believing that as AI agents continue to improve, companies will continue to invest heavily in the coming years.

Currently, AI agents are most successful in the field of software development. AI programming agents are becoming standard for many companies' engineering teams. However, software engineers still need to check the AI's code, as AI can make mistakes, meaning tasks cannot be fully automated yet.

“Stay Realistic”

Nikesh Arora, CEO of Palo Alto Networks, stated that companies selling AI tools must be cautious not to overpromise how much work AI can automate. He believes that achieving full automation in cybersecurity roles will still take years.

“We maintain a realistic attitude; (full automation) requires more effort, and we must be very certain that when we hand operations over to AI, the actions it takes are correct, because cybersecurity has consequences.”

Nevertheless, companies still recognize the benefits brought by AI agents, even if it requires “someone watching.” For example, Cirque du Soleil, the Canadian circus, is using an AI agent provided by SAP to track invoices from its costume and stage scenery suppliers.

When suppliers email to inquire about the status of invoices, the AI agent checks whether the invoices have been processed in the SAP system and drafts a reply email. In the past, the company had two full-time employees doing this; now, those two have been reassigned to other departments, and only one person is needed to review the AI draft before sending it out.

The operational cost of this tool is lower than the salary of one full-time employee, and Vice President Philippe Lalumière told the media:

“Sometimes the emails written by AI are not very polite, but suppliers receive replies faster and clearer, so overall satisfaction is higher. We haven't laid anyone off because of it, but the productivity improvement is obvious.”

Meanwhile, other AI agent providers also remind customers to view these tools as experimental projects rather than investments that will yield immediate returns.

Asha Sharma, President of Core AI Product Development at Microsoft, stated last week at The Information's WTF Summit:

“Think of AI agents as R&D budgets... an investment that will pay off in the next 5 to 10 years.”

“I think we are still in a very early stage... We now have millions of AI agents in production, but everyone is still figuring out how to make AI agents truly useful.”