California Management Review
California Management Review is a premier professional management journal for practitioners published at UC Berkeley Haas School of Business.
Alexander D. Hilton
Image Credit | MdRiyadull
Despite heavy investment in generative-AI upskilling, most organizations see little measurable impact on productivity or profitability. Evidence shows only a small share of enterprises achieve P&L gains, and early adoption often reduces efficiency. The core challenge is not adoption speed but the lack of engineered accountability. AI introduces three major risks such as malicious deepfakes, unintentional hallucinations, and systemic failures from insufficient testing that can trigger financial, legal, and reputational harm. Leaders must implement multi-channel verification, dual-approval workflows, adversarial testing, and a culture of procedural skepticism. Prioritizing validation over speed enables organizations to use AI safely and realize durable value.
Ankit Chopra, “Adoption of AI and Agentic Systems: Value, Challenges, and Pathways,” California Management Review, August 15, 2025.
Ravi Prakash Ranjan and Zohor Kettani, “Scenario Planning for Managing AI Disruption Risk: A 3C-AI Framework,” California Management Review, October 10, 2025.
The Global 2000 is betting billions that AI upskilling will unlock record valuations. However, empirical evidence reveals a persistent disconnect between investment levels and realized returns. Managers are being pushed to train their teams to integrate AI into daily workstreams, in some cases without a clear operational justification. But the evidence tells a sobering story. A recent Massachusetts Institute of Technology study found that only 5 % of enterprises report measurable P&L impact from gen‑AI initiatives,1 despite billions in expenditure. At the same time, surveys show that employees are adopting AI tools at increasing rates: a Federal Reserve Bank of St. Louis study found that 21.8 % of U.S. workers had used generative AI in the previous week, with 6.0 %–24.9 % of their work hours assisted by AI.2
Research in manufacturing reveals that AI adoption can initially reduce productivity: firms saw a 1.33 percentage‑point decline even after controlling for capital and IT factors, and the effect rose to on the order of 60 percentage points when correcting for selection bias. This steep short‑run decline is consistent with a J‑curve dynamic, where performance worsens initially before potential long‑term gains emerge.3 While the J‑curve suggests these early losses may be temporary, they underscore the importance of managing the integration process carefully, with a focus on implementing engineering accountability and control points to minimize the depth and duration of the productivity dip. Another study of software developers found that AI copilots can increase task completion time by 19 %, slowing rather than accelerating delivery depending on the size and maturity of the repository.4 The value proposition of AI can vary significantly and may not provide returns echoing similar concerns to the dot‑com era; the technology itself may have significant potential but is not at the point of necessarily being profitable.
AI’s market capabilities are still maturing. However, the risks AI poses to organizations are apparent beyond the question of its ability to increase productivity and return on investment. Rushing to train employees on how to adopt AI without designing organizational systems that challenge, test, and verify AI outputs leaves organizations exposed to poor quality, security breaches, and hidden costs. This mirrors the spread of false news online: without critical thinking skills, AI‑generated misinformation can seamlessly infiltrate business processes if the workforce is not trained in the essential skill of vetting AI.
With so many pilots stalling or even slowing productivity, managers have to ask: what kind of training will actually matter? The answer is not speed of adoption, rather designing and enforcing an engineered accountability framework that embeds verification, dual‑approval, and adversarial testing directly into workflows, while cultivating employee skills to critically assess and validate AI outputs. Similar to the rise of social media which demonstrated how misinformation can spread when people fail to evaluate sources; the rise of enterprise AI is exposing businesses to the same trap, but with far greater financial, legal, and operational consequences.
The real question for many organizations is not how quick to adopt AI; rather it’s whether their teams can slow down enough to identify, test, question, and validate AI before trusting it. This shift comes into sharp relief when we examine the three most immediate risk vectors: malicious external use, unintentional internal misuse, and systemic technical failures.
One of the most evident instances of AI‑related risk is the $25 million deepfake scam in Hong Kong, where a finance employee joined a video call with what appeared to be their CFO and colleagues.5 Every participant in that meeting was generated by AI. The request for funds appeared to be legitimate; it was polished, professional, and conveyed a sense of urgency. The transfer was executed, and the funds were lost.
This incident is not an isolated one. As early as 2019, criminals used voice‑cloning AI to impersonate a German CEO, tricking an employee into wiring $243 000.6 More recently, companies in the U.S. and U.K. have reported instances of fraudulent “CEO calls” where cloned voices solicit sensitive information or prompt urgent financial transfers. These occurrences all exhibit a common pattern: employees placed their trust in what they heard or saw without confirming through an alternative channel.
Training employees to visually or aurally detect deepfakes is unreliable; synthetic content can appear authentic until validated by specialized tools or protocols. As the technology advances, even experts struggle to consistently distinguish AI‑generated material without forensic assistance. Effective defenses therefore require workflow safeguards and multi‑channel verification, rather than reliance on individual perception.
To mitigate this risk, managers should implement engineered safeguards that reduce reliance on individual judgment and embed verification directly into organizational workflows.
In addition to financial fraud, deepfakes and other AI‑generated content can also:
The lesson for managers is clear: this threat is already here. Organizations that depend solely on human judgment are at risk. Only through procedural engineering such as layered verification processes and out‑of‑band controls can resilience against the malicious use of AI be achieved.
AI does not require malicious intent to inflict harm; it can inadvertently generate false yet seemingly credible outputs that integrate into workflows without scrutiny. In a corporate setting, this mirrors the same issue that facilitates the spread of misinformation on social media: content that appears accurate but lacks verification.
For instance, lawyers in the U.K. and U.S. have presented legal briefs that include fictitious case law and invented citations, depending on generative AI tools without adequate verification. A judge from the U.K. High Court observed that attorneys had referenced 18 entirely fictitious cases in a £90 million legal dispute and cautioned that utilizing AI research without independent verification poses “serious implications for the administration of justice and public confidence,” warning that attorneys could face prosecution if they fail to verify the accuracy of their research.7 In the U.S., lawyers faced sanctions and even disqualification from cases after incorporating fake citations produced by AI, with the judge stating that “fabricating legal authority demands substantially greater accountability.”8
Even reputable organizations have encountered difficulties when AI outputs were not adequately scrutinized. In Australia, Deloitte had to reimburse part of a $290 000 government contract after delivering a 237‑page report that included AI‑generated inaccuracies, fabricated quotations, and references to fictitious academic papers, which compromised both its credibility and the trust of its clients.9
In a similar vein, Air Canada lost a tribunal case when its chatbot fabricated a refund policy for bereavement fares; when a passenger depended on that information, the airline attempted to deny responsibility by claiming the chatbot was a “separate legal entity.” The court dismissed that argument and mandated compensation.10,11
Both instances underscore a common reality: AI can generate polished yet misleading outputs, and organizations that neglect to establish verification protocols, dual‑approval workflows, and accountability frameworks expose themselves to reputational harm, financial setbacks, and legal repercussions.
The consequences for businesses should be evident: when employees regard AI‑generated outputs as reliable without proper vetting, they introduce risks to quality, compliance, and reputation. If essential documents (such as legal filings, regulatory submissions, and operational reports) are based on unverified AI outputs, the organization may encounter sanctions, a loss of credibility, or operational disruptions.
Verification protocols safeguard against common misuse; however, they are inadequate by themselves. Even if individual outputs undergo review and receive approval, underlying flaws in system design may go unnoticed until an organization encounters stress, scales up, or faces a deliberate attack. At this point, the risk transitions from employee behavior to the very architecture of the AI systems. In the absence of proactive testing and adversarial assessment, organizations might only uncover vulnerabilities after they have already led to disruption or loss.
Even though AI outputs may appear accurate in daily applications, organizations encounter a significant third risk factor: systemic flaws that only become apparent under conditions of stress, scale, or adversarial challenges. Prominent companies like Microsoft have taken action by establishing an internal “AI Red Team” tasked with adopting the mindset of attackers, examining AI systems, and identifying vulnerabilities prior to these systems being delivered to customers.12
For example, Microsoft’s red‑teaming efforts on over 100 generative AI products uncovered critical insights: generative models not only exacerbate existing security threats but also create new ones, and red‑teaming should extend beyond mere model testing to encompass comprehensive system workflows, integrations, and user interactions.13
However, numerous organizations implement AI tools without equivalent protective measures. A report from the Software Engineering Institute at Carnegie Mellon University highlights that significant obstacles for enterprise AI red‑teaming include “inconsistencies in evaluation methodologies, limited threat modeling, and gaps in mitigation strategies.”14
Insufficient testing and a lack of adversarial readiness can result in:
Managers must subject AI systems to the same rigorous discipline as any other critical software deployment: conduct stress tests prior to launch, maintain continuous monitoring, and assume that adversaries will seek out vulnerabilities. Without this level of diligence, organizations risk uncovering systemic issues in the most detrimental manner during production, under duress, and with significant consequences.
Translating these risks into actionable strategies necessitates clear managerial priorities. The following focus areas provide leaders with a framework for embedding resilience into AI deployment from the very beginning:
AI risk has never been limited to the question of “will the model provide the correct answer?” It extends to “what occurs when someone attempts to manipulate it, or when it evolves beyond the conditions for which it was designed?” Managers who prioritize testing and validation today will be better positioned to prevent failures in the future.
AI is not going anywhere; it has already been integrated into the tools utilized by numerous teams, ranging from customer service chat to enterprise workflows. The discussion has shifted from whether organizations will implement AI to how they will do so. However, this is where many training strategies are currently misguided. Existing upskilling programs emphasize the rapid adoption of AI; how to enhance prompts, automate processes more efficiently, or incorporate copilots into workflows. While these skills are important, they are not the most critical.
The most critical requirement is to equip employees with the skills to systematically evaluate AI outputs. In the absence of this discipline, organizations risk perpetuating the same issues that turned social media into a hotbed for misinformation: polished, confident content disseminating unchecked due to a lack of scrutiny. In the business realm, the stakes are higher; potential regulatory penalties, damage to reputation, financial fraud, and systemic weaknesses.
Yet, research indicates that oversight and structured application can transform AI into a valuable asset. A working paper uploaded to ResearchGate found that a hybrid fraud detection system integrating AI with human oversight outperformed AI‑only or human‑only methods, yielding higher detection rates and fewer false positives.15 While preliminary, the findings align with broader guidance such as the NIST AI Risk Management Framework,16 which emphasizes embedding human oversight and validation in high‑stakes AI workflows. A similar principle applies in software development: while Becker observed that copilots could increase task completion times by 19% in mature and familiar repositories, they also found that 69 % of developers continued using Cursor after the study concluded, indicating they derived value from the tool despite measured slowdowns; while no clear learning effect was observed across the first 50 hours of use, the results are consistent with potential speedups in other contexts, such as small greenfield projects or unfamiliar codebases.17 The consistent takeaway is that benefits are not realized through automation alone, but through human‑in‑the‑loop systems that embed oversight, allow for learning curves, and emphasize disciplined use.
For engineering managers and directors, the strategic imperative is a fundamental pivot in AI implementation philosophy from a focus on adoption velocity to the engineering of accountability. This necessitates a foundational shift in training, where critical evaluation and verification are the primary objectives, not secondary components. Governance and resilience, articulated through robust acceptable use policies, dual‑approval processes, and adversarial testing, must be established as core prerequisites for AI interaction. Cultivating a culture of systematic skepticism is essential, empowering employees to treat the validation of AI outputs as a fundamental engineering responsibility. Consequently, broad‑based training on AI integration should be deferred until teams demonstrate sustained proficiency in assessing and validating AI‑generated work.
To operationalize this shift, the following actionable measures may be applicable depending on the team and context:
1. Initiate Foundational Awareness and Transparency.
2. Develop Context‑Specific Validation Frameworks.
3. Formalize Governance with Structured Oversight.
4. Create Mechanisms for Continuous Improvement.
AI may prove indispensable for many enterprises, but accountability must be engineered into its use from the start. Implementing AI without instructing employees on how to scrutinize it is akin to providing everyone with a printing press without teaching them literacy. Organizations that will succeed are those that grasp the true hierarchy of priorities: adoption can be deferred, but engineered accountability cannot.