INSIGHT

 

Resilience

Upskilling to Accountability: Rethinking AI Adoption Through Resilience

Alexander D. Hilton

Upskilling to Accountability: Rethinking AI Adoption Through Resilience

Image Credit | MdRiyadull

Prioritizing validation over speed enables organizations to use AI safely and realize durable value.
  PDF

Despite heavy investment in generative-AI upskilling, most organizations see little measurable impact on productivity or profitability. Evidence shows only a small share of enterprises achieve P&L gains, and early adoption often reduces efficiency. The core challenge is not adoption speed but the lack of engineered accountability. AI introduces three major risks such as malicious deepfakes, unintentional hallucinations, and systemic failures from insufficient testing that can trigger financial, legal, and reputational harm. Leaders must implement multi-channel verification, dual-approval workflows, adversarial testing, and a culture of procedural skepticism. Prioritizing validation over speed enables organizations to use AI safely and realize durable value.

Related Articles

Ankit Chopra, “Adoption of AI and Agentic Systems: Value, Challenges, and Pathways,” California Management Review, August 15, 2025.

Ravi Prakash Ranjan and Zohor Kettani, “Scenario Planning for Managing AI Disruption Risk: A 3C-AI Framework,” California Management Review, October 10, 2025.


Introduction

The Global 2000 is betting billions that AI upskilling will unlock record valuations. However, empirical evidence reveals a persistent disconnect between investment levels and realized returns. Managers are being pushed to train their teams to integrate AI into daily workstreams, in some cases without a clear operational justification. But the evidence tells a sobering story. A recent Massachusetts Institute of Technology study found that only 5 % of enterprises report measurable P&L impact from gen‑AI initiatives,1 despite billions in expenditure. At the same time, surveys show that employees are adopting AI tools at increasing rates: a Federal Reserve Bank of St. Louis study found that 21.8 % of U.S. workers had used generative AI in the previous week, with 6.0 %–24.9 % of their work hours assisted by AI.2

Research in manufacturing reveals that AI adoption can initially reduce productivity: firms saw a 1.33 percentage‑point decline even after controlling for capital and IT factors, and the effect rose to on the order of 60 percentage points when correcting for selection bias. This steep short‑run decline is consistent with a J‑curve dynamic, where performance worsens initially before potential long‑term gains emerge.3 While the J‑curve suggests these early losses may be temporary, they underscore the importance of managing the integration process carefully, with a focus on implementing engineering accountability and control points to minimize the depth and duration of the productivity dip. Another study of software developers found that AI copilots can increase task completion time by 19 %, slowing rather than accelerating delivery depending on the size and maturity of the repository.4 The value proposition of AI can vary significantly and may not provide returns echoing similar concerns to the dot‑com era; the technology itself may have significant potential but is not at the point of necessarily being profitable.

AI’s market capabilities are still maturing. However, the risks AI poses to organizations are apparent beyond the question of its ability to increase productivity and return on investment. Rushing to train employees on how to adopt AI without designing organizational systems that challenge, test, and verify AI outputs leaves organizations exposed to poor quality, security breaches, and hidden costs. This mirrors the spread of false news online: without critical thinking skills, AI‑generated misinformation can seamlessly infiltrate business processes if the workforce is not trained in the essential skill of vetting AI.

With so many pilots stalling or even slowing productivity, managers have to ask: what kind of training will actually matter? The answer is not speed of adoption, rather designing and enforcing an engineered accountability framework that embeds verification, dual‑approval, and adversarial testing directly into workflows, while cultivating employee skills to critically assess and validate AI outputs. Similar to the rise of social media which demonstrated how misinformation can spread when people fail to evaluate sources; the rise of enterprise AI is exposing businesses to the same trap, but with far greater financial, legal, and operational consequences.

The real question for many organizations is not how quick to adopt AI; rather it’s whether their teams can slow down enough to identify, test, question, and validate AI before trusting it. This shift comes into sharp relief when we examine the three most immediate risk vectors: malicious external use, unintentional internal misuse, and systemic technical failures.

Malicious Use: Deepfakes and AI‑Enabled Scams

One of the most evident instances of AI‑related risk is the $25 million deepfake scam in Hong Kong, where a finance employee joined a video call with what appeared to be their CFO and colleagues.5 Every participant in that meeting was generated by AI. The request for funds appeared to be legitimate; it was polished, professional, and conveyed a sense of urgency. The transfer was executed, and the funds were lost.

This incident is not an isolated one. As early as 2019, criminals used voice‑cloning AI to impersonate a German CEO, tricking an employee into wiring $243 000.6 More recently, companies in the U.S. and U.K. have reported instances of fraudulent “CEO calls” where cloned voices solicit sensitive information or prompt urgent financial transfers. These occurrences all exhibit a common pattern: employees placed their trust in what they heard or saw without confirming through an alternative channel.

Training employees to visually or aurally detect deepfakes is unreliable; synthetic content can appear authentic until validated by specialized tools or protocols. As the technology advances, even experts struggle to consistently distinguish AI‑generated material without forensic assistance. Effective defenses therefore require workflow safeguards and multi‑channel verification, rather than reliance on individual perception.

To mitigate this risk, managers should implement engineered safeguards that reduce reliance on individual judgment and embed verification directly into organizational workflows.

  • Develop secondary verification protocols such as requiring out‑of‑band confirmation for any high‑consequence request, such as financial transfers or password resets. Even if a video call or voicemail appears to be genuine, employees should be trained to verify through a known phone number, secure application, or in‑person confirmation. This approach embodies the principle of “trust but verify” in the digital era.
  • Establish dual‑approval workflows to ensure no single employee can authorize substantial transactions or high‑risk system modifications independently. This practice minimizes the risk that a single point of failure, such as a deceived employee, could lead to significant losses.
  • Broaden security awareness to consider AI as a potential threat. Conventional training typically emphasizes phishing emails; however, deepfakes represent an advanced form of social engineering. Organizations should explicitly categorize deepfakes within the same spectrum, assisting employees in recognizing that attacks can manifest through video, audio, or chatbots.
  • Institute a culture of procedural skepticism. Perhaps the most important step is fostering a workplace culture where questioning unexpected or high‑pressure digital requests is encouraged, not penalized. Employees must feel secure in postponing actions until verification is completed, even if the request seems to originate from the CEO.

In addition to financial fraud, deepfakes and other AI‑generated content can also:

  • Harm reputations by fabricating recordings of executives making provocative statements.
  • Distort negotiations by impersonating parties in business transactions.
  • Erode trust in corporate communications if employees or customers start to doubt the authenticity of official messages.

The lesson for managers is clear: this threat is already here. Organizations that depend solely on human judgment are at risk. Only through procedural engineering such as layered verification processes and out‑of‑band controls can resilience against the malicious use of AI be achieved.

Unintentional Misuse: Hallucinations and Overconfidence in Outputs

AI does not require malicious intent to inflict harm; it can inadvertently generate false yet seemingly credible outputs that integrate into workflows without scrutiny. In a corporate setting, this mirrors the same issue that facilitates the spread of misinformation on social media: content that appears accurate but lacks verification.

For instance, lawyers in the U.K. and U.S. have presented legal briefs that include fictitious case law and invented citations, depending on generative AI tools without adequate verification. A judge from the U.K. High Court observed that attorneys had referenced 18 entirely fictitious cases in a £90 million legal dispute and cautioned that utilizing AI research without independent verification poses “serious implications for the administration of justice and public confidence,” warning that attorneys could face prosecution if they fail to verify the accuracy of their research.7 In the U.S., lawyers faced sanctions and even disqualification from cases after incorporating fake citations produced by AI, with the judge stating that “fabricating legal authority demands substantially greater accountability.”8

Even reputable organizations have encountered difficulties when AI outputs were not adequately scrutinized. In Australia, Deloitte had to reimburse part of a $290 000 government contract after delivering a 237‑page report that included AI‑generated inaccuracies, fabricated quotations, and references to fictitious academic papers, which compromised both its credibility and the trust of its clients.9

In a similar vein, Air Canada lost a tribunal case when its chatbot fabricated a refund policy for bereavement fares; when a passenger depended on that information, the airline attempted to deny responsibility by claiming the chatbot was a “separate legal entity.” The court dismissed that argument and mandated compensation.10,11

Both instances underscore a common reality: AI can generate polished yet misleading outputs, and organizations that neglect to establish verification protocols, dual‑approval workflows, and accountability frameworks expose themselves to reputational harm, financial setbacks, and legal repercussions.

The consequences for businesses should be evident: when employees regard AI‑generated outputs as reliable without proper vetting, they introduce risks to quality, compliance, and reputation. If essential documents (such as legal filings, regulatory submissions, and operational reports) are based on unverified AI outputs, the organization may encounter sanctions, a loss of credibility, or operational disruptions.

  • Consequently, managerial focus must shift from the mere implementation of AI tools to ensuring the integration of robust verification processes within teams. This entails:
  • Educating teams on how to regard AI outputs as preliminary drafts that require scrutiny, rather than as completed work.
  • Instituting review protocols; for instance, prior to utilizing any AI‑generated content externally or in critical internal processes, there must be a verification of factual accuracy, citation validity, or logical coherence.
  • Acknowledging that human oversight is indispensable; it serves as the only dependable safeguard when AI generates confidently erroneous outputs.

Verification protocols safeguard against common misuse; however, they are inadequate by themselves. Even if individual outputs undergo review and receive approval, underlying flaws in system design may go unnoticed until an organization encounters stress, scales up, or faces a deliberate attack. At this point, the risk transitions from employee behavior to the very architecture of the AI systems. In the absence of proactive testing and adversarial assessment, organizations might only uncover vulnerabilities after they have already led to disruption or loss.

Systemic Failures: Lack of Testing and Red‑Teaming

Even though AI outputs may appear accurate in daily applications, organizations encounter a significant third risk factor: systemic flaws that only become apparent under conditions of stress, scale, or adversarial challenges. Prominent companies like Microsoft have taken action by establishing an internal “AI Red Team” tasked with adopting the mindset of attackers, examining AI systems, and identifying vulnerabilities prior to these systems being delivered to customers.12

For example, Microsoft’s red‑teaming efforts on over 100 generative AI products uncovered critical insights: generative models not only exacerbate existing security threats but also create new ones, and red‑teaming should extend beyond mere model testing to encompass comprehensive system workflows, integrations, and user interactions.13

However, numerous organizations implement AI tools without equivalent protective measures. A report from the Software Engineering Institute at Carnegie Mellon University highlights that significant obstacles for enterprise AI red‑teaming include “inconsistencies in evaluation methodologies, limited threat modeling, and gaps in mitigation strategies.”14

Insufficient testing and a lack of adversarial readiness can result in:

  • Unchecked bias in hiring or lending models until regulatory bodies step in.
  • Data breaches or unauthorized access when an AI system or chatbot is improperly configured or manipulated.
  • Operational breakdowns when a system behaves unpredictably in high‑pressure or unfamiliar situations.

Managers must subject AI systems to the same rigorous discipline as any other critical software deployment: conduct stress tests prior to launch, maintain continuous monitoring, and assume that adversaries will seek out vulnerabilities. Without this level of diligence, organizations risk uncovering systemic issues in the most detrimental manner during production, under duress, and with significant consequences.

Focus Areas for Embedding Resilience

Translating these risks into actionable strategies necessitates clear managerial priorities. The following focus areas provide leaders with a framework for embedding resilience into AI deployment from the very beginning:

  • Establish an AI Red Team function (internally or with partners) to design adversarial tests that replicate real‑world threats (e.g., prompt injection, data poisoning, jails/bots).
  • Integrate continuous monitoring and feedback loops such as tracking AI performance drift, unexpected outputs, and usage outside intended parameters.
  • Document testing and remediation to preserve audit trails of what was tested, what failed, and how it was fixed, which is essential for compliance and governance.
  • Ensure that AI deployment governance encompasses not only model accuracy but also workflows, data flows, vendor integrations, and user interfaces. AI risk is systemic, not just algorithmic.

AI risk has never been limited to the question of “will the model provide the correct answer?” It extends to “what occurs when someone attempts to manipulate it, or when it evolves beyond the conditions for which it was designed?” Managers who prioritize testing and validation today will be better positioned to prevent failures in the future.

AI Is Here: But Vetting Must Come First

AI is not going anywhere; it has already been integrated into the tools utilized by numerous teams, ranging from customer service chat to enterprise workflows. The discussion has shifted from whether organizations will implement AI to how they will do so. However, this is where many training strategies are currently misguided. Existing upskilling programs emphasize the rapid adoption of AI; how to enhance prompts, automate processes more efficiently, or incorporate copilots into workflows. While these skills are important, they are not the most critical.

The most critical requirement is to equip employees with the skills to systematically evaluate AI outputs. In the absence of this discipline, organizations risk perpetuating the same issues that turned social media into a hotbed for misinformation: polished, confident content disseminating unchecked due to a lack of scrutiny. In the business realm, the stakes are higher; potential regulatory penalties, damage to reputation, financial fraud, and systemic weaknesses.

Yet, research indicates that oversight and structured application can transform AI into a valuable asset. A working paper uploaded to ResearchGate found that a hybrid fraud detection system integrating AI with human oversight outperformed AI‑only or human‑only methods, yielding higher detection rates and fewer false positives.15 While preliminary, the findings align with broader guidance such as the NIST AI Risk Management Framework,16 which emphasizes embedding human oversight and validation in high‑stakes AI workflows. A similar principle applies in software development: while Becker observed that copilots could increase task completion times by 19% in mature and familiar repositories, they also found that 69 % of developers continued using Cursor after the study concluded, indicating they derived value from the tool despite measured slowdowns; while no clear learning effect was observed across the first 50 hours of use, the results are consistent with potential speedups in other contexts, such as small greenfield projects or unfamiliar codebases.17 The consistent takeaway is that benefits are not realized through automation alone, but through human‑in‑the‑loop systems that embed oversight, allow for learning curves, and emphasize disciplined use.

For engineering managers and directors, the strategic imperative is a fundamental pivot in AI implementation philosophy from a focus on adoption velocity to the engineering of accountability. This necessitates a foundational shift in training, where critical evaluation and verification are the primary objectives, not secondary components. Governance and resilience, articulated through robust acceptable use policies, dual‑approval processes, and adversarial testing, must be established as core prerequisites for AI interaction. Cultivating a culture of systematic skepticism is essential, empowering employees to treat the validation of AI outputs as a fundamental engineering responsibility. Consequently, broad‑based training on AI integration should be deferred until teams demonstrate sustained proficiency in assessing and validating AI‑generated work.

Actionable Measures for Critical Evaluation

To operationalize this shift, the following actionable measures may be applicable depending on the team and context:

1. Initiate Foundational Awareness and Transparency.

  • Mandate explicit provenance labeling for all AI‑generated or AI‑assisted content in internal drafts, code, and analyses. Help teams understand why the labeling is necessary and valuable.    
  • Conduct training sessions to help personnel identify common AI artifacts, such as factual superficiality, hallucinated references, and logical inconsistencies.

2. Develop Context‑Specific Validation Frameworks.

  • For AI‑generated code, institute mandatory peer review and enhanced testing cycles focused on security vulnerabilities, logic errors, and performance inefficiencies.    
  • For AI‑synthesized reports or data summaries, implement a “trust but verify” protocol requiring cross‑referencing of all key claims and figures against primary, authoritative sources.    
  • For strategic or creative content, deploy formal adversarial review processes where a separate team is tasked with identifying flaws, biases, and strategic gaps.

3. Formalize Governance with Structured Oversight.

  • Institute a dual‑approval workflow for any AI output used in external communications, critical decision‑making, or high‑stakes operational processes.    
  • Develop a simple, risk‑based checklist aligned with the NIST AI RMF’s “Validate” function to assess outputs for accuracy, fairness, and safety before deployment.

4. Create Mechanisms for Continuous Improvement.

  • Establish a central register to document and analyze AI‑related errors and near‑misses, transforming them into organizational learning opportunities.    
  • Launch pilot “red teaming” exercises for critical AI‑supported workflows to proactively identify failure modes and strengthen defensive protocols.

Conclusion

AI may prove indispensable for many enterprises, but accountability must be engineered into its use from the start. Implementing AI without instructing employees on how to scrutinize it is akin to providing everyone with a printing press without teaching them literacy. Organizations that will succeed are those that grasp the true hierarchy of priorities: adoption can be deferred, but engineered accountability cannot.

References

  1. Ajay Challapally et al., The GenAI Divide: State of AI in Business 2025, Research Report (MIT NANDA, 2025).
  2. Alexander Bick et al., “The Impact of Generative AI on Work Productivity,” Federal Reserve Bank of St. Louis, February 27, 2025.
  3. Kristin Burnham, “The ‘Productivity Paradox’ of AI Adoption in Manufacturing Firms,” MIT Sloan, July 9, 2025.
  4. Joel Becker et al., “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity,” arXiv:2507.09089, preprint, arXiv, July 25, 2025.
  5. Heather Chen and Kathleen Magramo, “Finance Worker Pays out $25 Million after Video Call with Deepfake ‘Chief Financial Officer’,” CNN, February 4, 2024.
  6. Jesse Damiani, “A Voice Deepfake Was Used To Scam A CEO Out Of $243,000,” Forbes, September 3, 2019.
  7. Jill Lawless, “UK Judge Warns of Risk to Justice after Lawyers Cited Fake AI-Generated Cases in Court,” AP News, June 7, 2025.
  8. Sara Merken, “Judge Disqualifies Three Butler Snow Attorneys from Case over AI Citations,” Reuters, July 24, 2025.
  9. Rod McGuirk, “Deloitte to Partially Refund Australian Government for Report with Apparent AI-Generated Errors,” AP News, October 7, 2025, .
  10. Maria Yagoda, “Airline Held Liable for Its Chatbot Giving Passenger Bad Advice - What This Means for Travellers,” February 23, 2024.
  11. Moffatt v. Air Canada, 2024 BCCRT 149 ___ (Civil Resolution Tribunal of British Columbia 2024).
  12. Ram Shankar Siva Kumar, “Microsoft AI Red Team Building Future of Safer AI,” Microsoft Security Blog, August 7, 2023.
  13. Blake Bullwinkel and Ram Shankar Siva Kumar, “3 Takeaways from Red Teaming 100 Generative AI Products,” Microsoft Security Blog, January 13, 2025.
  14. Generative AI Red-Teaming Can Learn Much from Cybersecurity Says SEI Study,” Carnegie Mellon University, August 28, 2025.
  15. Abigeal Fallen et al., “Exploring the Synergy Between AI and Human Oversight in Fraud Detection,” ResearchGate, January 27, 2025, .
  16. Gina Raimondo and Laurie Locascio, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1 (National Institute of Standards and Technology (U.S.), 2023), NIST AI 100-1, .
  17. Becker et al., “Measuring the Impact,” 12v
Keywords
  • Artificial intelligence
  • IT governance
  • Risk
  • Risk management
  • Technological innovation
  • Technology management


Alexander D. Hilton
Alexander D. Hilton Alexander D. Hilton, MBA, is a transformation and innovation advisor at a FTSE 100 company. He coaches organizations from small businesses to the Global 2000 on digital modernization, change adoption, and resilient operating models. He helps leaders translate emerging technologies and strategic investments into measurable business and financial outcomes.




California Management Review

Published at Berkeley Haas for more than sixty years, California Management Review seeks to share knowledge that challenges convention and shows a better way of doing business.

Learn more