Focus on the Human Element to Win the AI Arms Race
The United States must refine its investments to incorporate a deliberate and sustained campaign of mission engineering to accelerate and improve the delivery of trustworthy AI.
Chinese investments in artificial intelligence (AI) and autonomy have been used to encourage greater investment by the United States. Much attention is paid to relative government expenditures, commercial, and gaming applications, and data collection infrastructure. Proponents believe AI innovation will keep the United States ahead of its rivals. The government has listened to these arguments and taken some action. In 2018, the Department of Defense (DoD) stood up the Joint Artificial Intelligence Center and published a Data Strategy to transform the DoD with and for AI. In 2021, the National Security Commission on AI (NSCAI) detailed recommendations for winning this technology race in a 756-page tome. However, a March 2022 GAO report noted the DoD still lacks a road map for implementing AI. Previous acquisition efforts and recent AI and autonomy failures show the difficulty of moving from a technology concept to a fielded capability. Government officials and contractors commonly refer to the gap between research projects or technology demonstrations and fielded military systems as the “Valley of Death.” Bridging the “Valley” requires common ground among stakeholders about people, technology, and work throughout a project. Accordingly, the United States must refine its investments to incorporate a deliberate and sustained campaign of mission engineering to accelerate and improve the delivery of trustworthy AI.
Mission engineering is a deliberate investigation of people, technology, and work that enables good project and product management. Mission engineering teams include work domain experts, engineers, and scientists who provide progressive insights for all stakeholders throughout a development lifecycle. Unlike traditional contracting solicitations, mission engineering provides empirical descriptions of human-machine systems. These include models of workflows and workers, mockups of user interfaces, concepts of operation, and system and data architectures. These artifacts afford a low-commitment design dialogue between users, program offices, technologists, and test engineers. Most importantly, mission engineering provides proactive findings at the pace of agile software development. As the bridge between developers and users, mission engineering should be required of all defense software factories and technology acquisitions.
New technologies for low-risk work can generally be delivered with modest investments in user experience design. Sales and user feedback typically suffice for commercial product management. A “fail fast” approach can succeed because low-risk use cases provide opportunities for rapid feedback from users in realistic conditions. Military workgroups and data, however, are difficult to access and demand high security and reliability. Military operations are infrequent and extremely expensive, so they offer slower feedback loops for design. Faulty military systems undermine national security. Sustainment, interoperability, and programmatic constraints of legacy military systems further complicate technology delivery. For decades, military technology requirements have been identified and validated by work domain practitioners, with little engineering or data science input. These requirements are documented in text and rudimentary concept diagrams, which provide stale and insufficient details for good system engineering. Due to these complicating factors, a deliberate and sustained campaign of mission engineering is warranted.
Military capabilities are inherently human-machine systems. From basic tools and weapons to complex automation, humans and technology interact. Even the most autonomous drones are maintained, armed, and employed by humans. Goal-setting in any work environment remains an inherently human responsibility. Technology that is developed without considering these interactions wastes resources and fails to deliver military advantage.
Despite recently publicized technology successes (e.g., American Javelin missiles employed by Ukraine), U.S. technology investments often miss their intended mark. Since the mid-1990s, government software programs have produced disappointing outcomes. The Standish Group found that only 21 percent of government software-intensive projects executed between 2011 and 2015 were on time, within budget, and met customer expectations. From 2001 to 2014, the DoD wasted $46 billion on weapon systems that were canceled before they were fielded. One of those, the U.S. Army’s Future Combat System (FCS), was canceled in 2009 and is widely considered to have been a complete failure. From 2004 to 2014, the U.S. Army spent $2.7 billion on a failed intelligence support system called the Distributed Common Ground System-Army (DCGS-A). In 2007, one of the authors, through a review of the DCGS-A requirements document, anticipated the system’s shortcomings. DCGS-A requirements “were not generated through a task analysis of users” and, like its predecessor systems, will “add to the work of the analyst”.
Troublingly, commercial software development does not appear to be any better. The Wall Street Journal reported that 75 percent of venture-backed firms in the United States do not return their investors’ capital. The Center for Information and Software Quality estimated that the cost of poor Software quality in the United States in 2020 was $2.08 trillion.This figure includes costs of unsuccessful information technology projects, legacy systems, and operational failures. Unfortunately, these issues do not just manifest in lost capital but also in lost lives. Two notable commercial examples of misguided technology development involving autonomy are the Boeing 737 Max and the Tesla “autopilot” system.
Under financial pressure to deliver a new aircraft to compete with the Airbus A320neo, Boeing delivered the 737 Max. This new aircraft introduced flawed automation, the Maneuvering Characteristics Augmentation System (MCAS), to make trim corrections based on data from Angle of Attack Sensors on the fuselage. To avoid training and airworthiness inspections that would have increased costs and delayed delivery, Boeing concealed the MCAS from pilots and the FAA. This design and delivery approach presumed that the sensors would always provide reliable data and that the pilots would never have to intervene. Boeing’s “culture of concealment” contributed to 346 deaths and at least $20 billion in direct costs.
The Tesla “autopilot” system is a mislabeled suite of sensors and software that assists drivers under certain conditions. Since 2015, when Tesla released the system, 250 people have died in accidents involving the cars. Thus far, “autopilot” usage has been confirmed in accidents resulting in twelve deaths. Every year since 2015, Elon Musk has declared that his vehicles would demonstrate complete or full autonomy within twelve months. The warnings in the current Tesla owner’s manuals tell a very different story.
“Autosteer is intended for use on controlled-access highways with a fully attentive driver. When using Autosteer, hold the steering wheel and be mindful of road conditions and surrounding traffic. Do not use Autosteer in construction zones, or in areas where bicyclists or pedestrians may be present. Never depend on Autosteer to determine an appropriate driving path. Always be prepared to take immediate action. Failure to follow these instructions could cause damage, serious injury or death.”
Boeing and Tesla’s failures illustrate the fatal flaw of attempting to automate out, or around, humans. The fool's errand to replace humans with technology puts engineers in the impossible position of anticipating all possible future conditions and failure modes. A corpus of human factors engineering research shows instead that humans are the source of resilience in systems when technology provides them agency. People decide how and when to trust technologies based on their understanding of the machine’s capabilities, limitations, state, and trajectory. These trusting decisions require good feedback, which is often overlooked in the design of AI and autonomy. Human-centered design is not a polish to be applied at the end of a development process. Rather, it is the result of sustained, sound engineering practices.
AI is very complicated software prone to many sources of failure. Like any information system, AI can perform poorly due to software and hardware architecture deficiencies. These bugs can reduce accuracy, speed, reliability, and security. AI can fail due to data shortcomings. Real or realistic data is often unavailable for military technology development. When it is available, data is generally poorly curated and unusable by data scientists. Previously collected data was not intended for AI development and is often corrupt, unlabeled, or incomplete. Data curation requires uniquely qualified engineers with sufficient work domain knowledge to interpret and repair data. After an AI system is placed into operation, it can fail due to discrepancies between training and production data. In other words, data used to train the AI models may be inconsistent with data that is collected or available in the real world. This failure mode contributed to the Boeing and Tesla fatal crashes.
AI models are esoteric mathematical equations, which present unsolved issues. They can be inefficient, requiring more processing, data storage, or power than other approaches—particularly for systems operating in austere and disconnected settings. AI models are often unexplainable and brittle under changing conditions. These shortcomings can result in outputs that are not trustworthy. Furthermore, AI models are challenging to maintain. Some person or organization has to monitor the performance of AI models to determine whether and how to update them. Commercial AI systems, like search engines or content recommenders, are maintained in the background by the company that owns the models. Even these systems are coming under scrutiny for being opaque and unmanageable privacy intrusions. This all serves to illustrate that the AI race is inherently, but not exclusively, a computer and data science problem. Accordingly, we’ll need a far more interdisciplinary approach to defense acquisition. Specifically, we must invest in mission engineering to anticipate and characterize the human-machine system throughout the acquisition lifecycle.
Like any new technology, AI and autonomy change, rather than replace, work and workers. For example, the widespread use of the internet for commercial and private purposes dramatically improved information sharing but also created cybercrime and cybersecurity industries. AI promises to improve decision-making, but demands new roles, skills, and knowledge. Government technology programs are notorious for overlooking this. Although the DoD has documented human engineering criteria in Military Standards, the 395-page volume focuses mostly on ergonomics and prescribes no approaches to optimizing cognitive work. The well-intentioned NSCAI emphasized the importance of investments in human-AI teaming, but it’s difficult to understand how to prioritize or implement this among the commission’s fifteen recommendations. Only recently, did the DoD instruct its software programs to include continuous user engagement during development. These instructions should be further refined to prescribe mission engineering as the core enabler of agile development.
Technology races are not won solely based on expenditures. Rather, they are won through sound engineering. As illustrated by decades of human factors engineering research, technologies must be designed for humans. We must move away from under-informed and byzantine contracting and invest in mission engineering campaigns to develop reliable and useful human-machine systems. Investing 10 percent of a program’s budget in mission engineering will ensure that the other 90 percent delivers fit and useful technology. These iterative campaigns will allow us to “learn fast” to bridge the Valley of Death.
Colonel (U.S. Army, Retired) Stoney Trent, Ph.D. is a Cognitive Engineering Research Professor in the Virginia Tech National Security Institute where he leads research on AI assurance and human-machine teaming. Dr. Trent is a Military Intelligence and Cyber Warfare veteran with extensive experience planning and leading technology programs. While on active duty, Dr. Trent designed the Joint Artificial Intelligence Center (JAIC) and established product lines to deliver human-centered AI to improve warfighting and business functions. HDr. Trent has served in combat and stability operations in Iraq, Kosovo, Germany, and Korea, is a graduate of the Army War College, and former Cyber Fellow at the National Security Agency.
Lieutenant Colonel (U.S. Army, Retired) James Doty III, Ph.D., is a technical project manager, historian, and military intelligence veteran. Dr. Doty provides operational expertise for defense and intelligence technology development. As the Senior Intelligence Officer for Operations Group at the National Training Center (NTC) in Fort Irwin, CA, Dr. Doty coordinated the planning and execution of intelligence training and wargames for Army units preparing to deploy. Dr. Doty has served in combat and stability operations in Iraq, Saudi Arabia, Kosovo, and Germany, and has been awarded the Bronze Star.
Image: Flickr/U.S. Air Force.