how accountability practices are pursued by ai engineers in the federal government hyperedge embed
AI developers within the federal government, including at the GAO (office shown here), are defining accountable practices that AI engineers can employ as they work on projects. (Credit: GAO) 

By John P. Desmond, AI Trends Editor   

Two experiences of how AI developers within the federal government are pursuing AI accountability practices were outlined at the AI World Government event held virtually and in-person this week in Alexandria, Va. 

how accountability practices are pursued by ai engineers in the federal government 1 hyperedge embed
Taka Ariga, chief data scientist and director, US Government Accountability Office

Taka Ariga, chief data scientist and director at the US Government Accountability Office, described an AI accountability framework he uses within his agency and plans to make available to others.  

And Bryce Goodman, chief strategist for AI and machine learning at the Defense Innovation Unit (DIU), a unit of the Department of Defense founded to help the US military make faster use of emerging commercial technologies, described work in his unit to apply principles of AI development to terminology that an engineer can apply.  

Ariga, the first chief data scientist appointed to the US Government Accountability Office and director of the GAO’s Innovation Lab, discussed an AI Accountability Framework he helped to develop by convening a forum of experts in the government, industry, nonprofits, as well as federal inspector general officials and AI experts.   

“We are adopting an auditor’s perspective on the AI accountability framework,” Ariga said. “GAO is in the business of verification.”  

The effort to produce a formal framework began in September 2020 and included 60% women, 40% of whom were underrepresented minorities, to discuss over two days. The effort was spurred by a desire to ground the AI accountability framework in the reality of an engineer’s day-to-day work. The resulting framework was first published in June as what Ariga described as “version 1.0.”  

Seeking to Bring a “High-Altitude Posture” Down to Earth  

“We found the AI accountability framework had a very high-altitude posture,” Ariga said. “These are laudable ideals and aspirations, but what do they mean to the day-to-day AI practitioner? There is a gap, while we see AI proliferating across the government.”  

“We landed on a lifecycle approach,” which steps through stages of design, development, deployment and continuous monitoring. The development effort stands on four “pillars” of Governance, Data, Monitoring and Performance.  

Governance reviews what the organization has put in place to oversee the AI efforts. “The chief AI officer might be in place, but what does it mean? Can the person make changes? Is it multidisciplinary?”  At a system level within this pillar, the team will review individual AI models to see if they were “purposely deliberated.”  

For the Data pillar, his team will examine how the training data was evaluated, how representative it is, and is it functioning as intended.  

For the Performance pillar, the team will consider the “societal impact” the AI system will have in deployment, including whether it risks a violation of the Civil Rights Act. “Auditors have a long-standing track record of evaluating equity. We grounded the evaluation of AI to a proven system,” Ariga said.   

Emphasizing the importance of continuous monitoring, he said, “AI is not a technology you deploy and forget.” he said. “We are preparing to continually monitor for model drift and the fragility of algorithms, and we are scaling the AI appropriately.” The evaluations will determine whether the AI system continues to meet the need “or whether a sunset is more appropriate,” Ariga said.  

He is part of the discussion with NIST on an overall government AI accountability framework. “We don’t want an ecosystem of confusion,” Ariga said. “We want a whole-government approach. We feel that this is a useful first step in pushing high-level ideas down to an altitude meaningful to the practitioners of AI.”  

DIU Assesses Whether Proposed Projects Meet Ethical AI Guidelines  

how accountability practices are pursued by ai engineers in the federal government 2 hyperedge embed
Bryce Goodman, chief strategist for AI and machine learning, the Defense Innovation Unit

At the DIU, Goodman is involved in a similar effort to develop guidelines for developers of AI projects within the government.   

Projects Goodman has been involved with implementation of AI for humanitarian assistance and disaster response, predictive maintenance, to counter-disinformation, and predictive health. He heads the Responsible AI Working Group. He is a faculty member of Singularity University, has a wide range of consulting clients from inside and outside the government, and holds a PhD in AI and Philosophy from the University of Oxford.  

The DOD in February 2020 adopted five areas of Ethical Principles for AI after 15 months of consulting with AI experts in commercial industry, government academia and the American public.  These areas are: Responsible, Equitable, Traceable, Reliable and Governable.   

“Those are well-conceived, but it’s not obvious to an engineer how to translate them into a specific project requirement,” Good said in a presentation on Responsible AI Guidelines at the AI World Government event. “That’s the gap we are trying to fill.” 

Before the DIU even considers a project, they run through the ethical principles to see if it passes muster. Not all projects do. “There needs to be an option to say the technology is not there or the problem is not compatible with AI,” he said.   

All project stakeholders, including from commercial vendors and within the government, need to be able to test and validate and go beyond minimum legal requirements to meet the principles. “The law is not moving as fast as AI, which is why these principles are important,” he said.  

Also, collaboration is going on across the government to ensure values are being preserved and maintained. “Our intention with these guidelines is not to try to achieve perfection, but to avoid catastrophic consequences,” Goodman said. “It can be difficult to get a group to agree on what the best outcome is, but it’s easier to get the group to agree on what the worst-case outcome is.”  

The DIU guidelines along with case studies and supplemental materials will be published on the DIU website “soon,” Goodman said, to help others leverage the experience.  

Here are Questions DIU Asks Before Development Starts  

The first step in the guidelines is to define the task.  “That’s the single most important question,” he said. “Only if there is an advantage, should you use AI.” 

Next is a benchmark, which needs to be set up front to know if the project has delivered.   

Next, he evaluates ownership of the candidate data. “Data is critical to the AI system and is the place where a lot of problems can exist.” Goodman said. “We need a certain contract on who owns the data. If ambiguous, this can lead to problems.”  

Next, Goodman’s team wants a sample of data to evaluate. Then, they need to know how and why the information was collected. “If consent was given for one purpose, we cannot use it for another purpose without re-obtaining consent,” he said.  

Next, the team asks if the responsible stakeholders are identified, such as pilots who could be affected if a component fails.   

Next, the responsible mission-holders must be identified. “We need a single individual for this,” Goodman said. “Often we have a tradeoff between the performance of an algorithm and its explainability. We might have to decide between the two. Those kinds of decisions have an ethical component and an operational component. So we need to have someone who is accountable for those decisions, which is consistent with the chain of command in the DOD.”   

Finally, the DIU team requires a process for rolling back if things go wrong. “We need to be cautious about abandoning the previous system,” he said.   

Once all these questions are answered in a satisfactory way, the team moves on to the development phase.  

In lessons learned, Goodman said, “Metrics are key. And simply measuring accuracy might not be adequate. We need to be able to measure success.” 

Also, fit the technology to the task. “High risk applications require low-risk technology. And when potential harm is significant, we need to have high confidence in the technology,” he said.  

Another lesson learned is to set expectations with commercial vendors. “We need vendors to be transparent,” he said. ”When someone says they have a proprietary algorithm they cannot tell us about, we are very wary. We view the relationship as a collaboration. It’s the only way we can ensure that the AI is developed responsibly.”  

Lastly, “AI is not magic. It will not solve everything. It should only be used when necessary and only when we can prove it will provide an advantage.”  

Learn more at AI World Government, at the Government Accountability Office, at the AI Accountability Framework and at the Defense Innovation Unit site. 

Read more about this on: AI Trends