Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

1736183379011

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

December 2024

Generative AI Agents
in Action:
Revolutionizing Software
Development Testing
Contents

Foreword  4
Executive Summary  5

A Quick Dive into Generative AI Agents  8


Generative AI Agents: What & Why  8

Five Levels of Competence in Generative AI Agents  8

Building Generative AI Agent Systems for Business  9

Industrial Agentic Use Cases: From Strategy to Operation  11


Prerequisites for Agentic Use Case Adoption: Strategic Perspective  11

Mapping Generative AI Agents Across the Value Chain  12

Transforming Robotic Process Automation (RPA) into


Agentic Process Automation (APA)  14

The Growing Need for AgentOps  15

Focusing the Lens:


Generative AI Agents in Software Development  16
From LLMs to LLM-based Agents in Software Development  16

Navigating the Agentic Software Development Cycle:


Use Cases and Trust Spectrum  17

2 Generative AI Agents in Action: Revolutionizing Software Development Testing


Generative AI Agents in Action for Software Testing  18
Enhancing Software Development Testing:
Why Generative AI Agents Matter  18

Generative AI Agents for Unit Test Writing & Reviewing  19

Generative AI Agents for Functional/UI Tests  20

Retrospective & Prospective:


Challenges and Opportunities for Generative AI Agents  24
Insights & Reflections  24

Challenges & Risks of Generative AI Agents  25

The Future Unfolded: Opportunities of Generative AI Agents


Today and Beyond  26

References  28
Authors  30
Contributors  31
About appliedAI Initiative GmbH  32
Acknowledgement  33

3
Foreword
A Few Thoughts at the End of 2024

A
s we approach the close of 2024, the landscape of Leading organizations such as Anthropic, Microsoft,
Artificial Intelligence (AI) is undergoing a profound NVIDIA, OpenAI, Salesforce, SAP, and others are at
transformation. Since the pivotal ChatGPT the forefront of developing agents that not only
milestone two years ago, we have witnessed a rapid follow commands but also proactively solve complex
adoption of Large Language Models (LLMs) in a huge problems by aligning with broader objectives. Although
variety of applications. In most cases, LLMs have been we are still in the early phases of agent development, the
used for multi-modal content generation, knowledge evolvement towards autonomous multi-agent systems
retrieval and chatbots, already generating value in is already underway. In the not-too-distant future, these
various industries and their value chains. agents hold immense promise, as they begin to tackle
intricate tasks that were once the exclusive domain of
A common pattern across current generative AI human intelligence.
applications is the instruction-oriented interaction
between users and AI, primarily facilitated through In fact, generative AI agents have now become a
a chat format. Whether users formulate specific spotlighted field within AI technology. Why is this
questions, need a summary or key insights from a so? Because generative AI agents represent a shift
document, or want to note their thoughts and ideas from instruction-oriented chat interactions, where
for future reminders, the typical usage pattern involves humans guide problem-solving, to task delegation and
providing the AI with a specific prompt or instruction to autonomous problem-solving with minimal or even no
elicit the desired response. human oversight in the future. This shift opens up vast
potential for businesses, enabling task automation in
While this interaction model has driven numerous software-based virtual environments and even action
relevant and valuable business use cases and directly planning in physical environments.
embeds the human-in-control principle to mitigate
certain risks of generative AI, it does not fully utilize In this white paper, we explore the rise of generative AI
the most recent capabilities of generative AI. With the agents, transitioning from traditional instruction-driven
advent of more powerful LLMs equipped with deep interactions to innovative, goal-oriented automation.
reasoning and thinking abilities, use of tools, and the We will delve into the evolving progress of generative AI
capacity to understand and synthesize multilingual and agents, market observations, and the exciting potential
multimodal data, generative AI is increasingly capable of of autonomous systems, particularly in the field of
solving complex problems by translating them into a set software development. We invite you to reflect on the
of autonomous steps or tasks, much like humans would. technological advancements shaping our future and
We refer to these advanced systems as generative AI the implications of an increasingly automated world.
agents.

Bernhard Pflugfelder Dr. Paul Yu-Chun Chang Mingyang Ma


Head of Generative AI, Senior AI Expert: Foundation Models - Principal AI Strategist & Product Manager,
appliedAI Initiative GmbH Large Language Models, appliedAI Initiative GmbH
appliedAI Initiative GmbH

4 Generative AI Agents in Action: Revolutionizing Software Development Testing


Executive Summary
Adaptability, Transformation, and Responsibility —
Getting Prepared for the Agentic Era

Adaptability
Generative AI agents are defined by their ability to interact with environments and execute tasks
autonomously, showcasing cognitive processes like reasoning and goal setting for problem solving.
• Currently, most agents operate at foundational levels of conversation, analytical capability, and autonomy, but
significant advancements toward innovation and collaboration are anticipated.
• These agents already enhance business processes across various value chains, from research to customer service, by
automating complex tasks. Future agent systems can potentially automatize entire processes instead of use cases.
• The transition from Robotic Process Automation (RPA) to Agentic Process Automation (APA) allows for dynamic,
goal-oriented workflows that improve cost and time efficiency of processes already now. The use of Small Language
Models (SLMs) instead of LLMs within APA is an important approach to optimize costs and deploy agents on-premise
or on edge.
• As generative AI evolves, there is an increasing focus on mitigating the known risks of LLMs. However, since these risks
cannot be completely eliminated, there remains an important need for human oversight in agentic systems.

Transformation
Generative AI agents are able to significantly transform processes instead of single use case or tasks.
• A highly promising process showcasing this transformative potential is the software development lifecycle, particularly
through roles and tasks in planning, development, testing, review, and deployment. Notable use cases include code
generation, automated testing, and automated reviewing.
• While there is a cautious approach to their use in critical tasks like infrastructure deployment, these agents enhance
workflows by automating test creation and adapting to evolving requirements, thereby improving overall efficiency in
the software lifecycle.
• Technologies such as Retrieval-Augmented Generation (RAG) and AutoGen help reduce manual testing efforts,
allowing developers to focus on complex problem-solving. Beyond software, these agents are impacting fields like
industrial engineering and scientific research by streamlining tasks and fostering collaboration.

Responsibility
Generative AI agents present both remarkable opportunities and significant challenges.
• The creation of scalable, multi-modal agentic systems capable of integrating diverse sensory inputs and harnessing
collective human and artificial intelligence will open new frontiers for generative AI across various sectors.
• While they hold potential for enhancing efficiency, their susceptibility to adversarial attacks raises concerns about
their robustness and trustworthiness, particularly as these systems are designed to predict and perform actions.
• As AI technology advances, it is vital to anticipate potential risks and to adapt evaluation methods for real-world
applications, including methods like agent-as-a judge solutions with human oversight.
• Addressing ethical and stability considerations and ensuring responsible use are essential to mitigating risks.
By carefully weighing both the opportunities and challenges, we can fully realize the transformative
potential of generative AI agents while safeguarding societal well-being.

5
“AI agents will drive business
automation and business
decision augmentation. They
will advance to specialized
assistants that will help users in
various business roles by driving
business decisions and taking
action. Ultimately, this will not
only lead to much more efficient
and guided processes, but also
transform the business
processes themselves.”
Dr. Christian Karaschewitz
AI Product Incubation Lead,
SAP Business AI – Product & Partner
Management
SAP SE

6 Generative AI Agents in Action: Revolutionizing Software Development Testing


“Artificial Intelligence has
fundamentally transformed
how we interact with software,
making natural language
interfaces not just possible but
powerful. The next frontier lies
in AI agents – autonomous
systems that can take intelligent
action on our behalf. As these
agents evolve from concept to
reality, organizations and
individuals alike must actively
explore and experiment with
them to understand their
transformative potential.”
Antoine Leboyer
Managing Director SW/AI,
TUM Venture Labs

7
A Quick Dive into Generative AI Agents

Generative AI Agents: What & Why

A generative AI agent is an autonomous system that leverages large language models and foundation models to
independently execute complex tasks and workflows in a digital/physical environment. It perceives its surroundings,
reasons, plans, and acts over time to achieve its goals and influence future outcomes [1-4].

7 Reasons to Use GenAI Agents

1. Automation and Efficiency: Automating Digital/Physical Environment


repetitive tasks to boost productivity and allow
focus on strategic work.
2. Adaptability and Flexibility: Handling flexible
Feedback, Experience, Execution of complex
tasks and dynamically adapting to various
Interaction tasks and workflows
scenarios.
3. Personalization and Customization: Tailoring
experiences and recommendations based on
user preferences to enhance engagement.
4. In-depth Reasoning: Improving robustness in Generative AI
addressing ambiguous scenarios by utilizing Agents
Perceiving Goal Setting
advanced reasoning and reflective processes.
5. Decision Support: Analyzing data to provide
insights, aiding informed decision-making in
complex situations. Reasoning Planning
6. Innovation Enhancement: Inspiring creativity by
collecting and generating innovative insights
that human minds may overlook. LLM/FM as Engine
7. Simuation of Complex Systems: Optimizing real
systems through simulations, such as in the case
of cyber attacks and digital twins.

Five Levels of Competence in Generative AI Agents

Here we outline five levels of generative AI agent competence—conversational, reasoning, autonomous, innovating,
and organizational—based on their capabilities in thinking (brain), perceiving (perception), and task execution
(action). While not exhaustive, this categorization aims to provoke inquiries about the dynamics of human-agent
interactions [5-7].

Level 1 Level 2 Level 3 Level 4 Level 5


(Conversational) (Reasoning) (Autonomous) (Innovating) (Organizational)

Episodic Memory: Events Human-like Reasoning Goal-setting Autonomous Learning Personality & Role-Playing

Summary & Abstarction Reflection & Critique Planning Multistep Tasks Generalization Team Dynamics Insight

Semantic Memory: Knowledge Judgement & Evaluation Decision-making Goal Recalibration Strategic Thinking
Brain
Idea/Design Generation Coordination Planning

Textual Input Encoding Pattern Recognition Active Sensing/Monitoring Perceptual Learning Organizational Monitoring

Visual Input Encoding Multi-source Input Integration Goal-directed Perception Perceptual Recalibration Collective/Mutual Perception

Percep- Auditory Input Encoding Autonomous Data Mining Perceptual Anticipation System Failure Awareness
tion
Other Sensor Input Encoding

Conversation Completion Intent Inference Automated Tool Usage Learning/Making New Tools Multi-agent Collaboration

Question Answering Tool Selection Embodied Actions Self-improving/refining Conflict Resolution

Analytical Problem Solving Routing/Navigation Prototyping Project Management


Action
Mutual Task Delegation

8 Generative AI Agents in Action: Revolutionizing Software Development Testing


A Quick Dive into Generative AI Agents

Building Generative AI Agent Systems for Business

Fundamental Agentic System Design Patterns


To build generative AI agent systems for business, let's firstly look into three main design patterns for such systems [5].

Single Agent Multi-Agent Human-Agent Interaction


Characteristics Characteristics Characteristics
• Versatile capabilities for various • Cooperative or adversarial • Human feedback enhances
application tasks. interactions for advancement. agent task efficiency and safety.
• High task-solving performance in • Agents work together or • Agents improve service quality
diverse contexts. compete to improve results. for human users.

Typical Scenarios Typical Scenarios Typical Scenarios


• Task-oriented: Assisting users in • Cooperative Interaction: Agents • Instructor-Executor Paradigm:
daily tasks (e.g., comprehension collaborate, either orderly or Humans give instructions; agents
& task decomposition). disorderly, toward common execute tasks.
• Innovation-oriented: Autonomous goals. • Equal Partnership Paradigm:
exploration in scientific fields. • Adversarial Interaction: Agents engage empathetically
• Lifecycle-oriented: Continuous Competitive dynamics for and collaborate with humans.
learning and skill development individual performance
for long-term survival. enhancement.

Task-Oriented Cooperative
Task:
The theme of our
Find the Umbrella To create a product, product is...
we should...
#$%@#*....
Manager
I think the first step The architecture of
is... the product is...
... I will...
Designer
Firstly, we should... Programming...
def main():
In order to
Innovation-Oriented develop ...
Engineer
Task: The product has the
Create a New Medicine Single Agent Multi-Agent following issues: ...

Tester
Disordered Ordered

Adversarial
Agentic
I think users need a simplified interface.
Lifecycle-Oriented System
Task: Design Good idea, but... technical Limitations might affect
performance.
Maintain a Lifelong Survival
Patterns
True... while simplification does enhance user
experience.

Yeah, but performance issues also impact


overall satisfaction. I will try my best to
balance both aspects.

Human-Agent Interaction Designer Engineer

Instructor-Executor Equal Partnership

Designing an energy-saving So stressed lately, can’t get myself to do anything :(


product.
The product is a perpetual motion
machine capable of...
It’s tough, everything feels heavy right now.
Perpetual motion is impossible.
The product is capable of efficient... Yeah, thanks for understanding.

Instruct/ Feedback

Output
Human as instructor Agent as executor

9
A Quick Dive into Generative AI Agents

Core Components of An Arbitrary Business Problem

• To develop generative AI agent systems for business scenarios, we pinpoint six key components illustrated
below that may exist within an arbitrary business problem that agents can address.
• In the case of customer feedback analysis, for example, the following components are critical [5-7]:
‒ Understanding & Perceiving: Gather customer feedback from various sources (surveys, reviews, social media)
to understand sentiments and trends.
‒ Reflecting & Analyzing: Analyze the feedback to identify common themes and areas for improvement. Use
natural language processing to extract insights and sentiments.
‒ Iterative Review: Continuously monitor customer reactions to implemented changes, collecting new
feedback to assess the effectiveness of actions taken.

Understanding & Perceiving Reflecting & Analyzing Decision-making


Gather and interpret relevant data to Evaluate information critically, identifying Generate optimal decisions and
comprehend the business environment patterns, trends, and insights to inform strategies based on collected insights and
and specific problem context. future actions. analyzed data.

Action Execution Iterative Review Coordination


Implement decisions through automated Continuously monitor outcomes and Facilitate collaboration between various
processes, ensuring efficiency and gather feedback to refine processes and agents and stakeholders to ensure alignment
precision in task completion. improve decision-making over time. and synergy in achieving business objectives.

Example Generic Multi-Agent Framework

• With the core components of business problems in view, this section illustrates a generic multi-agent framework
based on three core generative AI agent capabilities as outlined earlier: the "brain" (memory, reasoning &
analytic, as well as learning & innovation agents), along with perception- and action-related agents. Here we
aim to broadly outline potential agents based on various previous definitions, though this graphic is by no means
exhaustive [5].
• In a multi-agent system, agents can interact in various patterns, such as in a hierarchical or decentralized
structure. Most existing frameworks typically employ a centralized communication model, where an orchestrator
sets goals, creates plans, delegates tasks to specific agents, and monitors the outcomes [5, 8-10].
• For instance, in customer feedback analysis, an orchestrator may collaborate with an autonomous data mining
agent and a reflection or evaluation agent to effectively accomplish the task.
Input SLMs alongside LLMs Output

Memory Agents Reasoning & Analytic Agents Learning & Inovation Agents

Short-term Long-term Long-term Human-like Reflection & Judgement & Autonomous Generalization Idea & Design
Working Declarative Declarative Reasoning Critique Evaluation Learning Generation Task
Memory Memory - Memory -
Episodic Semantic Completion
Experiences & Knowledge &
Events Ontology

Orchestrator

Perceiving Agents Acting Agents


Output
Generation

Multimodal Other Sensor Multi-source Conversation/ Intent Tool Selection


Input Input Encoding Input • Goal Setting & Calibration QA Inference & Use
• Multimodal Input Encoding Integration • Strategic Planning
• Managing
• Text • Coordinating
• Audio • Monitoring
• Image
Active Autonomous Perceptual Problem Routing/ Tool Learning/
• Video Sensing/ Data Mining Learning & Solving Navigation Making
• 3D Monitoring Calibration
• Pointing
• Gesture Service
• Posture Provision
Perceptual Collective & Organizational System Failure Web Search Coding File Navigation Computer
... Anticipation Mutual Monitoring Awareness Interface
Perception Control

10 Generative AI Agents in Action: Revolutionizing Software Development Testing


Industrial Agentic Use Cases:
From Strategy to Operation
Prerequisites for Agentic Use Case Adoption: Strategic Perspective

Despite the promising capabilities of large language models and the future potential of AI agents, the successful
adoption of AI agents in organizations, as with any AI technology, depends on various critical elements. The
appliedAI AI Strategy House framework provides a clear visual and conceptual structure of these essential
elements. By systematically aligning and measuring these elements, organizations can implement and scale AI
agents more effectively, ensuring the maximization of their AI investments. The crucial prerequisites of AI agent
adoption associated with the six enabling elements in the appliedAI Strategy House are detailed below.

AI STRATEGY HOUSE

Future competitive
Ambition Fields of action Commitment
advantage

Discovery & Portfolio


AI use cases Make or buy
specification management

Enabling factors Organization Expertise Culture Data Technology Ecosystem

Research & Development & Operationalization &


Execution
exploration validation maintenance

Organizational Commitment Expert-empowered Workflow Promoted User Adoption


• Leadership Support: Executives • Empowerment: Design • Training and Support: Provide
drive AI initiatives and secure workflows that balance comprehensive training and
necessary resources. the autonomy of AI agents ongoing assistance for users
• Cross-functional Collaboration: with employee acceptance, (i.e. customers or employees)
Align goals and stimulate incorporating elements such as • Incentives: Cultivate an
innovative ideas. human oversight. innovative mindset and
• Ongoing Training: Provide • Future Role Definition: Clearly encourage user adoption
continuous learning outline roles and responsibilities through rewards.
opportunities to upskill for AI and employees in future • User-Centric Design: Develop
employees AI-supported workflows. intuitive AI tools that meet
user needs and emphasize
trustworthy human-agent
interactions.

Data Quality and Access AgentOps & Adaptable Tools Ecosystem Support
• High-Quality Data: Ensure • Robust Evaluators: Implement • Collaborative Partnerships:
data in different languages validation frameworks to ensure Consider partnerships to
and modalities is accurate, quality. Automated evaluation accelerate innovation and the
consistent, and relevant. methods can be deterministic, adoption of AI agents.
• Sufficient Quantity: Gather statistical, or AI-based. • Knowledge Exchange: Share
enough data to train models • Dynamic Monitoring: insights and best practices to
effectively. Also consider Continuously review AI accelerate agent adoption.
augmentation with synthetic performance and in production • Collaborative Development:
data generation. for optimization As agent technologies are still
• Seamless Integration: Connect • Adaptatable Tools: Develop immature, share risks and ramp
agentic applications with data clear strategies for process/tool up investments through joint
in existing systems. changes. development initiatives.

11 Generative AI Agents in Action: Revolutionizing Software Development Testing


Industrial Agentic Use Cases:
From Strategy to Operation
Mapping Generative AI Agents Across the Value Chain

Here we illustrate potential use cases across different phases of a value chain as well as various corporate support
functions, aiming to inspire further ideas and innovation. While the examples are not exhaustive, they serve as a
starting point for exploring the diverse applications of value chain optimization.

Research & Supply Chain Management Production & Customer

Use Cases Across Different Phases of a Value Chain


Development & Inbound / Outbound Operations Engagement
Logistics

Discovery, Exploration & Process/Equipment Design Marketing & Sales


Supplier Management
Research & Management Strategy
Ideation & Concept Material & Component Integration, Assembly & Sales Channel
Development Sourcing Testing Management
Design, Prototyping, & Production, Manufacturing, Cust omer Service &
Inventory Management
Assessment & Operation Support
Planning, Proposal, & Inbound/Outbound Quality Control and Customer Relationship
Specification Management Logistics Management Management

Corporate Support Functions


Recruitment & Talent
Cybersecurity
Acquisition
Technical Support
Training & Development
Network & IT Management
Accounting & Planning
Application Development
Tender & Bid Management

Corporate HR, Finance,


Corporate IT Procurement, & Other
Support Functions

12
Industrial Agentic Use Cases:
From Strategy to Operation
Cross-industry 1 Automotive 2 Pharmaceuticals & Chemicals 3 Semiconductor, Electronics, & 4 Manufacturing, Aerospace, & Machinery 5
Tech Products
Discovery, Exploration & Research Discovery, Exploration & Research
Planning & Proposal New chemical compound hypothesis generation (based Inbound Logistics
Advanced requirement management using automated AI-assisted consumer feedback and trend analysis for Supplier Management
product development on existing patent, experiment and market data) Predicts price fluctuations for raw materials
quality and feasibility assessment for technical and Total cost of ownership evaluation for different
based on market trends and proposes sourcing
regulatory compliance Discovery, Exploration & Research suppliers
strategies
Supplier Management Market situation survey and competitor analysis Inbound/Outbound Logistics
Discovery, Exploration & Research
Use Cases Across Different Phases of a Value Chain

AI-assisted product knowledge retrieval, product design, AI-based simulation of supplier and real-time supply risks Discovery, Exploration & Research Outbound Logistics
Long-term contract optimization
Analysis of latest shipping regulations to
development and validation Material & Component Sourcing Optimized research path proposals based on historical streamline logistics operations across borders
AI-assisted resilience matrix creation for critical drug discovery efforts Production, Manufacturing, & Operation
Marketing Strategy components Chip production process monitoring and review to Process/Equipment Design & Management
Advanced market intelligence through automated Outbound Logistics Material & Component Sourcing generate yield rate improvement directions
sales strategy simulation based on market analysis and Sustainable material identification for evaluation of Fabric manufacturing process simulation to
Distribution center location optimization based on Quality Control and Management reduce waste and enhance quality
demand forecasting customer density and part availability trade-offs in performance and cost
Automated visual inspections with the capability to
Customer Service & Support Outbound Logistics Integration, Assembly & Testing
learn from previous errors and adapt over time
AI-powered assistants providing real-time support to Temperature-controlled shipping method suggestions Adaptive assembly process modeling that can
Process/Equipment Design & Management
customers based on interactions and additional internal to maintain drug integrity during transport. shift based on real-time demand
Proposal of part manufacturability optimization solution Sales Strategy
and external data before production Integration, Assembly & Testing
Targeted upsell strategies based on customer
Customer Service & Support Process/Equipment Design & Management Automated reliability testing to assess the
Production, Manufacturing, & Operation behavior data gathering and analysis
Personalized helpdesk handling customer requests, Analysis of outcome of formulation changes to predict reliability of critical components after production
Smart maintenance through dynamic assistance in analysis, Customer Relationship Management
complaints, and incidents risks of impacts on product stability and performance
problem resolution and documentation of incidents Automated sentiment monitoring on social media
and propose mitigation plans.
Sales Channel Management Quality Control and Management to guide marketing campaigns
AI-based sales agents for automating personalized Defect detection, analysis, and prevention plan proposal,
Production, Manufacturing, & Operation
Retail, Food, & Beverage 8
customer and lead interactions Dynamic lab operation optimization based on lab
to identify/analyze recurring defects in parts and propose
process monitoring
Sales Channel Management revised quality checks to prevent them
Automated sales contract execution with liability checks
Customer Service & Support
5 Ideation & Concept Development
Agent-assisted new beverage concept
development
4
Predictive car service issuing personalized notifications
based on model history
Material & Component Sourcing

1 3 Stock level and reorder schedule


recommendations based on supplier
delivery time analysis
Inventory Management
2 Inventory optimization analytics based on
sales patterns across regions
Outbound Logistics
Freshness-maximizing delivery route
6 8
Software, Telecommunications Healthcare & Education 7 7 optimization to maximize freshness and
reduce spoilage
& E-commerce

Ideation & Concept Development


6 Design, Prototyping, & Assessment Quality Control and Management
Predictive health policy impact modeling on patient Real-time monitoring of food production
Software architecture design
care outcomes processes to ensure quality and consistency
suggestions based on current
through visual cue analysis (e.g., color,
performance metrics and user
texture)
demands Quality Control and Management
Ideation & Concept Development Material degradation simulations and proposal of new
quality benchmarks Customer Service & Support
Analysis and proposal generation for
Agent-powered virtual shopping assistants
processing Request for Proposals
personalizing the customer journey
(RfP). Marketing Strategy through interactive question answering and
Customized outreach and engagement campaigns recommendations
Use Cases for Corporate Support Functions

Customer Service & Support based on patient demographics analysis


Personalized shopping chatbots Sales Strategy
providing customized assistance by Tailored academic program offers based on market
IT Service Management Compliance Checks and Reviews
analyzing user profiles and browsing/ needs, aligning with job trends and personal interests
purchase history • AI-driven ticket triaging and • Continuous compliance monitoring
resolution suggestions Customer Service & Support across multiple regulations
Customer Service & Support Personalized appointment planner and reminder to
• Predictive analysis for IT incident • Automated reporting and
Service outage monitoring and improve patient engagement
trends documentation for audits
notification for impacted users. Legal Contract Checks and Reviews
• Self-healing IT infrastructure Customer Service & Support • Intelligent risk management systems
using AI Personalized and interactive patient eduction in • Automated extraction and analysis of key
clinical studies contract terms
Employee Training • Legal risk assessment using AI
Automated Project and
Portfolio Management • Personalized learning plan • Compliance tracking and intelligent
Software Development formulation notifications
• AI-driven resource allocation
and optimization and Operations Research &
• Virtual training assistants
• Predictive analytics for project • Automated code generation and Recruiting Assistance Development
• Gamified Learning Platform
timelines and bottlenecks documentation • Interview scheduling
Cybersecurity Operation
• Intelligent code review, optimization and
• Smart dashboard and reporting • AI-driven threat detection
bug fixes
• Personalized chatbot interviews Supply Chain
L
with real-time updates and response
• Automated functional testing
• Candidate matching Management
• Automated vulnerability assessment
and patching
& Inbound /
Tender Management Outbound Logistics
• Behavioral analysis for insider threat
detection • Bid/No-Bid decision support
• Competitive analysis and
Financial Analysis and Reporting recommendations Production &
Workplace Productivity
• Automated bookkeeping and • Risk assessment and mitigation Operations
• Smart Scheduling and Calendar assessments strategies
Management
• Predictive financial planning
• Task Prioritization and Workflow Customer
• Personalized report generation
Automation Engagement
• Knowledge Management and Information
Retrieval

Generative AI Agents in Action: Revolutionizing Software Development Testing 13


Industrial Agentic Use Cases:
From Strategy to Operation
Transforming Robotic Process Automation (RPA) into Agentic Process Automation (APA)

Bringing Agents into RPA


When considering AI agent automation, a logical next step is to integrate AI agents into existing Robotic Process
Automation (RPA) workflows [11]. This strategy provides a straightforward yet effective means of implementing
agent-driven process automation.
Addressing Robustness Issues in RPA
By utilizing established problem-solving frameworks (i.e., process structures), AI agents can tackle predefined sub-
problems and tasks, which they are generally more robust and capable of handling than existing RPA methods. This
approach not only mitigates the current challenges associated with orchestrating and managing AI agents but also
effectively addresses a key limitation of RPA: robustness, hence a valuable advancement.
SLMs as Cost-effective Options
In these scenarios, system designers may opt for Small Language Models (SLMs) over Large Language Models
(LLMs) to reduce costs and infrastructure needs. For instance, agentic RPA workflows could run on standard
servers with less expensive GPUs.
Pipeline Pipeline Monitoring,
Pipeline Design Pipeline Execution Pros Cons When To Adopt
Building Evaluation, & Update

RPA
Using manual-
crafted rules Static and Fixed to defined
Manually Rule-based data- Well-defined
to orchestrate simple scenarios and Can handle rigid Cannot handle
several contructing via flow and control- sub problems
step-by-step unable to update task flexible task
software in pull-and-drag flow and tasks
a solidified workflows instructions
workflow for
execution

• Focus: Create • Resources • Mechanism: Bots • Monitoring: Rule-based • Efficiency: Greatly • Limited Flexibility: • Indicative
workflows using Required: IT execute predefined monitoring through reduces claim Struggles with Scenarios:
defined rules to expertise to tasks like claim data claim processing logs processing time for unstructured High volume,
handle repetitive configure bots entry, validations, and alerts to identify routine tasks. data and repetitive claim
tasks such as and integrate and standard failures or bottlenecks. • Cost-Effective: unexpected processing tasks
extraction of them with communications. • Evaluation Frequency: Significant savings scenarios. with clear rules;
names and existing claim • Speed: High volume Periodic reviews for on claim processing • Maintenance ideal for back-
dates. processing processing of claims performance and labor for repetitive Requirement: office functions.
• Structure: systems. at a consistent rate efficiency. tasks. Requires periodic • Best Fit: Claim
Simple • Development once programmed. • Update Process: • Scalability: Easy to manual updates processes with
flowcharts Work: Through • Consistency: Manual updates scale operations up to adapt to new low variance
illustrating the manual robotic Executes tasks required for any or down as needed. processes. and where
step-by-step procedure setup exactly as changes in workflows • Lack of Insight: performance
process for and review programmed, with or system integrations. Doesn’t can be
claim data entry, process. minimal deviation. analyze data measured
verification, and for patterns or without needing
updates. insights beyond complex
predefined tasks. decision-making.
Use Case
Example
• Focus: Design • Resources • Mechanism: • Monitoring: Human • Adaptability: Can • Complex • Indicative
Insurance workflows that Required: Intelligent bots oversight together handle complex, Implementation: Scenarios: Claim
Claim incorporate Cross-functional assess incoming with automated variable tasks Requires processes
Processing intelligent teams including claims, make monitoring through required in the substantial requiring
decision-making data scientists, decisions based on analytics tools claims by evolving upfront adaptability and
and adapt machine learning past claim history, that evaluate bot based on historical investment in deep analysis;
to changing experts, and and take actions performance and data. technology and suited for
scenarios and process analysts that can change decisions. • Enhanced Reasoning talent. complex claims
formats of claim to develop and dynamically. • Evaluation Frequency: and Decision-Making: • Data with varied
contents. refine algorithms • Speed: Faster Potential real-time Improved accuracy Dependency: outcomes.
• Structure: for claim decision-making, evaluations for instant and responsiveness Performance • Best Fit: Claim
Adaptive processing. especially for adjustments where to subtle claim reliant on processes that
flowcharts that • Development complex claims, as necessary. scenarios. the quality are infrequent,
can change Work: Through bots learn and adapt. • Update Process: Self- • Customer and quantity unpredictable,
based on generalized • Consistency: updating capabilities Experience: Offers of available and necessitate
real-time data and yet flexible Maintains accuracy based on learned personalized reference data. sophistsicated
analytics and agentic modules over time by learning data and analytics to services and faster • Risk of Errors: reasoning and
learning from and goal-driven from feedback continuously improve claim resolution. Potential risk decision-making.
previous claims adaptable instead of strictly the workflow. with AI agent
processing. autonomy. adhering to initial biases that lead
programming. to incorrect
decisions.

APA
Incorporating
AI agents to
adaptively Dynamic and Automatically Adaptable to
Agent-based Monitoring and Ill-defined sub
contruct scenario- constructing, various scenarios Can handle rigid
data-flow and verification problems and
and execute adaptive orchestrating, and able to update and flexible tasks
workflows control-flow may be tricky tasks
workflows and testing instructions
to achieve
process
automation

14 Generative AI Agents in Action: Revolutionizing Software Development Testing


Industrial Agentic Use Cases:
From Strategy to Operation
The Growing Need for AgentOps

A Call for Long-term Agent Mangement Towards Trustworthy Agents


AgentOps is an emerging concept focused on the As generative AI agents become increasingly
operational management of generative AI agents, which complex and capable, establishing clear operational
are often characterized by their ability to autonomously practices becomes crucial for ensuring their reliability,
perform tasks, interact with users, and generate effectiveness, and ethical deployment [12-13].
content based on user inputs or external data sources
[12].

1. Problem Definition, Agent 2. Experiment 7. Feedback & Iterative Improvement


Design, Tool/Model/Data Conduct experiments to train Collect user feedback and performance data
Engineering or fine-tune the underlying to identify areas for enhancement. Use this
Define use cases, design agent models of the agent, testing information to iteratively improve the agent's
prompts and co-working various tool or agent capabilities, retraining or adjusting as needed to
architecture, and prepare relevant configurations and determine optimize its performance over time.
tools, models (LLMs/SLMs) and the most effective setup.
datasets.

Operational
Cycle

Development
Cycle

3. Evaluate 4. Integration & Deploy 5. Inference & Execution 6. Monitoring & Logging
Assess the agent’s Integrate the generative Enable the agent to Continuously monitor the
performance using AI agent into the process inputs and agent’s performance and
predefined metrics and necessary systems and generate outputs in real usage, recording
benchmarks, possibly workflows. Deploy the time. Activate its core interactions and outputs.
leveraging external agents agent in a production functionalities, allowing it Implement step-by-step
to evaluate its output environment where it can to perform the tasks for trace-back mechanisms
(Agent-as-a-Judge) for interact with users and which it was designed. to analyze any issues or
consistency and perform its intended performance deviations
relevance. tasks. effectively.

Partially inspired and adapted from: https://learn.microsoft.com/en-us/ai/playbook/solutions/generative-ai/llmops-promptflow

15
Focusing the Lens:
Generative AI Agents in Software Development
From LLMs to LLM-based Agents in Software Development

With the emergence of LLMs and generative AI, their applications are being extensively investigated across
different industries. A significant area of focus is software development, where LLMs have demonstrated
impressive capabilities in tasks like code generation and test design. Despite these achievements, they also face
several limitations, particularly regarding autonomy. LLM-based agents utilize LLMs as the foundation for planning,
designing, decision-making, and executing actions during software development, thereby overcoming some of the
prior constraints. In this section, we highlight the main distinctions between these two approaches [14].

Instruction-Oriented

Software
Development
Using LLMs Human Human Human Human Human Human
Instruction Instruction Instruction Instruction Instruction Instruction

User Story User Story Code Autonomous


Software Software Software
Writing, Writing, Generation Learning and
Design and Test Security &
Requirement Requirement and Software Decision
Evaluation Generation Maintainance
Engineering Engineering Development Making

LLM-based LLM-based LLM-based LLM-based LLM-based LLM-based


Agents Agents Agents Agents Agents Agents
Software
Development
Using
LLM-based
Agents

Orchestrator Agent Human Instruction


& Oversight Goal-Oriented

16 Generative AI Agents in Action: Revolutionizing Software Development Testing


Focusing the Lens:
Generative AI Agents in Software Development
Navigating the Agentic Software Development Cycle: Use Cases and Trust Spectrum

1: Deployment Script Generation: 1: Test Case Generation:


Create scripts for deploying applications Create comprehensive test cases based
to various environments (e.g., staging, on user stories and requirements.
production).
2: Automated Test Scripts:
2: Configuration Management: Develop scripts for automated testing
Suggest optimal configurations based frameworks.
on application requirements.
3: Bug Detection Assistance:
3: Continuous Integration/Continuous Analyze code to identify potential bugs or
Deployment (CI/CD) Pipelines: 5. Deploy 4. Test vulnerabilities.
Assist in setting up and optimizing CI/
CD workflows.
4: Test Documentation:
Generate test plans, reports, and
4: Environment Setup Assistance: documentation.
Provide guidance on setting up
development, testing, and production
environments 5: Performance Testing Insights:
Provide recommendations for
performance testing scenarios and
5: Rollback Strategy Documentation: metrics.
Generate plans for rolling back
Agile Software
deployments in case of failures. Development

6. Review 3. Develop

1: Code Review Assistance: 1: Code Generation:


Automatically review code for Write code snippets or modules based on
adherence to standards, best specifications or descriptions.
practices, and potential issues.

2. Design 2: Documentation:
2: Feedback Analysis: Automatically generate or update code
Analyze feedback from team members documentation and comments.
to identify common themes or areas 1: Architectural Suggestions:
for improvement Provide recommendations for
system architecture based on 3: Code Refactoring:
requirements. Suggest improvements for existing code to
3: Performance Metrics Reporting: enhance readability and performance.
Compile and interpret performance
data to assess project progress. 2: UI/UX Design Assistance:
Generate wireframes or 4: API Integration Assistance:
design mockups from textual Provide guidance or generate code for
4: Documentation Review: descriptions. integrating third-party APIs.
Ensure that all project documentation is
up-to-date and comprehensive. 3: Design Document Drafting: 5: Language Support:
Create comprehensive Assist developers with syntax, libraries,
5: Retrospective Facilitation: design documents outlining and frameworks in various programming
Generate questions and topics for components, interfaces, and data languages.
retrospective meetings to encourage flow.
constructive discussions.
4: Component Specification: 7. Launch
Define specifications for
1. Plan individual system components.
1: Marketing Content Creation:
5: Best Practices Guidance: Generate marketing materials, blog posts,
1. Requirement Gathering & Analysis: Offer insights into industry best and release notes for the product launch.
Analyze stakeholder inputs and extract practices and standards relevant
key requirements. to the project.
2: User Onboarding Assistance:
Create guides, tutorials, and FAQs to help
2. User Story Generation: users understand and adopt the new
Create detailed user stories based on software.
high-level requirements.
Trust and Readiness for Adoption:
3: Feedback Collection Tools:
3. Project Planning: Assist in timeline Joint Assessment Among Selected
Design surveys or feedback forms to gather
estimation and resource allocation. appliedAI Partner Companies
user input post-launch.

4. Risk Assessment: 4: Launch Plan Documentation:


Identify potential risks and suggest Weaker Stronger Draft comprehensive launch plans outlining
mitigation strategies. steps, responsibilities, and timelines.

5.Backlog Prioritization: 5: Monitoring Setup Guidance:


Help prioritize backlog items based on Provide recommendations for setting up
various criteria like business value and monitoring and analytics tools to track the
effort. software’s performance post-launch.

17
Generative AI Agents in Action for Software Testing

Enhancing Software Development Testing: Why Generative AI Agents Matter

Background Opportunity
Continuous software testing is a critical element There is a significant opportunity to leverage AI
of the software development lifecycle, especially to automate the generation and optimization of
within agile methodologies, where testing occurs test cases, thereby reducing the manual workload
at every stage to ensure system robustness as new on developers, enhancing testing efficiency, and
code is committed to repositories like GitHub, often improving the overall quality of software products.
facilitated by tools like Jenkins for CI/CD processes.

Challenge
Despite advancements in automation, the creation
and refinement of test cases—such as unit tests and
functional/UI tests —still require substantial human
effort, leading to inefficiencies and potential gaps in
testing coverage.

Levels of Software Testing Automation

Exploratory Tests
• Validating functionalities in
unanticipated scenarios

Functional/UI Tests
• Validating functionalities in
business scenarios

Unit Tests
• Validating individual components

18 Generative AI Agents in Action: Revolutionizing Software Development Testing


Generative AI Agents in Action for Software Testing

Generative AI Agents for Unit Test Writing & Reviewing

Problem Statement Methods


Writing and reviewing unit tests can be time- • Our tool integrates Retrieval-Augmented
consuming and error-prone, often requiring deep Generation (RAG) and AutoGen technologies
domain knowledge and meticulous attention to edge to automatically review GitHub pull requests. It
cases. generates summaries for the code, analyzes code
differences, and provides concise summaries of
Current Solution those differences. Using these summaries, the tool
Generative AI agents can automate the generation conducts in-depth reviews of the pull requests,
of unit tests by analyzing code logic, identifying assessing code quality and functionality while also
critical paths, and suggesting tests for edge cases creating effective unit tests.
and coverage gaps. In this white paper we showcase
• By leveraging generative AI agents to interpret
an innovative autonomous Pull Request Tester and
requirements and code structure, the system
Reviewer.
can generate test cases, validate them against
expected behaviors, and provide feedback or
improvements to existing test suites.

GitHub Repository

In-house Agent Pull Request


Recursive
Summarization:
Index every file &
directory

Code → English Original File Code Diff Original File Code Diff Original File Code Diff
Summary Summary Summary

Diff Summary Diff Summary Diff Summary

File Summary

Code Prompt
Snippet
Context about Summary of
VectorDB the project all changes
PR Reviewer

Access via tools

AutoGen
Code Snippets
Commit & Push Unit Tests

Execution Results
Developer Proxy QA Tester
w/ docker

How AutoGen Generally Works


Uses shell with
human-in-the-loop Output:
Plot a chart of META
and TESLA stock price
User Proxy Agent change YTD.
$

Execute the Month


following code...
No, please plot %
Error package yfinance change!
is not installed
Got it! Here is the
Assistant Agent Sorry! Please first pip revised code...
install yfinance and
then execute the code Output:

%
Installing...

LLM configured to write Month


python code

AutoGen diagram partially adapted from: https://microsoft.github.io/autogen/0.2/docs/Getting-Started/

19
Generative AI Agents in Action for Software Testing

Generative AI Agents for Functional/UI Tests

Background & Methods

Problem Statement Methods


Developing functional and UI tests is labor-intensive • We use an orchestrating agent with AI generated
and requires detailed knowledge of user flows, persona to automate end-to-end functional/UI
interface interactions, and system functionality, tests, in combination with the acting agent based
making it a high-cost process for the company in on Claude 3.5 Model and a visual large language
terms of business value. model judgement and evaluation agent.
Current Solution • Using this multi-agent system, AI agents can
Generative AI agents can automate functional simulate user interactions, generate plans and
and UI test creation, execution and validation by instructions for automated functional / UI tests,
analyzing workflows, user stories, and interface and adapt these tests dynamically based on
designs, reducing manual effort while enhancing changes in the application's UI.
test accuracy and consistency. Here we showcase a
multi-agent autonomous UI function tester.

Multi-agent Functional/UI Testing Workflow

Input Functional/UI Test Agents Output

Orchestrator
Evaluation Task Delegation
• Criteria to make a "pass"
or "fail" judgement

Persona Agent
Developer (GPT-4o)
• Role-playing in planning Test
Instruction &
and coordinating Completion
Oversight testing steps Reasoning & Analytic Agents

Instructions Progress Report


• Avaliable Tools • Screenshots
• Specific Next • Textual Descriptions Judgement & Evaluation
Steps Agent (GPT-4o)
• Utilize visual input
• Judge whether system
Acting Agents passed or failed the
test

Output
Generation
Computer Use Agent
(Claude 3.5 Sonnet) Providing Final Screenshots
Computer
• Computer Interface Control and Textual Descriptions
Screen
• File Navigation
• Web Search &
Browsing

20 Generative AI Agents in Action: Revolutionizing Software Development Testing


Generative AI Agents in Action for Software Testing

Demonstration Test Cases

For demonstration purposes during the prototypical phase, we have chosen an internally developed software,
'GenAI.xy Playground,' as the target for software testing. This selection allows us to assess, based on its
current capabilities (such as reasoning, planning, and UI execution), the level of complexity (functional level)
that our designed multi-agent system can handle.
The table below presents the GUI-based functions that have been tested with the prototype as well as the
testing results. Refer to "Multi-agent Functional/UI Testing Workflow" on the previous page for a diagram of the
prototyped system and the appliedAI Initiative Youtube channel for live demo recordings.

Judgement
Test case Goal Description Testing result Agent result
Use the search bar to search for Acting agent based on Claude 3.5 Computer
Search bar one keyword, and then open a use Use correctly understood the visual design PASS
case from the result of a search bar
Favourite (bookmark) In the Use Case Library, use the Acting agent based on Claude 3.5 Computer
feature filtering feature to select 1 industry Use was able to correctly find filter on UI
PASS
Filtering (checkbox with and find one use case to add to and also understand the visual design that a
dropdown menu) favorites. ❤
heart means adding to favourite.

Open Image Generation in the Agent correctly navigated to the image


Using image generation
Playground, and then generate a generation playground and found correct PASS
model with a prompt
cute image and add it to favorite. model.
Find one user-specified prompt
Acting agent found the correct required
Search and apply template in the code generation
template, but after reviewing the template
1 specific prompt playground which can assist in FAIL
and getting back to the main menu, it forgot
template writing RESTful API and test that
the context and applied the wrong template.
template.

Example Agentic Software Test Workflow for the Test Case 'Seach Bar'

Step 1 Define the Test Case 'Search bar'


• The user and developer define the functional test case goal — in this example, testing the search bar
functionality — by appropriately describing the test case.

21
Generative AI Agents in Action for Software Testing

Step 2 Initialize the Persona Agent


• The Orchestrator (Persona Agent) generates three persona templates based on the given test case
definition and awaits the user's selection.

Step 3 Orchestrator Collaborates with the Acting Agent


• The Orchestrator selects the appropriate persona and begins planning and delegating tasks to the Acting Agent.
• The Acting Agent will request assistance or instructions from the Orchestrator when blocked by a subtask
and will also report major milestones to the Orchestrator, providing both text descriptions and UI screenshots.

22 Generative AI Agents in Action: Revolutionizing Software Development Testing


Generative AI Agents in Action for Software Testing

Step 4 Orchestrator Calls the Judgement Agent for Evaluation


• Once the Orchestrator "believes" that the test has been completed by the Acting Agent, it will call the
Judgement Agent for a final evaluation.
• The Judgement Agent will analyze and evaluate, based on the final screenshot and last report, whether the
original test goal given by the user has been completed (PASS) or not (FAIL).

23
Retrospective & Prospective:
Challenges and Opportunities for Generative AI Agents
Insights & Reflections

Agent Capabilities Industrial Agentic Use Cases


Defining Characteristics: Generative AI agents Transformative Automation Across the Value
exhibit key traits including environmental Chain: Generative AI agents offer substantial
interaction, task execution, and advanced opportunities to enhance business decision
cognitive capabilities—spanning perception, making processes by automating and simplifying
reasoning, goal setting, and planning—allowing for adaptive complex tasks, enabling deep semantic understanding
responses to complex scenarios. and driving efficiencies across diverse processes — from
research & development to customer engagement.
Current Status: Presently, agentic functionalities are
primarily concentrated at level 2 (reasoning), with emerging Holistic Value Creation: By facilitating processes such as
advancements at level 3 (autonomy), underscoring a creativity, discovery and research, optimizing logistics in
landscape ripe for the evolution of more sophisticated supply chain management, and streamlining operations
cognitive processes in future iterations. and quality control in production, generative AI agents
contribute to holistic value creation and operational
Prospects: In the coming years, we anticipate breakthroughs
excellence.
in agentic innovation and organizational capacity, leading
to the deployment of multi-agent systems that facilitate From Exploration and Engagement to Execution and
enhanced communication and orchestration among Effective Operation: Currently, the trajectory of generative
individual agents, while emphasizing the ongoing necessity AI applications tends to prioritize the initial exploratory and
for human oversight. final customer engagement stages. However, there is also
growing effort in the domains of supply chain management
and production operations, aimed at developing robust
automation tools for critical processes in the future.

From RPA to APA Software Development


Boosting Dynamicity and Adaptability: From LLMs to LLM-based agents: The transition
Transitioning from Robotic Process Automation from LLMs to LLM-based agents is reshaping
(RPA) to Agentic Process Automation (APA) software development, with current applications
enables a shift from rigid, rule-based workflows to focusing on code generation, unit/functional testing,
dynamic, goal-oriented frameworks that adapt to varying and requirements engineering, while cautious adoption
task complexities, enhancing overall process efficiency and persists for critical tasks like infrastructure deployment.
effectiveness.
Prioritizing Safe AI Integration: High-value, low-cost use
Augmenting Existing Workflows: By incorporating AI cases in agile development, such as documentation and
agents into existing RPA workflows, organizations can UI/UX design support, are being prioritized for AI agent
leverage agentic problem-solving capabilities to tackle integration, whereas critical activities like deployment script
both predefined and ill-defined sub-problems, thereby generation in complex environments are viewed as risky and
addressing RPA's limitations in adaptability and robustness. less mature.
Cost-Effective Deployment: Utilizing Small Language Models Impact and Trust Across Roles: AI agents are expected to
(SLMs) over Large Language Models (LLMs) in agentic influence various software engineering roles. Areas such as
workflows allows organizations giving self-hosting options frontend and web development and software testing are
on cloud, on-premise, and edge environments to optimize currently the most trusted for AI automation and may see
data privacy, IT integration and infrastructure costs, while the most impact. In contrast, task planning and deployment
still harnessing the flexibility and intelligence of AI agents. are less trusted. Nonetheless, the complexity of enterprise
software development may pose challenges.

Software Testing Next-Gen Potentials


Revolutionizing Continuous Testing: Generative Transformative Problem-Solving: Advanced
AI agents streamline the software development generative AI agents are revolutionizing
lifecycle by automating the creation and problem-solving in diverse fields like software
refinement of unit, functional, and UI tests, development and industrial engineering by
enhancing efficiency and ensuring robust testing coverage automating complex tasks, enhancing collaboration,
integral to agile methodologies. and producing high-quality solutions through dynamic
interaction and learning from human feedback.
Mitigating Human Effort in Test Generation: By leveraging
advanced techniques like Retrieval-Augmented Generation World Simulation Applications: Generative AI agents may
(RAG) and AutoGen, AI agents automatically review pull assist in simulating human behavior across gaming, societal
requests and produce tailored test cases, significantly interactions, and economic modeling, enabling realistic role-
relieving developers from the time-consuming and error- playing, engaging dialogue, and strategic decision-making
prone task of manual test writing. that closely mimics human responses and social dynamics.
Enhancing Test Accuracy and Adaptability: With the ability Autonomous Scientific Innovation: Generative AI agents
to analyze user workflows and interface interactions, will drive significant advancements in scientific research
generative AI agents facilitate the dynamic creation of by autonomously conducting experiments, optimizing
functional and UI tests, ensuring comprehensive coverage processes, and facilitating collaborative debates, thereby
while adapting to evolving application requirements and enhancing the efficiency and accuracy of scientific inquiry
maintaining consistency across testing efforts. across various disciplines.

24 Generative AI Agents in Action: Revolutionizing Software Development Testing


Retrospective & Prospective:
Challenges and Opportunities for Generative AI Agents
Challenges & Risks of Generative AI Agents

Adversarial
Robustness
LLMs Under Attacks Trustworthiness
• Large language models Calibration Challenges
are susceptible to adversarial • Language models face Misuse, Bias, &
attacks, leading to erroneous
responses. Relevant attack
challenges with the so-called Fairness
calibration problem, which causes
methods include dataset poisoning them to inadequately convey Exploitation of LLM
and prompt-specific attacks. the certainty of their predictions, Agents
In Pursuit of Robustness Techniques leading to outputs that do not • Individuals with malicious
• Approaches such as adversarial reflect human expectations in intentions can exploit LLM-based
training, data augmentation, and practical use cases. agents to sway public perception,
sample detection can enhance the disseminate misinformation, and
Demand for Reliability
robustness of LLM-driven agents; conduct unlawful activities.
• There is an urgent demand for
however, a complete solution intelligent agents that are both Dangers to Security and Society
continues to be elusive. reliable and honest. Recent studies • The potential for abuse of
Human Oversight Required have focused on directing models generative AI agents presents
• Introducing a human-in-the-loop to offer reasoning and explanations considerable dangers to both
framework can help oversee and to improve their credibility. security and social stability, which
improve the conduct of LLM- could lead to orchestrated terrorist
Debiasing and Fairness
dependent agents, which may activities and cyber threats.
• Implementing debiasing strategies
reduce the threats posed by and calibration methods during Regulatory Measures for Safe Use
adversarial attacks. the training process can address • To reduce these risks and promote
fairness concerns and improve the responsible usage, it is crucial
reasoning capabilities of language to implement strict regulatory
models. frameworks and improve security
protocols in the development and
training of these agents.

Threat to the Well-


being of the Human
Agent Evaluation Race
Real-World Performance Challenges in Managing Agents
Human-agent Limitations • As AI agent technology progresses,
• Existing approaches to humans may find it challenging
Interaction to manage these systems, which
assessing AI agents often
Communication Clarity fall short in accurately reflecting could result in considerable risks
Needed their performance in real-world if these agents surpass human
• Clear communication scenarios, resulting in a limited intelligence and develop their own
between humans and AI agents understanding of their reliability. objectives.
is essential, as misunderstandings Global Safeguards Imperative
Bias and Fairness in Evaluation
can occur due to the intricacies • Without adequate safeguards,
of the agents' language models, • When evaluating AI agents, it is
crucial to address issues of bias sophisticated AI agents could
potentially resulting in unintended pose significant dangers to
outcomes in decision-making. and fairness; using inappropriate
evaluation metrics can exacerbate humanity, underscoring the need
Impact of AI Reliance on Human undesirable behaviors, undermining forregulations and a globally
Cognition the agent's acceptance in society. shared technical and ethical
• As people depend more on AI framework.
Evolving Assessment Frameworks
agents for decision-making, there Economical Impact
is a concern that this reliance • As both environments and tasks
may change, it is essential for the • The advancements of AI agents
could weaken critical thinking and may disrupt traditional job markets,
problem-solving abilities, which evaluation of AI agents to evolve,
enabling ongoing measurement of necessitating workforce reskilling
might compromise human agency. and adaptation to ensure that
their performance while ensuring
Building Trust and Ethics they remain aligned with user the benefits of this technology
• Establishing trust in interactions requirements and ethical standards are equitably distributed across
between humans and AI is crucial; over time. society.
users need to have confidence
in the agents' abilities while
ensuring that ethical standards are
upheld to prevent manipulation or
exploitation. References: [5,7-8]

25
Retrospective & Prospective:
Challenges and Opportunities for Generative AI Agents

The Future Unfolded: Opportunities of Generative AI Agents Today and Beyond

Today’s Innovations: Generative AI Agents Taking Actions

Rapid Development of Agent Ecosystems From Single-Agent to Multi-Agent Systems


Tech leaders (e.g., NVIDIA, OpenAI, SAP) are Companies are gradually transitioning from
actively pursuing the creation and integration of single-agent approaches to orchestrated multi-
generative AI agents into exisiting frameworks, agent systems, assessing how Level 2 (reasoning)
tools, and ecosystems, laying the groundwork for and Level 3 (autonomy) agentic capabilities may
an expansive agent society. be integrated to tackle complex tasks end-to-end
with a high degree of autonomy.

Focus on Efficiency in Both Early- and Late- Start with Agentic Process Automation (APA)
Stage Value Chains For various business processes, a structural
Initial applications of generative AI agents are approach such as APA can be implemented to
likely to concentrate on enhancing efficiency in effectively leverage advanced agent capabilities,
use cases across both the early- and late-stage accommodating diverse task complexities and
value chains of various sectors, streamlining thereby enhancing and augmenting existing
processes, and improving productivity. robotic process automation (RPA) workflows.

Advancements in Software Development Tools Theoretical Innovations and Cross-disciplinary


Both generative AI code assistants and Approaches
engineering agents are gaining traction, Researchers are advancing theoretical
demonstrating potential in automating critical knowledge by integrating insights from fields
software development tasks such as code reviews such as cognitive science and complex systems,
and functional testing, although concerns about enhancing understanding and application of
trust and reliability remain. generative AI agents in various contexts.

“Agentic AI's transformative power shines


with customization. By designing purpose-
built agents for specific domains, we can now
blend advanced reasoning and actionability
with modern techniques like RAG, Knowledge
Graphs, Conversational Analytics, and
Intelligent Document Processing, crafting
AI systems that excel at tackling complex,
domain-specific challenges.”

Milos Rusic
CEO & Co-Founder
deepset

26 Generative AI Agents in Action: Revolutionizing Software Development Testing


Retrospective & Prospective:
Challenges and Opportunities for Generative AI Agents

Beyond the Horizon: Generative AI Agents Shaping the Future

Enhanced Collective Intelligence and Evolving System Interconnection and


Coordination Complexity
Research will likely explore optimizing collective By forming interconnected multi-agent systems,
intelligence within AI agent networks, where an "agentic galaxy," the complexity of these
multiple agents accumulate knowledge and networks may increase significantly, fostering
experience from both interactions among continuous learning and adaptability as well as
themselves and their collaborations with humans, allowing agents to rapidly evolve through shared
achieving synergies that can lead to more insights.
effective problem-solving and innovation.

Progress in Multi-Modal Environments From Virtual to Physical Agents


There will be an increased focus on generative The "ChatGPT moment" for multimodal
AI agents in multimodal settings, which will robotic foundation models is approaching,
integrate various sensory inputs. These versatile enabling predicted actions in complex physical
agents will enhance their ability to interact with environments and ultimately realizing the long-
the physical world, leading to more natural and term vision of human-like intelligence in robotic
effective human-robot interactions across diverse form.
industries.

Scalability and Resource Efficiency Extensive Applications Across Diverse Fields


The future of generative multi-agent systems Generative AI multi-agent systems are expected
will depend on developing scalable architectures to expand into various industrial sectors
that maintain efficiency as the number of agents (semiconductor, chemicals, E-commerce,
increases, addressing computational constraints. healthcare, education, etc.), tackling complex
problems and driving advanced computational
solutions.

27
References
[1] T. Sumers, S. Yao, K. Narasimhan, and T. L. Griffiths, “Cognitive Architectures for Language Agents,” Sep. 05, 2023, arXiv:
arXiv:2309.02427. doi: 10.48550/arXiv.2309.02427.
[2] S. Franklin and A. Graesser, “Is It an agent, or just a program?: A taxonomy for autonomous agents,” in Intelligent Agents
III Agent Theories, Architectures, and Languages, vol. 1193, J. P. Müller, M. J. Wooldridge, and N. R. Jennings, Eds., in
Lecture Notes in Computer Science, vol. 1193. , Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 21–35. doi: 10.1007/
BFb0013570.
[3] L. Yee, M. Chui, and R. Roberts, “Why AI agents are the next frontier of generative AI | McKinsey,” McKinsey Quarterly.
Accessed: Dec. 11, 2024. [Online]. Available: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/why-
agents-are-the-next-frontier-of-generative-ai
[4] A. Gutowska, “What Are AI Agents? | IBM,” IBM Think. Accessed: Dec. 11, 2024. [Online]. Available: https://www.ibm.com/
think/topics/ai-agents
[5] Z. Xi et al., “The Rise and Potential of Large Language Model Based Agents: A Survey,” Sep. 19, 2023, arXiv: arXiv:2309.07864.
doi: 10.48550/arXiv.2309.07864.
[6] J. Cook, “OpenAI’s 5 Levels Of ‘Super AI’ (AGI To Outperform Human Capability),” Forbes. Accessed: Dec. 11, 2024. [Online].
Available: https://www.forbes.com/sites/jodiecook/2024/07/16/openais-5-levels-of-super-ai-agi-to-outperform-human-
capability/
[7] T. Guo et al., “Large Language Model based Multi-Agents: A Survey of Progress and Challenges,” Apr. 19, 2024, arXiv:
arXiv:2402.01680. Accessed: Nov. 18, 2024. [Online]. Available: http://arxiv.org/abs/2402.01680
[8] X. Li, S. Wang, S. Zeng, Y. Wu, and Y. Yang, “A survey on LLM-based multi-agent systems: workflow, infrastructure,
and challenges,” ResearchGate. Accessed: Nov. 18, 2024. [Online]. Available: https://www.researchgate.net/
publication/384732283_A_survey_on_LLM-based_multi-agent_systems_workflow_infrastructure_and_challenges
[9] J. Liu et al., “Large Language Model-Based Agents for Software Engineering: A Survey,” Sep. 04, 2024, arXiv:
arXiv:2409.02977. Accessed: Nov. 18, 2024. [Online]. Available: http://arxiv.org/abs/2409.02977
[10] Y. Wang et al., “Agents in Software Engineering: Survey, Landscape, and Vision,” Sep. 23, 2024, arXiv: arXiv:2409.09030.
Accessed: Nov. 08, 2024. [Online]. Available: http://arxiv.org/abs/2409.09030
[11] Y. Ye et al., “ProAgent: From Robotic Process Automation to Agentic Process Automation,” Nov. 23, 2023, arXiv:
arXiv:2311.10751. doi: 10.48550/arXiv.2311.10751.
[12] L. Dong, Q. Lu, and L. Zhu, “AgentOps: Enabling Observability of LLM Agents,” Nov. 30, 2024, arXiv: arXiv:2411.05285. doi:
10.48550/arXiv.2411.05285.
[13] M. Zhuge et al., “Agent-as-a-Judge: Evaluate Agents with Agents,” Oct. 16, 2024, arXiv: arXiv:2410.10934. Accessed: Oct. 21,
2024. [Online]. Available: http://arxiv.org/abs/2410.10934
[14] H. Jin, L. Huang, H. Cai, J. Yan, B. Li, and H. Chen, “From LLMs to LLM-based Agents for Software Engineering: A Survey of
Current, Challenges and Future,” arXiv.org. Accessed: Aug. 30, 2024. [Online]. Available: https://arxiv.org/abs/2408.02479v1

“We see Generative AI agents, embedded into


advanced agentic RAGs, replicating complex
human analyses and decisions. When guided
by clearly defined processes, they automate
tasks, increase efficiency and achieve
precision, enabling solutions to challenges that
were previously out of reach. This capability has
the potential to strengthen the German
economy by boosting growth and
counteracting labour shortages.”

Lukas Wogirz
CEO & Co-Founder
databAIse

28 Generative AI Agents in Action: Revolutionizing Software Development Testing


Do you want to dive deeper into LLM and RAG?
Start your journey with our white papers.

Firms that employ large Our latest whitepaper on Our RAG use case study
language models (LLMs) can Retrieval-Augmented Generation on Retrieval-Augmented
create significant value and (RAG) offers insights into the Generation (RAG) within
achieve sustainable competitive advancements and challenges the test and measurement
advantage. However, the of Retrieval-Augmented industry highlights common
decision of whether to make- Generation (RAG) within the challenges in the technical
or-buy LLMs is a complex one industry. It provides an analysis domain and explores effective
and should be informed by of industry demands, current RAG evaluation techniques.
consideration of strategic value, methodologies, and the We demonstrate how Large
customization, intellectual obstacles in developing and Language Models (LLMs) can
property, security, costs, evaluating RAG. Additionally, our be leveraged to scale up RAG
talent, legal expertise, data, whitepaper aims to facilitate evaluation reliably, and address
and trustworthiness. It is strategy development and industry-specific challenges
also necessary to thoroughly knowledge exchange about such as multilingual data, in-
evaluate available open- practical use cases across domain data, and complex
source and closed-source LLM various industrial sectors. The tabular structures. Our vision
options, and to understand the whitepaper is the result of pipeline and retrieval fine-tuning
advantages and disadvantages extensive studies and discussions solutions have significantly
of fine-tuning existing models conducted with our internal improved the accuracy of RAG,
versus pre-training models from teams and industry partners. It proving the value of customized
scratch. highlights RAG as a cost-effective RAG applications for the wireless
technique that has significantly test and measurement sector.
improved the trustworthiness
and control of Large Language
Model (LLM) applications over
the past year.

29
Authors

Dr. Paul Yu-Chun Chang Mingyang Ma


Senior AI Expert: Foundation Models - Principal AI Strategist & Product Manager,
Large Language Models, appliedAI Initiative GmbH
appliedAI Initiative GmbH m.ma@appliedai.de
p.chang@appliedai.de

Paul Yu-Chun Chang works as an Senior AI Mingyang Ma works as Principal AI Strategist &
Expert specializing in Large Language Models Product Manager at appliedAI Initiative GmbH,
at appliedAI Initiative GmbH. He has over 10 supporting all partner companies’ decision making
years of interdisciplinary research experience in and technical solution identification of various AI use
computational linguistics, cognitive neuroscience, cases, with a particular focus on leveraging LLMs.
and AI, and more than 6 years of industrial With over 6 years of expertise in NLP, Mingyang
experience in developing AI algorithms in language has excelled in the realm of Conversational AI,
modeling and image analytics. Paul holds a PhD demonstrating her proficiency in application
from LMU Munich, where he integrated NLP and DevOps and platform development across various
machine learning methods to study brain language processes during her tenure at BMW Group in both
cognition. Germany and the USA.

Bernhard Pflugfelder
Head of Generative AI,
appliedAI Initiative GmbH
b.pflugfelder@appliedai.de

Bernhard Pflugfelder works as Head of Generative


AI at appliedAI Initiative GmbH. Bernhard has 15
years of experience in the fields of Data Science,
Natural Language Processing (NLP), as well as data
and AI across different companies such as BMW
Group or Volkswagen Group. He is renowned for his
expertise especially in the field of AI in general, NLP
and generative AI in particular.

30 Generative AI Agents in Action: Revolutionizing Software Development Testing


Contributors

Antoine Leboyer Lukas Wogirz


Managing Director SW/AI CEO & Co-Founder
TUM Venture Labs databAIse
antoine.leboyer@unternehmertum.de ljw@databai.se

Antoine Leboyer is an entrepreneur and the Managing Lukas Wogirz is the CEO and Co-Founder of databAIse, an
Director of SW/AI at TUM Venture Lab and a Board Member at AI-powered platform transforming unstructured text data
Hyperganic. Formerly, he served as President and CEO of GSX into actionable insights. With a Master’s degree in Electrical
and held board positions at Geneva Liberal Synagogue and Engineering and Information Technology from TUM, Lukas
Martello Technologies. He holds an MBA from Harvard, class of specializes in AI/ML, automation and deep learning. He has
'92. previously worked on advanced technologies at MOV.AI, where
he developed patented algorithms for industrial automation.

Milos Rusic Dr. Christian Karaschewitz


CEO and Co-founder AI Product Incubation Lead, SAP Business AI
deepset – Product & Partner Management
milos.rusic@deepset.ai SAP SE
christian.karaschewitz@sap.com

Milos Rusic is the co-founder and CEO of deepset, the Christian Karaschewitz is a product manager and innovation leader
company behind Haystack and deepset Cloud—leading with over a decade of experience at SAP. Currently, he serves as AI
solutions for rapid custom LLM and NLP application Product Incubation Lead, driving cutting-edge solutions in Business
development. Trusted by NVIDIA, Intel, Airbus, and The AI. Previously, he led product initiatives for SAP Start-Up initiatives
Economist, deepset’s tools empower enterprises to build and such as Head of Product for Ruum by SAP and Co-Founder and Head
deploy AI solutions tailored to their unique needs and mission- of Product of FlexPay by SAP. Beyond his work at SAP, Christian has
critical use cases. Learn more at deepset.ai. mentored startups as a Venture Mentor at SAP.iO. He holds a Ph.D.
from the University of St. Gallen and a Master’s from the University of
the Arts Berlin. With a passion for innovation, he excels at delivering
impactful technologies and business solutions.

Emre Demirci Joong-Won Seo Ferdy Dermawan Hadiwijaya


Junior AI Engineering LLM Junior LLM & Software Engineer Junior Generative AI Engineer
appliedAI Initiative GmbH appliedAI Initiative GmbH appliedAI Initiative GmbH
E.Demirci@appliedai.de j.seo@appliedai.de f.hadiwijaya@appliedai.de

Emre Demirci is a dedicated Data Joong-Won Seo is a Master’s student Ferdy Dermawan Hadiwijaya is a
Engineering Master's student at the in Computer Science at the Technical master's student in Computer Science
Technical University of Munich (TUM). University of Munich, specializing in deep at the Technical University of Munich
As a working student at AppliedAI, learning, generative AI, and full-stack (TUM), specializing in Natural Language
Emre focuses on developing cutting- engineering. With extensive experience as Processing and Generative Models. As a
edge solutions involving large language a teaching assistant at TUM, he combines working student at appliedAI, he serves
models (LLMs) and knowledge graphs. a strong research foundation with hands- as a Junior GenAI Engineer, bringing
Passionate about leveraging technology on engineering skills. He applies LLMs and three years of professional experience
for impactful solutions, Emre’s work generative AI to develop prototypes that in Large Language Models and software
bridges the gap between AI innovation translate theoretical ideas into practical development to bridge cutting-edge
and real-world applications. solutions for real-world challenges. academic research with practical
industry applications.

31
About appliedAI Initiative GmbH

appliedAI is Europe's largest initiative for the


application of trusted AI technology. The initiative was
established in 2017 by Dr. Andreas Liebl as a division
of UnternehmerTUM Munich and transferred to a joint
venture with Innovation Park Artificial Intelligence (IPAI)
Heilbronn in 2022.

At the Munich and Heilbronn offices, more than 100


employees pursue the goal of making European
businesses a shaper in the AI era in order to maintain
Europe's competitiveness and actively shape the future.

appliedAI holistically supports international


corporations, including BMW and Siemens, as well as
medium-sized companies in their AI transformation.
This is accomplished through partnership-based
exchange and joint knowledge building, comprehensive
accelerator programs, and specific solutions and
services, such as strategy consulting and Use-Case
development.

For more information, please visit


https://www.appliedai.de/en/

32 Generative AI Agents in Action: Revolutionizing Software Development Testing


Acknowledgement
We express our sincere appreciation for the excellent
work carried out by the authors, reviewers, and designer
of this white paper. Their great expertise, motivation,
and dedication to every detail made this white paper an
exceptional contribution to the AI community.

Furthermore, we extend our gratitude to all contributors


for their valuable contributions throughout this joint case
study. Their profound expertise and commitment have
played a pivotal role in shaping the ideas, knowledge
and results presented in this paper.

Our sincere thanks go to our appliedAI industry


partners Atruvia, Linde, Giesecke & Devrient, Rohde &
Schwarz, SAP and Siemens as well as our technology
partner deepset for their openness, commitment, and
contributions to this case study. We highly appreciate
their willingness to share insights and results.

The collective expertise, exchange, and dedication to


advancing the knowledge in generative AI and gentic
applications were great inspirations throughout the
process of creating this white paper.

33
Generative AI Agents in Action:
Revolutionizing Software
Development Testing

appliedAI Initiative GmbH

August-Everding-Straße 25
81671 München
Germany
www.appliedai.de

You might also like