1736183379011
1736183379011
1736183379011
Generative AI Agents
in Action:
Revolutionizing Software
Development Testing
Contents
Foreword 4
Executive Summary 5
References 28
Authors 30
Contributors 31
About appliedAI Initiative GmbH 32
Acknowledgement 33
3
Foreword
A Few Thoughts at the End of 2024
A
s we approach the close of 2024, the landscape of Leading organizations such as Anthropic, Microsoft,
Artificial Intelligence (AI) is undergoing a profound NVIDIA, OpenAI, Salesforce, SAP, and others are at
transformation. Since the pivotal ChatGPT the forefront of developing agents that not only
milestone two years ago, we have witnessed a rapid follow commands but also proactively solve complex
adoption of Large Language Models (LLMs) in a huge problems by aligning with broader objectives. Although
variety of applications. In most cases, LLMs have been we are still in the early phases of agent development, the
used for multi-modal content generation, knowledge evolvement towards autonomous multi-agent systems
retrieval and chatbots, already generating value in is already underway. In the not-too-distant future, these
various industries and their value chains. agents hold immense promise, as they begin to tackle
intricate tasks that were once the exclusive domain of
A common pattern across current generative AI human intelligence.
applications is the instruction-oriented interaction
between users and AI, primarily facilitated through In fact, generative AI agents have now become a
a chat format. Whether users formulate specific spotlighted field within AI technology. Why is this
questions, need a summary or key insights from a so? Because generative AI agents represent a shift
document, or want to note their thoughts and ideas from instruction-oriented chat interactions, where
for future reminders, the typical usage pattern involves humans guide problem-solving, to task delegation and
providing the AI with a specific prompt or instruction to autonomous problem-solving with minimal or even no
elicit the desired response. human oversight in the future. This shift opens up vast
potential for businesses, enabling task automation in
While this interaction model has driven numerous software-based virtual environments and even action
relevant and valuable business use cases and directly planning in physical environments.
embeds the human-in-control principle to mitigate
certain risks of generative AI, it does not fully utilize In this white paper, we explore the rise of generative AI
the most recent capabilities of generative AI. With the agents, transitioning from traditional instruction-driven
advent of more powerful LLMs equipped with deep interactions to innovative, goal-oriented automation.
reasoning and thinking abilities, use of tools, and the We will delve into the evolving progress of generative AI
capacity to understand and synthesize multilingual and agents, market observations, and the exciting potential
multimodal data, generative AI is increasingly capable of of autonomous systems, particularly in the field of
solving complex problems by translating them into a set software development. We invite you to reflect on the
of autonomous steps or tasks, much like humans would. technological advancements shaping our future and
We refer to these advanced systems as generative AI the implications of an increasingly automated world.
agents.
Adaptability
Generative AI agents are defined by their ability to interact with environments and execute tasks
autonomously, showcasing cognitive processes like reasoning and goal setting for problem solving.
• Currently, most agents operate at foundational levels of conversation, analytical capability, and autonomy, but
significant advancements toward innovation and collaboration are anticipated.
• These agents already enhance business processes across various value chains, from research to customer service, by
automating complex tasks. Future agent systems can potentially automatize entire processes instead of use cases.
• The transition from Robotic Process Automation (RPA) to Agentic Process Automation (APA) allows for dynamic,
goal-oriented workflows that improve cost and time efficiency of processes already now. The use of Small Language
Models (SLMs) instead of LLMs within APA is an important approach to optimize costs and deploy agents on-premise
or on edge.
• As generative AI evolves, there is an increasing focus on mitigating the known risks of LLMs. However, since these risks
cannot be completely eliminated, there remains an important need for human oversight in agentic systems.
Transformation
Generative AI agents are able to significantly transform processes instead of single use case or tasks.
• A highly promising process showcasing this transformative potential is the software development lifecycle, particularly
through roles and tasks in planning, development, testing, review, and deployment. Notable use cases include code
generation, automated testing, and automated reviewing.
• While there is a cautious approach to their use in critical tasks like infrastructure deployment, these agents enhance
workflows by automating test creation and adapting to evolving requirements, thereby improving overall efficiency in
the software lifecycle.
• Technologies such as Retrieval-Augmented Generation (RAG) and AutoGen help reduce manual testing efforts,
allowing developers to focus on complex problem-solving. Beyond software, these agents are impacting fields like
industrial engineering and scientific research by streamlining tasks and fostering collaboration.
Responsibility
Generative AI agents present both remarkable opportunities and significant challenges.
• The creation of scalable, multi-modal agentic systems capable of integrating diverse sensory inputs and harnessing
collective human and artificial intelligence will open new frontiers for generative AI across various sectors.
• While they hold potential for enhancing efficiency, their susceptibility to adversarial attacks raises concerns about
their robustness and trustworthiness, particularly as these systems are designed to predict and perform actions.
• As AI technology advances, it is vital to anticipate potential risks and to adapt evaluation methods for real-world
applications, including methods like agent-as-a judge solutions with human oversight.
• Addressing ethical and stability considerations and ensuring responsible use are essential to mitigating risks.
By carefully weighing both the opportunities and challenges, we can fully realize the transformative
potential of generative AI agents while safeguarding societal well-being.
5
“AI agents will drive business
automation and business
decision augmentation. They
will advance to specialized
assistants that will help users in
various business roles by driving
business decisions and taking
action. Ultimately, this will not
only lead to much more efficient
and guided processes, but also
transform the business
processes themselves.”
Dr. Christian Karaschewitz
AI Product Incubation Lead,
SAP Business AI – Product & Partner
Management
SAP SE
7
A Quick Dive into Generative AI Agents
A generative AI agent is an autonomous system that leverages large language models and foundation models to
independently execute complex tasks and workflows in a digital/physical environment. It perceives its surroundings,
reasons, plans, and acts over time to achieve its goals and influence future outcomes [1-4].
Here we outline five levels of generative AI agent competence—conversational, reasoning, autonomous, innovating,
and organizational—based on their capabilities in thinking (brain), perceiving (perception), and task execution
(action). While not exhaustive, this categorization aims to provoke inquiries about the dynamics of human-agent
interactions [5-7].
Episodic Memory: Events Human-like Reasoning Goal-setting Autonomous Learning Personality & Role-Playing
Summary & Abstarction Reflection & Critique Planning Multistep Tasks Generalization Team Dynamics Insight
Semantic Memory: Knowledge Judgement & Evaluation Decision-making Goal Recalibration Strategic Thinking
Brain
Idea/Design Generation Coordination Planning
Textual Input Encoding Pattern Recognition Active Sensing/Monitoring Perceptual Learning Organizational Monitoring
Visual Input Encoding Multi-source Input Integration Goal-directed Perception Perceptual Recalibration Collective/Mutual Perception
Percep- Auditory Input Encoding Autonomous Data Mining Perceptual Anticipation System Failure Awareness
tion
Other Sensor Input Encoding
Conversation Completion Intent Inference Automated Tool Usage Learning/Making New Tools Multi-agent Collaboration
Task-Oriented Cooperative
Task:
The theme of our
Find the Umbrella To create a product, product is...
we should...
#$%@#*....
Manager
I think the first step The architecture of
is... the product is...
... I will...
Designer
Firstly, we should... Programming...
def main():
In order to
Innovation-Oriented develop ...
Engineer
Task: The product has the
Create a New Medicine Single Agent Multi-Agent following issues: ...
Tester
Disordered Ordered
Adversarial
Agentic
I think users need a simplified interface.
Lifecycle-Oriented System
Task: Design Good idea, but... technical Limitations might affect
performance.
Maintain a Lifelong Survival
Patterns
True... while simplification does enhance user
experience.
Instruct/ Feedback
Output
Human as instructor Agent as executor
9
A Quick Dive into Generative AI Agents
• To develop generative AI agent systems for business scenarios, we pinpoint six key components illustrated
below that may exist within an arbitrary business problem that agents can address.
• In the case of customer feedback analysis, for example, the following components are critical [5-7]:
‒ Understanding & Perceiving: Gather customer feedback from various sources (surveys, reviews, social media)
to understand sentiments and trends.
‒ Reflecting & Analyzing: Analyze the feedback to identify common themes and areas for improvement. Use
natural language processing to extract insights and sentiments.
‒ Iterative Review: Continuously monitor customer reactions to implemented changes, collecting new
feedback to assess the effectiveness of actions taken.
• With the core components of business problems in view, this section illustrates a generic multi-agent framework
based on three core generative AI agent capabilities as outlined earlier: the "brain" (memory, reasoning &
analytic, as well as learning & innovation agents), along with perception- and action-related agents. Here we
aim to broadly outline potential agents based on various previous definitions, though this graphic is by no means
exhaustive [5].
• In a multi-agent system, agents can interact in various patterns, such as in a hierarchical or decentralized
structure. Most existing frameworks typically employ a centralized communication model, where an orchestrator
sets goals, creates plans, delegates tasks to specific agents, and monitors the outcomes [5, 8-10].
• For instance, in customer feedback analysis, an orchestrator may collaborate with an autonomous data mining
agent and a reflection or evaluation agent to effectively accomplish the task.
Input SLMs alongside LLMs Output
Memory Agents Reasoning & Analytic Agents Learning & Inovation Agents
Short-term Long-term Long-term Human-like Reflection & Judgement & Autonomous Generalization Idea & Design
Working Declarative Declarative Reasoning Critique Evaluation Learning Generation Task
Memory Memory - Memory -
Episodic Semantic Completion
Experiences & Knowledge &
Events Ontology
Orchestrator
Despite the promising capabilities of large language models and the future potential of AI agents, the successful
adoption of AI agents in organizations, as with any AI technology, depends on various critical elements. The
appliedAI AI Strategy House framework provides a clear visual and conceptual structure of these essential
elements. By systematically aligning and measuring these elements, organizations can implement and scale AI
agents more effectively, ensuring the maximization of their AI investments. The crucial prerequisites of AI agent
adoption associated with the six enabling elements in the appliedAI Strategy House are detailed below.
AI STRATEGY HOUSE
Future competitive
Ambition Fields of action Commitment
advantage
Data Quality and Access AgentOps & Adaptable Tools Ecosystem Support
• High-Quality Data: Ensure • Robust Evaluators: Implement • Collaborative Partnerships:
data in different languages validation frameworks to ensure Consider partnerships to
and modalities is accurate, quality. Automated evaluation accelerate innovation and the
consistent, and relevant. methods can be deterministic, adoption of AI agents.
• Sufficient Quantity: Gather statistical, or AI-based. • Knowledge Exchange: Share
enough data to train models • Dynamic Monitoring: insights and best practices to
effectively. Also consider Continuously review AI accelerate agent adoption.
augmentation with synthetic performance and in production • Collaborative Development:
data generation. for optimization As agent technologies are still
• Seamless Integration: Connect • Adaptatable Tools: Develop immature, share risks and ramp
agentic applications with data clear strategies for process/tool up investments through joint
in existing systems. changes. development initiatives.
Here we illustrate potential use cases across different phases of a value chain as well as various corporate support
functions, aiming to inspire further ideas and innovation. While the examples are not exhaustive, they serve as a
starting point for exploring the diverse applications of value chain optimization.
12
Industrial Agentic Use Cases:
From Strategy to Operation
Cross-industry 1 Automotive 2 Pharmaceuticals & Chemicals 3 Semiconductor, Electronics, & 4 Manufacturing, Aerospace, & Machinery 5
Tech Products
Discovery, Exploration & Research Discovery, Exploration & Research
Planning & Proposal New chemical compound hypothesis generation (based Inbound Logistics
Advanced requirement management using automated AI-assisted consumer feedback and trend analysis for Supplier Management
product development on existing patent, experiment and market data) Predicts price fluctuations for raw materials
quality and feasibility assessment for technical and Total cost of ownership evaluation for different
based on market trends and proposes sourcing
regulatory compliance Discovery, Exploration & Research suppliers
strategies
Supplier Management Market situation survey and competitor analysis Inbound/Outbound Logistics
Discovery, Exploration & Research
Use Cases Across Different Phases of a Value Chain
AI-assisted product knowledge retrieval, product design, AI-based simulation of supplier and real-time supply risks Discovery, Exploration & Research Outbound Logistics
Long-term contract optimization
Analysis of latest shipping regulations to
development and validation Material & Component Sourcing Optimized research path proposals based on historical streamline logistics operations across borders
AI-assisted resilience matrix creation for critical drug discovery efforts Production, Manufacturing, & Operation
Marketing Strategy components Chip production process monitoring and review to Process/Equipment Design & Management
Advanced market intelligence through automated Outbound Logistics Material & Component Sourcing generate yield rate improvement directions
sales strategy simulation based on market analysis and Sustainable material identification for evaluation of Fabric manufacturing process simulation to
Distribution center location optimization based on Quality Control and Management reduce waste and enhance quality
demand forecasting customer density and part availability trade-offs in performance and cost
Automated visual inspections with the capability to
Customer Service & Support Outbound Logistics Integration, Assembly & Testing
learn from previous errors and adapt over time
AI-powered assistants providing real-time support to Temperature-controlled shipping method suggestions Adaptive assembly process modeling that can
Process/Equipment Design & Management
customers based on interactions and additional internal to maintain drug integrity during transport. shift based on real-time demand
Proposal of part manufacturability optimization solution Sales Strategy
and external data before production Integration, Assembly & Testing
Targeted upsell strategies based on customer
Customer Service & Support Process/Equipment Design & Management Automated reliability testing to assess the
Production, Manufacturing, & Operation behavior data gathering and analysis
Personalized helpdesk handling customer requests, Analysis of outcome of formulation changes to predict reliability of critical components after production
Smart maintenance through dynamic assistance in analysis, Customer Relationship Management
complaints, and incidents risks of impacts on product stability and performance
problem resolution and documentation of incidents Automated sentiment monitoring on social media
and propose mitigation plans.
Sales Channel Management Quality Control and Management to guide marketing campaigns
AI-based sales agents for automating personalized Defect detection, analysis, and prevention plan proposal,
Production, Manufacturing, & Operation
Retail, Food, & Beverage 8
customer and lead interactions Dynamic lab operation optimization based on lab
to identify/analyze recurring defects in parts and propose
process monitoring
Sales Channel Management revised quality checks to prevent them
Automated sales contract execution with liability checks
Customer Service & Support
5 Ideation & Concept Development
Agent-assisted new beverage concept
development
4
Predictive car service issuing personalized notifications
based on model history
Material & Component Sourcing
RPA
Using manual-
crafted rules Static and Fixed to defined
Manually Rule-based data- Well-defined
to orchestrate simple scenarios and Can handle rigid Cannot handle
several contructing via flow and control- sub problems
step-by-step unable to update task flexible task
software in pull-and-drag flow and tasks
a solidified workflows instructions
workflow for
execution
• Focus: Create • Resources • Mechanism: Bots • Monitoring: Rule-based • Efficiency: Greatly • Limited Flexibility: • Indicative
workflows using Required: IT execute predefined monitoring through reduces claim Struggles with Scenarios:
defined rules to expertise to tasks like claim data claim processing logs processing time for unstructured High volume,
handle repetitive configure bots entry, validations, and alerts to identify routine tasks. data and repetitive claim
tasks such as and integrate and standard failures or bottlenecks. • Cost-Effective: unexpected processing tasks
extraction of them with communications. • Evaluation Frequency: Significant savings scenarios. with clear rules;
names and existing claim • Speed: High volume Periodic reviews for on claim processing • Maintenance ideal for back-
dates. processing processing of claims performance and labor for repetitive Requirement: office functions.
• Structure: systems. at a consistent rate efficiency. tasks. Requires periodic • Best Fit: Claim
Simple • Development once programmed. • Update Process: • Scalability: Easy to manual updates processes with
flowcharts Work: Through • Consistency: Manual updates scale operations up to adapt to new low variance
illustrating the manual robotic Executes tasks required for any or down as needed. processes. and where
step-by-step procedure setup exactly as changes in workflows • Lack of Insight: performance
process for and review programmed, with or system integrations. Doesn’t can be
claim data entry, process. minimal deviation. analyze data measured
verification, and for patterns or without needing
updates. insights beyond complex
predefined tasks. decision-making.
Use Case
Example
• Focus: Design • Resources • Mechanism: • Monitoring: Human • Adaptability: Can • Complex • Indicative
Insurance workflows that Required: Intelligent bots oversight together handle complex, Implementation: Scenarios: Claim
Claim incorporate Cross-functional assess incoming with automated variable tasks Requires processes
Processing intelligent teams including claims, make monitoring through required in the substantial requiring
decision-making data scientists, decisions based on analytics tools claims by evolving upfront adaptability and
and adapt machine learning past claim history, that evaluate bot based on historical investment in deep analysis;
to changing experts, and and take actions performance and data. technology and suited for
scenarios and process analysts that can change decisions. • Enhanced Reasoning talent. complex claims
formats of claim to develop and dynamically. • Evaluation Frequency: and Decision-Making: • Data with varied
contents. refine algorithms • Speed: Faster Potential real-time Improved accuracy Dependency: outcomes.
• Structure: for claim decision-making, evaluations for instant and responsiveness Performance • Best Fit: Claim
Adaptive processing. especially for adjustments where to subtle claim reliant on processes that
flowcharts that • Development complex claims, as necessary. scenarios. the quality are infrequent,
can change Work: Through bots learn and adapt. • Update Process: Self- • Customer and quantity unpredictable,
based on generalized • Consistency: updating capabilities Experience: Offers of available and necessitate
real-time data and yet flexible Maintains accuracy based on learned personalized reference data. sophistsicated
analytics and agentic modules over time by learning data and analytics to services and faster • Risk of Errors: reasoning and
learning from and goal-driven from feedback continuously improve claim resolution. Potential risk decision-making.
previous claims adaptable instead of strictly the workflow. with AI agent
processing. autonomy. adhering to initial biases that lead
programming. to incorrect
decisions.
APA
Incorporating
AI agents to
adaptively Dynamic and Automatically Adaptable to
Agent-based Monitoring and Ill-defined sub
contruct scenario- constructing, various scenarios Can handle rigid
data-flow and verification problems and
and execute adaptive orchestrating, and able to update and flexible tasks
workflows control-flow may be tricky tasks
workflows and testing instructions
to achieve
process
automation
Operational
Cycle
Development
Cycle
3. Evaluate 4. Integration & Deploy 5. Inference & Execution 6. Monitoring & Logging
Assess the agent’s Integrate the generative Enable the agent to Continuously monitor the
performance using AI agent into the process inputs and agent’s performance and
predefined metrics and necessary systems and generate outputs in real usage, recording
benchmarks, possibly workflows. Deploy the time. Activate its core interactions and outputs.
leveraging external agents agent in a production functionalities, allowing it Implement step-by-step
to evaluate its output environment where it can to perform the tasks for trace-back mechanisms
(Agent-as-a-Judge) for interact with users and which it was designed. to analyze any issues or
consistency and perform its intended performance deviations
relevance. tasks. effectively.
15
Focusing the Lens:
Generative AI Agents in Software Development
From LLMs to LLM-based Agents in Software Development
With the emergence of LLMs and generative AI, their applications are being extensively investigated across
different industries. A significant area of focus is software development, where LLMs have demonstrated
impressive capabilities in tasks like code generation and test design. Despite these achievements, they also face
several limitations, particularly regarding autonomy. LLM-based agents utilize LLMs as the foundation for planning,
designing, decision-making, and executing actions during software development, thereby overcoming some of the
prior constraints. In this section, we highlight the main distinctions between these two approaches [14].
Instruction-Oriented
Software
Development
Using LLMs Human Human Human Human Human Human
Instruction Instruction Instruction Instruction Instruction Instruction
6. Review 3. Develop
2. Design 2: Documentation:
2: Feedback Analysis: Automatically generate or update code
Analyze feedback from team members documentation and comments.
to identify common themes or areas 1: Architectural Suggestions:
for improvement Provide recommendations for
system architecture based on 3: Code Refactoring:
requirements. Suggest improvements for existing code to
3: Performance Metrics Reporting: enhance readability and performance.
Compile and interpret performance
data to assess project progress. 2: UI/UX Design Assistance:
Generate wireframes or 4: API Integration Assistance:
design mockups from textual Provide guidance or generate code for
4: Documentation Review: descriptions. integrating third-party APIs.
Ensure that all project documentation is
up-to-date and comprehensive. 3: Design Document Drafting: 5: Language Support:
Create comprehensive Assist developers with syntax, libraries,
5: Retrospective Facilitation: design documents outlining and frameworks in various programming
Generate questions and topics for components, interfaces, and data languages.
retrospective meetings to encourage flow.
constructive discussions.
4: Component Specification: 7. Launch
Define specifications for
1. Plan individual system components.
1: Marketing Content Creation:
5: Best Practices Guidance: Generate marketing materials, blog posts,
1. Requirement Gathering & Analysis: Offer insights into industry best and release notes for the product launch.
Analyze stakeholder inputs and extract practices and standards relevant
key requirements. to the project.
2: User Onboarding Assistance:
Create guides, tutorials, and FAQs to help
2. User Story Generation: users understand and adopt the new
Create detailed user stories based on software.
high-level requirements.
Trust and Readiness for Adoption:
3: Feedback Collection Tools:
3. Project Planning: Assist in timeline Joint Assessment Among Selected
Design surveys or feedback forms to gather
estimation and resource allocation. appliedAI Partner Companies
user input post-launch.
17
Generative AI Agents in Action for Software Testing
Background Opportunity
Continuous software testing is a critical element There is a significant opportunity to leverage AI
of the software development lifecycle, especially to automate the generation and optimization of
within agile methodologies, where testing occurs test cases, thereby reducing the manual workload
at every stage to ensure system robustness as new on developers, enhancing testing efficiency, and
code is committed to repositories like GitHub, often improving the overall quality of software products.
facilitated by tools like Jenkins for CI/CD processes.
Challenge
Despite advancements in automation, the creation
and refinement of test cases—such as unit tests and
functional/UI tests —still require substantial human
effort, leading to inefficiencies and potential gaps in
testing coverage.
Exploratory Tests
• Validating functionalities in
unanticipated scenarios
Functional/UI Tests
• Validating functionalities in
business scenarios
Unit Tests
• Validating individual components
GitHub Repository
Code → English Original File Code Diff Original File Code Diff Original File Code Diff
Summary Summary Summary
File Summary
Code Prompt
Snippet
Context about Summary of
VectorDB the project all changes
PR Reviewer
AutoGen
Code Snippets
Commit & Push Unit Tests
Execution Results
Developer Proxy QA Tester
w/ docker
%
Installing...
19
Generative AI Agents in Action for Software Testing
Orchestrator
Evaluation Task Delegation
• Criteria to make a "pass"
or "fail" judgement
Persona Agent
Developer (GPT-4o)
• Role-playing in planning Test
Instruction &
and coordinating Completion
Oversight testing steps Reasoning & Analytic Agents
Output
Generation
Computer Use Agent
(Claude 3.5 Sonnet) Providing Final Screenshots
Computer
• Computer Interface Control and Textual Descriptions
Screen
• File Navigation
• Web Search &
Browsing
For demonstration purposes during the prototypical phase, we have chosen an internally developed software,
'GenAI.xy Playground,' as the target for software testing. This selection allows us to assess, based on its
current capabilities (such as reasoning, planning, and UI execution), the level of complexity (functional level)
that our designed multi-agent system can handle.
The table below presents the GUI-based functions that have been tested with the prototype as well as the
testing results. Refer to "Multi-agent Functional/UI Testing Workflow" on the previous page for a diagram of the
prototyped system and the appliedAI Initiative Youtube channel for live demo recordings.
Judgement
Test case Goal Description Testing result Agent result
Use the search bar to search for Acting agent based on Claude 3.5 Computer
Search bar one keyword, and then open a use Use correctly understood the visual design PASS
case from the result of a search bar
Favourite (bookmark) In the Use Case Library, use the Acting agent based on Claude 3.5 Computer
feature filtering feature to select 1 industry Use was able to correctly find filter on UI
PASS
Filtering (checkbox with and find one use case to add to and also understand the visual design that a
dropdown menu) favorites. ❤
heart means adding to favourite.
Example Agentic Software Test Workflow for the Test Case 'Seach Bar'
21
Generative AI Agents in Action for Software Testing
23
Retrospective & Prospective:
Challenges and Opportunities for Generative AI Agents
Insights & Reflections
Adversarial
Robustness
LLMs Under Attacks Trustworthiness
• Large language models Calibration Challenges
are susceptible to adversarial • Language models face Misuse, Bias, &
attacks, leading to erroneous
responses. Relevant attack
challenges with the so-called Fairness
calibration problem, which causes
methods include dataset poisoning them to inadequately convey Exploitation of LLM
and prompt-specific attacks. the certainty of their predictions, Agents
In Pursuit of Robustness Techniques leading to outputs that do not • Individuals with malicious
• Approaches such as adversarial reflect human expectations in intentions can exploit LLM-based
training, data augmentation, and practical use cases. agents to sway public perception,
sample detection can enhance the disseminate misinformation, and
Demand for Reliability
robustness of LLM-driven agents; conduct unlawful activities.
• There is an urgent demand for
however, a complete solution intelligent agents that are both Dangers to Security and Society
continues to be elusive. reliable and honest. Recent studies • The potential for abuse of
Human Oversight Required have focused on directing models generative AI agents presents
• Introducing a human-in-the-loop to offer reasoning and explanations considerable dangers to both
framework can help oversee and to improve their credibility. security and social stability, which
improve the conduct of LLM- could lead to orchestrated terrorist
Debiasing and Fairness
dependent agents, which may activities and cyber threats.
• Implementing debiasing strategies
reduce the threats posed by and calibration methods during Regulatory Measures for Safe Use
adversarial attacks. the training process can address • To reduce these risks and promote
fairness concerns and improve the responsible usage, it is crucial
reasoning capabilities of language to implement strict regulatory
models. frameworks and improve security
protocols in the development and
training of these agents.
25
Retrospective & Prospective:
Challenges and Opportunities for Generative AI Agents
Focus on Efficiency in Both Early- and Late- Start with Agentic Process Automation (APA)
Stage Value Chains For various business processes, a structural
Initial applications of generative AI agents are approach such as APA can be implemented to
likely to concentrate on enhancing efficiency in effectively leverage advanced agent capabilities,
use cases across both the early- and late-stage accommodating diverse task complexities and
value chains of various sectors, streamlining thereby enhancing and augmenting existing
processes, and improving productivity. robotic process automation (RPA) workflows.
Milos Rusic
CEO & Co-Founder
deepset
27
References
[1] T. Sumers, S. Yao, K. Narasimhan, and T. L. Griffiths, “Cognitive Architectures for Language Agents,” Sep. 05, 2023, arXiv:
arXiv:2309.02427. doi: 10.48550/arXiv.2309.02427.
[2] S. Franklin and A. Graesser, “Is It an agent, or just a program?: A taxonomy for autonomous agents,” in Intelligent Agents
III Agent Theories, Architectures, and Languages, vol. 1193, J. P. Müller, M. J. Wooldridge, and N. R. Jennings, Eds., in
Lecture Notes in Computer Science, vol. 1193. , Berlin, Heidelberg: Springer Berlin Heidelberg, 1997, pp. 21–35. doi: 10.1007/
BFb0013570.
[3] L. Yee, M. Chui, and R. Roberts, “Why AI agents are the next frontier of generative AI | McKinsey,” McKinsey Quarterly.
Accessed: Dec. 11, 2024. [Online]. Available: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/why-
agents-are-the-next-frontier-of-generative-ai
[4] A. Gutowska, “What Are AI Agents? | IBM,” IBM Think. Accessed: Dec. 11, 2024. [Online]. Available: https://www.ibm.com/
think/topics/ai-agents
[5] Z. Xi et al., “The Rise and Potential of Large Language Model Based Agents: A Survey,” Sep. 19, 2023, arXiv: arXiv:2309.07864.
doi: 10.48550/arXiv.2309.07864.
[6] J. Cook, “OpenAI’s 5 Levels Of ‘Super AI’ (AGI To Outperform Human Capability),” Forbes. Accessed: Dec. 11, 2024. [Online].
Available: https://www.forbes.com/sites/jodiecook/2024/07/16/openais-5-levels-of-super-ai-agi-to-outperform-human-
capability/
[7] T. Guo et al., “Large Language Model based Multi-Agents: A Survey of Progress and Challenges,” Apr. 19, 2024, arXiv:
arXiv:2402.01680. Accessed: Nov. 18, 2024. [Online]. Available: http://arxiv.org/abs/2402.01680
[8] X. Li, S. Wang, S. Zeng, Y. Wu, and Y. Yang, “A survey on LLM-based multi-agent systems: workflow, infrastructure,
and challenges,” ResearchGate. Accessed: Nov. 18, 2024. [Online]. Available: https://www.researchgate.net/
publication/384732283_A_survey_on_LLM-based_multi-agent_systems_workflow_infrastructure_and_challenges
[9] J. Liu et al., “Large Language Model-Based Agents for Software Engineering: A Survey,” Sep. 04, 2024, arXiv:
arXiv:2409.02977. Accessed: Nov. 18, 2024. [Online]. Available: http://arxiv.org/abs/2409.02977
[10] Y. Wang et al., “Agents in Software Engineering: Survey, Landscape, and Vision,” Sep. 23, 2024, arXiv: arXiv:2409.09030.
Accessed: Nov. 08, 2024. [Online]. Available: http://arxiv.org/abs/2409.09030
[11] Y. Ye et al., “ProAgent: From Robotic Process Automation to Agentic Process Automation,” Nov. 23, 2023, arXiv:
arXiv:2311.10751. doi: 10.48550/arXiv.2311.10751.
[12] L. Dong, Q. Lu, and L. Zhu, “AgentOps: Enabling Observability of LLM Agents,” Nov. 30, 2024, arXiv: arXiv:2411.05285. doi:
10.48550/arXiv.2411.05285.
[13] M. Zhuge et al., “Agent-as-a-Judge: Evaluate Agents with Agents,” Oct. 16, 2024, arXiv: arXiv:2410.10934. Accessed: Oct. 21,
2024. [Online]. Available: http://arxiv.org/abs/2410.10934
[14] H. Jin, L. Huang, H. Cai, J. Yan, B. Li, and H. Chen, “From LLMs to LLM-based Agents for Software Engineering: A Survey of
Current, Challenges and Future,” arXiv.org. Accessed: Aug. 30, 2024. [Online]. Available: https://arxiv.org/abs/2408.02479v1
Lukas Wogirz
CEO & Co-Founder
databAIse
Firms that employ large Our latest whitepaper on Our RAG use case study
language models (LLMs) can Retrieval-Augmented Generation on Retrieval-Augmented
create significant value and (RAG) offers insights into the Generation (RAG) within
achieve sustainable competitive advancements and challenges the test and measurement
advantage. However, the of Retrieval-Augmented industry highlights common
decision of whether to make- Generation (RAG) within the challenges in the technical
or-buy LLMs is a complex one industry. It provides an analysis domain and explores effective
and should be informed by of industry demands, current RAG evaluation techniques.
consideration of strategic value, methodologies, and the We demonstrate how Large
customization, intellectual obstacles in developing and Language Models (LLMs) can
property, security, costs, evaluating RAG. Additionally, our be leveraged to scale up RAG
talent, legal expertise, data, whitepaper aims to facilitate evaluation reliably, and address
and trustworthiness. It is strategy development and industry-specific challenges
also necessary to thoroughly knowledge exchange about such as multilingual data, in-
evaluate available open- practical use cases across domain data, and complex
source and closed-source LLM various industrial sectors. The tabular structures. Our vision
options, and to understand the whitepaper is the result of pipeline and retrieval fine-tuning
advantages and disadvantages extensive studies and discussions solutions have significantly
of fine-tuning existing models conducted with our internal improved the accuracy of RAG,
versus pre-training models from teams and industry partners. It proving the value of customized
scratch. highlights RAG as a cost-effective RAG applications for the wireless
technique that has significantly test and measurement sector.
improved the trustworthiness
and control of Large Language
Model (LLM) applications over
the past year.
29
Authors
Paul Yu-Chun Chang works as an Senior AI Mingyang Ma works as Principal AI Strategist &
Expert specializing in Large Language Models Product Manager at appliedAI Initiative GmbH,
at appliedAI Initiative GmbH. He has over 10 supporting all partner companies’ decision making
years of interdisciplinary research experience in and technical solution identification of various AI use
computational linguistics, cognitive neuroscience, cases, with a particular focus on leveraging LLMs.
and AI, and more than 6 years of industrial With over 6 years of expertise in NLP, Mingyang
experience in developing AI algorithms in language has excelled in the realm of Conversational AI,
modeling and image analytics. Paul holds a PhD demonstrating her proficiency in application
from LMU Munich, where he integrated NLP and DevOps and platform development across various
machine learning methods to study brain language processes during her tenure at BMW Group in both
cognition. Germany and the USA.
Bernhard Pflugfelder
Head of Generative AI,
appliedAI Initiative GmbH
b.pflugfelder@appliedai.de
Antoine Leboyer is an entrepreneur and the Managing Lukas Wogirz is the CEO and Co-Founder of databAIse, an
Director of SW/AI at TUM Venture Lab and a Board Member at AI-powered platform transforming unstructured text data
Hyperganic. Formerly, he served as President and CEO of GSX into actionable insights. With a Master’s degree in Electrical
and held board positions at Geneva Liberal Synagogue and Engineering and Information Technology from TUM, Lukas
Martello Technologies. He holds an MBA from Harvard, class of specializes in AI/ML, automation and deep learning. He has
'92. previously worked on advanced technologies at MOV.AI, where
he developed patented algorithms for industrial automation.
Milos Rusic is the co-founder and CEO of deepset, the Christian Karaschewitz is a product manager and innovation leader
company behind Haystack and deepset Cloud—leading with over a decade of experience at SAP. Currently, he serves as AI
solutions for rapid custom LLM and NLP application Product Incubation Lead, driving cutting-edge solutions in Business
development. Trusted by NVIDIA, Intel, Airbus, and The AI. Previously, he led product initiatives for SAP Start-Up initiatives
Economist, deepset’s tools empower enterprises to build and such as Head of Product for Ruum by SAP and Co-Founder and Head
deploy AI solutions tailored to their unique needs and mission- of Product of FlexPay by SAP. Beyond his work at SAP, Christian has
critical use cases. Learn more at deepset.ai. mentored startups as a Venture Mentor at SAP.iO. He holds a Ph.D.
from the University of St. Gallen and a Master’s from the University of
the Arts Berlin. With a passion for innovation, he excels at delivering
impactful technologies and business solutions.
Emre Demirci is a dedicated Data Joong-Won Seo is a Master’s student Ferdy Dermawan Hadiwijaya is a
Engineering Master's student at the in Computer Science at the Technical master's student in Computer Science
Technical University of Munich (TUM). University of Munich, specializing in deep at the Technical University of Munich
As a working student at AppliedAI, learning, generative AI, and full-stack (TUM), specializing in Natural Language
Emre focuses on developing cutting- engineering. With extensive experience as Processing and Generative Models. As a
edge solutions involving large language a teaching assistant at TUM, he combines working student at appliedAI, he serves
models (LLMs) and knowledge graphs. a strong research foundation with hands- as a Junior GenAI Engineer, bringing
Passionate about leveraging technology on engineering skills. He applies LLMs and three years of professional experience
for impactful solutions, Emre’s work generative AI to develop prototypes that in Large Language Models and software
bridges the gap between AI innovation translate theoretical ideas into practical development to bridge cutting-edge
and real-world applications. solutions for real-world challenges. academic research with practical
industry applications.
31
About appliedAI Initiative GmbH
33
Generative AI Agents in Action:
Revolutionizing Software
Development Testing
August-Everding-Straße 25
81671 München
Germany
www.appliedai.de