From Requirements To Architecture An AI Based Journey To Semi-Automatically Generate Software Architectures
From Requirements To Architecture An AI Based Journey To Semi-Automatically Generate Software Architectures
From Requirements To Architecture An AI Based Journey To Semi-Automatically Generate Software Architectures
architectures play a vital role in fulfilling the system’s quality of ser- In related work, Kof [8] tries to generate domain models as a
vice. Due to time pressure, architects often model only one architec- basis for the architecture from requirements. However, his process
ture based on their known limited domain understanding, patterns, still requires a lot of time-consuming manual work to generate a
and experience instead of thoroughly analyzing the domain and fitting domain model. Due to the rapidly growing capabilities of
evaluating multiple candidates, selecting the best fitting. Existing modern artificial intelligence (AI) techniques, e.g., natural language
approaches try to generate domain models based on requirements, processing (NLP), such as large language models, we believe that
but still require time-consuming manual effort to achieve good we can reduce the manual effort substantially by applying these
results. Therefore, in this vision paper, we propose a method to gen- techniques in a semi-automated way.
erate software architecture candidates semi-automatically based We propose a method, i.e., a process and tool, to generate archi-
on requirements using artificial intelligence techniques. We fur- tectures semi-automatically based on requirements using modern
ther envision an automatic evaluation and trade-off analysis of AI techniques. In our method, we first generate domain models and
the generated architecture candidates using, e.g., the architecture use-case scenarios using large language models. Then, we derive
trade-off analysis method combined with large language models multiple software architecture candidates with their architectural
and quantitative analyses. To evaluate this approach, we aim to decision decisions and evaluate them to make informed decisions
analyze the quality of the generated architecture models and the about which candidate to take. We include manual iterations to
efficiency and effectiveness of our proposed process by conducting improve the overall outcome. We believe that this process can po-
qualitative studies. tentially improve the quality of architecture and the creation time
in most projects.
CCS CONCEPTS It may seem counter-intuitive to restrict the AI to the standard
procedure. However, the standard procedure has the advantage
• Software and its engineering → Designing software; Soft-
that it can easily be applied semi-automatically, and an experienced
ware architectures; System description languages; • Computing
architect can nudge the LLM in the right direction. Our exploratory
methodologies → Artificial intelligence.
analysis in section 4 suggests that a fully automated solution might
KEYWORDS not perform well with current state-of-the-art LLMs. Furthermore,
no data is available to train LLMs for this task, so we cannot easily
Requirements, Software Architecture, Architecture Evaluation, LLM improve upon the standard LLMs. Our approach has the advantage
of relying on general-purpose LLMs without special training.
1 INTRODUCTION
To evaluate our approach, we aim to answer the following re-
Software architecture plays a vital role in the quality of every soft- search questions:
ware system. However, deriving a software architecture model
RQ1: “Can state-of-the-art NLP technology generate reproducible,
fitting the functional and non-functional requirements is challeng-
correct, and elaborate domain models and use case scenarios based
ing, especially for complex and unknown domains, as it requires
on requirements in natural language?”
domain knowledge and understanding. Especially in modern meth-
ods like Domain-Driven Design [6], building an elaborate domain
RQ2: “Can state-of-the-art AI technology generate software archi-
model constitutes the basis for deriving the architecture. In addition,
tectures based on a domain model, use case scenarios, and require-
architecture creation is very time-consuming and often based on in-
ments that can appropriately fulfill these requirements?”
complete or contradictory requirements documents. New software
architects especially lack experience and technical knowledge, but
more experienced ones may focus more on concepts and technolo- RQ3: “Can quantitative and qualitative software architecture eval-
gies they are used to. uations and trade-off analyses be automated through the use of
Hence, architects often design one architecture instead of mul- AI?”
tiple architecture candidates, which are evaluated against the re-
quirements and other factors. Thus, mistakes during the design
phase of the architecture can show up later, imposing high main-
tenance costs. Therefore, we require means to support software
Eisenreich et al.
RQ4: “Does a method for semi-automatic architecture genera- best fitting candidate. In the following, we will describe each step
tion improve the architecture’s quality while reducing the time in more detail.
required?”
3.1 Generating Domain Model and Use-Case
The remainder of this paper is structured as follows: First, we Scenarios
outline the related work in Section 2. In Section 3, we describe our In the first step, we want to generate a domain model and use-case
proposed approach. In Section 4, we describe an exploratory analy- scenarios out of functional and non-functional requirements. While
sis we have conducted. Finally, we explain our planned evaluation we only consider textual requirements in the first iteration, in later
in Section 5 and conclude in Section 6. ones, we also include the current architecture and architecture
decisions leading to it. This will allow us to apply this process to
2 RELATED WORK incremental architecture.
Using a large language model (LLM), the software architect pro-
Souza et al. [14] reviewed the state-of-the-art of architecture deriva-
cesses the requirements to create an initial domain model and
tion from requirement specifications. They found 39 relevant stud-
use-case scenarios. We deem LLMs suitable for such a task as they
ies, some of which included methods with tooling support, but all
provide decent knowledge in plenty of domains and, thus, are uni-
of them relied heavily on the experience and knowledge of the
versally usable in contrast to more domain-specific AI models. A
architect. Kof [8] took an approach to more automated support. He
first exploratory analysis with LLaMA showed promising results
developed a semi-automatic method to create domain models from
(c.f. Section 4). To be able to fine-tune the LLM if required, we plan
requirements documents. Despite the semi-automatic approach, his
to use an open-source LLM for this task. Currently, we are consider-
method took extensive manual effort, especially for preparing the
ing LLaMA, Falcon, and Yi, but we will evaluate the state-of-the-art
requirements documents.
models for their suitability as part of our research, potentially in-
Researchers have touched on automated architecture evaluation
cluding other models that have not been published yet. Furthermore,
before: Bashroush et al. [2] manage to apply ATAM [7] with partial
we will compare these LLMs’ results with those obtained from the
automation. This accelerates the process of the evaluation. They use
latest GPT version.
a formal Architecture Description Language for this task. Kr??ger et
The generated domain model will omit details that are irrele-
al. [11] also try to speed up the architecture evaluation, but go down
vant to the architecture generation, e.g., an entity’s attributes. The
a different route: They developed a method to transition scenarios
domain model’s primary focus lies in the relations of the domain
into AspectJ prototypes. With these prototypes, multiple architec-
concepts to each other. With these relations, we can cut the domain
tures can be manually evaluated and compared during runtime.
into bounded contexts. We can then cut the architecture into its
Scheerer et al. [13] use PerOpteryx, a Palladio [3] plugin, to run
components with these bounded contexts. The use case scenarios
automated quantitative analysis on architectures. They evaluate ar-
will complement the domain model with the description of the
chitecture design decisions by analyzing their performance impact
system’s functional behavior.
for predefined scenarios. Numerous works evaluate architectures
After generating the domain model and scenarios, the architect
based on metrics; Coulin et al. [4] provide an overview of the state
can inspect and refine them using additional prompts. We assume
of the art. Mo et al. [12] successfully find architectural flaws in
that the domain model and the scenarios can be refined using the
industrial projects with fully automated architecture metrics. Other
LLM, so the architect does not need to do any work directly on the
works try to optimize architectures: Aleti et al. [1] conducted a sys-
models but instead instructs the tooling.
tematic literature review of 188 papers that evaluate architecture
optimization methods. To the best of our knowledge, there is no
automatic and comprehensive architecture evaluation that takes 3.2 Generating Architecture Candidates and
the requirements fully into account. Architecture Decision Records
In the third step, we want to generate multiple architecture can-
didates based on the requirements, the domain model, and the
3 VISION FOR SEMI-AUTOMATIC use-case scenarios. These architecture candidates model the differ-
ARCHITECTURE GENERATION ent design decisions that are viable to take when considering the
In the following, we propose our vision for a process and tool requirements, the domain model, and the use-case scenarios.
framework to generate software architectures semi-automatically We will consider various models for the architecture: First, there
based on requirements. Our process is envisioned to be iterative is the classic “4+1 Model” by Kruchten [10], where an architecture
throughout a software development life-cycle. Figure 1 depicts this consists of multiple views: A logical view, a development view, a
process and the relevant artifacts. The process consists of six steps: process view, and a physical view. With this model, we have full
(1) automatically generating a domain model and use-case scenarios freedom to choose representations for these views, like PlantUML
based on textual requirements, (2) manually refining the domain or Mermaid diagrams. Another promising option is the Palladio
model and scenarios, (3) automatically deriving multiple software Component Model [3], which would allow us to run quantitative
architecture candidates and architectural decisions leading to them architecture evaluations. This model specifies the exact format
using the domain model, scenarios, and non-functional require- for its different views. To some degree, we can also convert an
ments, (4) automatically evaluating and comparing the candidates, architecture from one model into another. The exact format used
(5) manually refining the candidates, and (6) manually selecting the will be the subject of our research.
From Requirements to Architecture: An AI-Based Journey to Semi-Automatically Generate Software Architectures
Semi-automatic Semi-automatic
Refinement Refinement
Architecture
Architecture
Requirements Domain Model, Architecture
Candidates Evaluation and Final Architecture
Candidates
Candidates
Scenarios Comparison
Current Architecture DesignDecisions
Design Decisions Design Decisions
Design Decisions
Generation Generation Generation Selection
To generate the architectures, we will employ multiple tech- it necessary to document them because all the people involved
niques and technologies: For tasks that require a deep understand- were aware of them. Another possibility could be that those are not
ing of the requirements, like cutting the architecture into compo- strictly considered requirements, but the architect already suspects
nents, we will employ LLMs. Where feasible, we will also consider they might become such.
other modern AI technologies. Some tasks are very technical and At this stage, the architect might find that none of the generated
based on well-structured data, like the process view, which explains architecture candidates quite fit their needs. In this case, the archi-
the execution path the use cases take through the components. For tect can refine the architecture candidates similarly as they refined
such tasks, we will instead apply classic algorithms, which do not the scenarios and the domain model: Using prompts, the architect
have the engineering difficulties that modern AI technology has, can tell the tooling what aspects of the architecture candidate need
like shaky and unpredictable results. to be changed. They might even ask for the generation of further
Furthermore, we want to extract the architectural design de- architecture candidates that they can assess and refine as well. With
cisions taken during the generation and model them as architec- all these choices available and with the evaluation assistance, the
ture design decision records (ADRs) [9, 17]. They will give a quick architect can select one of these candidates as the final architecture.
overview of the essence of each architecture candidate. They will
give hints to each architecture candidate on potential refinements 3.5 Further Iterations
that can be made. Moreover, they will document the architect’s After the architecture is selected, the development team can con-
decisions during this process instead of implicit or unconscious tinue implementing it. We assume that, in practice, at some point,
decisions taken by the tooling. the requirements will change. This might either be planned, e.g.,
when the next iteration of an agile process begins, or unplanned, e.g.,
3.3 Architecture Candidate Evaluation due to a misconception in the requirements. In this case, we want
After the generation, we want to assist the architect in selecting to support the iteration of the proposed process: This means that,
the best-fitting architecture candidate for the system he designs. as an additional requirement in the iteration, the generated archi-
We aim to evaluate the architecture candidates and find their pros tecture candidates should differ as little as possible from the current
and cons. We want to automate established architecture evaluation software architecture. Each change in the architecture might im-
technologies like ATAM [7] to achieve this. In addition, we will ply expensive changes in the source code, and we want to reduce
calculate quality metrics on the generated architecture candidates. them as much as possible. But sometimes, architectural changes
By conducting a quantitative analysis, we aim to find potential are necessary to keep up with the project’s quality requirements.
performance hotspots. The selection of the architecture evaluation This means that we cannot restrict the process to retain the cur-
methods will be part of our research: We will consider multiple rent architecture at all costs. Instead, the tooling needs to weigh
evaluation techniques and assess the ability to conduct them auto- the quality implications against the expenses for an architectural
matically. Furthermore, we will compare the aspects of the archi- change. These decisions will – to some degree – be presented to
tecture they evaluate and select the techniques in a manner that the architects, with these decisions factored into the presented data.
makes sure we evaluate a wide range of aspects to give a compre- This will also enable this process for agile environments, which are
hensive architecture evaluation overall. Whether an accurate and very common in the industry.
comprehensive automatic evaluation is possible will be an outcome
of our research. If this proves to be difficult, we will fall back to a 4 EXPLORATORY ANALYSIS
semi-automated evaluation. We did an exploratory analysis with chat versions of LLaMA2 70-B
and GPT-3.5 to generate domain models of text requirements. For
3.4 Architecture Candidate Selection this, we supplied 91 requirements from the MobSTr-dataset [16] and
Finally, we present the generated architecture candidates, design asked the model to generate a PlantUML domain model. The exact
decisions, and evaluations to the architect. With this information procedure can be found in our GitHub1 . Both models could identify
given, the architect can understand the advantages and disadvan- concepts from the requirements. It appears that the prompts were
tages of each architecture candidate. The architect can factor in misunderstood, though – instead of modeling the domain, the LLMs
potential implicit requirements that are not documented. Such re-
quirements might exist for various reasons: Maybe no one found 1 https://github.com/qw3ry/requirements2architecture/tree/exploration
Eisenreich et al.
modeled the system itself. Still, both LLMs showed an understand- find challenges in usability, practicality, and the overall hurdles of
ing of the requirements. While both models could generate valid implementing this process. We want to evaluate the effectiveness
PlantUML, LLaMA did not create relations between the concepts, and efficiency of the process.
even though it claimed it did. GPT handled this task well, though. Lastly, we want to apply the Technology Acceptance Model
In a small iteration, we could improve the results of both models (TAM [5]) to evaluate the attitude towards and the intentions to
substantially with basic prompt engineering. use this process.
With these tests, using general, unmodified LLMs, we can as-
sume that LLMs are generally capable of understanding application 6 CONCLUSION
domains. We must invest some work into adjusting these mod- We propose research for a new process that aids architects in their
els, prompt engineering, and translating the LLM output into the daily work. This process spans the whole creation of a new archi-
chosen format. tecture and can support the adaption of an existing architecture
to new requirements. We believe this process can be the basis for
5 PLANNED EVALUATION a considerable step forward in the quality of industrial software
We plan to evaluate the process as a whole with two studies, and architectures.
we aim to evaluate both the technical quality of the outcome and
the impact on the development process. In addition, we plan to REFERENCES
evaluate the intermediate steps of the process thoroughly, especially [1] Aldeida Aleti, Barbora Buhnova, Lars Grunske, Anne Koziolek, and Indika Mee-
deniya. 2012. Software architecture optimization methods: A systematic literature
regarding the technical quality of the intermediate artifacts. We review. IEEE Transactions on Software Engineering 39, 5 (2012), 658–683.
hope to apply these intermediate studies’ findings to improve the [2] Rabih Bashroush, Ivor Spence, Peter Kilpatrick, and John Brown. 2004. Towards
final evaluation process. an automated evaluation process for software architectures. (2004).
[3] Steffen Becker, Heiko Koziolek, and Ralf Reussner. 2009. The Palladio component
model for model-driven performance prediction. Journal of Systems and Software
5.1 Evaluation Based on Reference 82, 1 (2009), 3–22.
[4] Théo Coulin, Maxence Detante, William Mouchère, and Fabio Petrillo. 2019.
Architectures Software Architecture Metrics: a literature review. arXiv preprint arXiv:1901.09050
(2019).
We want to take well-established reference architectures like T2- [5] Fred D Davis. 1985. A technology acceptance model for empirically testing new end-
Project [15], TeaStore [18], and SockShop, re-engineer their re- user information systems: Theory and results. Ph. D. Dissertation. Massachusetts
quirements, and apply our process. We will then manually apply Institute of Technology.
[6] Eric Evans. 2004. Domain-driven design: tackling complexity in the heart of
architecture evaluation techniques to compare the reference archi- software. Addison-Wesley Professional.
tecture to the generated architecture candidates. Because we are not [7] Rick Kazman, Mark Klein, and Paul Clements. 2000. ATAM: Method for archi-
constrained to automatic evaluation techniques, we can evaluate tecture evaluation. Carnegie Mellon University, Software Engineering Institute
Pittsburgh, PA.
the architectures more in-depth when compared to the automatic [8] Leonid Kof. 2005. Text Analysis for Requirements Engineering. Ph. D. Dissertation.
evaluation described in section 3.3. In addition, we will examine Technische Universität München.
[9] Oliver Kopp, Anita Armbruster, and Olaf Zimmermann. 2018. Markdown Archi-
the generated architecture candidates and compare them to the ref- tectural Decision Records: Format and Tool Support.. In ZEUS. 55–62.
erence architecture to detect potential flaws that the more formal [10] P.B. Kruchten. 1995. The 4+1 View Model of architecture. IEEE Software 12, 6
architecture evaluation techniques missed. Furthermore, we will (1995), 42–50. https://doi.org/10.1109/52.469759
[11] Ingolf H Krüger, Gunny Lee, and Michael Meisinger. 2006. Automating software
ask experts to create architectures for the evaluated requirements architecture exploration with M2Aspects. In Proceedings of the 2006 International
and then rate the generated architecture candidates for their fit to Workshop on Scenarios and State Machines: Models, Algorithms, and Tools. 51–58.
the requirements. With the architecture created by the expert, we [12] Ran Mo, Will Snipes, Yuanfang Cai, Srini Ramaswamy, Rick Kazman, and Martin
Naedele. 2018. Experiences applying automated architecture analysis tool suites.
have a baseline of what an architect in a typical project is roughly In Proceedings of the 33rd ACM/IEEE International Conference on Automated
capable of. Software Engineering. 779–789.
[13] Max Scheerer, Axel Busch, and Anne Koziolek. 2017. Automatic evaluation
The used LLM might have been trained with the reference ar- of complex design decisions in component-based software architectures. In
chitectures previously. Thus, we will carefully check whether the Proceedings of the 15th ACM-IEEE International Conference on Formal Methods
LLMs only reproduce the reference architectures, or incorporate and Models for System Design. 67–76.
[14] Eric Souza, Ana Moreira, and Miguel Goulão. 2019. Deriving architectural models
new ideas. If we find that this limitation applies, we will apply the from requirements specifications: A systematic mapping study. Information and
described evaluation instead with a new set of requirements. software technology 109 (2019), 26–39.
[15] Sandro Speth, Sarah Stieß, and Steffen Becker. 2022. A Saga Pattern Microservice
Reference Architecture for an Elastic SLO Violation Analysis. In 2022 IEEE 19th
5.2 Industrial Field Study International Conference on Software Architecture Companion (ICSA-C). 116–119.
https://doi.org/10.1109/ICSA-C54293.2022.00029
We want to conduct an industrial field study to evaluate our process [16] Jan-Philipp Steghöfer, Björn Koopmann, Jan Steffen Becker, Ingo Stierand, Marc
in a real-world environment. For this evaluation, we will use the Zeller, Maria Bonner, David Schmelter, and Salome Maro. 2021. The MobSTr
requirements of a project and apply our process. We will then dataset: Model-Based Safety Assurance and Traceability. https://doi.org/10.
5281/zenodo.4981481
interview the project members on the quality of the generated [17] U. van Heesch, P. Avgeriou, and R. Hilliard. 2012. A documentation framework
architecture and the quality of the actual architecture of the project. for architecture decisions. Journal of Systems and Software 85, 4 (2012), 795–820.
Secondly, we will deploy our process for some time in the pro- https://doi.org/10.1016/j.jss.2011.10.017
[18] Jóakim von Kistowski, Simon Eismann, Norbert Schmitt, André Bauer, Johannes
ject’s ongoing development. Afterward, we will interview the pro- Grohmann, and Samuel Kounev. 2018. TeaStore: A Micro-Service Reference
ject members again, this time focusing on the challenges of the Application for Benchmarking, Modeling and Resource Management Research.
In Proceedings of the 26th IEEE International Symposium on the Modelling, Analysis,
process implementation and the advantages and disadvantages and Simulation of Computer and Telecommunication Systems (MASCOTS ’18).
of the process. This evaluation will be very open, and we aim to