IEEE Software 2025-07 - Next-Generation Software Testing - AI-Powered Test Automation

Conteúdos escolhidos para você

67 pág.

Prova Final - Testes e Manutenção de Software

AMPLI

223 pág.

Unicesumar - Projeto Implementação e Teste de Software

UNICESUMAR EAD

158 pág.

ISO-IEC-IEEE 12207-2017 - Software Life Cycle Processes (1)

UFG

82 pág.

Perguntas dessa disciplina

Qual a funcao dos testes de penetracao (penetration testing) na seguranca de software? a) Identificar possiveis melhorias na interface de usuario d...

Material

Conteúdos escolhidos para você

67 pág.

Prova Final - Testes e Manutenção de Software

AMPLI

223 pág.

Unicesumar - Projeto Implementação e Teste de Software

UNICESUMAR EAD

158 pág.

ISO-IEC-IEEE 12207-2017 - Software Life Cycle Processes (1)

UFG

82 pág.

Perguntas dessa disciplina

Qual a funcao dos testes de penetracao (penetration testing) na seguranca de software? a) Identificar possiveis melhorias na interface de usuario d...

Prévia do material em texto

FOCUS: GUEST EDITORS’ INTRODUCTION
Digital Object Identifier 10.1109/MS.2025.3559194
Date of current version: 10 June 2025
Next-Generation Software
Testing: AI-Powered
Test Automation
JULY/AUGUST 2025 | IEEE SOFTWARE 25
0740-7459 © 2025 IEEE. All rights reserved,
including rights for text and data mining, and training
of artificial intelligence and similar technologies.
Filippo Ricca , Università di Genova
Boni García, Universidad Carlos III de Madrid
Michel Nass, Blekinge Institute of Technology
Mark Harman, University College London
https://orcid.org/0000-0002-3928-5408
26 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
SOFTWARE TESTING HAS long
been a cornerstone of software devel-
opment, ensuring the delivery of high-
quality, reliable, and secure systems.
However, as software systems grow
in complexity and scale, traditional
testing methods might fail to address
their needs. Manual testing is often
time-consuming and error-prone. Au-
tomated testing, while beneficial for
efficiency, repeatability, and excellent
coverage, presents challenges such as
high initial costs and maintenance
overhead. The promise of AI-powered
test automation lies in its ability to ad-
dress some of the most pressing chal-
lenges in software testing.
A Brief History of AI
AI was first formalized in the 1950s
with Alan Turing’s “Computing Ma-
chinery and Intelligence,” introduc-
ing the “Turing Test.” Early research
focused on symbolic AI, using rule-
based systems like LISP. However,
limited computational power led to
the first “AI winter” in the 1970s.
The 1980s saw expert systems
emerge, followed by a shift to ma-
chine learning (ML) in the 1990s.
Neural networks and statistical
methods gained traction, but hard-
ware and data limitations caused a
second “AI winter” in the late 1990s.
In the 21st century, AI progress
has been accelerated with improved
computational power, large datasets,
and breakthroughs in deep learning,
reinforcement learning, and trans-
formers. These advances led to gen-
erative AI (GenAI), enabling systems
to create text, images, audio, and
code.1 Notable models include Chat-
GPT, DALL-E, GitHub Copilot, and
Google Gemini. GenAI relies on
large language models (LLMs) such
as BERT, GPT, PaLM, and LLaMA,
powering diverse natural language
processing (NLP) applications.
Why Software Testing
is More Important
Than Ever
Software testing consists of the dy-
namic evaluation of a piece of soft-
ware, called a system under test
(SUT), through a finite set of test
cases (or simply tests). Testing im-
plies the execution of a SUT using
specific input values to assess the
outcome or expected behavior. We
distinguish two broad categories of
software testing: manual and auto-
mated. On the one hand, a person
(such as a software tester or the fi-
nal user) evaluates the SUT in man-
ual testing. On the other hand, test
automation uses specific software
tools and frameworks to execute
against the SUT the test scripts—
executable test cases written in a
programming language—produced
by software testers.
With the increasing deploy-
ment of AI-generated code, soft-
ware will be created faster, and in
hybrid construction modes, with
humans and machines contribut-
ing together. Although automatic
code generation has been deployed
for many decades, it has previously
been based largely on deterministic,
rule-based approaches. Such rule-
based approaches have tradition-
ally been underpinned by strong
correct-by-construction theories,
based on decades of compiler the-
ory. In contrast, the emerging LLM-
based development paradigm2 is
one in which LLM-generated code
comes with few such guarantees.
There have been attempts to impose
frameworks for providing assur-
ances for LLM-based code genera-
tion.3 However, these frameworks
cannot offer the same guarantees
we are accustomed to in automated
code generation. Language mod-
els are inherently vulnerable to
hallucinations, so we must develop
software engineering processes and
tools to detect and address bugs in
the generated code. In this emerg-
ing paradigm, code will undoubt-
edly be generated far faster overall.
The impact of buggy code is not di-
minished however, whether it origi-
nates from human or machine-based
mistakes. The greater development
speed requires testing techniques
which can perform with equal alac-
rity; precisely the problem addressed
by automated software testing. The
advent of LLM-based software en-
gineering therefore makes software
testing all of the more important, if
not less challenging.
AI in Software Testing
There are two primary ways (not the
only ones) that AI is being used in
software testing:
1. General-Purpose AI for testing:
This approach involves using
general AI assistants that are
not built specifically for testing
but can help testers and develop-
ers improve their processes and
workflows. These particular
assistants can be categorized
into two types: code-specific AI
assistants (e.g., GitHub Copilot,
CodeWhisperer), which sug-
gest code and automate writing
tests (e.g., Copilot can generate
JUnit tests for a function based
on comments or code context),
and general-purpose GenAI as-
sistants (e.g., ChatGPT, Gemini,
Claude), which assist with test
planning, exploratory testing,
bug report analysis, and generat-
ing automated test cases across
various testing frameworks (e.g.,
you can ask ChatGPT to suggest
test cases for a given user story
or generate Selenium scripts).
JULY/AUGUST 2025 | IEEE SOFTWARE 27
2. AI-powered testing tools: These
tools integrate AI algorithms
designed to address common soft-
ware testing challenges, enhancing
traditional processes with models
tailored for tasks like defect pre-
diction, regression testing, and
exploratory testing assistance. Ex-
amples of these tools are provided
later in the appropriate section.
Here’s an analogy to illustrate the
difference:
1. General AI for testing: Imagine
using a general-purpose lan-
guage model like a chatbot to
get advice on how to clean your
house. It can provide helpful tips
and suggestions, but it is not a
tool specifically designed for
cleaning itself.
2. AI-powered testing tool: Imag-
ine a specialized robot vacuum
cleaner designed specifically for
cleaning carpets. It has built-in
sensors and algorithms to navi-
gate and clean effectively.
Advantages and
Challenges of AI in
Software Testing
The application of AI in software
testing offers significant benefits,
transforming traditional testing pro-
cesses and improving efficiency, ac-
curacy, and coverage. Some of the
most relevant software testing facets
supported by AI are the following4,5:
• Test case generation: Why? Tra-
ditional test case creation is man-
ual, time-consuming, and often
incomplete, leading to gaps in test
coverage. AI can help by analyz-
ing software requirements, code,
and user behavior to generate test
cases and test scripts. How far
have we come? AI models can
now generate test cases cover-
ing edge cases and critical paths.
Tools like EvoSuite and DiffBlue
Cover leverage AI algorithms to
automate this process. What does
AI bring? AI-driven test genera-
tion improves efficiency, reduces
human effort, and ensures better
test coverage, especially for com-
plex software systems.
• Test case prioritization: Why?
Running all test cases is imprac-
tical, especially with large test
suites. Prioritization ensures that
the most critical tests run first,
detecting high-risk defects early.
How far have we come? AI-
based tools (e.g., SBTTool and
Testomat.io) use historical data,
risk analysis, code changes and
other data to dynamically priori-
tize test cases. ML models can
now predict which test cases are
most likely to fail. What does
AI bring? AI-driven prioritiza-
tion reduces testing time while
maintaining high defect detec-
tion rates, making continuousintegration/continuous delivery
(CI/CD) pipelines more efficient.
• Exploratory testing assistance:
Why? Traditional exploratory
testing relies on tester intuition
and experience, making it dif-
ficult to scale and automate. AI
can guide testers by identifying
untested areas and generating
exploratory scenarios. How
far have we come? AI-driven
tools (e.g., TestRigor) now sug-
gest areas needing exploration,
analyze application behavior,
and propose test ideas dynami-
cally. What does AI bring? AI
enhances exploratory testing
by making it simpler and data-
driven, reducing the risk of miss-
ing critical defects.
• Test oracle generation: Why?
Determining whether software
behavior is correct (i.e., defin-
ing test oracles) is a major chal-
lenge, particularly for complex
systems. AI can help by predict-
ing expected outputs. How far
have we come? AI-based test
tools (e.g., TOGA and LLMs)
use AI models and predictive
analytics to validate test results,
especially in scenarios where
traditional oracles are unavail-
able. What does AI bring?
AI-driven test oracles reduce
human effort in defining ex-
pected outcomes, improving the
ability to test autonomous and
evolving systems.
• Predictive analysis: Why? Pre-
dicting defects before they occur
helps prevent costly failures
and improves software reliabil-
ity. How far have we come? AI
models (for example, those used
in Testers.ai and Test.Predic-
tor) now analyze code qual-
ity, historical defect data, and
development patterns to predict
potential problem areas. What
does AI bring? Predictive AI
improves software quality by
enabling proactive defect preven-
tion, reducing debugging time,
and enhancing test planning.
• Visual testing: Why? Ensuring
UI consistency across multiple
devices and screen configura-
tions is challenging. AI-powered
computer vision can automate
this process. How far have we
come? AI-driven tools like Ap-
plitools and Microsoft’s Visual
AI detect UI inconsistencies that
human testers might miss. What
does AI bring? AI automates
visual verification, improving ac-
curacy, scalability, and efficiency
in UI testing.
28 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
• Self-healing test automa-
tion: Why? Automated tests
break when UI or functionality
changes, leading to high main-
tenance costs. AI enables self-
healing mechanisms to adapt
to these changes. How far have
we come? AI-based tools such
as TestRigor and Mabl can now
automatically update test scripts
when application changes are de-
tected. What does AI bring? AI
reduces test maintenance effort,
making automated testing more
resilient and cost-effective.
• Test case maintenance: Why?
Over time, test suites become
bloated with redundant or obso-
lete test cases, making main-
tenance a challenge. AI helps
optimize test suites. How far
have we come? AI tools (e.g.,
Testim) now detects redundant
tests, identifies gaps, identify
and update tests affected by
changes, and recommends opti-
mizations to improve coverage.
What does AI bring? AI-driven
test maintenance ensures test
suites remain effective, reducing
execution time and improving
overall software quality.
However, the use of AI not only
brings benefits, but there are also chal-
lenges to overcome, such as follows5,6:
• Accuracy: AI-based testing
tools can produce false positives
(identifying nonexistent issues)
or false negatives (missing actual
defects), leading to unreliable
test results. In addition, AI may
struggle with complex test sce-
narios that require a deep under-
standing of context or nuanced
human judgment.
• Privacy concerns: Source
code may contain sensitive or
proprietary information, and
sharing this information in AI
models can lead to unintentional
disclosure or violate confiden-
tiality agreements or company
policies.
• Security vulnerabilities: AI tools
can be targets for cyberattacks,
potentially compromising the
testing process. For instance,
during AI model training, attack-
ers may use poisoned samples to
degrade model performance or
launch backdoor attacks to ma-
nipulate the outcomes.
• Explainability: AI-driven testing
tools often function as “black
boxes,” making it hard to un-
derstand, e.g., why specific tests
pass or fail.
• Model bias: AI models used in
testing can inherit biases from
training data, leading to unfair
or inconsistent testing results.
• Over-reliance on AI: Developers
and testers may become too de-
pendent on AI, reducing critical
thinking and problem-solving
skills.
General-Purpose
GenAI Assistants
The growing popularity of GenAI
assistants has led to significant inno-
vations in the field of software test-
ing. A practical example of their use
is the automatic generation of test
cases or test scripts. GenAI assis-
tants can analyze software require-
ments or alternative descriptions
(e.g., user stories or Gherkin speci-
fication of the test case) and auto-
matically generate relevant tests,
reducing the time and effort needed
for manual test creation.
In this motivational example, we
demonstrate how a GenAI assistant
can generate a test case, step by step,
from a simple test objective: “Add
the BILLY bookshelf to the cart on
the IKEA web application.” The
Billy bookshelf is an iconic book-
case sold by IKEA. It is known for
its simple, functional, and afford-
able design. Launched in 1979, it
has become one of the company’s
best-selling products. Thanks to its
versatility, the Billy bookshelf is of-
ten used for books, decorative items,
and even collections.
The assistant requires four key
components: an input [such as docu-
ment object model (DOM) or screen
capture], an output (the available
actions such as “click” or “type,”
a short-term memory (the actions
performed), and a goal (the objec-
tive of the test). Because the assistant
(powered by an LLM) is unlikely to
know how the IKEA web application
works in advance, we cannot simply
ask for the entire test sequence out-
right. Instead, we proceed one action
at a time.
First, we include the test objec-
tive, the DOM (and optionally a
screen capture), a list of possible ac-
tions, and instructions to choose the
next step. Suppose that the assistant
searches for “BILLY” using the site’s
search bar. We execute this action,
perhaps with Selenium WebDriver or
Playwright, and record it as the first
test step.
Next, we include this executed
step in the following prompt to the
assistant, ensuring that it does not
repeat the same action. We continue
in this iterative manner until the as-
sistant confirms that the objective is
fulfilled by successfully placing the
BILLY bookshelf in the cart. At this
stage, the test is complete and the as-
sembled test steps form our fully au-
tomated test case.
An alternative approach, widely
used in practice by testers, is to start
JULY/AUGUST 2025 | IEEE SOFTWARE 29
with a Gherkin specification of the
test case such as that shown in Fig-
ure 1 and then ask a GenAI assis-
tant, such as OpenAI ChatGPT, to
generate the corresponding Selenium
WebDriver test script.7 Gherkin is a
structured language used to define
test cases in a human-readable for-
mat, following a given-when-then
structure commonly used in behav-
ior-driven development.
A possible prompt to submit to
ChatGPT could be the following:
“Generate a Java Selenium Web-
Driver test script starting from the
following Gherkin specification”
and insert the Gherkin shown in Fig-
ure 1. The output of ChatGPT might
look something like the code shown
in Figure 2, where each Gherkin step
has been automatically converted
into the corresponding Selenium
WebDriver command. At this point,
the tester should refine the code by
adjusting the locators (a mechanism
used in automation testing frame-
works to identify and interact with
web elements on a page) that were
guessed by ChatGPT, which does
nothave access to the web page,
FIGURE 1. Gherkin specification for adding a BILLY bookshelf to the cart.
FIGURE 2. Selenium WebDriver test script for adding a BILLY bookshelf to the cart.
30 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
adding the specific parts of the ex-
ecution environment where the code
will run (such as the browser driver
setup), and manually adding waits at
critical points in the test script. The
software tester can manually refine
the test script, or they can reformu-
late specific prompts and request
ChatGPT to modify the test script
accordingly.
This second approach is based on
the process of prompt chain, which
refers to a series of interconnected
prompts used to refine, enhance
or expand information. A possible
prompt chain could be built to have
ChatGPT fix the locators, one by
one, by providing in a prompt the
HTML web page and attempting to
execute the test script, while provid-
ing any execution error messages.
Clearly, the automatic generation
of test scripts is just one of the possi-
ble tasks that can be performed with
GenAI assistants. Other possibilities
that save time and money include, for
example, the refactoring of test suites
(for instance, by introducing a design
pattern like the page object pattern,
or by changing the programming
language or testing framework), ex-
tending a test suite with new data
considering corner cases, or even the
automatic introduction of waiting
commands to synchronize the execu-
tion speed of a test suite with that of
the application being tested.
AI-Powered Test
Automation Tools
The market for test automation tools
is constantly evolving, driven by
their crucial role in modern software
testing. Today, these tools take ad-
vantage of various AI techniques to
improve different aspects of software
testing. A recent survey8 highlighted
a significant increase in the avail-
ability of new tools for testers. Their
number has grown significantly in a
short time, and new options have ap-
peared on the market. In the survey,
the authors counted more than 100
testing tools available. A new phe-
nomenon that is taking hold is the
integration of GenAI mechanisms
within these tools (see, for example,
TestIM and Functionize).
Table 1 summarizes some of the
most relevant AI-powered test auto-
mation tools in the gray literature to-
day.9 Functionize allows developers
to describe a test scenario in plain
English and automatically converts
it into an executable test. If the ap-
plication’s UI changes, Function-
ize’s self-healing capability ensures
the test adapts without manual in-
tervention. Applitools takes a dif-
ferent approach, focusing on visual
validation, comparing screenshots,
and flagging visual bugs while in-
telligently ignoring dynamic ele-
ments such as ads or timers. Mabl
learns from user interactions to au-
tomatically create test scripts when
TABLE 1. AI-powered test automation tools.
Tool Website Target AI-Driven Features
Functionize https://www.functionize.com/ Web, mobile, API. Self-healing tests, test generation, visual testing
Applitools https://www.applitools.com/ Web, mobile. Visual testing
Mabl https://www.mabl.com/ Web, mobile, API. Self-healing tests, test maintenance, predictive analytics
Testim https://www.testim.io/ Web Self-healing tests, test generation, test maintenance
Testers.ai https://testers.ai/ Web Exploratory testing, test generation
Appvance https://appvance.ai/ Web, mobile. Test generation, self-healing tests, performance testing
The software tester can manually
refine the test script, or they can
reformulate specific prompts and
request ChatGPT to modify the test
script accordingly.
https://www.functionize.com/
https://www.applitools.com/
https://www.mabl.com/
https://www.testim.io/
https://testers.ai/
https://appvance.ai/
JULY/AUGUST 2025 | IEEE SOFTWARE 31
navigating a web application. If the
UI evolves, its self-healing feature
updates the tests, and its AI-driven
performance monitoring can detect
anomalies, like a sudden drop in re-
sponse time, before they impact us-
ers. Testim creates tests from natural
language. If an element’s location
changes, its self-healing locators ad-
just automatically, and its root cause
analysis pinpoints why a test failed.
Testers.ai can analyze historical data
and code changes to predict where
defects could occur, allowing teams
to focus their efforts on high-risk
areas. Finally, Appvance combines
functional and performance testing
to run tests that check whether a fea-
ture works and evaluate its perfor-
mance under load.
According to the 2024 State of
Testing Report,10 the most popular
ways to use AI-powered testing tools
are the following. Approximately
25% utilize these tools to create test
cases, while 23% focus on optimiz-
ing test cases. Furthermore, 20% le-
verage AI to plan the testing process.
The largest group, 32%, employs AI-
driven tools for multiple purposes,
including generating test cases, cre-
ating test automation scripts, man-
aging test data, and identifying bugs
in test code.
Special Issues Articles
The Special Issue of IEEE Software
on AI-powered Test Automation,
which we are managing, attracted 12
submissions covering a wide range of
topics. We were supported by more
than 25 reviewers, who provided
constructive feedback and helped the
authors improve their contributions.
Each submitted manuscript was re-
viewed by at least two reviewers,
and in most cases, by three. Con-
flicts of interest were strictly man-
aged throughout the review process.
Ultimately, only two manuscripts
were selected for inclusion in this
special issue, reflecting the highly se-
lective nature of the review process.
In the following, we summarize
the key contributions of the papers
included in this special issue.
The article “From Code Genera-
tion to Software Testing: AI Copilot
with Context-Based Retrieval-Aug-
mented Generation” by Yuchen Wang
et al.[A1] presents Copilot for Testing,
an AI-assisted automated testing sys-
tem that integrates bug detection,
fix suggestions, and test case genera-
tion directly within the development
environment. The system extends
the capabilities of LLMs through a
context-based retrieval-augmented
generation (RAG) approach, dynami-
cally retrieving relevant code context
to enhance prompt construction. By
modeling the codebase as a graph
with context embeddings that update
in real time based on code changes,
the system optimizes software test-
ing efficiency, accuracy, and cover-
age. Their evaluation demonstrates
significant improvements, including
a 31.2% increase in bug detection
accuracy, a 12.6% boost in critical
test coverage, and a 10.5% higher
user acceptance rate compared to
traditional methods. The article po-
sitions AI as a transformative force
in software testing, shifting from a
reactive bug detection paradigm to
a proactive, context-aware approach
that seamlessly integrates with mod-
ern CI workflows. By leveraging past
work on AI-assisted coding tools like
Copilot for Xcode, the authors high-
light how the same principles can be
applied to software testing, ensuring
more reliable and efficient develop-
ment processes.
The article “Test Amplification
for REST APIs: Using ‘Out-of-the-
Box’ Large Language Models” by
Tolgahan Bardakci et al.[A2] explores
the use of LLMs such as ChatGPT
3.5, ChatGPT 4, and GitHub Copi-
lot for amplifying Representational
State Transfer (REST) application
programming interface (API) test
suites. Given the complexity of test-
ing REST APIs—due to their dis-
tributed nature and the necessity of
boundary value testing—the study
investigates whether LLMs can en-
hance test coverage while main-
taining readability. Using the open
source PetStore application as a tes-
tbed, the authors compare different
prompts and analyze their impact
on the number of generated tests,
APIcoverage, and readability. They
find that providing an API specifica-
tion as input significantly improves
coverage, and requesting the maxi-
mum number of test cases further
enhances testing effectiveness. While
LLMs occasionally generate invalid
or deprecated test cases, they also
expose API bugs. ChatGPT 4, in
particular, generates the most bug-
exposing tests. The study concludes
with recommendations for optimiz-
ing LLM-generated test prompts,
emphasizing the balance between
test strength and understandability.
Will AI Replace
Software Testers?
History has shown that while tech-
nology transforms job roles, it sel-
dom eradicates them. Instead, it
redefines them, bringing new oppor-
tunities and challenges. For exam-
ple, when test automation was first
introduced in software testing, many
feared that manual testers would be-
come obsolete. However, that pre-
diction did not come true.
All market analysts and scien-
tific researchers agree that, like
many other roles,11 the software tes-
ter position would be significantly
32 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
reshaped by the AI revolution. While
AI-powered tools are transforming
the field by automating repetitive
tasks and improving efficiency, hu-
man testers remain essential for their
critical thinking. This is further sup-
ported by a Capgemini survey in
2023,12 which found that 61% of or-
ganizations believe that human tes-
ters will remain essential, even with
AI integration. In the same direc-
tion, all available web articles in the
gray literature related to AI and soft-
ware testing agree that AI will not
completely replace software testers,
at least not in the near future.9
Software testers will not disap-
pear, although their number and
role may change,11 for several rea-
sons. First and foremost, humans
have always survived technological
revolutions—consider the Industrial
Revolution when machines began
replacing workers, the rise of in-
dustrial assembly lines/chains, and,
more recently, the Web revolution.
Humans have always leveraged ma-
chines to their advantage to complete
their tasks faster and achieve better
results. Why should this time be dif-
ferent, and why specifically in the
context of software testing? Some
pessimistic predictions from a 2017
survey on AI’s impact on software
testing13 have not materialized. Of
328 respondents, 65% expected AI
to replace manual testers by 2025,
and 59% believed test automation
engineers would be replaced by
2027. When predictions repeatedly
fail, the anticipated factor—AI, in
this case—is likely overestimated in
its ability to ensure software quality.
All the web sources analyzed agree
that AI cannot fully replace human
testers for several key reasons. One
major limitation is exploratory test-
ing and critical thinking. Although
AI is highly effective in performing
repetitive automated tests, it strug-
gles with exploratory testing based
on intuition, creativity, and adapt-
ability. Human testers can identify
unexpected bugs by thinking beyond
predefined test cases. In addition, AI
requires human supervision. AI-pow-
ered testing tools need proper config-
uration, fine-tuning, and validation.
When AI misinterprets test results,
only a human tester can accurately
distinguish between false positives,
false negatives, or truly critical issues,
ensuring the reliability of the testing
process. These reasons are also given
by ChatGPT-4o itself in response to
the question, “Why can AI not re-
place software testers?” ChatGPT-4o
concludes with the following slogan,
with which we fully agree: AI + hu-
man testers = stronger testing.
In conclusion, the authors of this
editorial believe that while AI will
enhance the speed and productiv-
ity of building and testing software,
it will not yet be capable of autono-
mously developing the software and
ensuring its quality.
A
B
O
U
T
T
H
E
A
U
T
H
O
R
S
FILIPPO RICCA is a full professor at the University of Genova,
16146 Genova, Italy. His research focuses on web application
testing and test automation, combining tool development with
empirical methods such as case studies, experiments, and sur-
veys. He received ICSE and ICST Most Influential Paper awards for
his contributions to the field. Contact him at filippo.ricca@unige.it
BONI GARCÍA is an associate professor at Universidad Car-
los III de Madrid, 28903 Madrid, Spain. His research focuses
on automated software testing. He is a committer to the Sele-
nium project and creator of tools like WebDriverManager and
Selenium-Jupiter. He is also the author of two books and over
45 research papers. Contact him at boni.garcia@uc3m.es.
MICHEL NASS is a postdoc researcher at the Blekinge Insti-
tute of Technology, SE-371 79 Karlskrona, Sweden. His work
focuses on software testing, with expertise in test automation,
test management, and coaching. He holds an M.Sc. in comput-
er science from Chalmers and a Ph.D. in software engineering
from BTH. Contact him at michel.nass@inceptive.se.
MARK HARMAN is a research scientist at Meta Platforms,
working on software engineering automation in the Instagram
Product Performance team. He cofounded Meta’s Simulation-
Based Testing team and contributed to tools like Sapienz and
WW. He is also a part-time professor at University College, WC1E
6BT London,U.K. He cofounded the field of Search-Based Soft-
ware Engineering and has received the IEEE Harlan Mills Award,
the ACM Outstanding Research Award, and a Royal Academy of
Engineering Fellowship. Contact him at mark.harman@ucl.ac.uk.
mailto:filippo.ricca@unige.it
mailto:boni.garcia@uc3m.es
mailto:michel.nass@inceptive.se
mailto:mark.harman@ucl.ac.uk
JULY/AUGUST 2025 | IEEE SOFTWARE 33
W ith all of the code gen-
erated using GenAI,
automated testing is
more crucial than ever. AI-pow-
ered test automation is revolution-
izing the software testing landscape
by improving efficiency, accuracy,
and adaptability. Through ML,
NLP, and computer vision, AI en-
ables advanced features such as test
generation, self-healing tests, or
predictive analytics, reducing man-
ual effort and paving the way for
faster feedback cycles and higher-
quality software.
Future directions include en-
hanced self-learning capabilities,
integration with CI pipelines, and
explainable AI to build trust in AI-
driven results. Multimodal testing,
ethical AI, and human-AI collabora-
tion will further expand the scope of
AI-powered testing. However, even
with advances in AI, it seems unlikely
that we will achieve fully autono-
mous testing, where AI independently
designs, executes, and optimizes
tests, or self-correcting software sys-
tems, which detect and fix issues
automatically. Given the complex-
ity and unpredictability of software,
full autonomy remains a distant goal,
ensuring that testing will continue to
require human oversight and that the
role of the tester does not disappear
anytime soon.14
References
1. I. Goodfellow et al., “Generative ad-
versarial networks,” Commun. ACM,
vol. 63, no. 11, pp. 139–144, 2020,
doi: 10.1145/3422622.
2. A. Fan et al., “Large language
models for software engineering:
Survey and open problems,” in
Proc. ICSE Future Softw. Eng.
(FoSE), 2023, pp. 31–53, doi:
10.1109/ICSE-FoSE59343.2023.
00008.
3. N. Alshahwan, M. Harman, I.
Harper, A. Marginean, S. Sengupta,
and E. Wang, “Assured LLM-based
software engineering (keynote pa-
per),” in Proc. 2nd ICSE Work-
shop Interoperability Robustness
Benchmarking Neural Softw. Eng.
(InteNSE), 2024, pp. 7–12.
4. D. Amalfitano, S. Faralli, J. C.
R. Hauck, S. Matalonga, and D.
Distante, “Artificial intelligence ap-
plied to software testing: A tertiary
study,” ACM Comput. Surv., vol.
56, no. 3, pp. 1–38, 2023, doi:
10.1145/3616372.
5. A. Aleti, “Software testing of gen-
erative AI systems: Challenges andopportunities,” in Proc. IEEE/
ACM Int. Conf. Softw. Eng.: Fu-
ture Softw. Eng. (ICSE-FoSE),
Piscataway, NJ, USA: IEEE Press,
2023, pp. 4–14, doi: 10.1109/
ICSE-FoSE59343.2023.00009.
6. Y. Wang, Y. Pan, M. Yan, Z. Su, and
T. H. Luan, “A survey on ChatGPT:
AI–generated contents, challenges, and
solutions,” IEEE Open J. Comput.
Soc., vol. 4, pp. 280–302, 2023, doi:
10.1109/OJCS.2023.3300321.
7. M. Leotta, H. Z. Yousaf, F. Ricca,
and B. Garcia, “AI-generated test
scripts for web E2E testing with
chatGPT and copilot: A preliminary
study,” in Proc. 28th Int. Conf. Eval.
Assessment Softw. Eng., 2024, pp.
339–344.
8. F. Ricca, A. Marchetto, and A.
Stocco, “A multi-year grey literature
review on ai-assisted test automation,”
2024, arXiv:2408.06224.
9. F. Ricca, A. Marchetto, and A.
Stocco, “AI-based test automation:
A grey literature analysis,” in Proc.
IEEE Int. Conf. Softw. Testing,
Verification Validation Workshops
(ICSTW), Piscataway, NJ, USA:
IEEE Press, 2021, pp. 263–270,
doi: 10.1109/ICSTW52544.
2021.00051.
10. “State of testing™ report 2024,”
PractiTest, Rehovot, Israel, 2024.
Accessed: Mar. 18, 2025. [Online].
Available: https://www.practitest.
com/assets/pdf/stot-2024.pdf
11. E. Brynjolfsson, T. Mitchell, and D.
Rock, “What can machines learn, and
what does it mean for occupations and
the economy?” AEA Papers Proc.,
vol. 108, pp. 43–47, May 2018,
doi: 10.1257/pandp.20181019.
12. “World quality report 2023–2024,”
Capgemini, Paris, France, 2023.
Accessed: Mar. 18, 2025. [Online].
Available: https://www.capgemini.
com/wp-content/uploads/2023/11/
WQR_2023_FINAL_WEB
_CG.pdf?utm_campaign=Software%
2BTesting%2BWeekly&utm
_medium=web&utm_source=Software
_Testing_Weekly_204
13. T. M. King, J. Arbon, D. Santiago,
D. Adamo, W. Chin, and R. Shan-
mugam, “AI for testing today and
tomorrow: Industry perspectives,” in
Proc. IEEE Int. Conf. Artif. Intell.
Testing (AITest), 2019, pp. 81–88,
doi: 10.1109/AITest.2019.000-3.
14. L. Layman and R. Vetter, “Gen-
erative artificial intelligence and the
future of software testing,” Computer,
vol. 57, no. 1, pp. 27–32, 2024, doi:
10.1109/MC.2023.3306998.
Appendix: Related Articles
A1. Y. Wang, S. Guo, and C. W. Tan,
“From code generation to software
testing: AI copilot with context-
based retrieval-augmented genera-
tion,” IEEE Softw., vol. 42, no.
4, pp. 34–42, Jul./Aug. 2025, doi:
10.1109/MS.2025.3549628.
A2. T. Bardakci, S. Demeyer, and M.
Beyazit, “Test amplification for
REST APIs: Using “out-of-the-
box” large language models,”
IEEE Softw., vol. 23, no. 1,
pp. 43–49, Jul./Aug. 2025, doi:
10.1109/MS.2025.3559664.
http://dx.doi.org/10.1145/3422622
https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
http://dx.doi.org/10.1145/3616372
https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00009
https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00009
http://dx.doi.org/10.1109/OJCS.2023.3300321
http://dx.doi.org/10.1109/ICSTW52544.2021.00051
http://dx.doi.org/10.1109/ICSTW52544.2021.00051
https://www.practitest.com/assets/pdf/stot-2024.pdf
https://www.practitest.com/assets/pdf/stot-2024.pdf
http://dx.doi.org/10.1257/pandp.20181019
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://doi.org/10.1109/AITest.2019.000-3
http://dx.doi.org/10.1109/MC.2023.3306998
http://dx.doi.org/10.1109/MS.2025.3549628
http://dx.doi.org/10.1109/MS.2025.3559664
025_42ms04-guest editorial-3559194(250096)

IEEE Software 2025-07 - Next-Generation Software Testing - AI-Powered Test Automation

Engenharia de Software

Ferramentas de estudo

Conteúdos escolhidos para você

testes de software

Prova Final - Testes e Manutenção de Software

Unicesumar - Projeto Implementação e Teste de Software

ISO-IEC-IEEE 12207-2017 - Software Life Cycle Processes (1)

teste software

Perguntas dessa disciplina

Qual a funcao dos testes de penetracao (penetration testing) na seguranca de software? a) Identificar possiveis melhorias na interface de usuario d...

Conteúdos escolhidos para você

testes de software

Prova Final - Testes e Manutenção de Software

Unicesumar - Projeto Implementação e Teste de Software

ISO-IEC-IEEE 12207-2017 - Software Life Cycle Processes (1)

teste software

Perguntas dessa disciplina

Qual a funcao dos testes de penetracao (penetration testing) na seguranca de software? a) Identificar possiveis melhorias na interface de usuario d...

Mais conteúdos dessa disciplina