Logo Passei Direto
Buscar

IEEE Software 2025-07 - Next-Generation Software Testing - AI-Powered Test Automation

User badge image
José Neto

em

Material
páginas com resultados encontrados.
páginas com resultados encontrados.

Prévia do material em texto

FOCUS: GUEST EDITORS’ INTRODUCTION
Digital Object Identifier 10.1109/MS.2025.3559194
Date of current version: 10 June 2025
Next-Generation Software 
Testing: AI-Powered 
Test Automation
 JULY/AUGUST 2025 | IEEE SOFTWARE 25
0740-7459 © 2025 IEEE. All rights reserved, 
including rights for text and data mining, and training 
of artificial intelligence and similar technologies.
Filippo Ricca , Università di Genova
Boni García, Universidad Carlos III de Madrid
Michel Nass, Blekinge Institute of Technology
Mark Harman, University College London
https://orcid.org/0000-0002-3928-5408
26 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
SOFTWARE TESTING HAS long 
been a cornerstone of software devel-
opment, ensuring the delivery of high-
quality, reliable, and secure systems. 
However, as software systems grow 
in complexity and scale, traditional 
testing methods might fail to address 
their needs. Manual testing is often 
time-consuming and error-prone. Au-
tomated testing, while beneficial for 
efficiency, repeatability, and excellent 
coverage, presents challenges such as 
high initial costs and maintenance 
overhead. The promise of AI-powered 
test automation lies in its ability to ad-
dress some of the most pressing chal-
lenges in software testing.
A Brief History of AI
AI was first formalized in the 1950s 
with Alan Turing’s “Computing Ma-
chinery and Intelligence,” introduc-
ing the “Turing Test.” Early research 
focused on symbolic AI, using rule-
based systems like LISP. However, 
limited computational power led to 
the first “AI winter” in the 1970s.
The 1980s saw expert systems 
emerge, followed by a shift to ma-
chine learning (ML) in the 1990s. 
Neural networks and statistical 
methods gained traction, but hard-
ware and data limitations caused a 
second “AI winter” in the late 1990s.
In the 21st century, AI progress 
has been accelerated with improved 
computational power, large datasets, 
and breakthroughs in deep learning, 
reinforcement learning, and trans-
formers. These advances led to gen-
erative AI (GenAI), enabling systems 
to create text, images, audio, and 
code.1 Notable models include Chat-
GPT, DALL-E, GitHub Copilot, and 
Google Gemini. GenAI relies on 
large language models (LLMs) such 
as BERT, GPT, PaLM, and LLaMA, 
powering diverse natural language 
processing (NLP) applications.
Why Software Testing 
is More Important 
Than Ever
Software testing consists of the dy-
namic evaluation of a piece of soft-
ware, called a system under test 
(SUT), through a finite set of test 
cases (or simply tests). Testing im-
plies the execution of a SUT using 
specific input values to assess the 
outcome or expected behavior. We 
distinguish two broad categories of 
software testing: manual and auto-
mated. On the one hand, a person 
(such as a software tester or the fi-
nal user) evaluates the SUT in man-
ual testing. On the other hand, test 
automation uses specific software 
tools and frameworks to execute 
against the SUT the test scripts—
executable test cases written in a 
programming language—produced 
by software testers.
With the increasing deploy-
ment of AI-generated code, soft-
ware will be created faster, and in 
hybrid construction modes, with 
humans and machines contribut-
ing together. Although automatic 
code generation has been deployed 
for many decades, it has previously 
been based largely on deterministic, 
rule-based approaches. Such rule-
based approaches have tradition-
ally been underpinned by strong 
correct-by-construction theories, 
based on decades of compiler the-
ory. In contrast, the emerging LLM-
based development paradigm2 is 
one in which LLM-generated code 
comes with few such guarantees. 
There have been attempts to impose 
frameworks for providing assur-
ances for LLM-based code genera-
tion.3 However, these frameworks 
cannot offer the same guarantees 
we are accustomed to in automated 
code generation. Language mod-
els are inherently vulnerable to 
hallucinations, so we must develop 
software engineering processes and 
tools to detect and address bugs in 
the generated code. In this emerg-
ing paradigm, code will undoubt-
edly be generated far faster overall. 
The impact of buggy code is not di-
minished however, whether it origi-
nates from human or machine-based 
mistakes. The greater development 
speed requires testing techniques 
which can perform with equal alac-
rity; precisely the problem addressed 
by automated software testing. The 
advent of LLM-based software en-
gineering therefore makes software 
testing all of the more important, if 
not less challenging.
AI in Software Testing
There are two primary ways (not the 
only ones) that AI is being used in 
software testing:
1. General-Purpose AI for testing: 
This approach involves using 
general AI assistants that are 
not built specifically for testing 
but can help testers and develop-
ers improve their processes and 
workflows. These particular 
assistants can be categorized 
into two types: code-specific AI 
assistants (e.g., GitHub Copilot, 
CodeWhisperer), which sug-
gest code and automate writing 
tests (e.g., Copilot can generate 
JUnit tests for a function based 
on comments or code context), 
and general-purpose GenAI as-
sistants (e.g., ChatGPT, Gemini, 
Claude), which assist with test 
planning, exploratory testing, 
bug report analysis, and generat-
ing automated test cases across 
various testing frameworks (e.g., 
you can ask ChatGPT to suggest 
test cases for a given user story 
or generate Selenium scripts).
 JULY/AUGUST 2025 | IEEE SOFTWARE 27
2. AI-powered testing tools: These 
tools integrate AI algorithms 
designed to address common soft-
ware testing challenges, enhancing 
traditional processes with models 
tailored for tasks like defect pre-
diction, regression testing, and 
exploratory testing assistance. Ex-
amples of these tools are provided 
later in the appropriate section.
Here’s an analogy to illustrate the 
difference:
1. General AI for testing: Imagine 
using a general-purpose lan-
guage model like a chatbot to 
get advice on how to clean your 
house. It can provide helpful tips 
and suggestions, but it is not a 
tool specifically designed for 
cleaning itself.
2. AI-powered testing tool: Imag-
ine a specialized robot vacuum 
cleaner designed specifically for 
cleaning carpets. It has built-in 
sensors and algorithms to navi-
gate and clean effectively.
Advantages and 
Challenges of AI in 
Software Testing
The application of AI in software 
testing offers significant benefits, 
transforming traditional testing pro-
cesses and improving efficiency, ac-
curacy, and coverage. Some of the 
most relevant software testing facets 
supported by AI are the following4,5:
• Test case generation: Why? Tra-
ditional test case creation is man-
ual, time-consuming, and often 
incomplete, leading to gaps in test 
coverage. AI can help by analyz-
ing software requirements, code, 
and user behavior to generate test 
cases and test scripts. How far 
have we come? AI models can 
now generate test cases cover-
ing edge cases and critical paths. 
Tools like EvoSuite and DiffBlue 
Cover leverage AI algorithms to 
automate this process. What does 
AI bring? AI-driven test genera-
tion improves efficiency, reduces 
human effort, and ensures better 
test coverage, especially for com-
plex software systems.
• Test case prioritization: Why? 
Running all test cases is imprac-
tical, especially with large test 
suites. Prioritization ensures that 
the most critical tests run first, 
detecting high-risk defects early. 
How far have we come? AI-
based tools (e.g., SBTTool and 
Testomat.io) use historical data, 
risk analysis, code changes and 
other data to dynamically priori-
tize test cases. ML models can 
now predict which test cases are 
most likely to fail. What does 
AI bring? AI-driven prioritiza-
tion reduces testing time while 
maintaining high defect detec-
tion rates, making continuousintegration/continuous delivery 
(CI/CD) pipelines more efficient.
• Exploratory testing assistance: 
Why? Traditional exploratory 
testing relies on tester intuition 
and experience, making it dif-
ficult to scale and automate. AI 
can guide testers by identifying 
untested areas and generating 
exploratory scenarios. How 
far have we come? AI-driven 
tools (e.g., TestRigor) now sug-
gest areas needing exploration, 
analyze application behavior, 
and propose test ideas dynami-
cally. What does AI bring? AI 
enhances exploratory testing 
by making it simpler and data-
driven, reducing the risk of miss-
ing critical defects.
• Test oracle generation: Why? 
Determining whether software 
behavior is correct (i.e., defin-
ing test oracles) is a major chal-
lenge, particularly for complex 
systems. AI can help by predict-
ing expected outputs. How far 
have we come? AI-based test 
tools (e.g., TOGA and LLMs) 
use AI models and predictive 
analytics to validate test results, 
especially in scenarios where 
traditional oracles are unavail-
able. What does AI bring? 
AI-driven test oracles reduce 
human effort in defining ex-
pected outcomes, improving the 
ability to test autonomous and 
evolving systems.
• Predictive analysis: Why? Pre-
dicting defects before they occur 
helps prevent costly failures 
and improves software reliabil-
ity. How far have we come? AI 
models (for example, those used 
in Testers.ai and Test.Predic-
tor) now analyze code qual-
ity, historical defect data, and 
development patterns to predict 
potential problem areas. What 
does AI bring? Predictive AI 
improves software quality by 
enabling proactive defect preven-
tion, reducing debugging time, 
and enhancing test planning.
• Visual testing: Why? Ensuring 
UI consistency across multiple 
devices and screen configura-
tions is challenging. AI-powered 
computer vision can automate 
this process. How far have we 
come? AI-driven tools like Ap-
plitools and Microsoft’s Visual 
AI detect UI inconsistencies that 
human testers might miss. What 
does AI bring? AI automates 
visual verification, improving ac-
curacy, scalability, and efficiency 
in UI testing.
28 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
• Self-healing test automa-
tion: Why? Automated tests 
break when UI or functionality 
changes, leading to high main-
tenance costs. AI enables self-
healing mechanisms to adapt 
to these changes. How far have 
we come? AI-based tools such 
as TestRigor and Mabl can now 
automatically update test scripts 
when application changes are de-
tected. What does AI bring? AI 
reduces test maintenance effort, 
making automated testing more 
resilient and cost-effective.
• Test case maintenance: Why? 
Over time, test suites become 
bloated with redundant or obso-
lete test cases, making main-
tenance a challenge. AI helps 
optimize test suites. How far 
have we come? AI tools (e.g., 
Testim) now detects redundant 
tests, identifies gaps, identify 
and update tests affected by 
changes, and recommends opti-
mizations to improve coverage. 
What does AI bring? AI-driven 
test maintenance ensures test 
suites remain effective, reducing 
execution time and improving 
overall software quality.
However, the use of AI not only 
brings benefits, but there are also chal-
lenges to overcome, such as follows5,6:
• Accuracy: AI-based testing 
tools can produce false positives 
(identifying nonexistent issues) 
or false negatives (missing actual 
defects), leading to unreliable 
test results. In addition, AI may 
struggle with complex test sce-
narios that require a deep under-
standing of context or nuanced 
human judgment.
• Privacy concerns: Source 
code may contain sensitive or 
proprietary information, and 
sharing this information in AI 
models can lead to unintentional 
disclosure or violate confiden-
tiality agreements or company 
policies.
• Security vulnerabilities: AI tools 
can be targets for cyberattacks, 
potentially compromising the 
testing process. For instance, 
during AI model training, attack-
ers may use poisoned samples to 
degrade model performance or 
launch backdoor attacks to ma-
nipulate the outcomes.
• Explainability: AI-driven testing 
tools often function as “black 
boxes,” making it hard to un-
derstand, e.g., why specific tests 
pass or fail.
• Model bias: AI models used in 
testing can inherit biases from 
training data, leading to unfair 
or inconsistent testing results.
• Over-reliance on AI: Developers 
and testers may become too de-
pendent on AI, reducing critical 
thinking and problem-solving 
skills.
General-Purpose 
GenAI Assistants
The growing popularity of GenAI 
assistants has led to significant inno-
vations in the field of software test-
ing. A practical example of their use 
is the automatic generation of test 
cases or test scripts. GenAI assis-
tants can analyze software require-
ments or alternative descriptions 
(e.g., user stories or Gherkin speci-
fication of the test case) and auto-
matically generate relevant tests, 
reducing the time and effort needed 
for manual test creation.
In this motivational example, we 
demonstrate how a GenAI assistant 
can generate a test case, step by step, 
from a simple test objective: “Add 
the BILLY bookshelf to the cart on 
the IKEA web application.” The 
Billy bookshelf is an iconic book-
case sold by IKEA. It is known for 
its simple, functional, and afford-
able design. Launched in 1979, it 
has become one of the company’s 
best-selling products. Thanks to its 
versatility, the Billy bookshelf is of-
ten used for books, decorative items, 
and even collections.
The assistant requires four key 
components: an input [such as docu-
ment object model (DOM) or screen 
capture], an output (the available 
actions such as “click” or “type,” 
a short-term memory (the actions 
performed), and a goal (the objec-
tive of the test). Because the assistant 
(powered by an LLM) is unlikely to 
know how the IKEA web application 
works in advance, we cannot simply 
ask for the entire test sequence out-
right. Instead, we proceed one action 
at a time.
First, we include the test objec-
tive, the DOM (and optionally a 
screen capture), a list of possible ac-
tions, and instructions to choose the 
next step. Suppose that the assistant 
searches for “BILLY” using the site’s 
search bar. We execute this action, 
perhaps with Selenium WebDriver or 
Playwright, and record it as the first 
test step.
Next, we include this executed 
step in the following prompt to the 
assistant, ensuring that it does not 
repeat the same action. We continue 
in this iterative manner until the as-
sistant confirms that the objective is 
fulfilled by successfully placing the 
BILLY bookshelf in the cart. At this 
stage, the test is complete and the as-
sembled test steps form our fully au-
tomated test case.
An alternative approach, widely 
used in practice by testers, is to start 
 JULY/AUGUST 2025 | IEEE SOFTWARE 29
with a Gherkin specification of the 
test case such as that shown in Fig-
ure 1 and then ask a GenAI assis-
tant, such as OpenAI ChatGPT, to 
generate the corresponding Selenium 
WebDriver test script.7 Gherkin is a 
structured language used to define 
test cases in a human-readable for-
mat, following a given-when-then 
structure commonly used in behav-
ior-driven development.
A possible prompt to submit to 
ChatGPT could be the following: 
“Generate a Java Selenium Web-
Driver test script starting from the 
following Gherkin specification” 
and insert the Gherkin shown in Fig-
ure 1. The output of ChatGPT might 
look something like the code shown 
in Figure 2, where each Gherkin step 
has been automatically converted 
into the corresponding Selenium 
WebDriver command. At this point, 
the tester should refine the code by 
adjusting the locators (a mechanism 
used in automation testing frame-
works to identify and interact with 
web elements on a page) that were 
guessed by ChatGPT, which does 
nothave access to the web page, 
FIGURE 1. Gherkin specification for adding a BILLY bookshelf to the cart.
FIGURE 2. Selenium WebDriver test script for adding a BILLY bookshelf to the cart.
30 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
adding the specific parts of the ex-
ecution environment where the code 
will run (such as the browser driver 
setup), and manually adding waits at 
critical points in the test script. The 
software tester can manually refine 
the test script, or they can reformu-
late specific prompts and request 
ChatGPT to modify the test script 
accordingly.
This second approach is based on 
the process of prompt chain, which 
refers to a series of interconnected 
prompts used to refine, enhance 
or expand information. A possible 
prompt chain could be built to have 
ChatGPT fix the locators, one by 
one, by providing in a prompt the 
HTML web page and attempting to 
execute the test script, while provid-
ing any execution error messages.
Clearly, the automatic generation 
of test scripts is just one of the possi-
ble tasks that can be performed with 
GenAI assistants. Other possibilities 
that save time and money include, for 
example, the refactoring of test suites 
(for instance, by introducing a design 
pattern like the page object pattern, 
or by changing the programming 
language or testing framework), ex-
tending a test suite with new data 
considering corner cases, or even the 
automatic introduction of waiting 
commands to synchronize the execu-
tion speed of a test suite with that of 
the application being tested.
AI-Powered Test 
Automation Tools
The market for test automation tools 
is constantly evolving, driven by 
their crucial role in modern software 
testing. Today, these tools take ad-
vantage of various AI techniques to 
improve different aspects of software 
testing. A recent survey8 highlighted 
a significant increase in the avail-
ability of new tools for testers. Their 
number has grown significantly in a 
short time, and new options have ap-
peared on the market. In the survey, 
the authors counted more than 100 
testing tools available. A new phe-
nomenon that is taking hold is the 
integration of GenAI mechanisms 
within these tools (see, for example, 
TestIM and Functionize).
Table 1 summarizes some of the 
most relevant AI-powered test auto-
mation tools in the gray literature to-
day.9 Functionize allows developers 
to describe a test scenario in plain 
English and automatically converts 
it into an executable test. If the ap-
plication’s UI changes, Function-
ize’s self-healing capability ensures 
the test adapts without manual in-
tervention. Applitools takes a dif-
ferent approach, focusing on visual 
validation, comparing screenshots, 
and flagging visual bugs while in-
telligently ignoring dynamic ele-
ments such as ads or timers. Mabl 
learns from user interactions to au-
tomatically create test scripts when 
TABLE 1. AI-powered test automation tools.
Tool Website Target AI-Driven Features 
Functionize https://www.functionize.com/ Web, mobile, API. Self-healing tests, test generation, visual testing
Applitools https://www.applitools.com/ Web, mobile. Visual testing
Mabl https://www.mabl.com/ Web, mobile, API. Self-healing tests, test maintenance, predictive analytics
Testim https://www.testim.io/ Web Self-healing tests, test generation, test maintenance
Testers.ai https://testers.ai/ Web Exploratory testing, test generation
Appvance https://appvance.ai/ Web, mobile. Test generation, self-healing tests, performance testing
The software tester can manually 
refine the test script, or they can 
reformulate specific prompts and 
request ChatGPT to modify the test 
script accordingly.
https://www.functionize.com/
https://www.applitools.com/
https://www.mabl.com/
https://www.testim.io/
https://testers.ai/
https://appvance.ai/
 JULY/AUGUST 2025 | IEEE SOFTWARE 31
navigating a web application. If the 
UI evolves, its self-healing feature 
updates the tests, and its AI-driven 
performance monitoring can detect 
anomalies, like a sudden drop in re-
sponse time, before they impact us-
ers. Testim creates tests from natural 
language. If an element’s location 
changes, its self-healing locators ad-
just automatically, and its root cause 
analysis pinpoints why a test failed. 
Testers.ai can analyze historical data 
and code changes to predict where 
defects could occur, allowing teams 
to focus their efforts on high-risk 
areas. Finally, Appvance combines 
functional and performance testing 
to run tests that check whether a fea-
ture works and evaluate its perfor-
mance under load.
According to the 2024 State of 
Testing Report,10 the most popular 
ways to use AI-powered testing tools 
are the following. Approximately 
25% utilize these tools to create test 
cases, while 23% focus on optimiz-
ing test cases. Furthermore, 20% le-
verage AI to plan the testing process. 
The largest group, 32%, employs AI-
driven tools for multiple purposes, 
including generating test cases, cre-
ating test automation scripts, man-
aging test data, and identifying bugs 
in test code.
Special Issues Articles
The Special Issue of IEEE Software 
on AI-powered Test Automation, 
which we are managing, attracted 12 
submissions covering a wide range of 
topics. We were supported by more 
than 25 reviewers, who provided 
constructive feedback and helped the 
authors improve their contributions. 
Each submitted manuscript was re-
viewed by at least two reviewers, 
and in most cases, by three. Con-
flicts of interest were strictly man-
aged throughout the review process. 
Ultimately, only two manuscripts 
were selected for inclusion in this 
special issue, reflecting the highly se-
lective nature of the review process.
In the following, we summarize 
the key contributions of the papers 
included in this special issue.
The article “From Code Genera-
tion to Software Testing: AI Copilot 
with Context-Based Retrieval-Aug-
mented Generation” by Yuchen Wang 
et al.[A1] presents Copilot for Testing, 
an AI-assisted automated testing sys-
tem that integrates bug detection, 
fix suggestions, and test case genera-
tion directly within the development 
environment. The system extends 
the capabilities of LLMs through a 
context-based retrieval-augmented 
generation (RAG) approach, dynami-
cally retrieving relevant code context 
to enhance prompt construction. By 
modeling the codebase as a graph 
with context embeddings that update 
in real time based on code changes, 
the system optimizes software test-
ing efficiency, accuracy, and cover-
age. Their evaluation demonstrates 
significant improvements, including 
a 31.2% increase in bug detection 
accuracy, a 12.6% boost in critical 
test coverage, and a 10.5% higher 
user acceptance rate compared to 
traditional methods. The article po-
sitions AI as a transformative force 
in software testing, shifting from a 
reactive bug detection paradigm to 
a proactive, context-aware approach 
that seamlessly integrates with mod-
ern CI workflows. By leveraging past 
work on AI-assisted coding tools like 
Copilot for Xcode, the authors high-
light how the same principles can be 
applied to software testing, ensuring 
more reliable and efficient develop-
ment processes.
The article “Test Amplification 
for REST APIs: Using ‘Out-of-the-
Box’ Large Language Models” by 
Tolgahan Bardakci et al.[A2] explores 
the use of LLMs such as ChatGPT 
3.5, ChatGPT 4, and GitHub Copi-
lot for amplifying Representational 
State Transfer (REST) application 
programming interface (API) test 
suites. Given the complexity of test-
ing REST APIs—due to their dis-
tributed nature and the necessity of 
boundary value testing—the study 
investigates whether LLMs can en-
hance test coverage while main-
taining readability. Using the open 
source PetStore application as a tes-
tbed, the authors compare different 
prompts and analyze their impact 
on the number of generated tests, 
APIcoverage, and readability. They 
find that providing an API specifica-
tion as input significantly improves 
coverage, and requesting the maxi-
mum number of test cases further 
enhances testing effectiveness. While 
LLMs occasionally generate invalid 
or deprecated test cases, they also 
expose API bugs. ChatGPT 4, in 
particular, generates the most bug-
exposing tests. The study concludes 
with recommendations for optimiz-
ing LLM-generated test prompts, 
emphasizing the balance between 
test strength and understandability.
Will AI Replace 
Software Testers?
History has shown that while tech-
nology transforms job roles, it sel-
dom eradicates them. Instead, it 
redefines them, bringing new oppor-
tunities and challenges. For exam-
ple, when test automation was first 
introduced in software testing, many 
feared that manual testers would be-
come obsolete. However, that pre-
diction did not come true.
All market analysts and scien-
tific researchers agree that, like 
many other roles,11 the software tes-
ter position would be significantly 
32 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE
FOCUS: GUEST EDITORS’ INTRODUCTION
reshaped by the AI revolution. While 
AI-powered tools are transforming 
the field by automating repetitive 
tasks and improving efficiency, hu-
man testers remain essential for their 
critical thinking. This is further sup-
ported by a Capgemini survey in 
2023,12 which found that 61% of or-
ganizations believe that human tes-
ters will remain essential, even with 
AI integration. In the same direc-
tion, all available web articles in the 
gray literature related to AI and soft-
ware testing agree that AI will not 
completely replace software testers, 
at least not in the near future.9
Software testers will not disap-
pear, although their number and 
role may change,11 for several rea-
sons. First and foremost, humans 
have always survived technological 
revolutions—consider the Industrial 
Revolution when machines began 
replacing workers, the rise of in-
dustrial assembly lines/chains, and, 
more recently, the Web revolution. 
Humans have always leveraged ma-
chines to their advantage to complete 
their tasks faster and achieve better 
results. Why should this time be dif-
ferent, and why specifically in the 
context of software testing? Some 
pessimistic predictions from a 2017 
survey on AI’s impact on software 
testing13 have not materialized. Of 
328 respondents, 65% expected AI 
to replace manual testers by 2025, 
and 59% believed test automation 
engineers would be replaced by 
2027. When predictions repeatedly 
fail, the anticipated factor—AI, in 
this case—is likely overestimated in 
its ability to ensure software quality.
All the web sources analyzed agree 
that AI cannot fully replace human 
testers for several key reasons. One 
major limitation is exploratory test-
ing and critical thinking. Although 
AI is highly effective in performing 
repetitive automated tests, it strug-
gles with exploratory testing based 
on intuition, creativity, and adapt-
ability. Human testers can identify 
unexpected bugs by thinking beyond 
predefined test cases. In addition, AI 
requires human supervision. AI-pow-
ered testing tools need proper config-
uration, fine-tuning, and validation. 
When AI misinterprets test results, 
only a human tester can accurately 
distinguish between false positives, 
false negatives, or truly critical issues, 
ensuring the reliability of the testing 
process. These reasons are also given 
by ChatGPT-4o itself in response to 
the question, “Why can AI not re-
place software testers?” ChatGPT-4o 
concludes with the following slogan, 
with which we fully agree: AI + hu-
man testers = stronger testing.
In conclusion, the authors of this 
editorial believe that while AI will 
enhance the speed and productiv-
ity of building and testing software, 
it will not yet be capable of autono-
mously developing the software and 
ensuring its quality.
A
B
O
U
T
 T
H
E
 A
U
T
H
O
R
S
FILIPPO RICCA is a full professor at the University of Genova, 
16146 Genova, Italy. His research focuses on web application 
testing and test automation, combining tool development with 
empirical methods such as case studies, experiments, and sur-
veys. He received ICSE and ICST Most Influential Paper awards for 
his contributions to the field. Contact him at filippo.ricca@unige.it
BONI GARCÍA is an associate professor at Universidad Car-
los III de Madrid, 28903 Madrid, Spain. His research focuses 
on automated software testing. He is a committer to the Sele-
nium project and creator of tools like WebDriverManager and 
Selenium-Jupiter. He is also the author of two books and over 
45 research papers. Contact him at boni.garcia@uc3m.es.
MICHEL NASS is a postdoc researcher at the Blekinge Insti-
tute of Technology, SE-371 79 Karlskrona, Sweden. His work 
focuses on software testing, with expertise in test automation, 
test management, and coaching. He holds an M.Sc. in comput-
er science from Chalmers and a Ph.D. in software engineering 
from BTH. Contact him at michel.nass@inceptive.se.
MARK HARMAN is a research scientist at Meta Platforms, 
working on software engineering automation in the Instagram 
Product Performance team. He cofounded Meta’s Simulation-
Based Testing team and contributed to tools like Sapienz and 
WW. He is also a part-time professor at University College, WC1E 
6BT London,U.K. He cofounded the field of Search-Based Soft-
ware Engineering and has received the IEEE Harlan Mills Award, 
the ACM Outstanding Research Award, and a Royal Academy of 
Engineering Fellowship. Contact him at mark.harman@ucl.ac.uk.
mailto:filippo.ricca@unige.it
mailto:boni.garcia@uc3m.es
mailto:michel.nass@inceptive.se
mailto:mark.harman@ucl.ac.uk
 JULY/AUGUST 2025 | IEEE SOFTWARE 33
W ith all of the code gen-
erated using GenAI, 
automated testing is 
more crucial than ever. AI-pow-
ered test automation is revolution-
izing the software testing landscape 
by improving efficiency, accuracy, 
and adaptability. Through ML, 
NLP, and computer vision, AI en-
ables advanced features such as test 
generation, self-healing tests, or 
predictive analytics, reducing man-
ual effort and paving the way for 
faster feedback cycles and higher- 
quality software.
Future directions include en-
hanced self-learning capabilities, 
integration with CI pipelines, and 
explainable AI to build trust in AI-
driven results. Multimodal testing, 
ethical AI, and human-AI collabora-
tion will further expand the scope of 
AI-powered testing. However, even 
with advances in AI, it seems unlikely 
that we will achieve fully autono-
mous testing, where AI independently 
designs, executes, and optimizes 
tests, or self-correcting software sys-
tems, which detect and fix issues 
automatically. Given the complex-
ity and unpredictability of software, 
full autonomy remains a distant goal, 
ensuring that testing will continue to 
require human oversight and that the 
role of the tester does not disappear 
anytime soon.14 
References
 1. I. Goodfellow et al., “Generative ad-
versarial networks,” Commun. ACM, 
vol. 63, no. 11, pp. 139–144, 2020, 
doi: 10.1145/3422622.
 2. A. Fan et al., “Large language 
models for software engineering: 
Survey and open problems,” in 
Proc. ICSE Future Softw. Eng. 
(FoSE), 2023, pp. 31–53, doi: 
10.1109/ICSE-FoSE59343.2023. 
00008.
 3. N. Alshahwan, M. Harman, I. 
Harper, A. Marginean, S. Sengupta, 
and E. Wang, “Assured LLM-based 
software engineering (keynote pa-
per),” in Proc. 2nd ICSE Work-
shop Interoperability Robustness 
Benchmarking Neural Softw. Eng. 
(InteNSE), 2024, pp. 7–12.
 4. D. Amalfitano, S. Faralli, J. C. 
R. Hauck, S. Matalonga, and D. 
Distante, “Artificial intelligence ap-
plied to software testing: A tertiary 
study,” ACM Comput. Surv., vol. 
56, no. 3, pp. 1–38, 2023, doi: 
10.1145/3616372.
 5. A. Aleti, “Software testing of gen-
erative AI systems: Challenges andopportunities,” in Proc. IEEE/
ACM Int. Conf. Softw. Eng.: Fu-
ture Softw. Eng. (ICSE-FoSE), 
Piscataway, NJ, USA: IEEE Press, 
2023, pp. 4–14, doi: 10.1109/
ICSE-FoSE59343.2023.00009.
 6. Y. Wang, Y. Pan, M. Yan, Z. Su, and 
T. H. Luan, “A survey on ChatGPT: 
AI–generated contents, challenges, and 
solutions,” IEEE Open J. Comput. 
Soc., vol. 4, pp. 280–302, 2023, doi: 
10.1109/OJCS.2023.3300321.
 7. M. Leotta, H. Z. Yousaf, F. Ricca, 
and B. Garcia, “AI-generated test 
scripts for web E2E testing with 
chatGPT and copilot: A preliminary 
study,” in Proc. 28th Int. Conf. Eval. 
Assessment Softw. Eng., 2024, pp. 
339–344.
 8. F. Ricca, A. Marchetto, and A. 
Stocco, “A multi-year grey literature 
review on ai-assisted test automation,” 
2024, arXiv:2408.06224.
 9. F. Ricca, A. Marchetto, and A. 
Stocco, “AI-based test automation: 
A grey literature analysis,” in Proc. 
IEEE Int. Conf. Softw. Testing, 
Verification Validation Workshops 
(ICSTW), Piscataway, NJ, USA: 
IEEE Press, 2021, pp. 263–270, 
doi: 10.1109/ICSTW52544. 
2021.00051.
 10. “State of testing™ report 2024,” 
PractiTest, Rehovot, Israel, 2024. 
Accessed: Mar. 18, 2025. [Online]. 
Available: https://www.practitest.
com/assets/pdf/stot-2024.pdf
 11. E. Brynjolfsson, T. Mitchell, and D. 
Rock, “What can machines learn, and 
what does it mean for occupations and 
the economy?” AEA Papers Proc., 
vol. 108, pp. 43–47, May 2018, 
doi: 10.1257/pandp.20181019.
 12. “World quality report 2023–2024,” 
Capgemini, Paris, France, 2023. 
Accessed: Mar. 18, 2025. [Online]. 
Available: https://www.capgemini.
com/wp-content/uploads/2023/11/
WQR_2023_FINAL_WEB 
_CG.pdf?utm_campaign=Software% 
2BTesting%2BWeekly&utm 
_medium=web&utm_source=Software 
_Testing_Weekly_204
 13. T. M. King, J. Arbon, D. Santiago, 
D. Adamo, W. Chin, and R. Shan-
mugam, “AI for testing today and 
tomorrow: Industry perspectives,” in 
Proc. IEEE Int. Conf. Artif. Intell. 
Testing (AITest), 2019, pp. 81–88, 
doi: 10.1109/AITest.2019.000-3.
 14. L. Layman and R. Vetter, “Gen-
erative artificial intelligence and the 
future of software testing,” Computer, 
vol. 57, no. 1, pp. 27–32, 2024, doi: 
10.1109/MC.2023.3306998.
Appendix: Related Articles
 A1. Y. Wang, S. Guo, and C. W. Tan, 
“From code generation to software 
testing: AI copilot with context-
based retrieval-augmented genera-
tion,” IEEE Softw., vol. 42, no. 
4, pp. 34–42, Jul./Aug. 2025, doi: 
10.1109/MS.2025.3549628.
 A2. T. Bardakci, S. Demeyer, and M. 
Beyazit, “Test amplification for 
REST APIs: Using “out-of-the-
box” large language models,” 
IEEE Softw., vol. 23, no. 1, 
pp. 43–49, Jul./Aug. 2025, doi: 
10.1109/MS.2025.3559664.
http://dx.doi.org/10.1145/3422622
https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
http://dx.doi.org/10.1145/3616372
https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00009
https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00009
http://dx.doi.org/10.1109/OJCS.2023.3300321
http://dx.doi.org/10.1109/ICSTW52544.2021.00051
http://dx.doi.org/10.1109/ICSTW52544.2021.00051
https://www.practitest.com/assets/pdf/stot-2024.pdf
https://www.practitest.com/assets/pdf/stot-2024.pdf
http://dx.doi.org/10.1257/pandp.20181019
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204
https://doi.org/10.1109/AITest.2019.000-3
http://dx.doi.org/10.1109/MC.2023.3306998
http://dx.doi.org/10.1109/MS.2025.3549628
http://dx.doi.org/10.1109/MS.2025.3559664
	025_42ms04-guest editorial-3559194(250096)

Mais conteúdos dessa disciplina