Prévia do material em texto
FOCUS: GUEST EDITORS’ INTRODUCTION Digital Object Identifier 10.1109/MS.2025.3559194 Date of current version: 10 June 2025 Next-Generation Software Testing: AI-Powered Test Automation JULY/AUGUST 2025 | IEEE SOFTWARE 25 0740-7459 © 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Filippo Ricca , Università di Genova Boni García, Universidad Carlos III de Madrid Michel Nass, Blekinge Institute of Technology Mark Harman, University College London https://orcid.org/0000-0002-3928-5408 26 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE FOCUS: GUEST EDITORS’ INTRODUCTION SOFTWARE TESTING HAS long been a cornerstone of software devel- opment, ensuring the delivery of high- quality, reliable, and secure systems. However, as software systems grow in complexity and scale, traditional testing methods might fail to address their needs. Manual testing is often time-consuming and error-prone. Au- tomated testing, while beneficial for efficiency, repeatability, and excellent coverage, presents challenges such as high initial costs and maintenance overhead. The promise of AI-powered test automation lies in its ability to ad- dress some of the most pressing chal- lenges in software testing. A Brief History of AI AI was first formalized in the 1950s with Alan Turing’s “Computing Ma- chinery and Intelligence,” introduc- ing the “Turing Test.” Early research focused on symbolic AI, using rule- based systems like LISP. However, limited computational power led to the first “AI winter” in the 1970s. The 1980s saw expert systems emerge, followed by a shift to ma- chine learning (ML) in the 1990s. Neural networks and statistical methods gained traction, but hard- ware and data limitations caused a second “AI winter” in the late 1990s. In the 21st century, AI progress has been accelerated with improved computational power, large datasets, and breakthroughs in deep learning, reinforcement learning, and trans- formers. These advances led to gen- erative AI (GenAI), enabling systems to create text, images, audio, and code.1 Notable models include Chat- GPT, DALL-E, GitHub Copilot, and Google Gemini. GenAI relies on large language models (LLMs) such as BERT, GPT, PaLM, and LLaMA, powering diverse natural language processing (NLP) applications. Why Software Testing is More Important Than Ever Software testing consists of the dy- namic evaluation of a piece of soft- ware, called a system under test (SUT), through a finite set of test cases (or simply tests). Testing im- plies the execution of a SUT using specific input values to assess the outcome or expected behavior. We distinguish two broad categories of software testing: manual and auto- mated. On the one hand, a person (such as a software tester or the fi- nal user) evaluates the SUT in man- ual testing. On the other hand, test automation uses specific software tools and frameworks to execute against the SUT the test scripts— executable test cases written in a programming language—produced by software testers. With the increasing deploy- ment of AI-generated code, soft- ware will be created faster, and in hybrid construction modes, with humans and machines contribut- ing together. Although automatic code generation has been deployed for many decades, it has previously been based largely on deterministic, rule-based approaches. Such rule- based approaches have tradition- ally been underpinned by strong correct-by-construction theories, based on decades of compiler the- ory. In contrast, the emerging LLM- based development paradigm2 is one in which LLM-generated code comes with few such guarantees. There have been attempts to impose frameworks for providing assur- ances for LLM-based code genera- tion.3 However, these frameworks cannot offer the same guarantees we are accustomed to in automated code generation. Language mod- els are inherently vulnerable to hallucinations, so we must develop software engineering processes and tools to detect and address bugs in the generated code. In this emerg- ing paradigm, code will undoubt- edly be generated far faster overall. The impact of buggy code is not di- minished however, whether it origi- nates from human or machine-based mistakes. The greater development speed requires testing techniques which can perform with equal alac- rity; precisely the problem addressed by automated software testing. The advent of LLM-based software en- gineering therefore makes software testing all of the more important, if not less challenging. AI in Software Testing There are two primary ways (not the only ones) that AI is being used in software testing: 1. General-Purpose AI for testing: This approach involves using general AI assistants that are not built specifically for testing but can help testers and develop- ers improve their processes and workflows. These particular assistants can be categorized into two types: code-specific AI assistants (e.g., GitHub Copilot, CodeWhisperer), which sug- gest code and automate writing tests (e.g., Copilot can generate JUnit tests for a function based on comments or code context), and general-purpose GenAI as- sistants (e.g., ChatGPT, Gemini, Claude), which assist with test planning, exploratory testing, bug report analysis, and generat- ing automated test cases across various testing frameworks (e.g., you can ask ChatGPT to suggest test cases for a given user story or generate Selenium scripts). JULY/AUGUST 2025 | IEEE SOFTWARE 27 2. AI-powered testing tools: These tools integrate AI algorithms designed to address common soft- ware testing challenges, enhancing traditional processes with models tailored for tasks like defect pre- diction, regression testing, and exploratory testing assistance. Ex- amples of these tools are provided later in the appropriate section. Here’s an analogy to illustrate the difference: 1. General AI for testing: Imagine using a general-purpose lan- guage model like a chatbot to get advice on how to clean your house. It can provide helpful tips and suggestions, but it is not a tool specifically designed for cleaning itself. 2. AI-powered testing tool: Imag- ine a specialized robot vacuum cleaner designed specifically for cleaning carpets. It has built-in sensors and algorithms to navi- gate and clean effectively. Advantages and Challenges of AI in Software Testing The application of AI in software testing offers significant benefits, transforming traditional testing pro- cesses and improving efficiency, ac- curacy, and coverage. Some of the most relevant software testing facets supported by AI are the following4,5: • Test case generation: Why? Tra- ditional test case creation is man- ual, time-consuming, and often incomplete, leading to gaps in test coverage. AI can help by analyz- ing software requirements, code, and user behavior to generate test cases and test scripts. How far have we come? AI models can now generate test cases cover- ing edge cases and critical paths. Tools like EvoSuite and DiffBlue Cover leverage AI algorithms to automate this process. What does AI bring? AI-driven test genera- tion improves efficiency, reduces human effort, and ensures better test coverage, especially for com- plex software systems. • Test case prioritization: Why? Running all test cases is imprac- tical, especially with large test suites. Prioritization ensures that the most critical tests run first, detecting high-risk defects early. How far have we come? AI- based tools (e.g., SBTTool and Testomat.io) use historical data, risk analysis, code changes and other data to dynamically priori- tize test cases. ML models can now predict which test cases are most likely to fail. What does AI bring? AI-driven prioritiza- tion reduces testing time while maintaining high defect detec- tion rates, making continuousintegration/continuous delivery (CI/CD) pipelines more efficient. • Exploratory testing assistance: Why? Traditional exploratory testing relies on tester intuition and experience, making it dif- ficult to scale and automate. AI can guide testers by identifying untested areas and generating exploratory scenarios. How far have we come? AI-driven tools (e.g., TestRigor) now sug- gest areas needing exploration, analyze application behavior, and propose test ideas dynami- cally. What does AI bring? AI enhances exploratory testing by making it simpler and data- driven, reducing the risk of miss- ing critical defects. • Test oracle generation: Why? Determining whether software behavior is correct (i.e., defin- ing test oracles) is a major chal- lenge, particularly for complex systems. AI can help by predict- ing expected outputs. How far have we come? AI-based test tools (e.g., TOGA and LLMs) use AI models and predictive analytics to validate test results, especially in scenarios where traditional oracles are unavail- able. What does AI bring? AI-driven test oracles reduce human effort in defining ex- pected outcomes, improving the ability to test autonomous and evolving systems. • Predictive analysis: Why? Pre- dicting defects before they occur helps prevent costly failures and improves software reliabil- ity. How far have we come? AI models (for example, those used in Testers.ai and Test.Predic- tor) now analyze code qual- ity, historical defect data, and development patterns to predict potential problem areas. What does AI bring? Predictive AI improves software quality by enabling proactive defect preven- tion, reducing debugging time, and enhancing test planning. • Visual testing: Why? Ensuring UI consistency across multiple devices and screen configura- tions is challenging. AI-powered computer vision can automate this process. How far have we come? AI-driven tools like Ap- plitools and Microsoft’s Visual AI detect UI inconsistencies that human testers might miss. What does AI bring? AI automates visual verification, improving ac- curacy, scalability, and efficiency in UI testing. 28 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE FOCUS: GUEST EDITORS’ INTRODUCTION • Self-healing test automa- tion: Why? Automated tests break when UI or functionality changes, leading to high main- tenance costs. AI enables self- healing mechanisms to adapt to these changes. How far have we come? AI-based tools such as TestRigor and Mabl can now automatically update test scripts when application changes are de- tected. What does AI bring? AI reduces test maintenance effort, making automated testing more resilient and cost-effective. • Test case maintenance: Why? Over time, test suites become bloated with redundant or obso- lete test cases, making main- tenance a challenge. AI helps optimize test suites. How far have we come? AI tools (e.g., Testim) now detects redundant tests, identifies gaps, identify and update tests affected by changes, and recommends opti- mizations to improve coverage. What does AI bring? AI-driven test maintenance ensures test suites remain effective, reducing execution time and improving overall software quality. However, the use of AI not only brings benefits, but there are also chal- lenges to overcome, such as follows5,6: • Accuracy: AI-based testing tools can produce false positives (identifying nonexistent issues) or false negatives (missing actual defects), leading to unreliable test results. In addition, AI may struggle with complex test sce- narios that require a deep under- standing of context or nuanced human judgment. • Privacy concerns: Source code may contain sensitive or proprietary information, and sharing this information in AI models can lead to unintentional disclosure or violate confiden- tiality agreements or company policies. • Security vulnerabilities: AI tools can be targets for cyberattacks, potentially compromising the testing process. For instance, during AI model training, attack- ers may use poisoned samples to degrade model performance or launch backdoor attacks to ma- nipulate the outcomes. • Explainability: AI-driven testing tools often function as “black boxes,” making it hard to un- derstand, e.g., why specific tests pass or fail. • Model bias: AI models used in testing can inherit biases from training data, leading to unfair or inconsistent testing results. • Over-reliance on AI: Developers and testers may become too de- pendent on AI, reducing critical thinking and problem-solving skills. General-Purpose GenAI Assistants The growing popularity of GenAI assistants has led to significant inno- vations in the field of software test- ing. A practical example of their use is the automatic generation of test cases or test scripts. GenAI assis- tants can analyze software require- ments or alternative descriptions (e.g., user stories or Gherkin speci- fication of the test case) and auto- matically generate relevant tests, reducing the time and effort needed for manual test creation. In this motivational example, we demonstrate how a GenAI assistant can generate a test case, step by step, from a simple test objective: “Add the BILLY bookshelf to the cart on the IKEA web application.” The Billy bookshelf is an iconic book- case sold by IKEA. It is known for its simple, functional, and afford- able design. Launched in 1979, it has become one of the company’s best-selling products. Thanks to its versatility, the Billy bookshelf is of- ten used for books, decorative items, and even collections. The assistant requires four key components: an input [such as docu- ment object model (DOM) or screen capture], an output (the available actions such as “click” or “type,” a short-term memory (the actions performed), and a goal (the objec- tive of the test). Because the assistant (powered by an LLM) is unlikely to know how the IKEA web application works in advance, we cannot simply ask for the entire test sequence out- right. Instead, we proceed one action at a time. First, we include the test objec- tive, the DOM (and optionally a screen capture), a list of possible ac- tions, and instructions to choose the next step. Suppose that the assistant searches for “BILLY” using the site’s search bar. We execute this action, perhaps with Selenium WebDriver or Playwright, and record it as the first test step. Next, we include this executed step in the following prompt to the assistant, ensuring that it does not repeat the same action. We continue in this iterative manner until the as- sistant confirms that the objective is fulfilled by successfully placing the BILLY bookshelf in the cart. At this stage, the test is complete and the as- sembled test steps form our fully au- tomated test case. An alternative approach, widely used in practice by testers, is to start JULY/AUGUST 2025 | IEEE SOFTWARE 29 with a Gherkin specification of the test case such as that shown in Fig- ure 1 and then ask a GenAI assis- tant, such as OpenAI ChatGPT, to generate the corresponding Selenium WebDriver test script.7 Gherkin is a structured language used to define test cases in a human-readable for- mat, following a given-when-then structure commonly used in behav- ior-driven development. A possible prompt to submit to ChatGPT could be the following: “Generate a Java Selenium Web- Driver test script starting from the following Gherkin specification” and insert the Gherkin shown in Fig- ure 1. The output of ChatGPT might look something like the code shown in Figure 2, where each Gherkin step has been automatically converted into the corresponding Selenium WebDriver command. At this point, the tester should refine the code by adjusting the locators (a mechanism used in automation testing frame- works to identify and interact with web elements on a page) that were guessed by ChatGPT, which does nothave access to the web page, FIGURE 1. Gherkin specification for adding a BILLY bookshelf to the cart. FIGURE 2. Selenium WebDriver test script for adding a BILLY bookshelf to the cart. 30 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE FOCUS: GUEST EDITORS’ INTRODUCTION adding the specific parts of the ex- ecution environment where the code will run (such as the browser driver setup), and manually adding waits at critical points in the test script. The software tester can manually refine the test script, or they can reformu- late specific prompts and request ChatGPT to modify the test script accordingly. This second approach is based on the process of prompt chain, which refers to a series of interconnected prompts used to refine, enhance or expand information. A possible prompt chain could be built to have ChatGPT fix the locators, one by one, by providing in a prompt the HTML web page and attempting to execute the test script, while provid- ing any execution error messages. Clearly, the automatic generation of test scripts is just one of the possi- ble tasks that can be performed with GenAI assistants. Other possibilities that save time and money include, for example, the refactoring of test suites (for instance, by introducing a design pattern like the page object pattern, or by changing the programming language or testing framework), ex- tending a test suite with new data considering corner cases, or even the automatic introduction of waiting commands to synchronize the execu- tion speed of a test suite with that of the application being tested. AI-Powered Test Automation Tools The market for test automation tools is constantly evolving, driven by their crucial role in modern software testing. Today, these tools take ad- vantage of various AI techniques to improve different aspects of software testing. A recent survey8 highlighted a significant increase in the avail- ability of new tools for testers. Their number has grown significantly in a short time, and new options have ap- peared on the market. In the survey, the authors counted more than 100 testing tools available. A new phe- nomenon that is taking hold is the integration of GenAI mechanisms within these tools (see, for example, TestIM and Functionize). Table 1 summarizes some of the most relevant AI-powered test auto- mation tools in the gray literature to- day.9 Functionize allows developers to describe a test scenario in plain English and automatically converts it into an executable test. If the ap- plication’s UI changes, Function- ize’s self-healing capability ensures the test adapts without manual in- tervention. Applitools takes a dif- ferent approach, focusing on visual validation, comparing screenshots, and flagging visual bugs while in- telligently ignoring dynamic ele- ments such as ads or timers. Mabl learns from user interactions to au- tomatically create test scripts when TABLE 1. AI-powered test automation tools. Tool Website Target AI-Driven Features Functionize https://www.functionize.com/ Web, mobile, API. Self-healing tests, test generation, visual testing Applitools https://www.applitools.com/ Web, mobile. Visual testing Mabl https://www.mabl.com/ Web, mobile, API. Self-healing tests, test maintenance, predictive analytics Testim https://www.testim.io/ Web Self-healing tests, test generation, test maintenance Testers.ai https://testers.ai/ Web Exploratory testing, test generation Appvance https://appvance.ai/ Web, mobile. Test generation, self-healing tests, performance testing The software tester can manually refine the test script, or they can reformulate specific prompts and request ChatGPT to modify the test script accordingly. https://www.functionize.com/ https://www.applitools.com/ https://www.mabl.com/ https://www.testim.io/ https://testers.ai/ https://appvance.ai/ JULY/AUGUST 2025 | IEEE SOFTWARE 31 navigating a web application. If the UI evolves, its self-healing feature updates the tests, and its AI-driven performance monitoring can detect anomalies, like a sudden drop in re- sponse time, before they impact us- ers. Testim creates tests from natural language. If an element’s location changes, its self-healing locators ad- just automatically, and its root cause analysis pinpoints why a test failed. Testers.ai can analyze historical data and code changes to predict where defects could occur, allowing teams to focus their efforts on high-risk areas. Finally, Appvance combines functional and performance testing to run tests that check whether a fea- ture works and evaluate its perfor- mance under load. According to the 2024 State of Testing Report,10 the most popular ways to use AI-powered testing tools are the following. Approximately 25% utilize these tools to create test cases, while 23% focus on optimiz- ing test cases. Furthermore, 20% le- verage AI to plan the testing process. The largest group, 32%, employs AI- driven tools for multiple purposes, including generating test cases, cre- ating test automation scripts, man- aging test data, and identifying bugs in test code. Special Issues Articles The Special Issue of IEEE Software on AI-powered Test Automation, which we are managing, attracted 12 submissions covering a wide range of topics. We were supported by more than 25 reviewers, who provided constructive feedback and helped the authors improve their contributions. Each submitted manuscript was re- viewed by at least two reviewers, and in most cases, by three. Con- flicts of interest were strictly man- aged throughout the review process. Ultimately, only two manuscripts were selected for inclusion in this special issue, reflecting the highly se- lective nature of the review process. In the following, we summarize the key contributions of the papers included in this special issue. The article “From Code Genera- tion to Software Testing: AI Copilot with Context-Based Retrieval-Aug- mented Generation” by Yuchen Wang et al.[A1] presents Copilot for Testing, an AI-assisted automated testing sys- tem that integrates bug detection, fix suggestions, and test case genera- tion directly within the development environment. The system extends the capabilities of LLMs through a context-based retrieval-augmented generation (RAG) approach, dynami- cally retrieving relevant code context to enhance prompt construction. By modeling the codebase as a graph with context embeddings that update in real time based on code changes, the system optimizes software test- ing efficiency, accuracy, and cover- age. Their evaluation demonstrates significant improvements, including a 31.2% increase in bug detection accuracy, a 12.6% boost in critical test coverage, and a 10.5% higher user acceptance rate compared to traditional methods. The article po- sitions AI as a transformative force in software testing, shifting from a reactive bug detection paradigm to a proactive, context-aware approach that seamlessly integrates with mod- ern CI workflows. By leveraging past work on AI-assisted coding tools like Copilot for Xcode, the authors high- light how the same principles can be applied to software testing, ensuring more reliable and efficient develop- ment processes. The article “Test Amplification for REST APIs: Using ‘Out-of-the- Box’ Large Language Models” by Tolgahan Bardakci et al.[A2] explores the use of LLMs such as ChatGPT 3.5, ChatGPT 4, and GitHub Copi- lot for amplifying Representational State Transfer (REST) application programming interface (API) test suites. Given the complexity of test- ing REST APIs—due to their dis- tributed nature and the necessity of boundary value testing—the study investigates whether LLMs can en- hance test coverage while main- taining readability. Using the open source PetStore application as a tes- tbed, the authors compare different prompts and analyze their impact on the number of generated tests, APIcoverage, and readability. They find that providing an API specifica- tion as input significantly improves coverage, and requesting the maxi- mum number of test cases further enhances testing effectiveness. While LLMs occasionally generate invalid or deprecated test cases, they also expose API bugs. ChatGPT 4, in particular, generates the most bug- exposing tests. The study concludes with recommendations for optimiz- ing LLM-generated test prompts, emphasizing the balance between test strength and understandability. Will AI Replace Software Testers? History has shown that while tech- nology transforms job roles, it sel- dom eradicates them. Instead, it redefines them, bringing new oppor- tunities and challenges. For exam- ple, when test automation was first introduced in software testing, many feared that manual testers would be- come obsolete. However, that pre- diction did not come true. All market analysts and scien- tific researchers agree that, like many other roles,11 the software tes- ter position would be significantly 32 IEEE SOFTWARE | W W W.COMPUTER.ORG/SOFT WARE | @IEEESOFT WARE FOCUS: GUEST EDITORS’ INTRODUCTION reshaped by the AI revolution. While AI-powered tools are transforming the field by automating repetitive tasks and improving efficiency, hu- man testers remain essential for their critical thinking. This is further sup- ported by a Capgemini survey in 2023,12 which found that 61% of or- ganizations believe that human tes- ters will remain essential, even with AI integration. In the same direc- tion, all available web articles in the gray literature related to AI and soft- ware testing agree that AI will not completely replace software testers, at least not in the near future.9 Software testers will not disap- pear, although their number and role may change,11 for several rea- sons. First and foremost, humans have always survived technological revolutions—consider the Industrial Revolution when machines began replacing workers, the rise of in- dustrial assembly lines/chains, and, more recently, the Web revolution. Humans have always leveraged ma- chines to their advantage to complete their tasks faster and achieve better results. Why should this time be dif- ferent, and why specifically in the context of software testing? Some pessimistic predictions from a 2017 survey on AI’s impact on software testing13 have not materialized. Of 328 respondents, 65% expected AI to replace manual testers by 2025, and 59% believed test automation engineers would be replaced by 2027. When predictions repeatedly fail, the anticipated factor—AI, in this case—is likely overestimated in its ability to ensure software quality. All the web sources analyzed agree that AI cannot fully replace human testers for several key reasons. One major limitation is exploratory test- ing and critical thinking. Although AI is highly effective in performing repetitive automated tests, it strug- gles with exploratory testing based on intuition, creativity, and adapt- ability. Human testers can identify unexpected bugs by thinking beyond predefined test cases. In addition, AI requires human supervision. AI-pow- ered testing tools need proper config- uration, fine-tuning, and validation. When AI misinterprets test results, only a human tester can accurately distinguish between false positives, false negatives, or truly critical issues, ensuring the reliability of the testing process. These reasons are also given by ChatGPT-4o itself in response to the question, “Why can AI not re- place software testers?” ChatGPT-4o concludes with the following slogan, with which we fully agree: AI + hu- man testers = stronger testing. In conclusion, the authors of this editorial believe that while AI will enhance the speed and productiv- ity of building and testing software, it will not yet be capable of autono- mously developing the software and ensuring its quality. A B O U T T H E A U T H O R S FILIPPO RICCA is a full professor at the University of Genova, 16146 Genova, Italy. His research focuses on web application testing and test automation, combining tool development with empirical methods such as case studies, experiments, and sur- veys. He received ICSE and ICST Most Influential Paper awards for his contributions to the field. Contact him at filippo.ricca@unige.it BONI GARCÍA is an associate professor at Universidad Car- los III de Madrid, 28903 Madrid, Spain. His research focuses on automated software testing. He is a committer to the Sele- nium project and creator of tools like WebDriverManager and Selenium-Jupiter. He is also the author of two books and over 45 research papers. Contact him at boni.garcia@uc3m.es. MICHEL NASS is a postdoc researcher at the Blekinge Insti- tute of Technology, SE-371 79 Karlskrona, Sweden. His work focuses on software testing, with expertise in test automation, test management, and coaching. He holds an M.Sc. in comput- er science from Chalmers and a Ph.D. in software engineering from BTH. Contact him at michel.nass@inceptive.se. MARK HARMAN is a research scientist at Meta Platforms, working on software engineering automation in the Instagram Product Performance team. He cofounded Meta’s Simulation- Based Testing team and contributed to tools like Sapienz and WW. He is also a part-time professor at University College, WC1E 6BT London,U.K. He cofounded the field of Search-Based Soft- ware Engineering and has received the IEEE Harlan Mills Award, the ACM Outstanding Research Award, and a Royal Academy of Engineering Fellowship. Contact him at mark.harman@ucl.ac.uk. mailto:filippo.ricca@unige.it mailto:boni.garcia@uc3m.es mailto:michel.nass@inceptive.se mailto:mark.harman@ucl.ac.uk JULY/AUGUST 2025 | IEEE SOFTWARE 33 W ith all of the code gen- erated using GenAI, automated testing is more crucial than ever. AI-pow- ered test automation is revolution- izing the software testing landscape by improving efficiency, accuracy, and adaptability. Through ML, NLP, and computer vision, AI en- ables advanced features such as test generation, self-healing tests, or predictive analytics, reducing man- ual effort and paving the way for faster feedback cycles and higher- quality software. Future directions include en- hanced self-learning capabilities, integration with CI pipelines, and explainable AI to build trust in AI- driven results. Multimodal testing, ethical AI, and human-AI collabora- tion will further expand the scope of AI-powered testing. However, even with advances in AI, it seems unlikely that we will achieve fully autono- mous testing, where AI independently designs, executes, and optimizes tests, or self-correcting software sys- tems, which detect and fix issues automatically. Given the complex- ity and unpredictability of software, full autonomy remains a distant goal, ensuring that testing will continue to require human oversight and that the role of the tester does not disappear anytime soon.14 References 1. I. Goodfellow et al., “Generative ad- versarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020, doi: 10.1145/3422622. 2. A. Fan et al., “Large language models for software engineering: Survey and open problems,” in Proc. ICSE Future Softw. Eng. (FoSE), 2023, pp. 31–53, doi: 10.1109/ICSE-FoSE59343.2023. 00008. 3. N. Alshahwan, M. Harman, I. Harper, A. Marginean, S. Sengupta, and E. Wang, “Assured LLM-based software engineering (keynote pa- per),” in Proc. 2nd ICSE Work- shop Interoperability Robustness Benchmarking Neural Softw. Eng. (InteNSE), 2024, pp. 7–12. 4. D. Amalfitano, S. Faralli, J. C. R. Hauck, S. Matalonga, and D. Distante, “Artificial intelligence ap- plied to software testing: A tertiary study,” ACM Comput. Surv., vol. 56, no. 3, pp. 1–38, 2023, doi: 10.1145/3616372. 5. A. Aleti, “Software testing of gen- erative AI systems: Challenges andopportunities,” in Proc. IEEE/ ACM Int. Conf. Softw. Eng.: Fu- ture Softw. Eng. (ICSE-FoSE), Piscataway, NJ, USA: IEEE Press, 2023, pp. 4–14, doi: 10.1109/ ICSE-FoSE59343.2023.00009. 6. Y. Wang, Y. Pan, M. Yan, Z. Su, and T. H. Luan, “A survey on ChatGPT: AI–generated contents, challenges, and solutions,” IEEE Open J. Comput. Soc., vol. 4, pp. 280–302, 2023, doi: 10.1109/OJCS.2023.3300321. 7. M. Leotta, H. Z. Yousaf, F. Ricca, and B. Garcia, “AI-generated test scripts for web E2E testing with chatGPT and copilot: A preliminary study,” in Proc. 28th Int. Conf. Eval. Assessment Softw. Eng., 2024, pp. 339–344. 8. F. Ricca, A. Marchetto, and A. Stocco, “A multi-year grey literature review on ai-assisted test automation,” 2024, arXiv:2408.06224. 9. F. Ricca, A. Marchetto, and A. Stocco, “AI-based test automation: A grey literature analysis,” in Proc. IEEE Int. Conf. Softw. Testing, Verification Validation Workshops (ICSTW), Piscataway, NJ, USA: IEEE Press, 2021, pp. 263–270, doi: 10.1109/ICSTW52544. 2021.00051. 10. “State of testing™ report 2024,” PractiTest, Rehovot, Israel, 2024. Accessed: Mar. 18, 2025. [Online]. Available: https://www.practitest. com/assets/pdf/stot-2024.pdf 11. E. Brynjolfsson, T. Mitchell, and D. Rock, “What can machines learn, and what does it mean for occupations and the economy?” AEA Papers Proc., vol. 108, pp. 43–47, May 2018, doi: 10.1257/pandp.20181019. 12. “World quality report 2023–2024,” Capgemini, Paris, France, 2023. Accessed: Mar. 18, 2025. [Online]. Available: https://www.capgemini. com/wp-content/uploads/2023/11/ WQR_2023_FINAL_WEB _CG.pdf?utm_campaign=Software% 2BTesting%2BWeekly&utm _medium=web&utm_source=Software _Testing_Weekly_204 13. T. M. King, J. Arbon, D. Santiago, D. Adamo, W. Chin, and R. Shan- mugam, “AI for testing today and tomorrow: Industry perspectives,” in Proc. IEEE Int. Conf. Artif. Intell. Testing (AITest), 2019, pp. 81–88, doi: 10.1109/AITest.2019.000-3. 14. L. Layman and R. Vetter, “Gen- erative artificial intelligence and the future of software testing,” Computer, vol. 57, no. 1, pp. 27–32, 2024, doi: 10.1109/MC.2023.3306998. Appendix: Related Articles A1. Y. Wang, S. Guo, and C. W. Tan, “From code generation to software testing: AI copilot with context- based retrieval-augmented genera- tion,” IEEE Softw., vol. 42, no. 4, pp. 34–42, Jul./Aug. 2025, doi: 10.1109/MS.2025.3549628. A2. T. Bardakci, S. Demeyer, and M. Beyazit, “Test amplification for REST APIs: Using “out-of-the- box” large language models,” IEEE Softw., vol. 23, no. 1, pp. 43–49, Jul./Aug. 2025, doi: 10.1109/MS.2025.3559664. http://dx.doi.org/10.1145/3422622 https://doi.org/10.1109/ICSE-FoSE59343.2023.00008 https://doi.org/10.1109/ICSE-FoSE59343.2023.00008 http://dx.doi.org/10.1145/3616372 https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00009 https://doi.ieeecomputersociety.org/10.1109/ICSE-FoSE59343.2023.00009 http://dx.doi.org/10.1109/OJCS.2023.3300321 http://dx.doi.org/10.1109/ICSTW52544.2021.00051 http://dx.doi.org/10.1109/ICSTW52544.2021.00051 https://www.practitest.com/assets/pdf/stot-2024.pdf https://www.practitest.com/assets/pdf/stot-2024.pdf http://dx.doi.org/10.1257/pandp.20181019 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://www.capgemini.com/wp-content/uploads/2023/11/WQR_2023_FINAL_WEB_CG.pdf?utm_campaign=Software%2BTesting%2BWeekly&utm_medium=web&utm_source=Software_Testing_Weekly_204 https://doi.org/10.1109/AITest.2019.000-3 http://dx.doi.org/10.1109/MC.2023.3306998 http://dx.doi.org/10.1109/MS.2025.3549628 http://dx.doi.org/10.1109/MS.2025.3559664 025_42ms04-guest editorial-3559194(250096)