Research

Publication List

Conference Papers Book Chapters Journal Articles Theses Miscellaneous

Activities

Program Committee Organizing Committee

Profiles

Google Scholar DBLP ResearchGate ACM ORCID Web of Science (Researcher ID)

Publication List

Conference Papers

UML is Back. Or is it? Investigating the Past, Present, and Future of UML in Open Source Software
Joseph Romeo, Marco Raglianti, Csaba Nagy and Michele Lanza
In Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025). 2025.

Abstract:

Since its inception, UML, the Unified Modeling Language, has been touted as the way to go when it comes to designing and documenting software systems. While being an integral part of many university software engineering programs, UML has found little consideration among developers, especially in open source software. Reasons for this include that UML shares some shortcomings with other forms of documentation (e.g., limited availability, outdatedness, inadequate level of detail). We present a study to investigate the evolution and the current situation regarding the use of UML in open source projects. We mined and analyzed ~13k GitHub projects, developing strategies and heuristics to identify UML files through their extensions and contents, for a quantitative analysis of two decades of evolution of the usage of UML. We explored the popularity of UML, derived characteristics of projects leveraging UML, and analyzed the authors, creators and maintainers, of UML artifacts. Our study confirms that UML is indeed still under-utilized. At the same time we found evidence of a resurgence coinciding with the popularity of human-readable text-based formats, defined and used by tools like PlantUML and Mermaid. We discuss how identifying and addressing the new challenges implied by this resurgence could impact the future of UML.
BibTex:

@inproceedings{Romeo2025, author = {Romeo, Joseph and Raglianti, Marco and Nagy, Csaba and Lanza, Michele}, title = {UML is Back. Or is it? Investigating the Past, Present, and Future of UML in Open Source Software}, booktitle = {Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025)}, year = {2025} }
Capturing and Understanding the Drift Between Design, Implementation, and Documentation
Joseph Romeo, Marco Raglianti, Csaba Nagy and Michele Lanza
In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024), Early Research Achievements (ERA) Track. Lisbon, Portugal, pp. 382-386, apr 15-16, 2024.
DOI PDF

Abstract:

UML artifacts constitute a key (but often neglected) asset supporting the comprehension of a system. Design documents "bind" developers in implementation phases and close the loop as documentation of the implemented system itself. Nevertheless, the intended system (design), its current version (implementation), and its documentation, naturally tend to drift apart, negatively impacting the usefulness of UML diagrams contained in such artifacts. We present a novel approach to capture and understand the Design-Implementation-Documentation (DID) drift. We connect UML references in human-readable text-based UML formats (e.g., PlantUML) to the corresponding source code entities (e.g., Java classes), implementing novel metrics to capture the UML coverage of the system. We analyze project and file coverage evolution across releases and commits, with overall, method-level, and attribute-level detailedness, showing how they support DID drift analysis. We present interesting case studies exemplifying how through Drifter, the visual exploration tool we developed to validate our approach, we identify DID drift and possible ways to tackle it in the future.
BibTex:

@inproceedings{Romeo2024, author = {Romeo, Joseph and Raglianti, Marco and Nagy, Csaba and Lanza, Michele}, title = {Capturing and Understanding the Drift Between Design, Implementation, and Documentation}, booktitle = {Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC 2024)}, year = {2024}, pages = {382-386}, doi = {10.1145/3643916.3644399} }
A Multivocal Mapping Study of MongoDB Smells
Boris Cherry, Jehan Bernard, Thomas Kintziger, Csaba Nagy, Anthony Cleve and Michele Lanza
In Proceedings of the 31th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2024), Research Papers Track. pp. 792-803, IEEE, mar 12-15, 2024.
DOI PDF

Abstract:

Code smells are symptoms of poor design or bad implementation choices. Their automatic detection is helpful for various reasons. For example, the detected smells can guide developers during code inspection to find the causes of maintenance problems. Many code smells have been proposed for several technologies, including database communication, such as ORM or SQL antipatterns. However, despite its popularity, no research has been conducted on MongoDB smells. We present a systematic multivocal literature mapping study, also covering ``grey'' literature, to build a catalog of MongoDB code smells. After evaluating 1,498 artifacts (e.g., blog posts, online articles, book chapters, scientific papers, presentation slides, and videos) from 12 search engines, we manually reviewed 174 sources and devised a catalog of 76 smells organized into 11 categories. We present the catalog of MongoDB code smells through a series of examples.
BibTex:

@inproceedings{Cherry2024a, author = {Cherry, Boris and Bernard, Jehan and Kintziger, Thomas and Nagy, Csaba and Cleve, Anthony and Lanza, Michele}, title = {A Multivocal Mapping Study of MongoDB Smells}, booktitle = {Proceedings of the 31th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2024)}, publisher = {IEEE}, year = {2024}, pages = {792-803}, doi = {10.1109/SANER60148.2024.00086} }
SMEAGOL: A Static Code Smell Detector for MongoDB
Boris Cherry, Csaba Nagy, Michele Lanza and Anthony Cleve
In Proceedings of the 31th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2024), Tools Demo Track. pp. 816-820, IEEE, mar 12-15, 2024.
DOI PDF

Abstract:

MongoDB is one of the most popular NoSQL database engines. To foster scalability, it provides multiple features such as schema-less data storage or sharding. However, those new features introduce additional considerations for the maintainer to be careful, which might lead to erroneous implementation choices often referred to as code smells or antipatterns. Detecting and fixing those code smells can play a crucial role for developers in their maintenance efforts. We present SMEAGOL (SMEll and Antipattern detection for monGOdb appLications), a static analysis tool to detect MongoDB code smells in JavaScript applications. SMEAGOL relies on CodeQL and detects code smells by analyzing and extracting all the necessary information (e.g., data structure) from the database access code of the application. We demonstrate it by examining the evolution of MongoDB code smells in five popular open-source projects, showing promising results. Video link: https://youtu.be/h4Xbp9dIFt0 Repository link: https://github.com/bocherry/SMEAGOL_tool
BibTex:

@inproceedings{Cherry2024, author = {Cherry, Boris and Nagy, Csaba and Lanza, Michele and Cleve, Anthony}, title = {SMEAGOL: A Static Code Smell Detector for MongoDB}, booktitle = {Proceedings of the 31th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2024)}, publisher = {IEEE}, year = {2024}, pages = {816-820}, doi = {10.1109/SANER60148.2024.00088} }
On the Rise of Modern Software Documentation
Marco Raglianti, Csaba Nagy, Roberto Minelli, Bin Lin and Michele Lanza
In Proceedings of the 37th European Conference on Object-Oriented Programming (ECOOP 2023), Pearl/Brave New Idea. Seattle, Washington, United States, Vol. 263, pp. 43:1-43:24, Schloss Dagstuhl -- Leibniz-Zentrum f\"ur Informatik, jul 17-21, 2023.
DOI PDF

Abstract:

Classical software documentation, as it was conceived and intended decades ago, is not the only reality anymore. Official documentation from authoritative and official sources is being replaced by real-time collaborative platforms and ecosystems that have seen a surge, influenced by changes in society, technology, and best practices. These modern tools influence the way developers document the conception, design, and implementation of software. As a by-product of these shifts, developers are changing their way of communicating about software. Where once official documentation stood as the only truth about a project, we now find a multitude of volatile and heterogeneous documentation sources, forming a complex and ever-changing documentation landscape. Software projects often include a top-level \readme file with important information, which we leverage to identify their documentation landscape. Starting from ~12K GitHub repositories, we mine their \readme files to extract links to additional documentation sources. We present a qualitative analysis, revealing multiple dimensions of the documentation landscape (e.g., content type, source type), highlighting important insights. By analyzing instant messaging application links (e.g., Gitter, Slack, Discord) in the histories of README files, we show how this part of the landscape has grown and evolved in the last decade. Our findings show that modern documentation encompasses communication platforms, which are exploding in popularity. This is not a passing phenomenon: On the contrary, it entails a number of unknowns and socio-technical problems the research community is currently ill-prepared to tackle.
BibTex:

@inproceedings{Raglianti2023, author = {Raglianti, Marco and Nagy, Csaba and Minelli, Roberto and Lin, Bin and Lanza, Michele}, editor = {Ali, Karim and Salvaneschi, Guido}, title = {On the Rise of Modern Software Documentation}, booktitle = {Proceedings of the 37th European Conference on Object-Oriented Programming (ECOOP 2023)}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f\"ur Informatik}, year = {2023}, volume = {263}, pages = {43:1-43:24}, doi = {10.4230/LIPIcs.ECOOP.2023.43} }
Contribution-Based Firing of Developers?
Vincenzo Orrei, Marco Raglianti, Csaba Nagy and Michele Lanza
In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023), Ideas, Visions and Reflections Track. San Francisco, California, United States, pp. to appear, ACM, dec 3-9, 2023.
DOI PDF

Abstract:

There has been some recent clamor about the developer layoff and turnover policies enacted by high-profile corporate executives. Precisely defining the contributions in software development has always been a thorny issue, as it is difficult to establish a developer's "performance" without recurring to guesswork, due to how software development works and how Git persists history. Taking inspiration from a seemingly informal notion, the pony factor, we present an approach to identify the key developers in a software project. We present an analysis of 1,011 GitHub repositories, providing fact-based reflections on development contributions.
BibTex:

@inproceedings{Orrei2023, author = {Orrei, Vincenzo and Raglianti, Marco and Nagy, Csaba and Lanza, Michele}, title = {Contribution-Based Firing of Developers?}, booktitle = {Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023)}, publisher = {ACM}, year = {2023}, pages = {to appear}, doi = {10.1145/3611643.3613085} }
SYN: Ultra-Scale Software Evolution Comprehension
Gianlorenzo Occhipinti, Csaba Nagy, Roberto Minelli and Michele Lanza
In Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023), Tool Demo Track. Melbourne, Australia, pp. 69-73, IEEE, may 15-16, 2023.
DOI PDF

Abstract:

The comprehension of very large-scale software system evolution remains a challenging problem due to the sheer amount of time-based (i.e., a sequence of changes) data and its intrinsically complex nature (i.e., heterogeneous changes across the entire system source code). It is a necessary step for program comprehension, as systems are not simply created out of thin air in a bang, but are the sum of many changes over long periods of time, by various actors and due to various circumstances. We present SYN, a web-based tool that uses versatile visualization and data processing techniques to create scalable depictions of ultra-scale software system evolution. SYN has been successfully applied on several systems versioned on GitHub, including the nearly 20-year history of the Linux operating system, which totals more than one million commits on more than 100k evolving files. Webpage of the tool and demo video: https://syn.si.usi.ch
BibTex:

@inproceedings{Occhipinti2023, author = {Occhipinti, Gianlorenzo and Nagy, Csaba and Minelli, Roberto and Lanza, Michele}, title = {SYN: Ultra-Scale Software Evolution Comprehension}, booktitle = {Proceedings of the 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023)}, publisher = {IEEE}, year = {2023}, pages = {69-73}, doi = {10.1109/ICPC58990.2023.00020} }
DiscOrDance: Visualizing Software Developers Communities on Discord
Marco Raglianti, Csaba Nagy, Roberto Minelli and Michele Lanza
In Proceedings of the 38th International Conference on Software Maintenance and Evolution (ICSME 2022), Tool Demo Track. Limassol, Cyprus, pp. 474-478, IEEE, oct 3-7, 2022.
DOI PDF

Abstract:

New communication platforms have emerged to support developers in finding and creating the knowledge they need for program comprehension, maintenance, and evolution. Instant messaging applications are supplanting developer mailing lists in collaborative development toolchains. These applications provide a new medium, supporting faster and richer communication (e.g., embedded previews, images, files, videos). Research so far focused on extracting information from these platforms, but there is a lack of tools to visually and interactively explore them. We present DiscOrDance, a tool for the interactive visual exploration of the complete message history of a Discord server. We show how three categories of views elicit insights on aspects of the structure, members, and software related content of a Discord server. We demonstrate use cases of DiscOrDance to support software maintenance and evolution activities on an active software developer community, the Pharo Discord server.
BibTex:

@inproceedings{Raglianti2022a, author = {Raglianti, Marco and Nagy, Csaba and Minelli, Roberto and Lanza, Michele}, title = {DiscOrDance: Visualizing Software Developers Communities on Discord}, booktitle = {Proceedings of the 38th International Conference on Software Maintenance and Evolution (ICSME 2022)}, publisher = {IEEE}, year = {2022}, pages = {474-478}, doi = {10.1109/ICSME55016.2022.00062} }
Using Discord Conversations as Program Comprehension Aid
Marco Raglianti, Csaba Nagy, Roberto Minelli and Michele Lanza
In Proceedings of the 30th International Conference on Program Comprehension (ICPC 2022), ERA Track. Virtual Event, pp. 597-601, ACM, may 16-17, 2022.
DOI PDF

Abstract:

Modern communication platforms used in software development host daily conversations among developers and users about a wide range of topics pertaining to software systems, such as language features, APIs, code artifacts like classes and methods, design patterns, usage examples, code reviews, bug reporting and fixing. Discord servers are one of these virtual community hubs that have seen a steep rise in popularity, as coordination and aggregation means for communities of developers. Although Discord supports filter-based search functionalities, the sheer volume, velocity, and small granularity of single messages makes it very hard to find useful results, let alone complete discussions revolving around particular themes. One reason is that the concept of a discussion, which we call a conversation, does not exist as an explicit concept. We argue that extracting and analyzing such conversations can be used fruitfully to aid program comprehension. We present an approach that reconstructs the conversations that take place on a software community Discord server, focusing on software-related conversations: Our approach binds the conversations to the discussed artifacts. Leveraging our approach, we built a tool that enables the interactive exploration of the conversations's contents. We illustrate its usefulness through a number of examples that highlight how the insights obtained serve as an additional form of software documentation and program comprehension aid.
BibTex:

@inproceedings{Raglianti2022, author = {Raglianti, Marco and Nagy, Csaba and Minelli, Roberto and Lanza, Michele}, title = {Using Discord Conversations as Program Comprehension Aid}, booktitle = {Proceedings of the 30th International Conference on Program Comprehension (ICPC 2022)}, publisher = {ACM}, year = {2022}, pages = {597-601}, doi = {10.1145/3524610.3528388} }
Static Analysis of Database Accesses in MongoDB Applications
Boris Cherry, Pol Benats, Maxime Gobert, Loup Meurice, Csaba Nagy and Anthony Cleve
In Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022), ERA Track. Honolulu, HI, USA, pp. 930-934, IEEE Computer Society, mar 15-18, 2022.
DOI PDF

Abstract:

The increasing data volume and the variety of data formats of modern data-intensive systems unveiled the boundaries of traditional relational database management systems. NoSQL technologies aim to fulfill shortcomings through numerous features such as allowing unstructured, schema-less data storage. However, new features also pose challenges to software engineering techniques that used to work well for relational databases. In this paper, we present an approach to retrieve database accesses in JavaScript applications that use MongoDB. The approach handles JavaScript's highly dynamic and typeless nature through heuristics to avoid collision with third-party libraries. The aim is to identify the part of the source code involved in the database communication as the first step towards additional static analysis approaches. We evaluated the approach on an oracle of 307 open-source projects and reached a precision of 78%. We demonstrate potential use cases of the approach through case studies on the evolution of open-source systems.
BibTex:

@inproceedings{Cherry2022, author = {Cherry, Boris and Benats, Pol and Gobert, Maxime and Meurice, Loup and Nagy, Csaba and Cleve, Anthony}, title = {Static Analysis of Database Accesses in MongoDB Applications}, booktitle = {Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022)}, publisher = {IEEE Computer Society}, year = {2022}, pages = {930-934}, doi = {10.1109/SANER53432.2022.00111} }
M3triCity: Visualizing Evolving Software & Data Cities
Susanna Ardigò, Csaba Nagy, Roberto Minelli and Michele Lanza
In Proceedings of the 44th International Conference on Software Engineering (ICSE 2022), Demo Track. Virtual, pp. 130-133, ACM, may 8-20, 2022.
DOI PDF

Abstract:

The city metaphor for visualizing software systems in 3D has been widely explored and has led to many diverse implementations and approaches. Common among all approaches is a focus on the software artifacts, while the aspects pertaining to the data and information (stored both in databases and files) used by a system are seldom taken into account. We present M3triCity, an interactive web application whose goal is to visualize object-oriented software systems, their evolution, and the way they access data and information. We illustrate how it can be used for program comprehension and evolution analysis of data-intensive software systems. Demo video URL: https://youtu.be/uBMvZFIlWtk
BibTex:

@inproceedings{Ardigo2022, author = {Ardigò, Susanna and Nagy, Csaba and Minelli, Roberto and Lanza, Michele}, title = {M3triCity: Visualizing Evolving Software & Data Cities}, booktitle = {Proceedings of the 44th International Conference on Software Engineering (ICSE 2022)}, publisher = {ACM}, year = {2022}, pages = {130-133}, doi = {10.1145/3510454.3516831} }
FeaRS: Recommending Complete Android Method Implementations
Fengcai Wen, Valentina Ferrari, Emad Aghajani, Csaba Nagy, Michele Lanza and Gabriele Bavota
In Proceedings of the 37th International Conference on Software Maintenance and Evolution (ICSME 2021), Tool Demo Track. Luxembourg, pp. 589-593, IEEE, sep, 2021.
DOI PDF

Abstract:

Several techniques have been proposed in the literature to support code completion, showing excellent results in predicting the next few tokens a developer is likely to type given the current context. Only recently, approaches pushing the boundaries of code completion (e.g., by presenting entire code statements) have been proposed. In this line of research, we present FeaRS, a recommender system that, given the current code a developer is writing in the IDE, recommends the next complete method to be implemented. FeaRS has been deployed to learn ``implementation patterns'' (i.e., groups of methods usually implemented within the same task) by continuously mining open-source Android projects. Such knowledge is leveraged to provide method recommendations when the code written by the developer in the IDE matches an ``implementation pattern''. Preliminary results of FeaRS' accuracy show its potential as well as some open challenges to overcome.
BibTex:

@inproceedings{Wen2021a, author = {Wen, Fengcai and Ferrari, Valentina and Aghajani, Emad and Nagy, Csaba and Lanza, Michele and Bavota, Gabriele}, title = {FeaRS: Recommending Complete Android Method Implementations}, booktitle = {Proceedings of the 37th International Conference on Software Maintenance and Evolution (ICSME 2021)}, publisher = {IEEE}, year = {2021}, pages = {589-593}, doi = {10.1109/ICSME52107.2021.00062} }
Siri, Write the Next Method
Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza and Gabriele Bavota
In Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE 2021). Madrid, ES, pp. 138-149, IEEE, may 23-29, 2021.
DOI PDF

Abstract:

Code completion is one of the killer features of Integrated Development Environments (IDEs), and researchers have proposed different methods to improve its accuracy. While these techniques are valuable to speed up code writing, they are limited to recommendations related to the next few tokens a developer is likely to type given the current context. In the best case, they can recommend a few APIs that a developer is likely to use next. We present FeaRS, a novel retrieval-based approach that, given the current code a developer is writing in the IDE, can recommend the next complete method (i.e., signature and method body) that the developer is likely to implement. To do this, FeaRS exploits “implementation patterns” (i.e., groups of methods usually implemented within the same task) learned by mining thousands of open source projects. We instantiated our approach to the specific context of Android apps. A large-scale empirical evaluation we performed across more than 20k apps shows encouraging preliminary results, but also highlights future challenges to overcome.
BibTex:

@inproceedings{Wen2021, author = {Wen, Fengcai and Aghajani, Emad and Nagy, Csaba and Lanza, Michele and Bavota, Gabriele}, title = {Siri, Write the Next Method}, booktitle = {Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE 2021)}, publisher = {IEEE}, year = {2021}, pages = {138-149}, doi = {10.1109/ICSE43902.2021.00025} }
Visualizing Discord Servers
Marco Raglianti, Roberto Minelli, Csaba Nagy and Michele Lanza
In Proceedings of the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021), New Ideas and Emerging Results (NIER) and Tool Demonstrations (TD) Track. Luxembourg, pp. 150-154, IEEE, sep 27-28, 2021.
DOI PDF

Abstract:

The last decade has seen the rise of global software community platforms, such as Slack, Gitter, and Discord. They allow developers to discuss implementation issues, report bugs, and, in general, interact with one another. Such real-time communication platforms are thus slowly complementing, if not replacing, more traditional communication channels, such as development mailing lists. Apart from simple text messaging and conference calls, they allow the sharing of any type of content, such as videos, images, and source code. This is turning such platforms into precious information sources when it comes to searching for documentation and understanding design and implementation choices. However, the velocity and volatility of the contents shared and discussed on such platforms, combined with their often informal structure, makes it difficult to grasp and differentiate the relevant pieces of information. We present a visual analytics approach, supported by a tool named DiscOrDance, which provides numerous custom views to support the understanding of Discord servers in terms of their structure, contents, and community. We illustrate DiscOrDance, using as running example the public Pharo development community Discord Server, which counts to date ~180k messages shared among ~2,900 developers, spanning 5 years of history. Based on our analyses, we distill and discuss interesting insights and lessons learned.
BibTex:

@inproceedings{Raglianti2021, author = {Raglianti, Marco and Minelli, Roberto and Nagy, Csaba and Lanza, Michele}, title = {Visualizing Discord Servers}, booktitle = {Proceedings of the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021)}, publisher = {IEEE}, year = {2021}, pages = {150-154}, doi = {10.1109/VISSOFT52517.2021.00029} }
Challenges and Perils of Testing Database Manipulation Code
Maxime Gobert, Csaba Nagy, Henrique Rocha, Serge Demeyer and Anthony Cleve
In Proceedings of the 33rd International Conference on Advanced Information Systems Engineering (CAiSE 2021). Melbourne, Victoria, Australia, pp. 229-245, Springer International Publishing, jun, 2021.
DOI PDF

Abstract:

Software testing enable development teams to maintain the quality of a software system while it evolves. The database manipulation code requires special attention in this context. However, it is often neglected and suffers from software maintenance problems. In this paper, we investigate the current state-of-the-practice in testing database manipulation code. We first analyse the code of 72 projects mined from Libraries.io to get an impression of the test coverage for database code. We confirm that the database is poorly tested: 46% of the projects did not cover with tests half of their database access methods, and 33% of the projects did not cover the database code at all. To understand the difficulties in testing database code, we analysed 532 questions on StackExchange sites and deduced a taxonomy. We found that developers mostly look for insights on general best practices to test database access code. They also have more technical questions related to DB handling, mocking, parallelisation or framework/tool usage. This investigation lays the basis for future research on improving database code testing.
BibTex:

@inproceedings{Gobert2021, author = {Gobert, Maxime and Nagy, Csaba and Rocha, Henrique and Demeyer, Serge and Cleve, Anthony}, title = {Challenges and Perils of Testing Database Manipulation Code}, booktitle = {Proceedings of the 33rd International Conference on Advanced Information Systems Engineering (CAiSE 2021)}, publisher = {Springer International Publishing}, year = {2021}, pages = {229-245}, doi = {10.1007/978-3-030-79382-1_14} }
Visualizing GitHub Issues
Aron Fiechter, Roberto Minelli, Csaba Nagy and Michele Lanza
In Proceedings of the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021), New Ideas and Emerging Results (NIER) and Tool Demonstrations (TD) Track. Luxembourg, pp. 155-159, IEEE, sep 27-28, 2021.
DOI PDF

Abstract:

The rise of distributed version control systems, such as git, and platforms built on top of it, such as GitHub, has triggered a change in how software is developed. Most notably, state-of-the-art practice foresees the use of pull requests and issues, enriched by means to enable discussions among the involved people. Platforms like GitHub and GitLab have thus turned into comprehensive and cohesive modern software development environments, also offering additional mechanisms, such as code review tools and a transversal support for continuous integration and deployment. However, the plethora of concepts, mechanisms, and their interconnections are stored and presented in textual form, which makes the understanding of the underlying evolutionary processes difficult. We introduce the notion of an issue tale, a visual narrative of the events and actors revolving around any GitHub issue, and present an approach, implemented as an interactive visual analytics tool, to depict and analyze the relevant information pertaining to issue tales. We illustrate our approach and its implementation on several open-source software systems.
BibTex:

@inproceedings{Fiechter2021, author = {Fiechter, Aron and Minelli, Roberto and Nagy, Csaba and Lanza, Michele}, title = {Visualizing GitHub Issues}, booktitle = {Proceedings of the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021)}, publisher = {IEEE}, year = {2021}, pages = {155-159}, doi = {10.1109/VISSOFT52517.2021.00030} }
An Empirical Study of (Multi-) Database Models in Open-Source Projects
Pol Benats, Maxime Gobert, Loup Meurice, Csaba Nagy and Anthony Cleve
In Proceedings of the 40th International Conference on Conceptual Modeling (ER 2021). St. John's, Canada, pp. 87-101, Springer, oct, 2021.
DOI PDF

Abstract:

Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well-known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non-relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became ``mono-database'' after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.
BibTex:

@inproceedings{Benats2021, author = {Benats, Pol and Gobert, Maxime and Meurice, Loup and Nagy, Csaba and Cleve, Anthony}, title = {An Empirical Study of (Multi-) Database Models in Open-Source Projects}, booktitle = {Proceedings of the 40th International Conference on Conceptual Modeling (ER 2021)}, publisher = {Springer}, year = {2021}, pages = {87-101}, doi = {10.1007/978-3-030-89022-3_8} }
Visualizing Data in Software Cities
Susanna Ardigò, Csaba Nagy, Roberto Minelli and Michele Lanza
In Proceedings of the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021), New Ideas and Emerging Results (NIER) and Tool Demonstrations (TD) Track. Luxembourg, pp. 145-149, IEEE, sep 27-28, 2021.
DOI PDF

Abstract:

The city metaphor for visualizing software systems in 3D has been widely explored and it has led to many diverse implementations and approaches. However, when looking at software systems in general, and when using specifically a city approach, it is evident that something is missing: The data. Indeed, software systems are intrinsically driven by data, which is usually managed using databases or often also simply stored in files coming in a variety of formats, such as CSV, XML, and JSON. While such data files are part of a project's file system and can thus be easily retrieved, the situation is different for databases: A database is usually not contained in the file system, and its presence can only be inferred from the source code which contains the database accesses. We present an extension of the CodeCity implementation, M3tricity2, with two new contributions: First, we consider data files and use simple metrics to integrate them in the visualization seamlessly. Second, we present a novel way to add a database to the visualization by making use of the one remaining space left unused: the sky and the underground. We present our contributions and illustrate them on various software systems.
BibTex:

@inproceedings{Ardigo2021, author = {Ardigò, Susanna and Nagy, Csaba and Minelli, Roberto and Lanza, Michele}, title = {Visualizing Data in Software Cities}, booktitle = {Proceedings of the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021)}, publisher = {IEEE}, year = {2021}, pages = {145-149}, doi = {10.1109/VISSOFT52517.2021.00028} }
An Empirical Study of Quick Remedy Commits
Fengcai Wen, Csaba Nagy, Michele Lanza and Gabriele Bavota
In Proceedings of the 28th IEEE/ACM International Conference on Program Comprehension (ICPC 2020). Seoul, Republic of Korea, pp. 60-71, ACM, jul 13-15, 2020.
DOI PDF

Abstract:

Software systems are continuously modified to implement new features, to fix bugs, and to improve quality attributes. Most of these activities are not atomic changes, but rather the result of several related changes affecting different parts of the code. For this reason, it may happen that developers omit some of the needed changes and, as a consequence, leave a task partially unfinished, introduce technical debt or, in the worst case scenario, inject bugs. Knowing the changes that are mistakenly omitted by developers can help in designing recommender systems able to automatically identify risky situations in which, for example, the developer is likely to be pushing an incomplete change to the software repository. We present a qualitative study investigating "quick remedy commits" performed by developers with the goal of implementing changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer in the same repository, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The defined taxonomy can guide the development of tools aimed at detecting omitted changes, and possibly autocomplete them.
BibTex:

@inproceedings{Wen2020, author = {Wen, Fengcai and Nagy, Csaba and Lanza, Michele and Bavota, Gabriele}, title = {An Empirical Study of Quick Remedy Commits}, booktitle = {Proceedings of the 28th IEEE/ACM International Conference on Program Comprehension (ICPC 2020)}, publisher = {ACM}, year = {2020}, pages = {60-71}, doi = {10.1145/3387904.3389266} }
Visualizing Evolving Software Cities
Federico Pfahler, Roberto Minelli, Csaba Nagy and Michele Lanza
In Proceedings of the 8th IEEE Working Conference on Software Visualization (VISSOFT 2020), NIER and Tool Demo Track. Adelaide, SA, Australia, pp. 22-26, IEEE, sep 28-29, 2020.
DOI PDF

Abstract:

Visualization approaches that leverage a 3D city metaphor have become popular. Numerous variations, including virtual and augmented reality have emerged. Despite its popularity, the city metaphor falls short when depicting the evolution of a system, which results in buildings and districts moving around in unpredictable ways. We present a novel approach to visualize software systems as evolving cities that treats evolution as a first-class concept. It renders with fidelity not only changes but also refactorings in a comprehensive way. To do so, we developed custom ways to traverse time. We implemented our approach in a publicly accessible web-based platform named m3triCity.
BibTex:

@inproceedings{Pfahler2020, author = {Pfahler, Federico and Minelli, Roberto and Nagy, Csaba and Lanza, Michele}, title = {Visualizing Evolving Software Cities}, booktitle = {Proceedings of the 8th IEEE Working Conference on Software Visualization (VISSOFT 2020)}, publisher = {IEEE}, year = {2020}, pages = {22-26}, doi = {10.1109/VISSOFT51673.2020.00007} }
On the Prevalence, Impact, and Evolution of SQL code smells in Data-Intensive Systems
Biruk Asmare Muse, Masud Rahman, Csaba Nagy, Anthony Cleve, Foutse Khomh and Giuliano Antoniol
In Proceedings of the 17th International Conference on Mining Software Repositories (MSR 2020). Seoul, Republic of Korea, pp. 327-338, ACM, jun 29-30, 2020.
DOI PDF

Abstract:

Code smells indicate software design problems that harm software quality. Data-intensive systems that frequently access databases often suffer from SQL code smells besides the traditional smells. While there have been extensive studies on traditional code smells, recently, there has been a growing interest in SQL code smells. In this paper, we conduct an empirical study to investigate the prevalence and evolution of SQL code smells in open source, data-intensive systems. We collected 150 projects and examined both traditional and SQL code smells in these projects. Our investigation delivers several important findings. First, SQL code smells are indeed prevalent in data-intensive software systems. Second, SQL code smells have a weak co-occurrence with traditional code smells. Third, SQL code smells have a weaker association with bugs than that of traditional code smells. Fourth, SQL code smells are more likely to be introduced at the beginning of the project lifetime and likely to be left in the code without a fix, compared to traditional code smells. Overall, our results show that SQL code smells are indeed prevalent and persistent in the studied data-intensive software systems. Developers should be aware of these smells and consider detecting and refactoring SQL code smells and traditional code smells separately, using dedicated tools.
BibTex:

@inproceedings{Muse2020, author = {Muse, Biruk Asmare and Rahman, Masud and Nagy, Csaba and Cleve, Anthony and Khomh, Foutse and Antoniol, Giuliano}, title = {On the Prevalence, Impact, and Evolution of SQL code smells in Data-Intensive Systems}, booktitle = {Proceedings of the 17th International Conference on Mining Software Repositories (MSR 2020)}, publisher = {ACM}, year = {2020}, pages = {327-338}, doi = {10.1145/3379597.3387467} }
Automated Identification of On-hold Self-admitted Technical Debt
Rungroj Maipradit, Bin Lin, Csaba Nagy, Gabriele Bavota, Michele Lanza, Hideaki Hata and Kenichi Matsumoto
In Proceedings of the 20th International Working Conference on Source Code Analysis and Manipulation (SCAM 2020). Adelaide, Australia, pp. 54-64, IEEE, sep 27-28, 2020.
DOI PDF

Abstract:

Modern software is developed under considerable time pressure, which implies that developers more often than not have to resort to compromises when it comes to code that is well written and code that just does the job. This has led over the past decades to the concept of “technical debt”, a short-term hack that potentially generates long-term maintenance problems. Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt). We focus on a specific type of SATD, namely “On-hold” SATD, in which developers document in their comments the need to halt an implementation task due to conditions outside of their scope of work (e.g., an open issue must be closed before a function can be implemented).We present an approach, based on regular expressions and machine learning, which is able to detect issues referenced in code comments, and to automatically classify the detected instances as either “On-hold” (the issue is referenced to indicate the need to wait for its resolution before completing a task), or as “cross-reference”, (the issue is referenced to document the code, for example to explain the rationale behind an implementation choice). Our approach also mines the issue tracker of the projects to check if the On-hold SATD instances are “superfluous” and can be removed (i.e., the referenced issue has been closed, but the SATD is still in the code). Our evaluation confirms that our approach can indeed identify relevant instances of On-hold SATD. We illustrate its usefulness by identifying superfluous On-hold SATD instances in open source projects as confirmed by the original developers.
BibTex:

@inproceedings{Maipradit2020, author = {Maipradit, Rungroj and Lin, Bin and Nagy, Csaba and Bavota, Gabriele and Lanza, Michele and Hata, Hideaki and Matsumoto, Kenichi}, title = {Automated Identification of On-hold Self-admitted Technical Debt}, booktitle = {Proceedings of the 20th International Working Conference on Source Code Analysis and Manipulation (SCAM 2020)}, publisher = {IEEE}, year = {2020}, pages = {54-64}, doi = {10.1109/SCAM51674.2020.00011} }
Software Documentation: The Practitioners' Perspective
Emad Aghajani, Csaba Nagy, Mario Vega-Márquez Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza and David C. Shepherd
In Proceedings of the 42nd International Conference on Software Engineering (ICSE 2020). Seoul, South Korea, pp. 590-601, ACM, jun, 2020.
DOI PDF

Abstract:

In theory, (good) documentation is an invaluable asset to any software project, as it helps stakeholders to use, understand, maintain, and evolve a system. In practice, however, documentation is generally affected by numerous shortcomings and issues, such as insufficient and inadequate content and obsolete, ambiguous information. To counter this, researchers are investigating the development of advanced recommender systems that automatically suggest high-quality documentation, useful for a given task. A crucial first step is to understand what quality means for practitioners and what information is actually needed for specific tasks. We present two surveys performed with 146 practitioners to investigate (i) the documentation issues they perceive as more relevant together with solutions they apply when these issues arise; and (ii) the types of documentation considered as important given specific tasks. Our findings can help researchers in designing the next generation of documentation recommender systems.
BibTex:

@inproceedings{Aghajani2020, author = {Aghajani, Emad and Nagy, Csaba and Vega-Márquez, Linares-Vásquez, Mario and Moreno, Laura and Bavota, Gabriele and Lanza, Michele and Shepherd, David C.}, title = {Software Documentation: The Practitioners' Perspective}, booktitle = {Proceedings of the 42nd International Conference on Software Engineering (ICSE 2020)}, publisher = {ACM}, year = {2020}, pages = {590-601}, doi = {10.1145/3377811.3380405} }
A Large-scale Empirical Study on Code-Comment Inconsistencies
Fengcai Wen, Csaba Nagy, Gabriele Bavota and Michele Lanza
In Proceedings of the 27th International Conference on Program Comprehension (ICPC 2019). Montreal, Quebec, Canada, pp. 53-64, IEEE, may 25-26, 2019.
DOI PDF

Abstract:

Code comments are a primary means to document source code. Keeping comments up-to-date during code change activities requires substantial time and attention. For this reason, researchers have proposed methods to detect code-comment inconsistencies (i.e., comments that are not kept in sync with the code they document) and studies have been conducted to investigate this phenomenon. However, these studies were performed at a small scale, relying on quantitative analysis, thus limiting the empirical knowledge about code-comment inconsistencies. We present the largest study at date investigating how code and comments co-evolve. The study has been performed by mining 1.3 Billion AST-level changes from the complete history of 1,500 systems. Moreover, we manually analyzed 500 commits to define a taxonomy of code-comment inconsistencies fixed by developers. Our analysis discloses the extent to which different types of code changes (e.g., change of selection statements) trigger updates to the related comments, identifying cases in which code-comment inconsistencies are more likely to be introduced. The defined taxonomy categorizes the types of inconsistencies fixed by developers. Our results can guide the development of tools aimed at detecting and fixing code-comment inconsistencies.
BibTex:

@inproceedings{Wen2019, author = {Wen, Fengcai and Nagy, Csaba and Bavota, Gabriele and Lanza, Michele}, title = {A Large-scale Empirical Study on Code-Comment Inconsistencies}, booktitle = {Proceedings of the 27th International Conference on Program Comprehension (ICPC 2019)}, publisher = {IEEE}, year = {2019}, pages = {53-64}, doi = {10.1109/ICPC.2019.00019} }
On the Quality of Identifiers in Test Code
Bin Lin, Csaba Nagy, Gabriele Bavota, Andrian Marcus and Michele Lanza
In Proceedings of the 19th International Working Conference on Source Code Analysis and Manipulation (SCAM 2019). Cleveland, OH, USA, pp. 204-215, IEEE, sep, 2019.
DOI PDF

Abstract:

Meaningful, expressive identifiers in source code can enhance the readability and reduce comprehension efforts. Over the past years, researchers have devoted considerable effort to understanding and improving the naming quality of identifiers in source code. However, little attention has been given to test code, an important resource during program comprehension activities. To better grasp identifier quality in test code, we conducted a survey involving manually written and automatically generated test cases from ten open source software projects. The survey results indicate that test cases contain low quality identifiers, including the manually written ones, and that the quality of identifiers is lower in test code than in production code. We also investigated the use of three state-of-the-art rename refactoring recommenders for improving test code identifiers. The analysis highlights their limitations when applied to test code and supports mapping out a research agenda for future work in the area.
BibTex:

@inproceedings{Lin2019a, author = {Lin, Bin and Nagy, Csaba and Bavota, Gabriele and Marcus, Andrian and Lanza, Michele}, title = {On the Quality of Identifiers in Test Code}, booktitle = {Proceedings of the 19th International Working Conference on Source Code Analysis and Manipulation (SCAM 2019)}, publisher = {IEEE}, year = {2019}, pages = {204-215}, doi = {10.1109/SCAM.2019.00031} }
On the Impact of Refactoring Operations on Code Naturalness
Bin Lin, Csaba Nagy, Gabriele Bavota and Michele Lanza
In Proceedings of the 26th International Conference on Software Analysis, Evolution and Reengineering (SANER 2019), ERA Track. Hangzhou, China, pp. 594-598, IEEE, feb 24-27, 2019.
DOI PDF

Abstract:

Recent studies have demonstrated that software is natural, that is, its source code is highly repetitive and predictable like human languages. Also, previous studies suggested the existence of a relationship between code quality and its naturalness, presenting empirical evidence showing that buggy code is “less natural” than non-buggy code. We conjecture that this qualitynaturalness relationship could be exploited to support refactoring activities (e.g., to locate source code areas in need of refactoring). We perform a first step in this direction by analyzing whether refactoring can improve the naturalness of code. We use state-of-the-art tools to mine a large dataset of refactoring operations performed in open source systems. Then, we investigate the impact of different types of refactoring operations on the naturalness of the impacted code. We found that (i) code refactoring does not necessarily increase the naturalness of the refactored code; and (ii) the impact on the code naturalness strongly depends on the type of refactoring operations.
BibTex:

@inproceedings{Lin2019, author = {{Lin}, Bin and {Nagy}, Csaba and {Bavota}, Gabriele and {Lanza}, Michele}, title = {On the Impact of Refactoring Operations on Code Naturalness}, booktitle = {Proceedings of the 26th International Conference on Software Analysis, Evolution and Reengineering (SANER 2019)}, publisher = {IEEE}, year = {2019}, pages = {594-598}, doi = {10.1109/SANER.2019.8667992} }
Software Documentation Issues Unveiled
Emad Aghajani, Csaba Nagy, Olga Lucero Vega-Márquez, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota and Michele Lanza
In Proceedings of the 41st International Conference on Software Engineering (ICSE 2019). Montréal, QC, Canada, pp. 1199-1210, IEEE, may 25-31, 2019.
DOI PDF

Abstract:

(Good) Software documentation provides developers and users with a description of what a software system does, how it operates, and how it should be used. For example, technical documentation (e.g., an API reference guide) aids developers during evolution/maintenance activities, while a user manual explains how users are to interact with a system. Despite its intrinsic value, the creation and the maintenance of documentation is often neglected, negatively impacting its quality and usefulness, ultimately leading to a generally unfavorable take on documentation. Previous studies investigating documentation issues have been based on surveying developers, which naturally leads to a somewhat biased view of problems affecting documentation. We present a large scale empirical study, where we mined, analyzed, and categorized 878 documentation-related artifacts stemming from four different sources, namely mailing lists, Stack Overflow discussions, issue repositories, and pull requests. The result is a detailed taxonomy of documentation issues from which we infer a series of actionable proposals both for researchers and practitioners.
BibTex:

@inproceedings{Aghajani2019, author = {Aghajani, Emad and Nagy, Csaba and Vega-Márquez, Olga Lucero and Linares-Vásquez, Mario and Moreno, Laura and Bavota, Gabriele and Lanza, Michele}, title = {Software Documentation Issues Unveiled}, booktitle = {Proceedings of the 41st International Conference on Software Engineering (ICSE 2019)}, publisher = {IEEE}, year = {2019}, pages = {1199-1210}, doi = {10.1109/ICSE.2019.00122} }
SQLInspect: A Static Analyzer to Inspect Database Usage in Java Applications
Csaba Nagy and Anthony Cleve
In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering: Companion (ICSE 2018), Demonstrations Track. Gothenburg, Sweden, pp. 93-96, ACM, may, 2018.
DOI PDF

Abstract:

We present SQLInspect, a tool intended to assist developers who deal with SQL code embedded in Java applications. It is integrated into Eclipse as a plug-in that is able to extract SQL queries from Java code through static string analysis. It parses the extracted queries and performs various analyses on them. As a result, one can readily explore the source code which accesses a given part of the database, or which is responsible for the construction of a given SQL query. SQL-related metrics and common coding mistakes are also used to spot inefficiently or defectively performing SQL statements and to identify poorly designed classes, like those that construct many queries via complex control-flow paths. SQLInspect is a novel tool that relies on recent query extraction approaches. It currently supports Java applications working with JDBC and SQL code written for MySQL or Apache Impala. Check out the live demo of SQLInspect at http://perso.unamur.be/~cnagy/sqlinspect.
BibTex:

@inproceedings{Nagy2018, author = {Nagy, Csaba and Cleve, Anthony}, title = {SQLInspect: A Static Analyzer to Inspect Database Usage in Java Applications}, booktitle = {Proceedings of the 40th IEEE/ACM International Conference on Software Engineering: Companion (ICSE 2018)}, publisher = {ACM}, year = {2018}, pages = {93-96}, doi = {10.1145/3183440.3183496} }
A Large-Scale Empirical Study on Linguistic Antipatterns Affecting APIs
Emad Aghajani, Csaba Nagy, Gabriele Bavota and Michele Lanza
In Proceedings of the 34th International Conference on Software Maintenance and Evolution (ICSME 2018). Madrid, Spain, pp. 25-35, IEEE, sep 23-29, 2018.
DOI PDF

Abstract:

The concept of monolithic stand-alone software systems developed completely from scratch has become obsolete, as modern systems nowadays leverage the abundant presence of Application Programming Interfaces (APIs) developed by third parties, which leads on the one hand to accelerated development, but on the other hand introduces potentially fragile dependencies on external resources. In this context, the design of any API strongly influences how developers write code utilizing it. A wrong design decision like a poorly chosen method name can lead to a steeper learning curve, due to misunderstandings, misuse and eventually bug-prone code in the client projects using the API. It is not unfrequent to find APIs with poorly expressive or misleading names, possibly lacking appropriate documentation. Such issues can manifest in what have been defined in the literature as Linguistic Antipatterns (LAs), i.e., inconsistencies among the naming, documentation, and implementation of a code entity. While previous studies showed the relevance of LAs for software developers, their impact on (developers of) client projects using APIs affected by LAs has not been investigated. This paper fills this gap by presenting a large-scale study conducted on 1.6k releases of popular Maven libraries, 14k open- source Java projects using these libraries, and 4.4k questions related to the investigated APIs asked on Stack Overflow. In particular, we investigate whether developers of client projects have higher chances of introducing bugs when using APIs affected by LAs and if these trigger more questions on Stack Overflow as compared to non-affected APIs.
BibTex:

@inproceedings{Aghajani2018, author = {Aghajani, Emad and Nagy, Csaba and Bavota, Gabriele and Lanza, Michele}, title = {A Large-Scale Empirical Study on Linguistic Antipatterns Affecting APIs}, booktitle = {Proceedings of the 34th International Conference on Software Maintenance and Evolution (ICSME 2018)}, publisher = {IEEE}, year = {2018}, pages = {25-35}, doi = {10.1109/ICSME.2018.00012} }
A Static Code Smell Detector for SQL QueriesEmbedded in Java Code
Csaba Nagy and Anthony Cleve
In Proceedings of the 17th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2017), Engineering Track. Shanghai, China, pp. 147-152, IEEE, sep 17-18, 2017.
DOI PDF

Abstract:

A database plays a central role in the architecture of an information system, and the way it stores the data delimits its main features. However, it is not just the data that matters. The way it is handled, i.e., how the application communicates with the database is of critical importance too. Therefore the implementation of such a communication layer has to be reliable and efficient. SQL is a popular language to query a database, and modern technologies rely on it (or its dialects) as query strings embedded in the application code. In many languages (e.g. in Java), an embedded query is typically constructed through several string operations that obstruct developers in understanding the statement finally sent to the database. It is a potential source of fault-prone and inefficient database usage, i.e., code smells. In our paper, we present a tool for the identification of code smells in SQL queries embedded in Java code. Our tool implements a combined static analysis of the SQL statements embedded in the source code, the database schema, and the data in the database. We use a lightweight query extraction algorithm to extract SQL code from the Java code and implement smell detectors on the ASG of our fault-tolerant SQL parser. Depending on the context of the smell, its severity is also determined. Developers can examine the identified issues with the help of an Eclipse plug-in or through command line interfaces.
BibTex:

@inproceedings{Nagy2017, author = {Nagy, Csaba and Cleve, Anthony}, title = {A Static Code Smell Detector for SQL QueriesEmbedded in Java Code}, booktitle = {Proceedings of the 17th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2017)}, publisher = {IEEE}, year = {2017}, pages = {147-152}, doi = {10.1109/SCAM.2017.19} }
Designing and Developing Automated Refactoring Transformations: An Experience Report
Gábor Szőke, Csaba Nagy, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2016), Industry Track. Osaka, Japan, pp. 693-697, IEEE, mar 14-18, 2016.
DOI PDF

Abstract:

There are several challenges which should be kept in mind during the design and development phases of a refactoring tool, and one is that developers have several expectations that are quite hard to satisfy. In this report, we present our experiences of a two-year project where we attempted to create an automatic refactoring tool. In this project, we worked with five software development companies that wanted to improve the maintainability of their products. The project was designed to take into account the expectations of the developers of these companies and consisted of three main stages: a manual refactoring phase, a tool building phase, and an automatic refactoring phase. Throughout these stages we collected the opinions of the developers and faced several challenges on how to automate refactoring transformations, which we present and summarize.
BibTex:

@inproceedings{Szoeke2016, author = {Szőke, Gábor and Nagy, Csaba and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {Designing and Developing Automated Refactoring Transformations: An Experience Report}, booktitle = {Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2016)}, publisher = {IEEE}, year = {2016}, pages = {693-697}, doi = {10.1109/SANER.2016.17} }
Detecting and Preventing Program Inconsistencies Under Database Schema Evolution
Loup Meurice, Csaba Nagy and Anthony Cleve
In Proceedings of the 2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016). Vienna, Austria, pp. 262-273, IEEE Computer Society, aug 1-3, 2016.
DOI PDF

Abstract:

Nowadays, data-intensive applications tend to access their underlying database in an increasingly dynamic way. The queries that they send to the database server are usually built at runtime, through String concatenation, or Object-Relational-Mapping (ORM) frameworks. This level of dynamicity significantly complicates the task of adapting application programs to database schema changes. Failing to correctly adapt programs to an evolving database schema results in program inconsistencies, which in turn may cause program failures. In this paper, we present a tool-supported approach, that allows developers to (1) analyze how the source code and database schema co-evolved in the past and (2) simulate a database schema change and automatically determine the set of source code locations that would be impacted by this change. The developers are then provided with recommendations about what they should modify at those source code locations in order to avoid inconsistencies. The approach has been designed to deal with Java systems that use dynamic data access frameworks such as JDBC, Hibernate and JPA. We motivate and evaluate the proposed approach, based on three real-life systems of different size and nature.
BibTex:

@inproceedings{Meurice2016a, author = {Meurice, Loup and Nagy, Csaba and Cleve, Anthony}, title = {Detecting and Preventing Program Inconsistencies Under Database Schema Evolution}, booktitle = {Proceedings of the 2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016)}, publisher = {IEEE Computer Society}, year = {2016}, pages = {262-273}, doi = {10.1109/QRS.2016.38} }
Static Analysis of Dynamic Database Usage in Java Systems
Loup Meurice, Csaba Nagy and Anthony Cleve
In Proceedings of the 28th International Conference on Advanced Information Systems Engineering (CAiSE 2016). Ljubljana, Slovenia, pp. 491-506, Springer LNCS, jun 13-17, 2016.
DOI PDF

Abstract:

Understanding the links between application programs and their database is useful in various contexts such as migrating information systems towards a new database platform, evolving the database schema, or assessing the overall system quality. In the case of Java systems, identifying which portion of the source code accesses which portion of the database may prove challenging. Indeed, Java programs typically access their database in a dynamic way. The queries they send to the database server are built at runtime, through String concatenations, or Object-Relational Mapping frameworks like Hibernate and JPA. This paper presents a static analysis approach to program-database links recovery, speciﬁcally designed for Java systems. The approach allows developers to automatically identify the source code locations accessing given database tables and columns. It focuses on the combined analysis of JDBC, Hibernate and JPA invocations. We report on the use of our approach to analyse three real-life Java systems.
BibTex:

@inproceedings{Meurice2016, author = {Meurice, Loup and Nagy, Csaba and Cleve, Anthony}, title = {Static Analysis of Dynamic Database Usage in Java Systems}, booktitle = {Proceedings of the 28th International Conference on Advanced Information Systems Engineering (CAiSE 2016)}, publisher = {Springer LNCS}, year = {2016}, pages = {491-506}, doi = {10.1007/978-3-319-39696-5_30} }
Do Automatic Refactorings Improve Maintainability? An Industrial Case Study
Gábor Szőke, Csaba Nagy, Péter Hegedűs, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME 2015), Industry Track. Bremen, Germany, pp. 429-438, IEEE, sep, 2015.
DOI PDF

Abstract:

Refactoring is often treated as the main remedy against the unavoidable code erosion happening during software evolution. Studies show that refactoring is indeed an elemental part of the developers' arsenal. However, empirical studies about the impact of refactorings on software maintainability still did not reach a consensus. Moreover, most of these empirical investigations are carried out on open-source projects where distinguishing refactoring operations from other development activities is a challenge in itself. We had a chance to work together with several software development companies in a project where they got extra budget to improve their source code by performing refactoring operations. Taking advantage of this controlled environment, we collected a large amount of data during a refactoring phase where the developers used a (semi)automatic refactoring tool. By measuring the maintainability of the involved subject systems before and after the refactorings, we got valuable insights into the effect of these refactorings on large-scale industrial projects. All but one company, who applied a special refactoring strategy, achieved a maintainability improvement at the end of the refactoring phase, but even that one company suffered from the negative impact of only one type of refactoring.
BibTex:

@inproceedings{Szoeke2015a, author = {Szőke, Gábor and Nagy, Csaba and Hegedűs, Péter and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {Do Automatic Refactorings Improve Maintainability? An Industrial Case Study}, booktitle = {Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME 2015)}, publisher = {IEEE}, year = {2015}, pages = {429-438}, doi = {10.1109/ICSM.2015.7332494} }
FaultBuster: An Automatic Code Smell Refactoring Toolset
Gábor Szőke, Csaba Nagy, Lajos Jenő Fülöp, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 15th International Working Conference on Source Code Analysis and Manipulation (SCAM 2015), Tool Track. Bremen, Germany, pp. 253-258, IEEE, sep 27-28, 2015.
DOI PDF

Abstract:

One solution to prevent the quality erosion of a software product is to maintain its quality by continuous refactoring. However, refactoring is not always easy. Developers need to identify the piece of code that should be improved and decide how to rewrite it. Furthermore, refactoring can also be risky; that is, the modified code needs to be re-tested, so developers can see if they broke something. Many IDEs offer a range of refactorings to support so-called automatic refactoring, but tools which are really able to automatically refactor code smells are still under research. In this paper we introduce FaultBuster, a refactoring toolset which is able to support automatic refactoring: identifying the problematic code parts via static code analysis, running automatic algorithms to fix selected code smells, and executing integrated testing tools. In the heart of the toolset lies a refactoring framework to control the analysis and the execution of automatic algorithms. FaultBuster provides IDE plugins to interact with developers via popular IDEs (Eclipse, Netbeans and IntelliJ IDEA). All the tools were developed and tested in a 2-year project with 6 software development companies where thousands of code smells were identified and fixed in 5 systems having altogether over 5 million lines of code.
BibTex:

@inproceedings{Szoeke2015, author = {Szőke, Gábor and Nagy, Csaba and Fülöp, Lajos Jenő and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {FaultBuster: An Automatic Code Smell Refactoring Toolset}, booktitle = {Proceedings of the 15th International Working Conference on Source Code Analysis and Manipulation (SCAM 2015)}, publisher = {IEEE}, year = {2015}, pages = {253-258}, doi = {10.1109/SCAM.2015.7335422} }
Mining Stack Overflow for Discovering Error Patterns in SQL Queries
Csaba Nagy and Anthony Cleve
In Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME 2015), Early Research Achievements (ERA) Track. Bremen, Germany, pp. 516-520, IEEE, sep, 2015.
DOI PDF

Abstract:

Constructing complex queries in SQL sometimes necessitates the use of language constructs and the invocation of internal functions which inexperienced developers find hard to comprehend or which are unknown to them. In the worst case, bad usage of these constructs might lead to errors, to ineffective queries, or hamper developers in their tasks. This paper presents a mining technique for Stack Overflow to identify error-prone patterns in SQL queries. Identifying such patterns can help developers to avoid the use of error-prone constructs, or if they have to use such constructs, the Stack Overflow posts can help them to properly utilize the language. Hence, our purpose is to provide the initial steps towards a recommendation system that supports developers in constructing SQL queries. Our current implementation supports the MySQL dialect, and Stack Overflow has over 300,000 questions tagged with the MySQL flag in its database. It provides a huge knowledge base where developers can ask questions about real problems. Our initial results indicate that our technique is indeed able to identify patterns among them.
BibTex:

@inproceedings{Nagy2015a, author = {Nagy, Csaba and Cleve, Anthony}, title = {Mining Stack Overflow for Discovering Error Patterns in SQL Queries}, booktitle = {Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME 2015)}, publisher = {IEEE}, year = {2015}, pages = {516-520}, doi = {10.1109/ICSM.2015.7332505} }
Where Was This SQL Query Executed? A Static Concept Location Approach
Csaba Nagy, Loup Meurice and Anthony Cleve
In Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2015), ERA Track. Montréal, Québec, Canada, pp. 580-584, IEEE Computer Society, mar 2-6, 2015.
DOI PDF

Abstract:

Concept location in software engineering is the process of identifying where a specific concept is implemented in the source code of a software system. It is a very common task performed by developers during development or maintenance, and many techniques have been studied by researchers to make it more efficient. However, most of the current techniques ignore the role of a database in the architecture of a system, which is also an important source of concepts or dependencies among them. In this paper, we present a concept location technique for data-intensive systems, as systems with at least one database server in their architecture which is intensively used by its clients. Specifically, we present a static technique for identifying the exact source code location from where a given SQL query was sent to the database. We evaluate our technique by collecting and locating SQL queries from testing scenarios of two open source Java systems under active development. With our technique, we are able to successfully identify the source of most of these queries.
BibTex:

@inproceedings{Nagy2015, author = {Nagy, Csaba and Meurice, Loup and Cleve, Anthony}, title = {Where Was This SQL Query Executed? A Static Concept Location Approach}, booktitle = {Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2015)}, publisher = {IEEE Computer Society}, year = {2015}, pages = {580-584}, doi = {10.1109/SANER.2015.7081881} }
Bulk Fixing Coding Issues and Its Effects on Software Quality: Is It Worth Refactoring?
Gábor Szőke, Gábor Antal, Csaba Nagy, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 14th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2014). Victoria, BC, Canada, pp. 95-104, IEEE, sep 28-29, 2014.
DOI PDF

Abstract:

The quality of a software system is mostly defined by its source code. Software evolves continuously, it gets modified, enhanced, and new requirements always arise. If we do not spend time periodically on improving our source code, it becomes messy and its quality will decrease inevitably. Literature tells us that we can improve the quality of our software product by regularly refactoring it. But does refactoring really increase software quality? Can it happen that a refactoring decreases the quality? Is it possible to recognize the change in quality caused by a single refactoring operation? In our paper, we seek answers to these questions in a case study of refactoring large-scale proprietary software systems. We analyzed the source code of 5 systems, and measured the quality of several revisions for a period of time. We analyzed 2 million lines of code and identified nearly 200 refactoring commits which fixed over 500 coding issues. We found that one single refactoring only makes a small change (sometimes even decreases quality), but when we do them in blocks, we can significantly increase quality, which can result not only in the local, but also in the global improvement of the code.
BibTex:

@inproceedings{Szoeke2014a, author = {Szőke, Gábor and Antal, Gábor and Nagy, Csaba and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {Bulk Fixing Coding Issues and Its Effects on Software Quality: Is It Worth Refactoring?}, booktitle = {Proceedings of the 14th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2014)}, publisher = {IEEE}, year = {2014}, pages = {95-104}, doi = {10.1109/SCAM.2014.18} }
A Case Study of Refactoring Large-Scale Industrial Systems to Efficiently Improve Source Code Quality
Gábor Szőke, Csaba Nagy, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 14th International Conference on Computational Science and Its Applications (ICCSA 2014). Guimarães, Portugal, Vol. 8583, pp. 524-540, Springer International Publishing, jun, 2014.
DOI PDF

Abstract:

Refactoring source code has many benefits (e.g. improving maintainability, robustness and source code quality), but it takes time away from other implementation tasks, resulting in developers neglecting refactoring steps during the development process. But what happens when they know that the quality of their source code needs to be improved and they can get the extra time and money to refactor the code? What will they do? What will they consider the most important for improving source code quality? What sort of issues will they address first or last and how will they solve them? In our paper, we look for answers to these questions in a case study of refactoring large-scale industrial systems where developers participated in a project to improve the quality of their software systems. We collected empirical data of over a thousand refactoring patches for 5 systems with over 5 million lines of code in total, and we found that developers really optimized the refactoring process to significantly improve the quality of these systems.
BibTex:

@inproceedings{Szoeke2014, author = {Szőke, Gábor and Nagy, Csaba and Ferenc, Rudolf and Gyimóthy, Tibor}, editor = {Murgante, Beniamino and Misra, Sanjay and Rocha, AnaMaria A.C. and Torre, Carmelo and Rocha, JorgeGustavo and Falcão, MariaIrene and Taniar, David and Apduhan, BernadyO. and Gervasi, Osvaldo}, title = {A Case Study of Refactoring Large-Scale Industrial Systems to Efficiently Improve Source Code Quality}, booktitle = {Proceedings of the 14th International Conference on Computational Science and Its Applications (ICCSA 2014)}, publisher = {Springer International Publishing}, year = {2014}, volume = {8583}, pages = {524-540}, doi = {10.1007/978-3-319-09156-3_37} }
A Regression Test Selection Technique for Magic Systems
Gábor Novák, Csaba Nagy and Rudolf Ferenc
In Proceedings of the 13th Symposium on Programming Languages and Software Tools (SPLST 2013). Szeged, Hungary, pp. 76-89, University of Szeged, Faculty of Science and Informatics, Institute of Informatics, aug 26-27, 2013.
PDF

Abstract:

Regression testing is an important step to make sure that after committing a change to our software we do not make unwanted changes to other, untouched features. For larger and faster evolving soft- ware, however, executing all the test cases of a regression test and easily become a tremendous process which takes too much time to thoroughly test each change separately. In our paper, we present a method to support regression testing with impact analysis based test selection. As a result, we and show a limited set of test cases that must be re-executed after a change, to test the changed part of the code and its related code elements. Our technique is implemented for a special 4th-generation language, the Magi xpa development environment. The technique was implemented in cooperation with our industrial partner, SZEGED Software In, who has been developing Magi applications for more than a decade.
BibTex:

@inproceedings{Novak2013, author = {Novák, Gábor and Nagy, Csaba and Ferenc, Rudolf}, editor = {Kiss, Ákos}, title = {A Regression Test Selection Technique for Magic Systems}, booktitle = {Proceedings of the 13th Symposium on Programming Languages and Software Tools (SPLST 2013)}, publisher = {University of Szeged, Faculty of Science and Informatics, Institute of Informatics}, year = {2013}, pages = {76-89} }
Static Analysis of Data-Intensive Applications
Csaba Nagy
In Proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR 2013), PhD Symposium. Genova, Italy, pp. 435-438, IEEE, mar 5-8, 2013.
DOI PDF

Abstract:

Data-intensive systems are designed to handle data at massive scale, and during the years they might evolve to very large, complex systems. In order to support maintenance tasks of these systems several techniques have been developed to analyze the source code of applications or to analyze the underlying databases for the purpose of reverse engineering, e.g. quality assurance or program comprehension. However, only a few techniques take into account the specialties of data-intensive systems (e.g. dependencies arising via database accesses). In this thesis we conducted research to analyze and to improve data-intensive applications via different methods based on static analysis: methods for recovering architecture of data-intensive systems and a quality assurance methodology for applications developed in Magic 4GL. We targeted SQL as the most widespread databases are relational databases using certain dialect of SQL for their queries. With the proposed techniques we were able to analyze large scale industrial projects, such as banking systems with more than 3 million lines of code, and we successfully recovered architecture maps and quality issues of these systems.
BibTex:

@inproceedings{Nagy2013, author = {Nagy, Csaba}, title = {Static Analysis of Data-Intensive Applications}, booktitle = {Proceedings of the 17th European Conference on Software Maintenance and Reengineering (CSMR 2013)}, publisher = {IEEE}, year = {2013}, pages = {435-438}, doi = {10.1109/CSMR.2013.66} }
A Methodology and Framework for Automatic Layout Independent GUI Testing of Applications Developed in Magic xpa
Dániel Fritsi, Csaba Nagy, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 13th International Conference on Computational Science and Its Applications (ICCSA 2013) - Part II. Ho Chi Minh City, Vietnam, pp. 513-528, Springer, jun 24-27, 2013.
DOI PDF

Abstract:

Testing an application via its Graphical User Interface (GUI) requires lots of manual work, even if some steps of GUI testing can be automated. Test automation tools are great help for testers, particularly for regression testing. However these tools still lack some important features and still require manual work to maintain the test cases. For instance, if the layout of a window is changed without affecting the main functionality of the application, all test cases testing the window must be re-recorded again. This hard maintenance work is one of the greatest problems with the regression tests of GUI applications. In our paper we propose an approach to use the GUI information stored in the source code during automatic testing processes to create layout independent test scripts. The idea was motivated by testing an application developed in a fourth generation language, Magic. In this language the layout of the GUI elements (e.g. position and size of controls) are stored in the code and can be gathered via static code analysis. We implemented the presented approach for Magic xpa in a tool called Magic Test Automation, which is used by our industrial partner who has developed applications in Magic for more than a decade.
BibTex:

@inproceedings{Fritsi2013, author = {Fritsi, Dániel and Nagy, Csaba and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {A Methodology and Framework for Automatic Layout Independent GUI Testing of Applications Developed in Magic xpa}, booktitle = {Proceedings of the 13th International Conference on Computational Science and Its Applications (ICCSA 2013) - Part II}, publisher = {Springer}, year = {2013}, pages = {513-528}, doi = {10.1007/978-3-642-39643-4_37} }
Designing and Implementing Control Flow Graph for Magic 4th Generation Language
Richárd Dévai, Judit Jász, Csaba Nagy and Rudolf Ferenc
In Proceedings of the 13th Symposium on Programming Languages and Software Tools (SPLST 2013). Szeged, Hungary, pp. 200-214, University of Szeged, Faculty of Science and Informatics, Institute of Informatics, aug 26-27, 2013.
PDF

Abstract:

A good compiler which implements many optimizations during its compilation phases must be able to perform several static analysis techniques such as control flow or data flow analysis. Besides compilers, these techniques are common for static analyzers to retrieve information from the code for example code auditing, quality assurance, or testing purposes. Implementing control flow analysis requires handling many special structures of the target language. In our paper we present our experiences in implementing control flow graph (CFG) construction for a special 4th generation language called Magic. During designing and implementing the CFG for this language we identified differences compared to 3rd generation languages because the special programming technique of this language (e.g. data access, parallel task execution, events). Our work was motivated by our industrial partner who needed precise static analysis tools (e.g. for quality assurance or testing purposes) for this language. We believe that our experiences for Magic, as a representative of 4GLs might be generalized for other languages too.
BibTex:

@inproceedings{Devai2013, author = {Dévai, Richárd and Jász, Judit and Nagy, Csaba and Ferenc, Rudolf}, editor = {Kiss, Akos}, title = {Designing and Implementing Control Flow Graph for Magic 4th Generation Language}, booktitle = {Proceedings of the 13th Symposium on Programming Languages and Software Tools (SPLST 2013)}, publisher = {University of Szeged, Faculty of Science and Informatics, Institute of Informatics}, year = {2013}, pages = {200-214} }
Solutions for Reverse Engineering 4GL Applications, Recovering the Design of a Logistical Wholesale System
Csaba Nagy, László Vidács, Rudolf Ferenc, Tibor Gyimóthy, Ferenc Kocsis and István Kovács
In Proceedings of the 15th European Conference on Software Maintenance and Reengineering (CSMR 2011), EU Projects Track. Oldenburg, Germany, pp. 343-346, IEEE, mar 1-4, 2011.
DOI PDF

Abstract:

Re-engineering a legacy software system to support new, modern technologies instead of old ones is not an easy task, especially for large systems with a complex architecture. The use of reverse engineering tools is crucial for different subtasks of the full process, such as re-documenting the old code or recovering its design. There are many tools available to assist developers, but most of these tools were designed to deal with third generation languages (e.g. Java, C, C++, C\#). However, many large systems are developed in higher level languages (e.g. Magic, Informix, ABAP) and current tools are not able to support all the arising problems during re-engineering systems written in fourth generation languages. In this paper we present a project whose main goal is the development of a technologically and functionally renewed medicinal wholesale system. This system is developed in Magic 4GL, and its development is based on re-engineering an old Magic (version 5) system to uniPaaS, which is the current release version of Magic. In the early phases of this project we developed a reverse engineering toolset for Magic 4GL to support reverse engineering, recovering the design of the old system, and to support some forward engineering tasks too. Here we present a report on this project that was carried out in cooperation with SZEGED Software Zrt and the Department of Software Engineering at the University of Szeged. The project was partly funded by the Economic Development Operational Programme, New Hungary Development Plan.
BibTex:

@inproceedings{Nagy2011b, author = {Nagy, Csaba and Vidács, László and Ferenc, Rudolf and Gyimóthy, Tibor and Kocsis, Ferenc and Kovács, István}, title = {Solutions for Reverse Engineering 4GL Applications, Recovering the Design of a Logistical Wholesale System}, booktitle = {Proceedings of the 15th European Conference on Software Maintenance and Reengineering (CSMR 2011)}, publisher = {IEEE}, year = {2011}, pages = {343-346}, doi = {10.1109/CSMR.2011.66} }
Complexity measures in 4GL environment
Csaba Nagy, László Vidács, Rudolf Ferenc, Tibor Gyimóthy, Ferenc Kocsis and István Kovács
In Proceedings of the 2011 International Conference on Computational Science and Its Applications (ICCSA 2011) - Volume Part V. Santander, Spain, pp. 293-309, Springer-Verlag, jun 20-23, 2011.
DOI PDF

Abstract:

Nowadays, the most popular programming languages are socalled third generation languages, such as Java, C\# and C++, but higher level languages are also widely used for application development. Our work was motivated by the need for a quality assurance solution for a fourth generation language (4GL) called Magic. We realized that these very high level languages lie outside the main scope of recent static analysis techniques and researches, even though there is an increasing need for solutions in 4GL environment. During the development of our quality assurance framework we faced many challenges in adapting metrics from popular 3GLs and defining new ones in 4GL context. Here we present our results and experiments focusing on the complexity of a 4GL system. We found that popular 3GL metrics can be easily adapted based on syntactic structure of a language, however it requires more complex solutions to define complexity metrics that are closer to developers' opinion. The research was conducted in co-operation with a company where developers have been programming in Magic for more than a decade. As an outcome, the resulting metrics are used in a novel quality assurance framework based on the Columbus methodology.
BibTex:

@inproceedings{Nagy2011a, author = {Nagy, Csaba and Vidács, László and Ferenc, Rudolf and Gyimóthy, Tibor and Kocsis, Ferenc and Kovács, István}, title = {Complexity measures in 4GL environment}, booktitle = {Proceedings of the 2011 International Conference on Computational Science and Its Applications (ICCSA 2011) - Volume Part V}, publisher = {Springer-Verlag}, year = {2011}, pages = {293-309}, doi = {10.1007/978-3-642-21934-4_25} }
A true story of refactoring a large Oracle PL/SQL banking system
Csaba Nagy, Rudolf Ferenc and Tibor Bakota
In Proceedings of the 8th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2011), Industrial Track. Szeged, Hungary, sep 5-9, 2011.
PDF

Abstract:

It is common that due to the pressure of business, banking systems evolve and grow fast and even the slightest wrong decision may result in losing control over the codebase in long term. Once it happens, the business will not drive developments any more, but will be constrained by maintenance preoccupations. As easy is to lose control, as hard is to regain it again. Software comprehension and refactoring are the proper means for reestablishing governance over the system, but they require sophisticated tools and methods that help analyzing, understanding and refactoring the codebase. This paper tells a true story about how control has been lost and regained again in case of a real banking system written in PL/SQL programming language.
BibTex:

@inproceedings{Nagy2011, author = {Nagy, Csaba and Ferenc, Rudolf and Bakota, Tibor}, title = {A true story of refactoring a large Oracle PL/SQL banking system}, booktitle = {Proceedings of the 8th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2011)}, year = {2011} }
A Layout Independent GUI Test Automation Tool for Applications Developed in Magic/uniPaaS
Dániel Fritsi, Csaba Nagy, Rudolf Ferenc and Tibor Gyimóthy
In Proceedings of the 12th Symposium on Programming Languages and Software Tools (SPLST 2011). Tallinn, Estonia, pp. 248-259, TTÜ Küberneetika Instituut, oct 4-7, 2011.
PDF

Abstract:

A good software development process involves thorough testing phases, that are usually expensive, but necessary to deliver a reliable and high quality product. Testing an application via its graphical user interface requires lots of manual work, even if some steps of GUI testing can be automated. Test automation tools are a great help for testers, particularly for regression tests. However these tools still lack some important features and still require manual work to maintain the test cases. For instance, if the layout of a window is changed without affecting the main functionality of the application, all test cases testing the window must be re-recorded again. This hard maintenance work is one of the greatest problems with the regression tests of GUI applications. In our paper we propose an approach to use the GUI information stored in the source code during automatic testing processes to create layout independent test scripts. With this technique, the already recorded tests scripts will be unaffected by minor changes in the GUI. It reduces the maintenance effort of very expensive regression tests where thousands of test cases have to be maintained by testing teams. The idea was motivated by testing an application developed in a fourth generation language, Magic/uniPaaS. In this language the layout of the GUI elements (structure of the window, position and size of controls, etc.) are stored in the code and it can be gathered via static code analysis. We implemented the presented approach for Magic/uniPaaS, and our Magic Test Automation tool is used by our industrial partner who has developed applications in Magic/uniPaaS for more than a decade.
BibTex:

@inproceedings{Fritsi2011, author = {Fritsi, Dániel and Nagy, Csaba and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {A Layout Independent GUI Test Automation Tool for Applications Developed in Magic/uniPaaS}, booktitle = {Proceedings of the 12th Symposium on Programming Languages and Software Tools (SPLST 2011)}, publisher = {TTÜ Küberneetika Instituut}, year = {2011}, pages = {248-259} }
CIASYS--Change Impact Analysis at System Level
Gabriella Tóth, Csaba Nagy, Judit Jász, Árpád Beszédes and Lajos Fülöp
In Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR 2010), EU Projects Track. Madrid, Spain, pp. 198-201, IEEE Computer Society, mar 15-18, 2010.
DOI

Abstract:

The research field of change impact analysis plays an important role in software engineering theory and practice nowadays. Not only because it has many scientific challenges, but it has many industrial applications too (e.g., cost estimation, test optimization), and the current techniques are still not ready to fulfill the requirements of industry. Typically, the current solutions lack a whole-system view and give either precise results with high computation costs or less precise results with fast algorithms. For these reasons, they are not applicable to large industrial systems where both scalability and precision are very important. In this paper, we present a project whose main goal is to develop an innovative change impact analysis software-suit based on recent scientific results and modern technologies. The suite will use hybrid analysis techniques to benefit from all the advantages of static and dynamic analyses. In addition, it will be able to determine the dependencies at system level of software systems with heterogeneous architecture. The software is being developed by FrontEndART Ltd. while the theoretical and technological background is provided by the Department of Software Engineering at the University of Szeged. The project is funded by the Economic Development Operational Programme, New Hungary Development Plan.
BibTex:

@inproceedings{Toth2010, author = {Tóth, Gabriella and Nagy, Csaba and Jász, Judit and Beszédes, Árpád and Fülöp, Lajos}, title = {CIASYS--Change Impact Analysis at System Level}, booktitle = {Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR 2010)}, publisher = {IEEE Computer Society}, year = {2010}, pages = {198-201}, doi = {10.1109/CSMR.2010.35} }
Towards a Safe Method for Computing Dependencies in Database-Intensive Systems
Csaba Nagy, János Pántos, Tamás Gergely and Árpád Beszédes
In Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR 2010). Madrid, Spain, pp. 166-175, IEEE, mar 15-18, 2010.
DOI PDF

Abstract:

Determining dependencies between different components of an application is useful in lots of applications (e. g., architecture reconstruction, reverse engineering, regression test case selection, change impact analysis). However, implementing automated methods to recover dependencies has many challenges, particularly in systems using databases, where dependencies may arise via database access. Furthermore, it is especially hard to find safe techniques (which do not omit any important dependency) that are applicable to large and complex systems at the same time. We propose two techniques that can cope with these problems in most situations. These methods compute dependencies between procedures or database tables, and they are based on the simultaneous static analysis of the source code, the database schema and the SQL instructions. In this paper, we quantitatively and qualitatively evaluate the methods on real-life data, and also evaluate them on some of their potential applications.
BibTex:

@inproceedings{Nagy2010a, author = {Nagy, Csaba and Pántos, János and Gergely, Tamás and Beszédes, Árpád}, title = {Towards a Safe Method for Computing Dependencies in Database-Intensive Systems}, booktitle = {Proceedings of the 14th European Conference on Software Maintenance and Reengineering (CSMR 2010)}, publisher = {IEEE}, year = {2010}, pages = {166-175}, doi = {10.1109/CSMR.2010.29} }
MAGISTER: Quality assurance of Magic applications for software developers and end users
Csaba Nagy, László Vidacs, Rudolf Ferenc, Tibor Gyimóthy, Ferenc Kocsis and István Kovács
In Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM 2010), Tool Demonstration. Timisoara, Romania, pp. 1-6, IEEE, sep 12-18, 2010.
DOI PDF

Abstract:

Nowadays there are many tools and methods available for source code quality assurance based on static analysis, but most of these tools focus on traditional software development techniques with 3GL languages. Besides procedural languages, 4GL programming languages such as Magic 4GL and Progress are widely used for application development. All these languages lie outside the main scope of analysis techniques. In this paper we present MAGISTER, which is a quality assurance framework for applications being developed in Magic, a 4GL application development solution created by Magic Software Enterprises. MAGISTER extracts data using static analysis methods from applications being developed in different versions of Magic (v5-9 and uniPaaS). The extracted data (including metrics, rule violations and dependency relations) is presented to the user via a GUI so it can be queried and visualized for further analysis. It helps software developers, architects and managers through the full development cycle by performing continuous code scans and measurements.
BibTex:

@inproceedings{Nagy2010, author = {Nagy, Csaba and Vidacs, László and Ferenc, Rudolf and Gyimóthy, Tibor and Kocsis, Ferenc and Kovács, István}, title = {MAGISTER: Quality assurance of Magic applications for software developers and end users}, booktitle = {Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM 2010)}, publisher = {IEEE}, year = {2010}, pages = {1-6}, doi = {10.1109/ICSM.2010.5609550} }
Static Security Analysis Based on Input-Related Software Faults
Csaba Nagy and Spiros Mancoridis
In Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR 2009). Fraunhofer IESE, Kaiserslautern, Germany, pp. 37-46, IEEE, mar 24-27, 2009.
DOI PDF

Abstract:

It is important to focus on security aspects during the development cycle to deliver reliable software. However, locating security faults in complex systems is difficult and there are only a few effective automatic tools available to help developers. In this paper we present an approach to help developers locate vulnerabilities by marking parts of the source code that involve user input. We focus on input-related code, since an attacker can usually take advantage of vulnerabilities by passing malformed input to the application. The main contributions of this work are two metrics to help locate faults during a code review, and algorithms to locate buffer overflow and format string vulnerabilities in C source code. We implemented our approach as a plug in to the Grammatech CodeSurfer tool. We tested and validated our technique on open source projects and we found faults in software that includes Pidgin and cyrus-imapd.
BibTex:

@inproceedings{Nagy2009, author = {Nagy, Csaba and Mancoridis, Spiros}, title = {Static Security Analysis Based on Input-Related Software Faults}, booktitle = {Proceedings of the 13th European Conference on Software Maintenance and Reengineering (CSMR 2009)}, publisher = {IEEE}, year = {2009}, pages = {37-46}, doi = {10.1109/CSMR.2009.51} }

Code factoring in GCC on different intermediate languages

Csaba Nagy, Gábor Lóki, Árpád Beszédes and Tibor Gyimóthy

In Proceedings of the 10th Symposium on Programming Languages and Software Tools (SPLST 2007). Budapest, Hungary, pp. 81-95, jun 14-16, 2007.

Abstract:

BibTex:

@inproceedings{Nagy2007b,
  author = {Nagy, Csaba and Lóki, Gábor and Beszédes, Árpád and Gyimóthy, Tibor},
  title = {Code factoring in GCC on different intermediate languages},
  booktitle = {Proceedings of the 10th Symposium on Programming Languages and Software Tools (SPLST 2007)},
  year = {2007},
  pages = {81-95}
}

Extension of GCC with a fully manageable reverse engineering front end
Csaba Nagy
In Proceedings of the 7th International Conference on Applied Informatics (ICAI 2007). Eger, Hungary, jan 28-31, 2007.
PDF

Abstract:

In the open source community one of the most popular compiler is GNU GCC. It is a very complex and robust compiler but because of its working mechanism it has no ability for special transformations like interprocedural optimizations. A typical compiler has a three sided construction. It has a front end for analyzes and for building an abstract internal representation of the program, a middle for transformations (eg. optimizations), and a back end for final code generation. However there are smaller but very useful projects for only front/middle/back ends, too. It seems possible to achieve a more effective compiler by extending GCC with a front end which is capable of running special algorithms. This paper shows one solution for this extension. The described method is based on using Columbus/CAN instead of GCC's front end and GCC's back end for code generation. As Columbus has a well-structured schema for the representation of C++ sources, by this transformation we will have the ability to execute those special transformations on the code before the compiling phases. Furthermore this technique opens the possibility to link other front ends (like EDG) with GCC to achieve a more powerful compiler, for example in code size optimizations. This approach has been tested on GCC's official Code-Size Benchmark Environment (CSiBE) as real-world system and for the testing diffeerent metrics have been measured on the compilation with this 'extended compiler' and with the official GCC.
BibTex:

@inproceedings{Nagy2007a, author = {Nagy, Csaba}, title = {Extension of GCC with a fully manageable reverse engineering front end}, booktitle = {Proceedings of the 7th International Conference on Applied Informatics (ICAI 2007)}, year = {2007} }

Book Chapters

Mining, Analyzing, and Evolving Data-intensive Software Ecosystems
Csaba Nagy, Michele Lanza and Anthony Cleve
In Software Ecosystems: Tooling and Analytics. Springer, sep, 2023.
DOI

Abstract:

Managing data-intensive software ecosystems has long been considered an expensive and error-prone process. This is mainly due to the often implicit consistency relationships between applications and their database(s). In addition, as new technologies emerged for specialized purposes (e.g., key-value stores, document stores, graph databases), the common use of multiple database models within the same software (eco)system has also become more popular. There are undeniable benefits of such multi-database models where developers use and combine technologies. However, the side effects on database design, querying, and maintenance are not well-known. In this chapter, we elaborate on the recent research effort devoted to mining, analyzing, and evolving data-intensive software ecosystems. We focus on methods, techniques, and tools providing developers with automated support. We cover different processes, including automatic database query extraction, bad smell detection, self-admitted technical debt analysis, and evolution history visualization.
BibTex:

@inbook{Nagy2023, author = {Nagy, Csaba and Lanza, Michele and Cleve, Anthony}, editor = {Mens, Tom and Roover, Coen De and Cleve, Anthony}, title = {Software Ecosystems: Tooling and Analytics}, chapter = {Mining, Analyzing, and Evolving Data-intensive Software Ecosystems}, publisher = {Springer}, year = {2023}, doi = {10.1007/978-3-031-36060-2_11} }
Analyzing the Evolution of Database Usage in Data-Intensive Software Systems
Loup Meurice, Mathieu Goeminne, Tom Mens, Csaba Nagy, Alexandre Decan and Anthony Cleve
In Software Technology: 10 Years of Innovation. pp. 208-240, John Wiley & Sons, 2018.
DOI PDF

Abstract:

This chapter presents the research advancements in the field of data-intensive software system evolution, 5 years after the publication of our IEEE Computer column presenting the challenges in this field. We present the state-of-the-art in this research domain, and report on research on the evolution of open source Java projects relying on relational database technologies. We empirically analyse how the use of Java database technologies evolves over time. We report on a coarse-grained source-code analysis carried out over several thousands of Java projects, and complement this by a fine-grained longitudinal analysis of the co-evolution between database schema changes and source code changes within three large Java projects. The presented results are a first step towards a recommendation system supporting developers in writing database-centered code.
BibTex:

@inbook{Meurice2018, author = {Meurice, Loup and Goeminne, Mathieu and Mens, Tom and Nagy, Csaba and Decan, Alexandre and Cleve, Anthony}, editor = {Hinchey, Mike}, title = {Software Technology: 10 Years of Innovation}, chapter = {Analyzing the Evolution of Database Usage in Data-Intensive Software Systems}, publisher = {John Wiley & Sons}, year = {2018}, pages = {208-240}, doi = {10.1002/9781119174240.ch12} }

Journal Articles

Best practices of testing database manipulation code
Maxime Gobert, Csaba Nagy, Henrique Rocha, Serge Demeyer and Anthony Cleve
Information Systems. Vol. 111, pp. 102105, Elsevier, 2023.
DOI PDF

Abstract:

Software testing enables development teams to maintain the quality of a software system while it evolves. The database manipulation code requires special attention in this context. However, it is often neglected and suffers from software maintenance problems. In this paper, we study the current state-of-the-practice in testing database manipulation code. We first analysed the tests of 72 open-source projects to gain insight into the coverage of database access code. The database was poorly tested: 46% of the projects did not cover with tests half of their database access methods, and 33% did not cover the database code at all. This poor coverage motivated us to study developers’ challenges and best practices. (i) First, we analysed 532 questions on Stack Exchange sites and deduced a taxonomy of issues. Developers mostly looked for general best practices to test database access code. Their technical questions were related to database management, mocking, parallelisation, or framework/tool usage. (ii) Next, we examined the answers to these questions. We manually labelled 598 answers to 255 questions. We distinguished 363 solutions and organised them in a taxonomy of best practices. Most of the suggestions considered the testing environment and recommended various tools or configurations. The second largest category was database management, where many addressed database initialisation and clean-up between tests. Other categories pertained to code structure or design, concepts, performance, processes, test characteristics, test code, and mocking. We illustrate the two taxonomies through intriguing examples.
BibTex:

@article{Gobert2022, author = {Gobert, Maxime and Nagy, Csaba and Rocha, Henrique and Demeyer, Serge and Cleve, Anthony}, title = {Best practices of testing database manipulation code}, journal = {Information Systems}, publisher = {Elsevier}, year = {2023}, volume = {111}, pages = {102105}, doi = {10.1016/j.is.2022.102105} }
FIXME: Synchronize with Database An Empirical Study of Data Access Self-Admitted Technical Debt
Biruk Asmare Muse, Csaba Nagy, Anthony Cleve, Foutse Khomh and Giuliano Antoniol
Empirical Software Engineering. Vol. 27, 6, pp. 130, Springer, jul, 2022.
DOI PDF

Abstract:

Developers sometimes choose design and implementation shortcuts due to the pressure from tight release schedules. However, shortcuts introduce technical debt that increases as the software evolves. The debt needs to be repaid as fast as possible to minimize its impact on software development and software quality. Sometimes, technical debt is admitted by developers in comments and commit messages. Such debt is known as self-admitted technical debt (SATD). In data-intensive systems, where data manipulation is a critical functionality, the presence of SATD in the data access logic could seriously harm performance and maintainability. Understanding the composition and distribution of the SATDs across software systems and their evolution could provide insights into managing technical debt efficiently. We present a large-scale empirical study on the prevalence, composition, and evolution of SATD in data-intensive systems. We analyzed 83 open-source systems relying on relational databases as well as 19 systems relying on NoSQL databases. We detected SATD in source code comments obtained from different snapshots of the subject systems. To understand the evolution dynamics of SATDs, we conducted a survival analysis. Next, we performed a manual analysis of 361 sample data-access SATDs, investigating the composition of data-access SATDs and the reasons behind their introduction and removal. We identified 15 new SATD categories, out of which 11 are specific to database access operations. We found that most of the data-access SATDs are introduced in the later stages of change history rather than at the beginning. We also observed that bug fixing and refactoring are the main reasons behind the introduction of data-access SATDs.
BibTex:

@article{Muse2022, author = {Muse, Biruk Asmare and Nagy, Csaba and Cleve, Anthony and Khomh, Foutse and Antoniol, Giuliano}, title = {FIXME: Synchronize with Database An Empirical Study of Data Access Self-Admitted Technical Debt}, journal = {Empirical Software Engineering}, publisher = {Springer}, year = {2022}, volume = {27}, number = {6}, pages = {130}, doi = {10.1007/s10664-022-10119-4} }
Quick remedy commits and their impact on mining software repositories
Fengcai Wen, Csaba Nagy, Michele Lanza and Gabriele Bavota
Empirical Software Engineering. Vol. 27, 1, pp. 14, Springer, oct 28, 2021.
DOI PDF

Abstract:

Most changes during software maintenance and evolution are not atomic changes, but rather the result of several related changes affecting different parts of the code. It may happen that developers omit needed changes, thus leaving a task partially unfinished, introducing technical debt or injecting bugs. We present a study investigating ``quick remedy commits'' performed by developers to implement changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring) or simply improve the previously committed change (e.g., improve the name of a newly introduced variable). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The taxonomy can (i) guide the development of tools aimed at detecting omitted changes and (ii) help researchers in identifying corner cases that must be properly handled. For example, one of the categories in our taxonomy groups the reverted commits, meaning changes that are undone in a subsequent commit. We show that not accounting for such commits when mining software repositories can undermine one's findings. In particular, our results show that considering completely reverted commits when mining software repositories accounts, on average, for 0.07 and 0.27 noisy data points when dealing with two typical MSR data collection tasks (i.e., bug-fixing commits identification and refactoring operations mining, respectively).
BibTex:

@article{Wen2021b, author = {Wen, Fengcai and Nagy, Csaba and Lanza, Michele and Bavota, Gabriele}, title = {Quick remedy commits and their impact on mining software repositories}, journal = {Empirical Software Engineering}, publisher = {Springer}, year = {2021}, volume = {27}, number = {1}, pages = {14}, doi = {10.1007/s10664-021-10051-z} }
Empirical study on refactoring large-scale industrial systems and its effects on maintainability
Gábor Szőke, Gábor Antal, Csaba Nagy, Rudolf Ferenc and Tibor Gyimóthy
Journal of Systems and Software. Vol. 129, pp. 107-126, Elsevier, jul, 2017.
DOI PDF

Abstract:

Abstract Software evolves continuously, it gets modified, enhanced, and new requirements always arise. If we do not spend time occasionally on improving our source code, its maintainability will inevitably decrease. The literature tells us that we can improve the maintainability of a software system by regularly refactoring it. But does refactoring really increase software maintainability? Can it happen that refactoring decreases the maintainability? Empirical studies show contradicting answers to these questions and there have been only a few studies which were performed in a large-scale, industrial context. In our paper, we assess these questions in an in vivo context, where we analyzed the source code and measured the maintainability of 6 large-scale, proprietary software systems in their manual refactoring phase. We analyzed 2.5 million lines of code and studied the effects on maintainability of 315 refactoring commits which fixed 1273 coding issues. We found that single refactorings only make a very little difference (sometimes even decrease maintainability), but a whole refactoring period, in general, can significantly increase maintainability, which can result not only in the local, but also in the global improvement of the code.
BibTex:

@article{Szoeke2017, author = {Szőke, Gábor and Antal, Gábor and Nagy, Csaba and Ferenc, Rudolf and Gyimóthy, Tibor}, title = {Empirical study on refactoring large-scale industrial systems and its effects on maintainability}, journal = {Journal of Systems and Software}, publisher = {Elsevier}, year = {2017}, volume = {129}, pages = {107-126}, doi = {10.1016/j.jss.2016.08.071} }
Designing and Implementing Control Flow Graph for Magic 4th Generation Language
Richárd Dévai, Judit Jász, Csaba Nagy and Rudolf Ferenc
Acta Cybernetica. Vol. 21, 3, pp. 419-437, University of Szeged, Institute of Informatics, jan, 2014.
DOI PDF

Abstract:

A good compiler which implements many optimizations during its compilation phases must be able to perform several static analysis techniques such as control flow or data flow analysis. Besides compilers, these techniques are common for static analyzers as well to retrieve information from source code, for example for code auditing, quality assurance or testing purposes. Implementing control flow analysis requires handling many special structures of the target language. In our paper we present our experiences in implementing control flow graph (CFG) construction for a special 4th generation language called Magic. While we were designing and implementing the CFG for this language, we identified differences compared to 3rd generation languages mostly because of the unique programming technique of Magic (e.g. data access, parallel task execution, events). Our work was motivated by our industrial partner who needed precise static analysis tools (e.g. for quality assurance or testing purposes) for this language. We believe that our experiences for Magic, as a representative of 4GLs, might be generalized for other languages too.
BibTex:

@article{Devai2014, author = {Dévai, Richárd and Jász, Judit and Nagy, Csaba and Ferenc, Rudolf}, title = {Designing and Implementing Control Flow Graph for Magic 4th Generation Language}, journal = {Acta Cybernetica}, publisher = {University of Szeged, Institute of Informatics}, year = {2014}, volume = {21}, number = {3}, pages = {419-437}, doi = {10.14232/actacyb.21.3.2014.9} }
Code factoring in GCC on different intermediate languages
Csaba Nagy, Gábor Lóki, Árpád Beszédes and Tibor Gyimóthy
Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae Sectio Computatorica. Vol. 30, pp. 79-96, Loránd Eötvös University, Faculty of Informatics, Department of Computer Algebra, Budapest, 2009.
Zbl PDF

Abstract:

Today as handheld devices (smart phones, PDAs, etc.) are becoming increasingly popular, storage capacity becomes more and more important. One way to increase capacity is to optimize static executables on the device. This resulted that code-size optimization gets bigger attention nowadays and new techniques are observed, like code factoring which is still under research. Although GNU GCC is the most common compiler in the open source community and has many implemented algorithms for code-size optimization, the compiler is still weak in these methods, which can be turned on using the `-Os' flag. In this article we would like to give an overview on implementation of different code factoring algorithms (local factoring, sequence abstraction, interprocedural abstraction) on the IPA, Tree, Tree SSA and RTL passes of GCC. The correctness of the implementation was checked, and the results were measured on different architectures with GCC's official Code-Size Benchmark Environment (CSiBE) as a real-world system. These results showed that on the ARM architecture we could achieve 61.53% maximum and 2.58% average extra code-size saving compared to the `-Os' flag of GCC.
BibTex:

@article{Nagy2009a, author = {Nagy, Csaba and Lóki, Gábor and Beszédes, Árpád and Gyimóthy, Tibor}, title = {Code factoring in GCC on different intermediate languages}, journal = {Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae Sectio Computatorica}, publisher = {Loránd Eötvös University, Faculty of Informatics, Department of Computer Algebra, Budapest}, year = {2009}, volume = {30}, pages = {79-96} }

Theses

Evaluating optimization and reverse engineering techniques on data-intensive systems

Csaba Nagy

PhD Thesis, University of Szeged, Szeged, Hungary, dec, 2013.

PDF

Abstract:

BibTex:

@phdthesis{Nagy2013a,
  author = {Nagy, Csaba},
  title = {Evaluating optimization and reverse engineering techniques on data-intensive systems},
  school = {University of Szeged},
  year = {2013}
}

Extension of GCC with a fully manageable reverse engineering front end

Csaba Nagy

Master's Thesis, University of Szeged, Szeged, Hungary, 2007.

PDF

Abstract:

BibTex:

@mastersthesis{Nagy2007,
  author = {Nagy, Csaba},
  title = {Extension of GCC with a fully manageable reverse engineering front end},
  school = {University of Szeged},
  year = {2007}
}

Miscellaneous

Static Analysis of Database Accesses in MongoDB Applications

Boris Cherry, Csaba Nagy and Anthony Cleve

In Proceedings of the 22nd Belgium-Netherlands Software Evolution Workshop (BENEVOL 2023), Presentation Abstracts. Nijmegen, The Netherlands, nov, 2023.

Abstract:

BibTex:

@inproceedings{Cherry2023_misc,
  author = {Cherry, Boris and Nagy, Csaba and Cleve, Anthony},
  title = {Static Analysis of Database Accesses in MongoDB Applications},
  booktitle = {Proceedings of the 22nd Belgium-Netherlands Software Evolution Workshop (BENEVOL 2023)},
  year = {2023}
}

Parsing and Analyzing SQL Queries in Stack Overflow Questions
Csaba Nagy and Anthony Cleve
In Preproceedings of the Eight Seminar Series on Advanced Techniques & Tools for Software Evolution (SATToSE 2015). Mons, Belgium, jul 6-8, 2015.
PDF

Abstract:

The rapid growth and increasing popularity of Stack Overflow made it a large knowledge base of several programming topics which also attracts researchers. To mention a few examples, they study actual trends that developers follow design questions of Q&A systems island parsing techniques to analyze posts, recommendation systems, and try model the quality of the posts. In our paper, we introduce an approach to parse and analyze SQL queries in Stack Overflow questions with the main goal to identify common patterns among them. Such similar structures in SQL statements can point to problematic language constructs (e.g. antipatterns) in SQL statements which should be avoided by developers.
BibTex:

@inproceedings{Nagy2015_misc, author = {Nagy, Csaba and Cleve, Anthony}, title = {Parsing and Analyzing SQL Queries in Stack Overflow Questions}, booktitle = {Preproceedings of the Eight Seminar Series on Advanced Techniques & Tools for Software Evolution (SATToSE 2015)}, year = {2015} }

Adat-intenzív szoftverrendszerek

Csaba Nagy

In SZTE Talent Press Az SZTE Tehetségpont disszeminációs és tudomány-népszerűsítő magazinja. VI, University of Szeged, 2014.

PDF

Abstract:

BibTex:

@incollection{Nagy2014a_misc,
  author = {Nagy, Csaba},
  title = {Adat-intenzív szoftverrendszerek},
  booktitle = {SZTE Talent Press Az SZTE Tehetségpont disszeminációs és tudomány-népszerűsítő magazinja},
  publisher = {University of Szeged},
  year = {2014},
  number = {VI}
}

A Static Concept Location Technique for Data-Intensive Systems: "Where Was This SQL Query Executed?"
Csaba Nagy and Anthony Cleve
In Proceedings of the 13th Software Evolution in Belgium and the Netherlands Seminar (BENEVOL 2014). Amsterdam, Netherland, Centrum Wiskunde & Informatica (CWI), nov 27-28, 2014.
PDF

Abstract:

An evolving software system is incrementally modified, changed by its developers during the development and maintenance phases. Before the developers start working on a change they need to identify which parts of the source code implement the feature, and should be touched first during the change. In practice, what they do is a concept location task (also known as feature identification/location) which is 'the process that identifies where a software system implements a specific concept'. There are many existing approaches to support developers in concept location tasks starting from simple pattern matching (so-called `grep' techniques) to more sophisticated methods like IR- based techniques or dependency analyzes. However, none of the existing approaches consider when there is a database in the architecture, which adds further source artifacts or dependencies. Here, we investigate a concept location approach for data-intensive systems, as applications with at least one database server in their architecture which is intensively used by its clients. Specifically, we introduce a static technique to identify the location(s) in the source code where a given SQL query was potentially sent to the database server.
BibTex:

@inproceedings{Nagy2014_misc, author = {Nagy, Csaba and Cleve, Anthony}, title = {A Static Concept Location Technique for Data-Intensive Systems: "Where Was This SQL Query Executed?"}, booktitle = {Proceedings of the 13th Software Evolution in Belgium and the Netherlands Seminar (BENEVOL 2014)}, publisher = {Centrum Wiskunde & Informatica (CWI)}, year = {2014} }

Static Security Analysis Based on Input Related Software Faults

Csaba Nagy

In Proceedings of the Hungarian-American Sholarship Fund (HAESF) Five Year Anniversary Conference. Hungarian Academy of Sciences, Budapest, Hungary, pp. 12, HEASF, CIEE, sep 18, 2009.

Abstract:

BibTex:

@conference{Nagy2009b_misc,
  author = {Nagy, Csaba},
  title = {Static Security Analysis Based on Input Related Software Faults},
  booktitle = {Proceedings of the Hungarian-American Sholarship Fund (HAESF) Five Year Anniversary Conference},
  publisher = {HEASF, CIEE},
  year = {2009},
  pages = {12}
}

Activities

Program Committee Member at the 23rd International Conference on Mining Software Repositories (MSR 2026), Data and Tool Showcase Track, Rio de Janeiro, Brazil, April 2026
Program co-Chair at the 22nd International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2026), Montreal, Canada, July 2026
Program Committee Member at the 19th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE 2026), Research Track, Rio de Janeiro, Brazil, April 2–13, 2026
Program Committee Member at the 48th IEEE/ACM International Conference on Software Engineering (ICSE 2026), Demonstrations Track, Rio de Janeiro, Brazil, April 12 - 18, 2026
Program Committee Member at the 34th International Conference on Program Comprehension (ICPC 2026), Rio de Janeiro, Brazil, April 2026
Program Committee Member at the ACM International Conference on the Foundations of Software Engineering (FSE 2025), Demonstrations track, Trondheim, Norway, June 23–27, 2025
Program Committee Member at the 16th International Conference on Internetware (Internetware 2025), Trondheim, Norway, June 20–22, 2025
Publicity chair at the 21st International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2025), Trondheim, Norway, June 26, 2025
Program Committee Member at the 25th IEEE International Conference on Source Code Analysis and Manipulation (SCAM 2025) (SCAM 2025), Research Track, Auckland, New Zealand
Program Committee Member at the 9th Symposium on Computer Languages, Implementations and Tools (SCLIT 2025), Research Papers, Prague, Czechia, June 6, 2025
Program Committee Member at the 41st International Conference on Software Maintenance and Evolution (ICSME 2025), Research Track, Auckland, New Zealand, September 8–13, 2025
Program Committee Member at the 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025), Demonstrations Track, Ottawa, Ontario, Canada, April 30 – May 2, 2025
Program Committee Member at the 32nd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2025), Tool Demo, Montréal, Québec, Canada, March 4–7, 2025
Program Committee Member at the 22nd International Conference on Mining Software Repositories (MSR 2025), Technical Papers, Ottawa, Canada, April 28–29, 2025
Program Committee Member at the 22nd International Conference on Mining Software Repositories (MSR 2025), Data and Tool Showcase Track, Ottawa, Canada, April 28–29, 2025
Program Committee Member at the 24th IEEE International Conference on Source Code Analysis and Manipulation (SCAM 2024), Research Track, Flagstaff, AZ, USA, October 7–8, 2024
Program Committee Member at the 24th IEEE International Conference on Source Code Analysis and Manipulation (SCAM 2024), New Ideas and Emerging Results (NIER) Track, Flagstaff, AZ, USA, October 7–8, 2024
Program Committee Member at the 40th IEEE International Conference on Software Maintenance and Evolution (ICSME 2024), Tool Demo Track, Flagstaff, Arizona, USA, October 6–11, 2024
Program Committee Member at the 12th IEEE Working Conference on Software Visualization (VISSOFT 2024), Flagstaff, Arizona, USA, October 6–7, 2024
Program Committee Member at the 20th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2024), Porto de Galinhas, Brazil, July 16, 2024
Publicity co-Chair at the 20th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2024), Porto de Galinhas, Brazil, July 16, 2024
Program Committee Member at the 40th IEEE International Conference on Software Maintenance and Evolution (ICSME 2024), Research Track, Flagstaff, AX, USA, October 6–11, 2024
Program Committee Member at the 31st International Conference on Software Analysis, Evolution and Reengineering (SANER 2024), Main Track, and Short Papers and Posters Track, Rovaniemi, Finland, March 12–15, 2024
Program Committee Member at the 32nd International Conference on Program Comprehension (ICPC 2024), Research Track, and Tool Demo Track, Lisbon, Portugal, April 14–20, 2024
Program Committee Member at the 21st International Conference on Mining Software Repositories (MSR 2024), Research Track, Data and Tool Showcase Track, Lisbon, Portugal, April 15–16, 2024
Organizing Committe Member at the 3rd International Software Engineering Summer School (SIESTA 2023), Lugano, Switzerland, September 11–13, 2023
Program Committee Member at the 19th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2023), San Francisco, California, USA, December 8th, 2023
Program Committee Member at the 23rd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2023), Research Track, and New Ideas and Emerging Results (NIER) Track, Bogotá, Colombia, October 2–3, 2023
Program Committee Member at the 11th IEEE Working Conference on Software Visualization (VISSOFT 2023), Technical papers, New Ideas & Emerging Results (NIER), and Tool Demo (TD) Tracks, Bogotá, Colombia, October 1–2, 2023
Program Committee Member at the 39th IEEE International Conference on Software Maintenance and Evolution (ICSME 2023), Research Track, Bogotá, Colombia, October 1–6, 2023,
Program Committee Member at the 46th IEEE/ACM International Conference on Software Engineering (ICSE 2024), Demonstrations Track, and Software Engineering in Society (SEIS) Track, Lisbon, Portugal, April 12–14, 2024,
Program Committee Member at the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023), Research Papers Track, Kirchberg, Luxembourg, September 11–15, 2023
Program Committee Member at the 20th International Conference on Mining Software Repositories (MSR 2023), Technical Track, Data and Tool Showcase Track, and Registered Reports Track, Melbourne, Australia, May 15-16, 2023,
Program Committee Member at the 2023 International Conference on Evaluation and Assessment in Software Engineering (EASE 2023), Research Track, Oulu, Finland, June 14–16, 2023
Program Committee Member at the 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023), Research Track, and Replications and Negative Results (RENE) Track, Melbourne, Australia, May 14–20, 2023
Program Committee Member at the 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2023), Early Research Achievements (ERA), and Tool Demo Tracks, Macao SAR, China, March 21–24, 2023
Program Committee Member at the 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023), Demonstrations Track, Melbourne, Australia, May 14–20, 2023
Program Committee Member at the 10th IEEE Working Conference on Software Visualization (VISSOFT 2022), Technical Papers, New Ideas & Emerging Results (NIER), and Tool Demo (TD) Tracks, Limassol, Cyprus, October 2–3, 2022
Program Committee Member at the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE 2022), Research Papers, Ann Arbor, Michigan, United States, September 26 – October 1, 2022
Program Committee Member at the 15th International Conference on the Quality of Information and Communications Technology (QUATIC 2022), Quality Aspects in Software Evolution Thematic Track, Talavera de la Reina, Spain, September 12 – 14, 2022
Program Co-Chair at the 22nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2022), New Ideas and Emerging Results (NIER) Track, Limassol, Cyprus, October 3–4, 2022
Program Committee Member at the 30th IEEE/ACM International Conference on Program Comprehension (ICPC 2022), Research Track, Pittsburgh, Pennsylvania, United States, May 21–22, 2022,
Program Committee Member at the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2022), Tool Demo Track, Honolulu, Hawaii, March 15–18, 2022
Program Committee Member at the 37th International Conference on Software Maintenance and Evolution (ICSME 2021), NIER Track, Luxembourg City, Luxembourg, September 27 – October 1, 2021
Program Committee Member at the 9th IEEE Working Conference on Software Visualization (VISSOFT 2021), Luxembourg City, Luxembourg, September 27–28, 2021
Program Committee Member at the 18th IEEE/ACM International Conference on Mining Software Repositories Conference (MSR 2021), Registered Report (RR) Track, Virtual Event, May 17–19, 2021
Program Committee Member at the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020), Tool Demonstration Track, Melbourne, Australia, September 21–25, 2020
Program Committee Member at the 28th IEEE/ACM International Conference on Program Comprehensioin (ICPC 2020), Research Track, Seoul, South Korea, May 23–24, 2020
Program Committee Member at the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2020), Tool Demo Track, London, Ontario, February 18–21, 2020
Program Committee Member at the International Conference on Technical Debt 2020 (TechDebt 2020), Tools Track, Seoul, South Korea, May 25–26, 2020
Program Committee Member at the 35th IEEE International Conference on Software Maintenance and Evolution (ICSME 2019), Artifacts Track, Cleveland, OH, USA, September 30 – October 4, 2019
Organizing Committee Member at the 2nd International Summer School on Software Engineering (SIESTA 2019), Termoli, Italy, September 3–6, 2019
Program Committee Member at the 19th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2019), Engineering Track, Cleveland, OH, USA, September 30 – October 1, 2019
Program Committee Member at the 13th IEEE International Conference on Research Challenges in Information Science (RCIS 2019), Doctoral Consortium, Brussels, Belgium, May 29–31, 2019
Program Committee Member at the 18th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2018), Engineering Track, Madrid, Spain, September 23–24, 2018
Organizing Committee Member at the 1st International Summer School on Software Engineering (SIESTA 2018), Lugano, Switzerland, September 10–12, 2018
Program Committee Member at the 16th BElgian-NEtherlands software eVOLution symposium (BENEVOL 2017), Antwerp, Belgium, December 4–5, 2017
Program Committee Member at the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME 2017), Industry Track, Shanghai, China, September 17–24, 2017
Program Committee Member at the 17th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2017), Engineering Track, Shanghai, China, September 17–18, 2017
Program Committee Member at the 25th IEEE International Conference on Program Comprehension (ICPC 2017), Tool Demo Track, Buenos Aires, Argentina, May 22–23, 2017
Program Committee Member at the 16th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2016), Engineering Track, Raleigh, North Carolina, USA, October 2–3, 2016
Program Committee Member at the 32nd IEEE International Conference on Software Maintenance and Evolution (ICSME 2016), Industry Track, Raleigh, North Carolina, USA, October 2–10, 2016
Program Committee Member at the 15th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2015), Tool Demo Track, Bremen, Germany, September 27–28, 2015
Program Committee Member at the 8th Seminar Series on Advanced Techniques & Tools for Software Evolution (SATToSE 2015), Mons, Belgium, July 6–8, 2015
Web & Publication co-Chair at the 16th European Conference on Software Maintenance and Reengineering (CSMR 2012), Szeged, Hungary, March 27–30, 2012
Social Chair at the 8th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2011), Szeged, Hungary, September 5–9, 2011