Sultan Alhusain

Assistant Professor of
Computer Science

Datasets and Resources

Software Design Pattern Datasets

Paving the way for data-driven machine-learning based solutions for the problem of Design Pattern recognition.

Click for Details

Defect Prediction Datasets

36 software defect datasets representing different versions of 13 open source Java systems.

Click for Details

Intelligent Data-Driven Reverse Engineering of Software Design Patterns

(PhD Thesis)

Abstract: Recognising implemented instances of Design Patterns (DPs) in software design discloses and recovers a wealth of information about the intention of the original designers and the rationale for their design decisions. Because it is often the case that the documentation available for software systems, if any, is poor and/or obsolete, recovering such information can be of great help and importance for maintenance tasks. However, since DPs are abstractly and vaguely defined, a set of software classes with exactly the same relationships as expected for a DP instance may actually be only accidentally similar. On the other hand, a set of classes with relationships that are, to an extent, different from those typically expected can still be a true DP instance. The deciding factor is mainly concerned with whether or not the set of classes is actually intended to solve the design problem addressed by the DP, thus making the intent a fundamental and defining characteristic of DPs.

Discerning the intent of potential instances requires building complex models that cannot be built using only the descriptions of DPs in books and catalogues. Accordingly, a paradigm shift in DP recognition towards fully machine learning based approaches is required. The problem is that no accurate and sufficiently large DP datasets exist, and it is difficult to manually construct one. Moreover, there is a lack of research on the feature set that should be used in DP recognition. The main aim of this thesis is to enable the required paradigm shift by laying down an accurate, comprehensive and information-rich foundation of feature and data sets. In order to achieve this aim, a large set of features is developed to cover a wide range of design aspects, with particular focus on design intent. This set serves as a global feature set from which different subsets can be objectively selected for different DPs. A new and feasible approach to DP dataset construction is designed and used to construct training datasets. The feature and data sets are then used experimentally to build and train DP classifiers. The results demonstrate the accuracy and utility of the sets introduced, and show that fully machine learning based approaches are capable of providing appropriate and well-equipped solutions for the problem of DP recognition.