Applying Absolute Residuals as Evaluation Criterion for Estimating the Development Time of Software Projects by Means of a Neuro-Fuzzy Approach

In the software development field, software practitioners expend between 30% and 40% more effort than is predicted. Accordingly, researchers have proposed new models for estimating the development effort such that the estimations of these models are close to actual ones. In this study, an application based on a new neurofuzzy system (NFS) is analyzed. The NFS accuracy was compared to that of a statistical multiple linear regression (MLR) model. The criterion for evaluating the accuracy of estimation models has mainly been the Magnitude of Relative Error (MRE), however, it was recently found that MRE is asymmetric, and the use of Absolute Residuals (AR) has been proposed, therefore, in this study, the accuracy results of the NFS and MLR were based on AR. After a statistical paired t-test was performed, results showed that accuracy of the NewNFS is statistically better than that of the MLR at the 99% confidence level. It can be concluded that a newNFS could be used for predicting the effort of software development projects when they have been individually developed on a disciplined process.


INTRODUCTION
A high percentage of machine learning models have been proposed based on an accuracy asymmetric criterion Magnitude of Relative Error (MRE) (Wen et al., 2012), however, it was recently found that MRE is asymmetric and that the use of the Absolute Residual (AR) should be used instead because of AR is unbiased and it is does no lead to asymmetry (Shepperd & MacDonell, 2012).The AR criterion has been already used in recent publications for estimating the effort (López-Martín, 2015) and schedule (duration, time) (Ferreira-Santiago et al., 2015) of software projects.
Software Engineering (SE) is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software and it provides the fundamentals, principles and skills needed to develop and maintain high quality software products (Abran et al., 2004).Some of the areas of SE are: Requirement, design, construction, testing, and management.Software Engineering Management includes planning and measurement of SE, which involves to the Software Development Effort Estimation (SDEE).
The Chaos report conducted by the Standish Group (The Standish group, 2012), which is the report on the failure of projects in the field of information technologies, measures the success of projects only if completed in time, within budget, and if they met the requirements.Several research works in software development effort estimation have cited the Chaos report (De Araújo et al., 2012;Bonneti et al., 2012;De Araújo et al., 2012b;LagerstrÖm et al., 2012;Moløkken-Østvold & Jørgensen, 2003).This report found that more than half of software projects worldwide (around 61%) conducted between 2004 and 2012 were delivered with delay, were over budget and many were not even finished; just 39 percent were classified as successful.The main cause of these problems is a failure of the SDEE (De Araújo et al., 2009, 2011).Estimation of software development effort is the basis for project bidding, budgeting and planning.The consequences of poor budgets and plans can be dramatic: if they are too pessimistic, business opportunities can be lost, while over optimism may be followed by significant losses.
The SDEE activity could start using a personal level approach, starting with the development of small-size projects.The disciplined software development at the personal level based on small-scale projects, represented by the personal software process (PSP), have offered benefit for thousands of developers in academic or industrial training courses (Rombach et al., 2008).
Two of the three most important causes of Information Technology projects failure have been related to a poor resource estimation (González-Carrasco et al., 2012).In average, software developers expend from 30% to 40% more effort than is estimated (Jørgensen & Shepperd, 2007).Because that no single technique to estimate software development effort is best for all situations, it is important to propose new models to compare their results and then generate more realistic estimates (Boehm & Abts, 2000).
The objective of this paper is to present a new Neuro-Fuzzy System (NFS) for achieving higher accuracy for estimating the development time of software projects using the AR and its mean (MAR).
The data set obtained from (Lopez-Martin et al., 2005) with forty-one modules developed in ten projects were used for training and testing the models.The accuracy of the new NFS was compared to that of a Multiple Linear Regression (MLR) model.
The rest of this study is structured as follows: Section 2 presents the related work.Section 3 defines SDEE and the AR is described.In Section 4 is described the software estimation technique where SDEE has been addressed.A brief description to MLR, Fuzzy Logic (FL), Neural Networks (NN) and NFS is presented in their respective sections.Section 5 is dedicated to the description of the data set from which the models are generated; then is carried out the generating of the MLR model and the New-NFS, whereas in Section 6 the results are presented and compared.Finally, conclusions and future work are mentioned in Section 7.

RELATED WORK
A systematic review of 157 studies published between the years 2002 and 2012 involved the application of NFS (Kar et al., 2014).This review classifies the NFS applications into ten different categories, and in none of them was found any study regarding SDEE, therefore, in this study a NFS is proposed as a new model to compare its results to a MLR.
Additional estimation techniques have been proposed into the area of SDEE to improve estimation accuracy (Wen et al., 2012;Jørgensen & Shepperd, 2007).
A second systematic review completed in 2004 identified 304 SDEE studies in 76 journals (Jørgensen & Shepperd, 2007).It classified the studies according to their research topic, estimation approach, research approach, study context and data set.In this review, it was not including any neuro-fuzzy model used for SDEE.
A NFS to estimate software projects development time was built by (Marza et al., 2008).The forty-one modules developed from ten software projects were used as data set.The proposed approach was compared with FL and NN model and results showed that the value of MMRE applying NFS was substantially lower than MMRE applying FL and NN.
In (Garcia-Diaz et al., 2015) the accuracy of time estimation for a NFS was statistically better than the accuracy obtained from a previous NFS and statistical regression when the forty-one modules developed from ten programs were used as data set.Results showed that the value of MMRE (Mean of Magnitude of Relative Error) applying a New-NFS was substantially lower than MMRE applying a previous NFS and statistical regression.
In (Lopez-Martin et al., 2005) was proposed a FL model for SDEE whose results are compared with those of a multiple regression.Results showed that the value of MMRE applying FL was slightly higher than MMRE applying multiple regression.

A. Software development effort estimation (SDEE)
The SDEE has been defined, at least since 1969 as the amount of time in human hours needed to design, code, and test a software project (Naur & Randell, 1969).
The SDEE process consists of specific activities: 1. Obtaining data from previous projects.2. Generation of estimation models.
3. Checking and validating the models, based on accuracy.
One activity of software project planning is the estimation of the development effort, which was considered to be one of the three great challenges of computer science (Brooks, 2003) and effort estimation techniques have been proposed and researched over the last years (Wen et al., 2012;Jørgensen & Shepperd, 2007) Models based on machine learning such as FL, NN, genetic programming, and genetic algorithm.The present work uses estimations obtained with an algorithmic model and it attempts to represent the relationship between effort and one or more characteristics of a project, based on statistics (MLR) and a NFS, which is a hybrid model based on a computational technique.MLR has been the dominating technique for software estimation in recent years (Jørgensen & Shepperd, 2007).

B. Evaluation criterion
For evaluating the different software effort estimation machine learning models is used the Magnitude of Relative Error (MRE) which is the most popular accuracy metric when compared effort estimation models (Wen et al., 2012), however MRE is an accuracy criterion known to lead to asymmetry (Shepperd & MacDonell, 2012), therefore, in this research the absolute residuals is used as suggested in (Shepperd & MacDonell, 2012).
The AR is defined as follows: The AR value is calculated for each observation i whose effort is estimated.The aggregation of ARs over multiple observations (N) can be achieved through its mean (MAR) as follows: The accuracy of an estimation technique is inversely proportional to the MAR (Shepperd & MacDonell, 2012).That is, a lower MAR indicates a more accurate estimate.

A. Multiple linear regression (MLR)
Accuracy of statistical regressions has frequently been used to be compared to other software estimation models (Wen et al., 2012;Jørgensen & Shepperd, 2007).The comparison against a statistical regression model is suggested because it should be built as the default model construction method (Kitchenham et al., 2007).

B. Fuzzy logic (FL)
FL was introduced by Zadeh in 1965.FL is the definition given to a mathematical system developed to model the brain of human curious way of processing and selecting words (Zadeh, 1965).The main motivation behind the creation of FL was the existence of imprecision in the measurement process.
FL represents models or knowledge using IF-THEN rules in the form of ''if X then Y''.
A fuzzy model is a modelling construct featuring two main properties (Zhiwei-Xu & Khoshgoftaar, 2004): (1) It operates at a level of linguistic terms (fuzzy sets that are sets whose elements have degrees of membership), and (2) it represents and processes uncertainty.
FL offers a particularly convenient way to generate a keen mapping between input and output spaces thanks to the natural expression of fuzzy rules (Zadeh, 2002).
In SDEE, two considerations justify the decision of implementing a fuzzy model: first, it is impossible to develop a precise mathematical model of the domain (Lewis, 2001); second, metrics only produce estimations of the real complexity.
There are two types of fuzzy inference system (FIS), these are: Mamdani (Mamdani, 1976) and Takagi-Sugeno (Takagi & Sugeno, 1983).The FIS that was used in this study is the proposed by Takagi-Sugeno (Takagi & Sugeno, 1983) once we did not find any study that used for the SDEE of small projects.The Mamdani (Mamdani, 1976) FIS expects the output MF to be fuzzy sets, whereas the Takagi-Sugeno-type system can be used to model any inference system in which the output is either linear or constant.In this research was used the constant output.
The rules in functional Takagi-Sugeno fuzzy systems have the form: Where f j () is a crisp function of the input variables, rather than a fuzzy proposition (Takagi & Sugeno, 1985).For a particular application, the effectiveness of the fuzzy system in most cases depends on the order of the function.

C. Neural networks (NN)
A NN, is a massively parallel, distributed system composed of simple processing units or artificial neurons that are interconnected to mimic a biological NN (Haykin, 1999).
Before a NN can be used, it has to undergo some training, which involves iteratively finding the appropriate weight values so that the network outputs the desired value for a given a set of input values.A number of training algorithms have been developed over the years, with Backpropagation being the most widely known (Haykin, 1999).After a NN has been trained, it is convenient to validate its performance using ideally a dataset different from the one used to train it.
In this research was used a combination of training algorithms between Backpropagation and least mean squares.
NN are used in SDEE due to its ability to learn from previous data.In addition, it has the ability to generalize from the training data set thus enabling it to produce acceptable result for previously unseen data (Su et al., 2007).
NN can model complex non-linear relationships and approximate any measurable function such that it is very useful in problems where there is a complex relationship between inputs and outputs (Aggarwal et al., 2005;Huang et al., 2007).

D. Hybrid systems
One of the most important capabilities of FL is to model the qualitative aspects of human by using the simple rules.The NN have some advantages such as its capability of learning and high computational power.As a result, it is possible to combine the advantages of NN and FL to make a better tool.
NFS is a fuzzy system augmented by NN to enhance some characteristics like flexibility and adaptability (Nauck, 1994;Nauck et al., 1997;Saliu, 2003).
The NN research started in the 1940s, and the FL research in the 1960s, but the neuro-fuzzy research area is relatively new (Jantzen, 1998).
The neuro-fuzzy hybrid systems may be divided into two major groups (Mitra & Hayashi, 2000): FNN and NFS.FNN is a NN equipped with the capability of handling fuzzy information.NFS is a fuzzy system combined with NN in order to enhance certain desirable characteristics (Nauck, 1994;Nauck et al., 1997;Saliu, 2003).This research is based on the second approach.
A NFS can be viewed as a special three layer NN (Nauck et al., 1997).The first layer represents input variables; the hidden layer represents fuzzy rules and the third layer represents output variables.
The first integrated hybrid NFS is ANFIS; it has lowest Root Mean Square Error (RMSE) among other NFS like the ARX model.Therefore, ANFIS was used here for implement Takagi-Sugeno NFS.By MATLAB, the ANFIS structure with (a) type: Sugeno FIS, (b) and method: prod, (c) or method: probor, (d) implication Method: min, (e) aggregation Method: max and (f) defuzzfication: Wtaver (weighted average) was implemented and its architecture is shown in Fig. 1 (Abraham, 2005).
The functioning of each layer is as follows (Abraham, 2005): Layer-1 (input layer): No computation is done in this layer.Each node in this layer, which corresponds to one input variable, only transmits input values to the next layer directly.The link weight in layer 1 is unity.
Layer-2 (fuzzification layer): Each node in this layer corresponds to one linguistic label to one of the input variables in layer 1.In other words, the output link represents the membership value, which specifies the degree to which an input value belongs to a fuzzy set, is calculated in layer 2. A clustering algorithm will decide the initial number and type of MF to be allocated to each of the input variable.The final shapes of the MFs will be finetuned during network learning.
Layer-3 (rule antecedent layer): A node in this layer represents the antecedent part of a rule.Usually a T-norm operator is used in this node.The output of a layer 3 node represents the firing strength of the corresponding fuzzy rule.
Layer-4 (rule strength normalization): Every node in this layer calculates the ratio of the firing strength of the i-th rule to the sum of all rules firing strength.
Layer-5 (rule consequent layer): Every node i in this layer is with a node function.
̅    =   ̅̅̅ (   1 +    2 +   ) (4) Where  ̅  is the output of layer 4, and {  ,   ,   } is the parameters set.A well-established way is to determine the consequent parameters using the least means squares algorithm.
Layer-6 (rule inference layer): The single node in this layer computes the overall output as the summation of all incoming signals.

A. Data sample description
The comparative study carried out here was based on the empirical study done by (Lopez-Martin et al., 2005;Marza et al., 2008;Garcia-Diaz et al., 2015).The development time of forty-one modules and for each module, coupling (Dhama), complexity (McCabe), and lines of code metrics were registered, all programs were written in Pascal, hence, module categories belong to procedures or functions.
The development time of each of the forty-one modules were registered including five phases: requirements understanding, algorithm design, coding, compiling and testing (Lopez-Martin et al., 2005).
The statistics and a brief description related to each module are described by (Lopez-Martin et al., 2005).

B. Multiple linear regression
Using the data described in (Lopez-Martin et al., 2005;Marza et al., 2008;Garcia-Diaz et al., 2015)  The equation 5 describes the relationship between the dependent variable (Time) and the independent variables (McCabe complexity, Dhama coupling as well as lines of code).
An acceptable value for the coefficient of determination is r 2 ≥ 0.5 (Humphrey, 1995).In this case, the r 2 of equation 5 was 0.7223.The ANOVA for this equation had a statistically significant relationship between the variables at the 95% confidence level.

C. Neuro-fuzzy system (NFS)
In according to (Lopez-Martin et al., 2008) a Gaussian MF have two parameters, one of them (k) determines the curve shape and the other one (m) corresponds to curve central position.Their scalar parameters (k, m) are defined as follows: Table 1 shows the final parameters of the MF of input variable.The NFS parameters are obtained by the Grid Partitioning method.Grid partition divides the data space into rectangular subspaces using axis-paralleled partition based on predefined number of MF and their types in each dimension (Wei et al., 2007).According to (Wei et al., 2007) grid partition is only suitable for cases with small number of input variables (e.g. less than 6).
In Fig. 2a to 2c are shown the three input variables with its respective small, medium, big and very big MFs and its parameters with each one of the MFs from table 1.

RESULTS
Mean Magnitude of Relative Error (MMRE) assesses the validation results of estimation accuracy of the 41 projects in a previous work (Garcia-Diaz et al., 2015) whose results are shown in Table 2.
. Researchers aimed at (1) determining which technique had the greatest effort estimation accuracy, or (2) proposing new or combined techniques that could provide better estimates.SDEE techniques can be classified into two general categories (López-Martín, 2015): 1) Expert judgment: This technique implies a lack of analytical argumentation and aims at deriving estimates based on the experience of experts on similar projects; this technique is based on a tacit (intuitive) quantification step.
2) Model-based technique: It is based on a deliberate (mechanical) quantification step, and it can be divided into the following two subcategories: a) Models based on statistics: Its general form is a statistical regression model.b)

Table 1 .
Parameters of mf of input variables

Table 2 .
MRE of Each Technique (MLR: Multiple Linear Regression, NFS: Neuro-Fuzzy System, MMRE: Mean of Magnitude of Error Relative).

Table 3 .
MAR of each module (MLR: Multiple Linear Regression, NFS: Neuro-Fuzzy System, MAR: Mean of Absolute Residuals).