Evolution of the Perceptions about Tourist Destinations Affected by Risk Events Using a PANAS-tDL Deep Learning Model

Evolution of the Perceptions about Tourist Destinations Affected by Risk Events Using a PANAS-tDL The tourism industry has dynamized the economy of the countries by offering places, as well as related tourism experiences, products, and services. In the context of the COVID-19 pandemic, some of these tourist destinations were affected by subjective perceptions of users on social networks, within stands out Twitter. To achieve an objective perception from user comments posted on Twitter in front of a tourist destination, we propose a PANAS-tDL (Positive and Negative Affect Schedule - Deep Learning) model which integrates into a single structure a neural model inspired by a Stacked neural deep learning model (SDL), as well as the PANAS-t methodology. For this process, a database of comments was available for four destinations (Colombia, Italy, Spain, USA), and its tourist’s products and services, before and in the context of COVID-19 pandemic throughout the year 2020. The proposed model made it possible to generate objective perceptions of the tourist destinations and their products and services using an automatic classification of comments in each category defined by the PANAS-t methodology (11-sentiments). The results show how users’ perceptions were towards the negative sentiment zone defined by this methodology, according to the evolution of the COVID-19 pandemic worldwide throughout the year 2020. The proposed model also integrated an automatic process of normalisation, lemmatisation and tokenisation (Natural language process - NLP) for the objective characterization of perceptions, and due to its capacity for adaption and learning, it can be extended for the evaluation of new tourist destinations, products or services using comments from different social networks.


INTRODUCTION
Tourism is an industry that has energized the economy of the countries by generating a large number of jobs to promote places, experiences, and their tourism products and services. Global dynamics have led governments worldwide to establish favourable environments for the development of this industry, however, in the context of COVID-19 pandemic, many tourist destinations have been significantly affected. Social networks such as Twitter, Facebook, and Instagram have played a major role in terms of communication, because these social networks grouped comments, news, and opinions. However, the real impact of a risk event such as COVID-19 on a tourist destination still requires the objective characterisation of perceptions from comments posted by users on social networks (Goncalves et al., 2013).
For the solution of this problem, three well-defined development trends can be identified in the scientific literature. The first trend focuses on the analysis of aspects related to access to tourism destinations using different data sources. In this trend, Xue and Zhang (2020) identify behavioural patterns in tourist places according to the distance, in Li et al. (2020), the authors create a model to forecast how the potential factors as weather conditions, holidays, seasonality and economics affect the decisionmaking in the tourism sector, and Sarkar et al. (2020) propose a MULTITOUR recommendation engine to recommend multiple itineraries based on the tourist's interest in a destination or place. Two additional papers show how a tourist experience can be influenced by the interaction tourist to tourist (Lin et al., 2019), and a final paper shows how the cultural differences can help a customer have a better experience (Jia, 2020;Piteira et al., 2018). This trend clearly shows the importance of the data in decision making, specifically to improve the experiences of the tourists in a destination considering different potential factors (Chau and Yan, 2021;Zha et al., 2021). However, this trend does not allow for the creation of objective perception of the aspects that characterise a tourist destination.
A second development trend focuses on sentimental analysis for the evaluation of destinations. In this trend, Liu et al., (2019) apply the sentimental analysis for the characterization of destinations in Australia, Nie et al. (2020) shows a novel model for hotel selection driven by online textual reviews based on TripAdvisor webpage expert in tourism, and Sharma et al. (2020) shows the variation of sentiments generated for the visitants against the same tourist service. In this trend, another paper presents a model to rank tourist sites of a city based on sentiments from comments posted in social networks (Bueno et al., 2019), while a final paper proposes a novel methodology to identify the polarity user, which a priory may not be representative of sentiment or comment against a tourist attraction (Valdivia et al., 2019). This development trend clearly shows the importance of social networks and sentimental analysis in the characterization of comments against tourist services and attractions, and as a way to reduce subjectivity in decision-making in tourism (Alaei et al., 2019;Gómez et al., 2018;González-Rodríguez et al., 2016).
A third development trend groups a series of papers where Deep Learning for sentiment analysis stands out. Within this trend, Chang et al. (2020) presents a series of Deep Learning models (DLMs) to analyse hotel reviews to identify response strategies to comments of users, Zhang et al. (2019) uses different tools from the big data technology based on DLMs to discover the tourist's behaviours and perceptions on a tourism destination, while Hu et al. (2020) explores distinct relationships between importance, performance, and the asymmetric impact of service attributes on customer satisfaction (CS). Another paper shows the use of DLMs for the monthly forecast of tourist arrivals in Macau (Law et al., 2019), while a final paper presents a machine learning approach for the identification of the deceptive reviews in the hospitality sector using unique attributes and sentiment orientation (Martinez-Torres and Toral, 2019;Salazar et al., 2020). It is important to emphasize the relevance that the DLMs have gained in the analysis of sentiments in social networks, specifically to identify behaviours, opinions and perceptions to improve services in tourist destinations and attractions (Abdi et al., 2019;Araque et al., 2017).
In the scientific literature, it can be seen the absence of machine learning models that allow characterizing objectively the perceptions of tourists from comments in social networks in front of destinations, places and tourist attractions, and specifically to the impact that a risk event such as the COVID-19 pandemic generates. According to second and third development trends, and to achieve an objective perception of user comments in a social network like Twitter in the face of a risk event such as COVID-19 and its impact on different destinations and their associated services and products, a PANAS-tDL (Positive and Negative Affect Schedule -Deep Learning) model was proposed. The novel model integrates into a single structure a neural model inspired by a Stacked Deep Learning model (Fischer and Krauss, 2018; Ravi et al., 2018), and the PANAS-t methodology (Positive and Negative Affect Schedule) (Heubeck and Wilkinson, 2019). The Fully Connected Layer (FCL) was defined by a Log-logistic cumulative distribution function (CDF -Softmax function) to carried out an automatic classification of comments in Positive (PA) and Negative Affects (NA) categories, as well as in the following 11-sentiments categories defined by the PANAS-t scale: Guilt, Fear, Sadness, Hostility, Shyness, Fatigue, Surprise, Joviality, Self-assurance, Attentiveness and Serenity. According to the structure of the activation function, this scale goes from Guilty (1) to Serenity (11) (Goncalves et al., 2013;Nguyen et al., 2019).
For the analysis and validation of the PANAS-tDL model, a database composed of a total of 93; 634 comments on Twitter regarding four tourist destinations (Colombia, Italy, Spain and USA), and their associated products and services was available. The comments were selected before (46,817 -50%) and during the evolution of COVID-19 pandemic throughout the year 2020 (46,817 -50%). Each comment is composed of 5-words (5-emojis) and their multiples (mix of words-emojis). Each comment was subjected an automatic NLP process (Natural language processing) of normalisation, lemmatisation and tokenisation NTL (2020), as well as an automatic characterization process according to the PANAS-t scale (Branovački et al., 2020). In a first stage (learning stage), the proposed PANAS-tDL model was configured for a total of 46,817 (50%) random comments, which grouped into positive (PA) and negative (NA) categories, as well as in 11-sentiment categories, to create objective perceptions from comments according to the PANAS-t methodology. In a second stage (Generalization Stage), the PANAS-tDL model was evaluated using the comments obtained during the COVID-19 pandemic (46,817(50%)) in the absence of an adaption and learning process. The results given by the model in the first stage, show that the model reached compression rates above 80% on average for the configuration of stacked neuron layers using an auto-encoder strategy (Chang et al., 2020). In this same stage, the model also reached sensitivity and specificity indices above 85% on average against the classification of comments as PA or NA, while for the objective characterization of perceptions, the model reached IOAs above 95% on average taking as reference the Log-logistic function that defines the FCL. The results in the second stage, show that the model was able to identify the effect of a S risk event (COVID-19 pandemic), shifting the perceptions to NA areas that define the PANAS-t structure. This shows in general the good behaviour of the PANAS-tDL model concerning the objective characterization of perceptions, and the impact that the COVID-19 generates on destinations, and its associated products and services from comments posted in a social network such as Twitter.
This paper is made up of three sections. A first section describes the general methodology for the design and development of the proposed PANAS-tDL model, the construction of database of comments on Twitter, as well as the metrics for the performance evaluation of the model. A second section shows the analysis and discussion of results according to the objective change in perception of comments on Twitter that the COVID-19 pandemic has caused in four tourist destinations based on the PANAS-t methodology. A final section shows a series of conclusions and the future work that can be developed according to the flexible structure that defines the proposed model.

METHODOLOGY
Social networks have become reference tools for the analysis of people's perception of products and services in the economy. Regarding tourism, the Twitter social network is being used more and more by the tourists to explore potential tourist destinations to visit. From this information, the tourists create their own opinions about costumes, political stability and in general, about the security of a country. However, social networks do not allow creating an objective perception and impact of that a risk event (COVID-19 pandemic) generates on a tourist destination and their products and services, due to the large amount of subjectivity contained in the comments posted there. To solve this problem, we propose the following methodology.

PANAS-x Scale
The Positive and Negative Affect Schedule (PANAS) consists of two 10-item scales to provide measures of Positive Affect (PA) and Negative Affect (NA) against a particular emotion. In this methodology, a respondent is asked to rate a particular experience (usually during last week), taking as a reference a 5-point scale per item. Each item is defined by a series of words related a lexicon. Meanwhile, the PANAS-x scale, not only rates a particular emotion in NA or PA, but also rates an emotion in 11 specific sentiments such as Guilt, Fear, Sadness, Hostility, Shyness, Fatigue, Surprise, Joviality, Self-Assurance, Attentiveness, and Serenity. Goncalves et al. (2013) also summarize the common words used to describe each sentiment according to PANAS scale, the NA category groups words like nervous and scared, while the PA category groups words such as enthusiastic and excited. These words describe clearly the state of mind of a person against a feeling. Unlike the POMS scale (Profiles of Mood States), which establishes six different dimensions of mood swings (12-sentiments), the PANAS-x methodology had exhibited minor values of correlation among the 11-sentiments scale, what was brought about a better characterization of sentiments.

Adjusting PANAS-x for Tourism (PANAS-t -Case of Study)
According to the PANAS-x methodology, we proceed to create a database consisting of a total of 93,634 comments related with four tourist destinations (Colombia, Italy, Spain, USA), and their products and services obtained from a social network like Twitter throughout 2020 using a web-scraping methodology. Each comment or Tweet was defined by a total of 5-words (or mixed emojis) and was classified as Negative (NA) and Positive (PA) categories according to a polarization process in the interval [-1,1], taking as reference the 11sentiments categories that defines the PANAS-t methodology. The comments were normalized, lemmatized, and tokenized (NLP Processing) to create objective perceptions (Tellez et al., 2017). The comments were also grouped before (Baseline scenario) and during the worldwide COVID-19 pandemic (Autonomy scenario). The Baseline scenario of comments is defined as shown in Table 1.
Where: : Represents the sentiments or categories (PANAS -t). : Tweets for each category. : Total tweets that make up the database for before the COVID-19 pandemic (Baseline case).
These comments were later grouped before (46,817 -50%) and during the evolution of COVID-19 pandemic in the world (46,817 -50%). For the configuration by adaption and learning of the proposed model (First Stage -Learning Stage), a total of 46,817 (50%) random comments were selected from the total of comments, of which 32,772 (70%) were for learning, and 14,045 (30%) were for validation. For the evaluation of the proposed model, all comments were also grouped into positive (PA) and negative (NA) categories, as well as in the 11sentiment categories that defines the PANAS-t methodology. In a second state (Generalization Stage), a total of 46,817 (50%) random comments (COVID-19 context) were selected, to evaluate the sensitivity of the model.

PANAS-t Deep Learning Model (PANAS-tDL)
To achieve an objective characterization of the impact that risk event (COVID-19) generates on a tourist destination, and its associated products and services from comments in Twitter, we proceeded with the design of a model by adaptation and learning, which integrates in a single structure the PANAS-t methodology and a neuronal model inspired by a Staked Deep Learning model which is denoted and defined (González-Rodríguez et al., 2016;Zhai and Chen, 2018): Where: , : Represents the fully connected layer -FCL. : Number of neurons for the layer. This value represents the number of classification categories according to PANAS-t methodology ( Table 2).
, −1 : Represents the relationships among the and −1 layers. 1 , , i: Represents the input array or commentary on Twitter. : Represents the − (or emoji) for the − or comment.
Due to its flexibility, and according to the categories defined by the PANAS-t methodology, the proposed model integrates a Softmax strategy based on a log-logistic cumulative distribution function, which is denoted and defined (Borja et al., 2020;Fischer and Krauss, 2018): Where: : Pasticity factor, : Stability factor. The loglogistic Softmax function was configured according to the structure of comments in agree with the PANAS-t scale shown in Table 2.

Metrics
The metrics that were considered for the analysis and evaluation of the proposed model are described below.

Index of compression
The Index of compression (IC) shows the capacity that a layer of neurons has for the compression of information because of an auto-encoder strategy. The IC can be defined according to the Index of agreement as follows (IOA): Where: : Index of compression or Index of agreement (IOA). : Standard deviation for the input vector 1 . . . : Standard deviation for the output vector 1 , , .

Confusion matrix
The confusion matrix is known as the matrix error and is used to evaluate a model against the classification of comments in a Positive (PA) and a Negative (NA) category defined by the PANAS methodology (Table 3).
Where: TPR: True positive rate.
Where: TNR: True negative rate.

Radar chart
The radar chart is known the chart of multidimensional classification and allows to evaluate a model against the classification of comments according to the categories defined by the PANAS-t methodology (11-sentiments) and the loglogistic Softmax function (FCL: Guilt (-1) to Serenity (1)) (Table 1, Figure 1).

Relative evaluation
Let the set of tweets for a particular event of risk (e.g. COVID-19, natural disasters, political events, etc.) and the subset of these tweets related to sentiment. : Represents the relative occurrence of sentiment for event . The can be expressed as follows (Goncalves et al., 2013):

Dimensional sentiment
Let categories that defines the PANAS-t scale, the Score Function ( ) can be expressed as follows: The values for ( ) is defined by the interval [−1,1] for each sentiment. ( ) = 0 means that the event has no increase or decrease for the sentiment in comparison with the database, and for a particular risk event . A ( ) ≥ 0 represents an increasing for a sentiment, while a ( ) ≤ 0 represent a decreasing for a sentiment.

Skewness index
The Skewness Index (SI) allows the evaluation of the flexibility of the cumulative log-logistic function (CDF) against the classification of comments, according to PANAS-tDL methodology. Statistically, the SI shows the location of the data around the mean of a probability distribution. The SI index can be expressed: Where: For Cumulative Distribution Function, the SI can be zero (CDF's -Centered), Negative (CDF's -Heavy Tails) and Positive (CDF's -Long tails) (Pena et al., 2018a(Pena et al., , 2018b.

Experimental validation
For the analysis and validation of the proposed PANAS-tDL model, we analyzed a total of 93,634 comments on Twitter regarding four tourist destinations (Colombia, Italy, Spain and USA), and their associated products and services. The comments were selected before (46,817 -50%) and during the evolution of COVID-19 pandemic in the world (46,817 -50%). It is important to note that each comment is made up of 5words (e.g. 5-words or 5-emojis) and their multiples (e.g. or mixing wordsemojis), and that each of them was subjected to an automatic NLP process (Natural language processing) of normalisation, lemmatisation and tokenisation (NLTK 3.5 documentation, 2020), in order to achieve an objective perception of each comment according to PANAS-t methodology.
In a first stage (learning stage), the proposed PANAS-tDL model was configured for a total of 46,817 (50%) random comments, of which 32,772 (70%) were for learning, and 14,045 (30%) were for validation. For the evaluation of the PANAS-tDL model, the comments for this stage were grouped into positive (PA) and negative (NA) categories, as well as in the 11-sentiment categories defined by the PANAS-t methodology (Branovački et al., 2020). At this stage, the PANAS-tDL is expected to reach values above 75% on average against the PA and NA classification taking as reference the Confusion Matrix (Eqs. 4 & 5), as well as IOAs above 90% on average for the cumulative log-logistic function (CDF) integrated into FCL (e.g., Learning stage). In this same stage, the structure of the stacked layers will be evaluated taking as reference the first layer of neurons ( 1 ), so the model is expected to reach compression rates (IC) close to 90% because of an auto-encoder learning strategy.
In a second stage (Generalization Stage), the PANAS-tDL model was evaluated in the absence of an adaption and learning process for a total of 46,817 (50%) random comments (COVID-19 context) based on the relative evaluation metric (Eq. 6). To assess the sensitivity of the model to change in perceptions due to the evolution of COVID-19 pandemic in the world, the all comments were grouped in four worldwide tourist destinations: Colombia, Spain, Italy and the USA, and also were selected taking as reference the risk event . In this stage, the proposed model is expected to evolve into the negative zone (NA) of the Radar Chart due mainly to the impact of the COVID-19 pandemic ( ( ) ≥ 0). This movement will make it possible to evaluate the sensitivity of the PANAS-tDL model to the characterization of objective perceptions derived from a risk event and will also make it possible to evaluate how the COVID-19 has impacted tourist destinations, as well as their associated products and services. Regarding the structure and shape of the activation function that defines the FCL, the proposed model is expected to evolve toward CDFs slenderer with log tail structures ( ( ) ≥ 0), with SIs greater than those exhibited by Baseline scenario for each tourist destination.

ANALYSIS AND DISCUSSION OF RESULTS
In Figure 2, it can be seen the normalized values for the configuration of the structure for the PANAS-tDL model: Number of Layers ( ), Number of neurons for the first layer (NO), Compression Index (IC Index), Limit of Compression (vertical line). Figure 2 and Table 4 show the last value achieved by the PANAS-tDL model about IC index, Figure 2. PANAS-tDL model structure which was above 85%. Despite the number of layers that make up the internal structure of the proposed model, and the number of neurons for the first layer was increased, the IC index did not change approximately from 8-layers () and for 800-neurons as show the IC curve (Green line -IC Index). Table 5 shows that the PANAS-tDL model reached classification percentages above 85% on average for the sensitivity (Eq.5) and for specificity (Eq. 6) against the classification of comments as PA and NA categories (Learning Stage -Confusion Matrix). The above shows in an early stage the good behaviour of the proposed model according to the proposed Softmax function for a = 1 (Eq. 3).

Figure 3
shows that the proposed PANAS-tDL model reached IOAs close to 98% on average against the classification of comments according to the PANAS-t methodology (First Stage -Phase 1). This good behaviour could be evidenced through the structure and shape of radar charts (Figure 4), as well as through the structure and shape of the Softmax function that integrates the FCL, which reached SIs close to zero on average (SI = 0.290816 -Orange Line), and which were very close to the SI that define the Baseline scenario for a tourist destination like Spain (SI = 0:589903 -Blue Line). Figure 5 shows the CDF given by the proposed PANAS-tDL model (Orange line -SI = 0.750858) in the validation phase (First Stage -Phase 2), taking as reference the CDF that represents the Baseline scenario (Blue Line -SI = 0.617066462). Here, the model reach IOAs above to 95% on average in the multidimensional characterization of sentiments in absence of a learning process for a tourist destination like Spain. According to the SI reached by the proposed PANAS-tDL model, it was slightly above the SI of reference for this stage, which indicates that the model tends to classify NA comments better.     with regard the Baseline scenario for Colombia (Blue line -SI = 0.4707791). The increase of SI generates a shift in the comments towards the NA zone of the radar chart, evidencing the impact of COVID-19 pandemic on this tourist destination and its associated products and services. This behaviour was also clearly evident for Italy (Figure 7), where the COVID-19 comments were located toward the NA zone of the radar chart (Right Side -Orange line), standing out in the comments sentiments like Fear, Sadness and Hostility, which clearly shows the impact of the pandemic on this tourism destination (Gonzalez-Ruiz et al., 2019;Pena et al., 2018aPena et al., , 2018b. Table 6 shows the values reached by the proposed model for P(s). Here, you can see how the perception of tourists evolved into NA comments for a destination like USA, showing a change of perception taking as reference the Eqs. 6 & 7 & 8. This fact corroborates once again the good performance of the PANAS-tDL in absence of a learning process against the impact generated by a risk event such as COVID-19, becoming a tool for the characterization of objective perceptions on tourist destination, products and services based on comments obtained from a social network like Twitter.

CONCLUSIONS AND FUTURE WORK
The proposed model made it possible to objectively evaluate the impact that a risk event such as COVID-19 generates on a tourist destination and its associated products and services. To achieve this characterisation, the model integrated in a single structure an automatic NL process, as well as a neuronal model with a deep learning structure for the classification of comments according to the structure defined by the PANAS-t methodology. This resulted in a general methodology for the objective evaluation of the impact that a risk event generates on a tourism destination, and its associated services and products based on comments posted by users in a social network like Twitter.
The impact that the COVID-19 has had on the perception of users of a social network such as Twitter in relation to different tourist destinations, can be seen through the evolution experienced by the activation function integrated to FCL, when it was evaluated in the absence of an adaptation and learning process in the presence of a series of comments taken in the context of a risk event generated by the COVID-19 pandemic. Here, it was possible to observe how this activation function evolved towards increasingly higher SIs, with increasingly slender CDFs, or by the displacement of comments towards the NA zone defined by the PANAS-t methodology.
Due to its adaptation and learning capacity, the proposed PANAS-tDL model can be extended for the evaluation of the impacts that a risk event generates on a tourist destination from the the automatic characterisation of users' perceptions on different social networks, integrating not only comments but also elements such as experience vacations, climatic conditions, economic factors and security. In this way, the PANAS-tDL model can also be extended to evaluate the people's perception against the creation of new products, to identify the target audience for a tourist service, or to promote a specific tourist destinations.
As future work, the authors propose to extend the PANAS-tDL model for the identification of sarcasm, irony and fake news in comments, in order to avoid misperceptions about tourist destinations, products and services. The authors also propose the validation of the model by means of a technical and financial analysis, in order to more clearly measure the impact of risk events relate to political decisions, weather conditions or exchange-rates can have on a tourist destination based on user comments posted on a specific social network, and integrating fuzzy logic concepts (operational risk).