Journal of Intelligence and Information Systems,
Vol. 24, No. 4, December 2018
Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM
Sungjae Cha, and Jungseok Kang
Vol. 24, No. 4, Page: 1 ~ 32
Keywords : Optimal Feature Selection, Lasso Regression, Deep Learning Time Series Algorithm, Corporate Bankruptcy, RNN, LSTM
In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'.
A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment.
The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman’s (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski’s (1984) and Ohlson’s (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change.
Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction.
The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared.
In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data.
Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society.
Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.
Prediction of a hit drama with a pattern analysis on early viewing ratings
Kihwan Nam, and Nohyoon Seong
Vol. 24, No. 4, Page: 33 ~ 49
Keywords : Similarity, Viewing-time pattern, prediction of blockbuster drama, Nearest Neighbor
The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry.
Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies.
In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step.
The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer’s initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the “passionate viewer” when a new drama is broadcasted. Then we compared the drama’s passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis.
This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations.
Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.
Multi-Category Sentiment Analysis for Social Opinion Related to Artificial Intelligence on Social Media
Sang Won Lee, Chang Wook Choi, Dong Sung Kim, Woon Young Yeo, and Jong Woo Kim
Vol. 24, No. 4, Page: 51 ~ 66
Keywords : Artificial Intelligence, Sentimental Analysis, Social Opinion, Online News, Online comments
As AI (Artificial Intelligence) technologies have been swiftly evolved, a lot of products and services are under development in various fields for better users’ experience. On this technology advance, negative effects of AI technologies also have been discussed actively while there exists positive expectation on them at the same time. For instance, many social issues such as trolley dilemma and system security issues are being debated, whereas autonomous vehicles based on artificial intelligence have had attention in terms of stability increase. Therefore, it needs to check and analyse major social issues on artificial intelligence for their development and societal acceptance. In this paper, multi-categorical sentiment analysis is conducted over online public opinion on artificial intelligence after identifying the trending topics related to artificial intelligence for two years from January 2016 to December 2017, which include the event, match between Lee Sedol and AlphaGo. Using the largest web portal in South Korea, online news, news headlines and news comments were crawled. Considering the importance of trending topics, online public opinion was analysed into seven multiple sentimental categories comprised of anger, dislike, fear, happiness, neutrality, sadness, and surprise by topics, not only two simple positive or negative sentiment. As a result, it was found that the top sentiment is “happiness” in most events and yet sentiments on each keyword are different. In addition, when the research period was divided into four periods, the first half of 2016, the second half of the year, the first half of 2017, and the second half of the year, it is confirmed that the sentiment of 'anger' decreases as goes by time. Based on the results of this analysis, it is possible to grasp various topics and trends currently discussed on artificial intelligence, and it can be used to prepare countermeasures. We hope that we can improve to measure public opinion more precisely in the future by integrating empathy level of news comments.
Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning
Junga Song, Keunho Choi, and Gunwoo Kim
Vol. 24, No. 4, Page: 67 ~ 83
Keywords : Movie, Box Office, Box Office Revenue, Box Office Factors, Prediction of Box Office, Predicting Number of Audience, Machine Learning
The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film.
In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.
Identifying Social Relationships using Text Analysis for Social Chatbots
Jeonghun Kim, and Ohbyung Kwon
Vol. 24, No. 4, Page: 85 ~ 110
Keywords : Social Chatbot, Relationship Awareness, Text Analysis, Privacy Protection
A chatbot is an interactive assistant that utilizes many communication modes: voice, images, video, or text. It is an artificial intelligence-based application that responds to users’ needs or solves problems during user-friendly conversation. However, the current version of the chatbot is focused on understanding and performing tasks requested by the user; its ability to generate personalized conversation suitable for relationship-building is limited.Recognizing the need to build a relationship and making suitable conversation is more important for social chatbots who require social skills similar to those of problem-solving chatbots like the intelligent personal assistant. The purpose of this study is to propose a text analysis method that evaluates relationships between chatbots and users based on content input by the user and adapted to the communication situation, enabling the chatbot to conduct suitable conversations. To evaluate the performance of this method, we examined learning and verified the results using actual SNS conversation records. The results of the analysis will aid in implementation of the social chatbot, as this method yields excellent results even when the private profile information of the user is excluded for privacy reasons.
Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion
Hyunseung Choi, Mintae Kim, Wooju Kim, Dongwook Shin, and Yong Hun Lee
Vol. 24, No. 4, Page: 111 ~ 136
Keywords : Information Extraction, Question Answering System, Machine Reading Comprehension, Bi-directional LSTM-CRF, Knowledge Base
In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for “subject-predicate“ separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result.
In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model.
The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data.
In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment.
The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer.
Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name.
Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually.
Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.
Animal Infectious Diseases Prevention through Big Data and Deep Learning
Sung Hyun Kim, Joon Ki Choi, Jae Seok Kim, Ah Reum Jang, Jae Ho Lee, Kyung Jin Cha, and Sang Won Lee
Vol. 24, No. 4, Page: 137 ~ 154
Keywords : Big Data, Deep Learning, Machine Learning, Animal Infectious Diseases, Evidence-based Policy-making
Animal infectious diseases, such as avian influenza and foot and mouth disease, occur almost every year and cause huge economic and social damage to the country. In order to prevent this, the anti-quarantine authorities have tried various human and material endeavors, but the infectious diseases have continued to occur. Avian influenza is known to be developed in 1878 and it rose as a national issue due to its high lethality. Food and mouth disease is considered as most critical animal infectious disease internationally. In a nation where this disease has not been spread, food and mouth disease is recognized as economic disease or political disease because it restricts international trade by making it complex to import processed and non-processed live stock, and also quarantine is costly. In a society where whole nation is connected by zone of life, there is no way to prevent the spread of infectious disease fully. Hence, there is a need to be aware of occurrence of the disease and to take action before it is distributed.
Epidemiological investigation on definite diagnosis target is implemented and measures are taken to prevent the spread of disease according to the investigation results, simultaneously with the confirmation of both human infectious disease and animal infectious disease. The foundation of epidemiological investigation is figuring out to where one has been, and whom he or she has met. In a data perspective, this can be defined as an action taken to predict the cause of disease outbreak, outbreak location, and future infection, by collecting and analyzing geographic data and relation data. Recently, an attempt has been made to develop a prediction model of infectious disease by using Big Data and deep learning technology, but there is no active research on model building studies and case reports. KT and the Ministry of Science and ICT have been carrying out big data projects since 2014 as part of national R &D projects to analyze and predict the route of livestock related vehicles. To prevent animal infectious diseases, the researchers first developed a prediction model based on a regression analysis using vehicle movement data. After that, more accurate prediction model was constructed using machine learning algorithms such as Logistic Regression, Lasso, Support Vector Machine and Random Forest. In particular, the prediction model for 2017 added the risk of diffusion to the facilities, and the performance of the model was improved by considering the hyper-parameters of the modeling in various ways. Confusion Matrix and ROC Curve show that the model constructed in 2017 is superior to the machine learning model. The difference between the2016 model and the 2017 model is that visiting information on facilities such as feed factory and slaughter house, and information on bird livestock, which was limited to chicken and duck but now expanded to goose and quail, has been used for analysis in the later model. In addition, an explanation of the results was added to help the authorities in making decisions and to establish a basis for persuading stakeholders in 2017. This study reports an animal infectious disease prevention system which is constructed on the basis of hazardous vehicle movement, farm and environment Big Data. The significance of this study is that it describes the evolution process of the prediction model using Big Data which is used in the field and the model is expected to be more complete if the form of viruses is put into consideration. This will contribute to data utilization and analysis model development in related field. In addition, we expect that the system constructed in this study will provide more preventive and effective prevention.
Predicting Corporate Bankruptcy using Simulated Annealing-based Random Forests
Hoyeon Park, and Kyoung-jae Kim
Vol. 24, No. 4, Page: 155 ~ 170
Keywords : Simulated Annealing, Random Forests, Bankruptcy Prediction, Feature Selection, Business Analytics
Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.
Individual Thinking Style leads its Emotional Perception: Development of Web-style Design Evaluation Model and Recommendation Algorithm Depending on Consumer Regulatory Focus
Keon-Woo Kim, and Do-Hyung Park
Vol. 24, No. 4, Page: 171 ~ 196
Keywords : Regulatory Focus, Emotion, Web-Style, Recommendation System, Thinking Style
With the development of the web, two-way communication and evaluation became possible and marketing paradigms shifted. In order to meet the needs of consumers, web design trends are continuously responding to consumer feedback. As the web becomes more and more important, both academics and businesses are studying consumer emotions and satisfaction on the web. However, some consumer characteristics are not well considered. Demographic characteristics such as age and sex have been studied extensively, but few studies consider psychological characteristics such as regulatory focus (i.e., emotional regulation). In this study, we analyze the effect of web style on consumer emotion. Many studies analyze the relationship between the web and regulatory focus, but most concentrate on the purpose of web use, particularly motivation and information search, rather than on web style and design. The web communicates with users through visual elements. Because the human brain is influenced by all five senses, both design factors and emotional responses are important in the web environment. Therefore, in this study, we examine the relationship between consumer emotion and satisfaction and web style and design. Previous studies have considered the effects of web layout, structure, and color on emotions. In this study, however, we excluded these web components, in contrast to earlier studies, and analyzed the relationship between consumer satisfaction and emotional indexes of web-style only. To perform this analysis, we collected consumer surveys presenting 40 web style themes to 204 consumers. Each consumer evaluated four themes. The emotional adjectives evaluated by consumers were composed of 18 contrast pairs, and the upper emotional indexes were extracted through factor analysis. The emotional indexes were ‘softness,’ ‘modernity,’ ‘clearness,’ and ‘jam.’ Hypotheses were established based on the assumption that emotional indexes have different effects on consumer satisfaction. After the analysis, hypotheses 1, 2, and 3 were accepted and hypothesis 4 was rejected. While hypothesis 4 was rejected, its effect on consumer satisfaction was negative, not positive. This means that emotional indexes such as ‘softness,’ ‘modernity,’ and ‘clearness’ have a positive effect on consumer satisfaction. In other words, consumers prefer emotions that are soft, emotional, natural, rounded, dynamic, modern, elaborate, unique, bright, pure, and clear. ‘Jam’ has a negative effect on consumer satisfaction. It means, consumer prefer the emotion which is empty, plain, and simple. Regulatory focus shows differences in motivation and propensity in various domains. It is important to consider organizational behavior and decision making according to the regulatory focus tendency, and it affects not only political, cultural, ethical judgments and behavior but also broad psychological problems.
Regulatory focus also differs from emotional response. Promotion focus responds more strongly to positive emotional responses. On the other hand, prevention focus has a strong response to negative emotions. Web style is a type of service, and consumer satisfaction is affected not only by cognitive evaluation but also by emotion. This emotional response depends on whether the consumer will benefit or harm himself.
Therefore, it is necessary to confirm the difference of the consumer 's emotional response according to the regulatory focus which is one of the characteristics and viewpoint of the consumers about the web style.
After MMR analysis result, hypothesis 5.3 was accepted, and hypothesis 5.4 was rejected. But hypothesis 5.4 supported in the opposite direction to the hypothesis. After validation, we confirmed the mechanism of emotional response according to the tendency of regulatory focus. Using the results, we developed the structure of web-style recommendation system and recommend methods through regulatory focus. We classified the regulatory focus group in to three categories that promotion, grey, prevention. Then, we suggest web-style recommend method along the group. If we further develop this study, we expect that the existing regulatory focus theory can be extended not only to the motivational part but also to the emotional behavioral response according to the regulatory focus tendency. Moreover, we believe that it is possible to recommend web-style according to regulatory focus and emotional desire which consumers most prefer.
Development of Music Recommendation System based on Customer Sentiment Analysis
Seungjun Lee, Bong-Goon Seo, and Do-Hyung Park
Vol. 24, No. 4, Page: 197 ~ 217
Keywords : Music Recommendation Algorithm, Recommendation System, Sentiment Analysis, Customer Sentiment, Audio Fingerprint, Recommendation algorithm, Folksonomy
Music is one of the most creative act that can express human sentiment with sound. Also, since music invoke people’s sentiment to get empathized with it easily, it can either encourage or discourage people’s sentiment with music what they are listening. Thus, sentiment is the primary factor when it comes to searching or recommending music to people. Regard to the music recommendation system, there are still lack of recommendation systems that are based on customer sentiment. An algorithm’s that were used in previous music recommendation systems are mostly user based, for example, user’s play history and playlists etc. Based on play history or playlists between multiple users, distance between music were calculated refer to basic information such as genre, singer, beat etc. It can filter out similar music to the users as a recommendation system. However those methodology have limitations like filter bubble. For example, if user listen to rock music only, it would be hard to get hip-hop or R&B music which have similar sentiment as a recommendation. In this study, we have focused on sentiment of music itself, and finally developed methodology of defining new index for music recommendation system. Concretely, we are proposing “SWEMS” index and using this index, we also extracted “Sentiment Pattern” for each music which was used for this research. Using this “SWEMS” index and “Sentiment Pattern”, we expect that it can be used for a variety of purposes not only the music recommendation system but also as an algorithm which used for buildup predicting model etc.
In this study, we had to develop the music recommendation system based on emotional adjectives which people generally feel when they listening to music. For that reason, it was necessary to collect a large amount of emotional adjectives as we can. Emotional adjectives were collected via previous study which is related to them. Also more emotional adjectives has collected via social metrics and qualitative interview. Finally, we could collect 134 individual adjectives. Through several steps, the collected adjectives were selected as the final 60 adjectives. Based on the final adjectives, music survey has taken as each item to evaluated the sentiment of a song. Surveys were taken by expert panels who like to listen to music.
During the survey, all survey questions were based on emotional adjectives, no other information were collected. The music which evaluated from the previous step is divided into popular and unpopular songs, and the most relevant variables were derived from the popularity of music. The derived variables were reclassified through factor analysis and assigned a weight to the adjectives which belongs to the factor.
We define the extracted factors as “SWEMS” index, which describes sentiment score of music in numeric value. In this study, we attempted to apply Case Based Reasoning method to implement an algorithm.
Compare to other methodology, we used Case Based Reasoning because it shows similar problem solving method as what human do. Using “SWEMS” index of each music, an algorithm will be implemented based on the Euclidean distance to recommend a song similar to the emotion value which given by the factor for each music. Also, using “SWEMS” index, we can also draw “Sentiment Pattern” for each song. In this study, we found that the song which gives a similar emotion shows similar “Sentiment Pattern” each other. Through “Sentiment Pattern”, we could also suggest a new group of music, which is different from the previous format of genre. This research would help people to quantify qualitative data. Also the algorithms can be used to quantify the content itself, which would help users to search the similar content more quickly.

Advanced Search
Date Range