DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
< Previous   List   Next >  
Improving Performance of Recommendation Systems Using Topic Modeling
Full-text Download
Seongi Choi (Graduate School of Business IT, Kookmin University)
Yoonjin Hyun (Graduate School of Business IT, Kookmin University)
Namgyu Kim (School of Management Information Systems, Kookmin University)
Vol. 21, No. 3, Page: 103 ~ 118
10.13088/jiis.2015.21.3.103
Keywords
Big Data Analysis, Data Mining, Recommendation Systems, Text Mining, Topic Modeling
Abstract
Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences.<br /> There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category.<br /> In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.
Show/Hide Detailed Information in Korean
사용자 관심 이슈 분석을 통한추천시스템 성능 향상 방안
최성이 (국민대학교 비즈니스IT전문대학원)
현윤진 (국민대학교 비즈니스IT전문대학원)
김남규 (국민대학교 경영대학 경영정보학부)
Keywords
데이터 마이닝, 빅데이터 분석, 추천시스템, 텍스트 마이닝, 토픽 분석
Abstract
많은 기관들이 데이터에 기반을 둔 의사결정을 수행해 왔으며, 특히 수치자료를 비롯한 정형 데이터가 이러한 목적으로 널리 활용되어 왔다. 하지만 최근에는 스마트기기와 소셜미디어의 발달로 인해 다양한 형태를 가진 방대한 양의 정보가 생성, 공유, 저장되면서, 전통적인 정형 데이터 기반 의사결정으로부터 비정형 빅데이터 기반 의사결정으로 관심의 전환이 이루어지고 있다. 데이터 기반 의사결정의 대표적 분야인 추천시스템 분야에서도 성능 향상을 위해 비정형 데이터를 활용해야 한다는 필요성이 최근 꾸준히 제기되고 있다. 특히 사용자의 성향이나 선호도는 고객의 니즈와 직결되기 때문에, 비정형 데이터 분석을 통해 사용자의 성향을 파악하고 이를 통해 상품 추천 및 구매 예측의 정확도를 향상시키기 위한 노력이 매우 시급하게 이루어질 필요가 있다. 따라서 본 연구에서는 사용자의 성향을 측정하여 재구매 예측 정확도, 특히 카테고리별 재구매 예측 정확도를 높임으로써, 궁극적으로 추천시스템의 성능을 향상시킬 수 있는 방안을 제시한다. 구체적으로는 사용자의 일상적인 인터넷 사용 기록을 분석하여 고객이 조회하는 뉴스 기사의 이슈를 식별하고 다양한 이슈에 대한 고객의 관심을 계량화한 후, 이를 활용하여 고객의 카테고리별 재구매 여부를 예측하는 모델을 제안하고자 한다. 실제 웹 트랜잭션으로부터 도출된 인터넷 뉴스 조회 기록 및 쇼핑몰 구매 기록을 대상으로 실험을 수행한 결과, 고객의 과거 구매이력만을 활용한 카테고리 재구매 예측 모형에 비해 본 연구에서 제안한 모형, 즉 고객의 과거 구매이력과 관심 이슈를 모두 활용한 예측 모형의 정확도가 다소 우수한 것으로 나타났다.
Cite this article
JIIS(APA) Style
Choi, S., Hyun, Y., & Kim, N. (2015). Improving Performance of Recommendation Systems Using Topic Modeling. Journal of Intelligence and Information Systems, 21(3), 103-118.

IEEE Style
Seongi Choi, Yoonjin Hyun, and Namgyu Kim, "Improving Performance of Recommendation Systems Using Topic Modeling", Journal of Intelligence and Information Systems, vol. 21, no. 3, pp. 103~118, 2015.

ACM Style
Choi, S., Hyun, Y., & Kim, N., 2015. Improving Performance of Recommendation Systems Using Topic Modeling. Journal of Intelligence and Information Systems. 21, 3, 103--118.
Export Formats : BiBTeX, EndNote
Advanced Search
Date Range

to
Search
@article{Choi:JIIS:2015:626,
author = {Choi, Seongi and Hyun, Yoonjin and Kim, Namgyu},
title = {Improving Performance of Recommendation Systems Using Topic Modeling},
journal = {Journal of Intelligence and Information Systems},
issue_date = {September 2015},
volume = {21},
number = {3},
month = Sep,
year = {2015},
issn = {2288-4866},
pages = {103--118},
url = {http://dx.doi.org/10.13088/jiis.2015.21.3.103 },
doi = {10.13088/jiis.2015.21.3.103},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { Big Data Analysis, Data Mining, Recommendation Systems, Text Mining and Topic Modeling },
}
%0 Journal Article
%1 626
%A Seongi Choi
%A Yoonjin Hyun
%A Namgyu Kim
%T Improving Performance of Recommendation Systems Using Topic Modeling
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 21
%N 3
%P 103-118
%D 2015
%R 10.13088/jiis.2015.21.3.103
%I Korea Intelligent Information System Society