< Previous   List   Next >  
Latent topics-based product reputation mining
Full-text Download
Sang-Min Park (Department of Software Convergence Engineering, Kunsan National University)
Byung-Won On (Department of Software Convergence Engineering, Kunsan National University)
Vol. 23, No. 2, Page: 39 ~ 70
topic model, opinion mining, text summarization, data analytics, public survey
Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences.
In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the ‘car design’ aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as ‘car quality’, ‘car performance’, and ‘car service.’ Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores.
However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming.
To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.
Show/Hide Detailed Information in Korean
잠재 토픽 기반의 제품 평판 마이닝
박상민 (군산대학교 산학융합공과대학 소프트웨어융합공학과)
온병원 (군산대학교 산학융합공과대학 소프트웨어융합공학과)
토픽 모델, 오피니언 마이닝, 텍스트 요약, 데이터 분석, 여론조사
최근 여론조사 분야에서 데이터에 기반을 둔 분석 기법이 널리 활용되고 있다. 기업에서는 최근 출시된 제품에 대한 선호도를 조사하기 위해 기존의 설문조사나 전문가의 의견을 단순 취합하는 것이 아니라, 온라인상에존재하는 다양한 종류의 데이터를 수집하고 분석하여 제품에 대한 대중의 기호를 정확히 파악할 수 있는 방안을 필요로 한다. 기존의 주요 방안에서는 먼저 해당 분야에 대한 감성사전을 구축한다. 전문가들은 수집된 텍스트 문서들로부터 빈도가 높은 단어들을 정리하여 긍정, 부정, 중립을 판단한다. 특정 제품의 선호를 판별하기위해, 제품에 대한 사용 후기 글을 수집하여 문장을 추출하고, 감성사전을 이용하여 문장들의 긍정, 부정, 중립을 판단하여 최종적으로 긍정과 부정인 문장의 개수를 통해 제품에 대한 선호도를 측정한다. 그리고 제품에 대한 긍·부정 내용을 자동으로 요약하여 제공한다. 이것은 문장들의 감성점수를 산출하여, 긍정과 부정점수가 높은 문장들을 추출한다. 본 연구에서는 일반 대중이 생산한 문서 속에 숨겨져 있는 토픽을 추출하여 주어진 제품의 선호도를 조사하고, 토픽의 긍·부정 내용을 요약하여 보여주는 제품 평판 마이닝 알고리즘을 제안한다. 기존방식과 다르게, 토픽을 활용하여 쉽고 빠르게 감성사전을 구축할 수 있으며 추출된 토픽을 정제하여 제품의 선호도와 요약 결과의 정확도를 높인다. 실험을 통해, K5, SM5, 아반떼 등의 국내에서 생산된 자동차의 수많은후기 글들을 수집하였고, 실험 자동차의 긍·부정 비율, 긍·부정 내용 요약, 통계 검정을 실시하여 제안방안의 효용성을 입증하였다.
Cite this article
JIIS Style
Park, S.-M., and B.-W. On, "Latent topics-based product reputation mining", Journal of Intelligence and Information Systems, Vol. 23, No. 2 (2017), 39~70.

IEEE Style
Sang-Min Park, and Byung-Won On, "Latent topics-based product reputation mining", Journal of Intelligence and Information Systems, vol. 23, no. 2, pp. 39~70, 2017.

ACM Style
Park, S.-M., and On, B.-W., 2017. Latent topics-based product reputation mining. Journal of Intelligence and Information Systems. 23, 2, 39--70.
Export Formats : BiBTeX, EndNote

Warning: include(/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php) [function.include]: failed to open stream: No such file or directory in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429

Warning: include() [function.include]: Failed opening '/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php' for inclusion (include_path='.:/usr/local/php/lib/php') in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429
author = {Park, Sang-Min and On, Byung-Won},
title = {Latent topics-based product reputation mining},
journal = {Journal of Intelligence and Information Systems},
issue_date = {June 2017},
volume = {23},
number = {2},
month = Jun,
year = {2017},
issn = {2288-4866},
pages = {39--70},
url = { },
doi = {},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { topic model, opinion mining, text summarization, data analytics and public survey },
%0 Journal Article
%1 689
%A Sang-Min Park
%A Byung-Won On
%T Latent topics-based product reputation mining
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 23
%N 2
%P 39-70
%D 2017
%I Korea Intelligent Information System Society