DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
< Previous   List   Next >  
Twitter Issue Tracking System by Topic Modeling Techniques
Full-text Download
Jung-hwan Bae (Dept. of Library and Information Science, Yonsei University)
Nam-gi Han (Dept. of Library and Information Science, Yonsei University)
Min Song (Dept. of Library and Information Science, Yonsei University)
Vol. 20, No. 2, Page: 109 ~ 122
10.13088/jiis.2014.20.2.109
Keywords
Social Media Mining; Text Mining; Twitter Issue; Topic Modeling; Social Network Service;
Big Data
Abstract
People are nowadays creating a tremendous amount of data on Social Network Service (SNS).
In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data
generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now
we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount
of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are
satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can
be used as a new important source for the creation of new values because this information covers
the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and
established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts
and visualizes them on the web. The proposed system provides the following four functions:(1) Provide the topic keyword set that corresponds to daily ranking;
(2) Visualize the daily time series graph of a topic for the duration of a month;
(3) Provide the importance of a topic through a treemap based on the score system and
frequency;
(4) Visualize the daily time-series graph of keywords by searching the keyword;
The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis
requires various natural language processing techniques, including the removal of stop words, and
noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis
requires the latest big data technology to process rapidly a large amount of real-time data, such as
the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built
TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is
classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented
database that provides high performance, high availability, and automatic scaling. Unlike existing
relational database, there are no schema or tables with MongoDB, and its most important goal is that
of data accessibility and data processing performance. In the Age of Big Data, the visualization of
Big Data is more attractive to the Big Data community because it helps analysts to examine such
data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is
designed for the purpose of creating Data Driven Documents that bind document object model (DOM)
and any data; the interaction between data is easy and useful for managing real-time data stream with
smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and
JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using
these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The
proposed work demonstrates the superiority of our issue detection techniques by matching detected
issues with corresponding online news articles.
The contributions of the present study are threefold. First, we suggest an alternative approach
to real-time big data analysis, which has become an extremely important issue. Second, we apply a
topic modeling technique that is used in various research areas, including Library and Information
Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third,
we develop a web-based system, and make the system available for the real-time discovery of topics.
The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.
Show/Hide Detailed Information in Korean
토픽 모델링을 이용한 트위터 이슈 트래킹
배정환 (연세대학교 문헌정보학과 대학원)
한남기 (연세대학교 문헌정보학과 대학원)
송민 (연세대학교 문헌정보학과 부교수)
Keywords
데이터 마이닝, 소셜네트워크 분석, 이슈 클러스터링, 토픽 분석
Abstract
현재 우리는 소셜 네트워크 서비스(Social Network Service, 이하 SNS) 상에서 수많은 데이터를 만들어 내고 있다. 특
히, 모바일 기기와 SNS의 결합은 과거와는 비교할 수 없는 대량의 데이터를 생성하면서 사회적으로도 큰 영향을 미치고
있다. 이렇게 방대한 SNS 데이터 안에서 사람들이 많이 이야기하는 이슈를 찾아낼 수 있다면 이 정보는 사회 전반에 걸
쳐 새로운 가치 창출을 위한 중요한 원천으로 활용될 수 있다. 본 연구는 이러한 SNS 빅데이터 분석에 대한 요구에 부응
하기 위해, 트위터 데이터를 활용하여 트위터 상에서 어떤 이슈가 있었는지 추출하고 이를 웹 상에서 시각화 하는 트위터
이슈 트래킹 시스템 TITS(Twitter Issue Tracking System)를 설계하고 구축 하였다. TITS는 1) 일별 순위에 따른 토픽 키
워드 집합 제공 2) 토픽의 한달 간 일별 시계열 그래프 시각화 3) 토픽으로서의 중요도를 점수와 빈도수에 따라 Treemap
으로 제공 4) 키워드 검색을 통한 키워드의 한달 간 일별 시계열 그래프 시각화의 기능을 갖는다. 본 연구는 SNS 상에서
실시간으로 발생하는 빅데이터를 Open Source인 Hadoop과 MongoDB를 활용하여 분석하였고, 이는 빅데이터의 실시간
처리가 점점 중요해지고 있는 현재 매우 주요한 방법론을 제시한다. 둘째, 문헌정보학 분야뿐만 아니라 다양한 연구 영역
에서 사용하고 있는 토픽 모델링 기법을 실제 트위터 데이터에 적용하여 스토리텔링과 시계열 분석 측면에서 유용성을
확인할 수 있었다. 셋째, 연구 실험을 바탕으로 시각화와 웹 시스템 구축을 통해 실제 사용 가능한 시스템으로 구현하였
다. 이를 통해 소셜미디어에서 생성되는 사회적 트렌드를 마이닝하여 데이터 분석을 통한 의미 있는 정보를 제공하는 실
제적인 방법을 제시할 수 있었다는 점에서 주요한 의의를 갖는다. 본 연구는 JSON(JavaScript Object Notation) 파일 포맷
의 1억 5천만개 가량의 2013년 3월 한국어 트위터 데이터를 실험 대상으로 한다.
Cite this article
JIIS Style
Bae, J.-h., N.-g. Han, and M. Song, "Twitter Issue Tracking System by Topic Modeling Techniques", Journal of Intelligence and Information Systems, Vol. 20, No. 2 (2014), 109~122.

IEEE Style
Jung-hwan Bae, Nam-gi Han, and Min Song, "Twitter Issue Tracking System by Topic Modeling Techniques", Journal of Intelligence and Information Systems, vol. 20, no. 2, pp. 109~122, 2014.

ACM Style
Bae, J.-h., Han, N.-g., and Song, M., 2014. Twitter Issue Tracking System by Topic Modeling Techniques. Journal of Intelligence and Information Systems. 20, 2, 109--122.
Export Formats : BiBTeX, EndNote

Warning: include(/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php) [function.include]: failed to open stream: No such file or directory in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429

Warning: include() [function.include]: Failed opening '/home/hosting_users/ev_jiisonline/www/admin/archive/advancedSearch.php' for inclusion (include_path='.:/usr/local/php/lib/php') in /home/hosting_users/ev_jiisonline/www/archive/detail.php on line 429
@article{Bae:JIIS:2014:576,
author = {Bae, Jung-hwan and Han, Nam-gi and Song, Min},
title = {Twitter Issue Tracking System by Topic Modeling Techniques},
journal = {Journal of Intelligence and Information Systems},
issue_date = {June 2014},
volume = {20},
number = {2},
month = Jun,
year = {2014},
issn = {2288-4866},
pages = {109--122},
url = {http://dx.doi.org/10.13088/jiis.2014.20.2.109 },
doi = {10.13088/jiis.2014.20.2.109},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { Social Media Mining; Text Mining; Twitter Issue; Topic Modeling; Social Network Service;
Big Data },
}
%0 Journal Article
%1 576
%A Jung-hwan Bae
%A Nam-gi Han
%A Min Song
%T Twitter Issue Tracking System by Topic Modeling Techniques
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 20
%N 2
%P 109-122
%D 2014
%R 10.13088/jiis.2014.20.2.109
%I Korea Intelligent Information System Society