< Previous   List   Next >  
Semantic Visualization of Dynamic Topic Modeling
Full-text Download
Jinwook Yeon (Graduate School of Business IT, Kookmin Universit)
Hyunkyung Boo (Graduate School of Business IT, Kookmin Universit)
Namgyu Kim (Graduate School of Business IT, Kookmin Universit)
Vol. 28, No. 1, Page: 131 ~ 154
Dynamic Topic Modeling, Word Embedding, Big Data, Visualization, Word2vec
Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed.
Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself.
To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with ‘Wikipedia’, an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics.
In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.
Show/Hide Detailed Information in Korean
다이내믹 토픽 모델링의 의미적 시각화 방법론
연진욱 (국민대학교 비즈니스IT전문대학원)
부현경 (국민대학교 비즈니스IT전문대학원)
김남규 (국민대학교 비즈니스IT전문대학원)
다이내믹 토픽 모델링, 워드 임베딩, 빅 데이터, 시각화, Word2vec
최근 방대한 양의 텍스트 데이터에 대한 분석을 통해 유용한 지식을 창출하는 시도가 꾸준히 증가하고 있으며, 특히 토픽 모델링(Topic Modeling)을 통해 다양한 분야의 여러 이슈를 발견하기 위한 연구가 활발히 이루어지고 있다. 초기의 토픽 모델링은 토픽의 발견 자체에 초점을 두었지만, 점차 시기의 변화에 따른 토픽의 변화를 고찰하는 방향으로 연구의 흐름이 진화하고 있다. 특히 토픽 자체의 내용, 즉 토픽을 구성하는 키워드의 변화를 수용한 다이내믹 토픽 모델링 (Dynamic Topic Modeling)에 대한 관심이 높아지고 있지만, 다이내믹 토픽 모델링은 분석 결과의 직관적인 이해가 어렵 고 키워드의 변화가 토픽의 의미에 미치는 영향을 나타내지 못한다는 한계를 갖는다. 본 논문에서는 이러한 한계를 극복 하기 위해 다이내믹 토픽 모델링과 워드 임베딩(Word Embedding)을 활용하여 토픽의 변화 및 토픽 간 관계를 직관적으 로 해석할 수 있는 방안을 제시한다. 구체적으로 본 연구에서는 다이내믹 토픽 모델링 결과로부터 각 시기별 토픽의 상위 키워드와 해당 키워드의 토픽 가중치를 도출하여 정규화하고, 사전 학습된 워드 임베딩 모델을 활용하여 각 토픽 키워드의 벡터를 추출한 후 각 토픽에 대해 키워드 벡터의 가중합을 산출하여 각 토픽의 의미를 벡터로 나타낸다. 또한 이렇게 도출된 각 토픽의 의미 벡터를 2차원 평면에 시각화하여 토픽의 변화 양상 및 토픽 간 관계를 표현하고 해석한다. 제안 방법론의 실무 적용 가능성을 평가하기 위해 DBpia에 2016년부터 2021년까지 공개된 논문 중 ‘인공지능’ 관련 논문 1,847건에 대한 실험을 수행하였으며, 실험 결과 제안 방법론을 통해 다양한 토픽이 시간의 흐름에 따라 변화하는 양상을 직관적으로 파악할 수 있음을 확인하였다.
Cite this article
Yeon, J., Boo, H., & Kim, N. (2022). Semantic Visualization of Dynamic Topic Modeling. Journal of Intelligence and Information Systems, 28(1), 131-154.

IEEE Style
Jinwook Yeon, Hyunkyung Boo, and Namgyu Kim, "Semantic Visualization of Dynamic Topic Modeling", Journal of Intelligence and Information Systems, vol. 28, no. 1, pp. 131~154, 2022.

ACM Style
Yeon, J., Boo, H., & Kim, N., 2022. Semantic Visualization of Dynamic Topic Modeling. Journal of Intelligence and Information Systems. 28, 1, 131--154.
Export Formats : BiBTeX, EndNote
Advanced Search
Date Range

author = {Yeon, Jinwook and Boo, Hyunkyung and Kim, Namgyu},
title = {Semantic Visualization of Dynamic Topic Modeling},
journal = {Journal of Intelligence and Information Systems},
issue_date = {March 2022},
volume = {28},
number = {1},
month = Mar,
year = {2022},
issn = {2288-4866},
pages = {131--154},
url = {},
doi = {},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { Dynamic Topic Modeling, Word Embedding, Big Data, Visualization and Word2vec },
%0 Journal Article
%1 866
%A Jinwook Yeon
%A Hyunkyung Boo
%A Namgyu Kim
%T Semantic Visualization of Dynamic Topic Modeling
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 28
%N 1
%P 131-154
%D 2022
%I Korea Intelligent Information System Society