Journal of Intelligence and Information Systems,
Vol. 19, No. 3, September 2013
A Proposal of a Keyword Extraction System for Detecting Social Issues
Dami Jeong, Jaeseok Kim, Gi-Nam Kim, Jong-Uk Heo, Byung-Won On, and Mijung Kang
Vol. 19, No. 3, Page: 1 ~ 23
Keywords : Topic Modeling, Generative Model, Matching, Text Mining, Social Issue Keywords, Social Issue Filtering, News Articles, Time Series Keyword Visualization
To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in
The Ontology Based Movie Contents Recommendation Scheme Using Relations of Movie Metadata
Jaeyoung Kim, and Seok-Won Lee
Vol. 19, No. 3, Page: 25 ~ 44
Keywords : Recommendation System, Metadata, Ontology, Movie Contents
Accessing movie contents has become easier and increased with the advent of smart TV, IPTV and web services that are able to be used to search and watch movies. In this situation, there are increasing search for preference movie contents of users. However, since the amount of provided movie contents is too large, the user needs more effort and time for searching the movie contents. Hence, there are a lot of researches for recommendations of personalized item through analysis and clustering of the user preferences and user profiles. In this study, we propose recommendation system which uses ontology based knowledge base. Our ontology can represent not only relations between metadata of movies but also relations between metadata and profile of user. The relation of each metadata can show similarity between movies. In order to build, the knowledge base our ontology model is considered two aspects which are the movie metadata model and the user model. On the part of build the movie metadata model based on ontology, we decide main metadata that are genre, actor/actress, keywords and synopsis. Those affect that users choose the interested movie. And there are demographic information of user and relation between user and movie metadata in user model. In our model, movie ontology model consists of seven concepts (Movie, Genre, Keywords, Synopsis Keywords, Character, and Person), eight attributes (title, rating, limit, description, character name, character description, person job, person name) and ten relations between concepts. For our knowledge base, we input individual data of 14,374 movies for each concept in contents ontology model. This movie metadata knowledge base is used to search the movie that is related to interesting metadata of user. And it can search the similar movie through relations between concepts.
We also propose the architecture for movie recommendation. The proposed architecture consists of four components. The first component search candidate movies based the demographic information of the user. In this component, we decide the group of users according to demographic information to recommend the movie for each group and define the rule to decide the group of users. We generate the query that be used to search the candidate movie for recommendation in this component.
The second component search candidate movies based user preference. When users choose the movie, users consider metadata such as genre, actor/actress, synopsis, keywords. Users input their preference and then in this component, system search the movie based on users preferences. The proposed system can search the similar movie through relation between concepts, unlike existing movie recommendation systems. Each metadata of recommended candidate movies have weight that will be used for deciding recommendation order.
The third component the merges results of first component and second component. In this step, we calculate the weight of movies using the weight value of metadata for each movie. Then we sort movies order by the weight value.
The fourth component analyzes result of third component, and then it decides level of the contribution of metadata. And we apply contribution weight to metadata. Finally, we use the result of this step as recommendation for users.
We test the usability of the proposed scheme by using web application. We implement that web application for experimental process by using JSP, Java Script and protégé API. In our experiment, we collect results of 20 men and woman, ranging in age from 20 to 29. And we use 7,418 movies with rating that is not fewer than 7.0. In order to experiment, we provide Top-5, Top-10 and Top-20 recommended movies to user, and then users choose interested movies. The result of experiment is that average number of to choose interested movie are 2.1 in Top-5, 3.35 in Top-10, 6.35 in Top-20. It is better than results that are yielded by for each metadata.
Content‐based Recommendation Based on Social Network for Personalized News Service
Myung-Duk Hong, Kyeong-Jin Oh, Myung-Hyun Ga, and Geun-Sik Jo
Vol. 19, No. 3, Page: 57 ~ 71
Keywords : Content-Based Recommendation System, Personalized News Service, User Profile, Social Network
세계에는 수많은 사람들이 살아가고 있고, 사람들의 일상으로부터 매일, 매 시간 단위로 새로운 뉴스가 발생한다. 발생되는 뉴스는 예정된 일과 예상하지 못한 일들을 포함하고 있다. 발생하는 뉴스의 거대한 양과 이를 전달하는 수많은 미디어들로 인해 사람들은 뉴스 콘텐츠를 이용하는데 많은 시간을 소비하게 된다. 하지만 미디어에 시시각각 나타나는 속보와 실시간 이슈의 대부분이 가십 기사로 이루어져 있어 사용자들이 자신의 성향에 맞는 뉴스를 선별하고, 뉴스로부터 정보를 획득하는 것은 쉽지 않은 일이다. 또한 사용자의 관심사가 시간에 따라 변하기 때문에 뉴스 제공에 있어 사용자의 변하는 관심사를 반영하는 것이 요구된다. 본 논문에서는 사용자의 최근 관심사를 기반으로 사용자 선호도에 맞는 뉴스를 제공하기 위한 콘텐츠 기반의 추천 기법 및 시스템을 제안한다. 사용자의 최근 선호도를 파악하기 위하여 소셜 네트워크 서비스인 Facebook 사용자의 정보와 최근 게시글을 이용하여 동적으로 사용자 프로파일을 생성하여 이를 뉴스 서비스에 활용하고, 사용자 선호도에 적합한 뉴스를 추출하기 위해서 뉴스 콘텐츠의 분석을 요구한다. 뉴스 콘텐츠 분석을 위해 미디어에서 제공되는 뉴스의 카테고리를 사용하고, 뉴스 방송원고의 분석 및 주요 키워드 추출을 통해 뉴스 프로파일을 생성한다. 사용자 프로파일과 뉴스 프로파일 간의 유사도 측정을 위해서는 두 프로파일 간 형식의 일치화가 요구되므로 사용자 프로파일을 뉴스 프로파일과 동일한 형태로 생성한다. 사용자가 시스템에 접속하면 시스템은 사용자 프로파일에 명시된 선호도를 기반으로 뉴스 프로파일과의 유사도를 측정하고, 사용자 선호도에 가장 적합한 뉴스들을 제공하게 된다. 또한 사용자에게 제공된 뉴스 프로파일과 다른 뉴스 프로파일들 간에 유사도를 측정하여 유사도가 높은 관련된 뉴스들을 제공하게 된다. 제안한 개인화된 뉴스 서비스의 성능을 평가하기 위해 사용자에게 추천된 뉴스에 대한 사용자 평가와 시스템 예측값의 오차를 기반으로 6Sub-Vectors 벤치마크 알고리즘과 성능 평가를 수행하였고, 실험 결과를 통해 제안한 시스템의 우수성을 입증하였다.
The Effect of the Context Awareness Value on the Smartphone Adopterʼs Advertising Attitude
Chang-Gyu Yang, Eui-Bang Lee, and Yunchu Huang
Vol. 19, No. 3, Page: 73 ~ 91
Keywords : Smartphone Advertising, Advertising Value, Context Awareness Value, Activity, Timing, Location
Advertising market has been facing new challenges due to dramatic change in advertising channels and the advent of innovative media such as mobile devices. Recent research related to mobile devices is mainly focused on the fact that mobile devices could identify users’ physical location in real‐time, and this sheds light on how location‐based technology is utilized to achieve competitive advantage in advertising market. With the introduction of smartphone, the functionality of smartphone has become much more diverse and context awareness is one of the areas that require further study. This work analyses the influence of context awareness value resulted from the transformation of advertising channel in mobile communication market, and our research result reflects recent trend in advertising market environment which is not considered in previous studies.
Many constructs has intensively been studied in the context of advertising channel in traditional marketing environment, and entertainment, irritation and information are considered to be the most widely accepted variables that has positive relationship with advertising value. Also, in smartphone advertisement, four main dimensions of context awareness value are recognized: identification, activity, timing and location. In this study, we assume that these four constructs has positive relationship with context awareness value. Finally, we propose that advertising value and context awareness value positively influence smartphone advertising attitude.
Partial Least Squares (PLS) structural model is used in our theoretical research model to test proposed hypotheses. A well designed survey is conducted for college students in Korea, and reliability, convergent validity and discriminant validity of constructs and measurement indicators are carefully evaluated and the results show that reliability and validity are confirmed according to predefined statistical criteria. Goodness‐of‐fit of our research model is also supported. In summary, the results collectively suggest good measurement properties for the proposed research model.
The research outcomes are as follows. First, information has positive impact on advertising value while entertainment and irritation have no significant impact. Information, entertainment and irritation together account for 38.8% of advertising value. Second, along with the change in advertising market due to the advent of smartphone, activity, timing and location have positive impact on context awareness value while identification has no significant impact. In addition, identification, activity, location and time together account for 46.3% of context awareness value. Third, advertising value and context awareness value both positively influence smartphone advertising attitude, and these two constructs explain 31.7% of the variability of smartphone advertising attitude.
The theoretical implication of our research is as follows. First, the influence of entertainment and irritation is reduced which are known to be crucial factors according to previous studies related to advertising value, while the influence of information is increased. It indicates that smartphone users are not likely interested in entertaining effect of smartphone advertisement, and are insensitive to the inconvenience due to smartphone advertisement. Second, in today’s ubiquitous computing environment, it is effective to provide differentiated advertising service by utilizing smartphone users' context awareness values such as identification, activity, timing and location in order to achieve competitive business advantage in advertising market. For practical implications, enterprises should provide valuable and useful information that might attract smartphone users by adopting differentiation strategy as smartphone users are sensitive to the information provided via smartphone. Also enterprises not only provide useful information but also recognize and utilize smarphone users’ unique characteristics and behaviors by increasing context awareness values. In summary, our result implies that smartphone advertisement should be optimized by considering the needed information of smartphone users in order to maximize advertisement effect.
Intelligent Brand Positioning Visualization System Based on Web Search Traffic Information : Focusing on Tablet PC
Seung-Pyo Jun, and Do-Hyung Park
Vol. 19, No. 3, Page: 93 ~ 111
Keywords : Web Search Traffic, Google Insight, Brand Positioning, Social Network Analysis, Tablet PC
As Internet and information technology (IT) continues to develop and evolve, the issue of big data has emerged at the foreground of scholarly and industrial attention. Big data is generally defined as data that exceed the range that can be collected, stored, managed and analyzed by existing conventional information systems and it also refers to the new technologies designed to effectively extract values from such data. With the widespread dissemination of IT systems, continual efforts have been made in various fields of industry such as R&D, manufacturing, and finance to collect and analyze immense quantities of data in order to extract meaningful information and to use this information to solve various problems. Since IT has converged with various industries in many aspects, digital data are now being generated at a remarkably accelerating rate while developments in state‐of‐the‐art technology have led to continual enhancements in system performance. The types of big data that are currently receiving the most attention include information available within companies, such as information on consumer characteristics, information on purchase records, logistics information and log information indicating the usage of products and services by consumers, as well as information accumulated outside companies, such as information on the web search traffic of online users, social network information, and patent information. Among these various types of big data, web searches performed by online users constitute one of the most effective and important sources of information for marketing purposes because consumers search for information on the internet in order to make efficient and rational choices.
Recently, Google has provided public access to its information on the web search traffic of online users through a service named Google Trends. Research that uses this web search traffic information to analyze the information search behavior of online users is now receiving much attention in academia and in fields of industry. Studies using web search traffic information can be broadly classified into two fields. The first field consists of empirical demonstrations that show how web search information can be used to forecast social phenomena, the purchasing power of consumers, the outcomes of political elections, etc. The other field focuses on using web search traffic information to observe consumer behavior, identifying the attributes of a product that consumers regard as important or tracking changes on consumers’ expectations, for example, but relatively less research has been completed in this field. In particular, to the extent of our knowledge, hardly any studies related to brands have yet attempted to use web search traffic information to analyze the factors that influence consumers’ purchasing activities.
This study aims to demonstrate that consumers’ web search traffic information can be used to derive the relations among brands and the relations between an individual brand and product attributes. When consumers input their search words on the web, they may use a single keyword for the search, but they also often input multiple keywords to seek related information (this is referred to as simultaneous searching). A consumer performs a simultaneous search either to simultaneously compare two product brands to obtain information on their similarities and differences, or to acquire more in‐depth information about a specific attribute in a specific brand. Web search traffic information shows that the quantity of simultaneous searches using certain keywords increases when the relation is closer in the consumer's mind and it will be possible to derive the relations between each of the keywords by collecting this relational data and subjecting it to network analysis. Accordingly, this study proposes a method of analyzing how brands are positioned by consumers and what relationships exist between product attributes and an individual brand, using simultaneous search traffic information. It also presents case studies demonstrating the actual application of this method, with a focus on tablets, belonging to innovative product groups.
Intelligent VOC Analyzing System Using Opinion Mining
Yoosin Kim, and Seung Ryul Jeong
Vol. 19, No. 3, Page: 113 ~ 125
Keywords : VOC, Voice of Customers, Opinion Mining, Sentimental Analysis, Text Mining, Big Data
Every company wants to know customer’s requirement and makes an effort to meet them. Cause that, communication between customer and company became core competition of business and that important is increasing continuously. There are several strategies to find customer’s needs, but VOC (Voice of customer) is one of most powerful communication tools and VOC gathering by several channels as telephone, post, e‐mail, website and so on is so meaningful. So, almost company is gathering VOC and operating VOC system.
VOC is important not only to business organization but also public organization such as government, education institute, and medical center that should drive up public service quality and customer satisfaction. Accordingly, they make a VOC gathering and analyzing System and then use for making a new product and service, and upgrade. In recent years, innovations in internet and ICT have made diverse channels such as SNS, mobile, website and call‐center to collect VOC data.
Although a lot of VOC data is collected through diverse channel, the proper utilization is still difficult. It is because the VOC data is made of very emotional contents by voice or text of informal style and the volume of the VOC data are so big. These unstructured big data make a difficult to store and analyze for use by human. So that, the organization need to automatic collecting, storing, classifying and analyzing system for unstructured big VOC data.
This study propose an intelligent VOC analyzing system based on opinion mining to classify the unstructured VOC data automatically and determine the polarity as well as the type of VOC. And then, the basis of the VOC opinion analyzing system, called domain‐oriented sentiment dictionary is created and corresponding stages are presented in detail. The experiment is conducted with 4,300 VOC data collected from a medical website to measure the effectiveness of the proposed system and utilized them to develop the sensitive data dictionary by determining the special sentiment vocabulary and their polarity value in a medical domain.
Through the experiment, it comes out that positive terms such as “칭찬, 친절함, 감사, 무사히, 잘해, 감동, 미소” have high positive opinion value, and negative terms such as “퉁명, 뭡니까, 말하더군요, 무시하는” have strong negative opinion. These terms are in general use and the experiment result seems to be a high probability of opinion polarity. Furthermore, the accuracy of proposed VOC classification model has been compared and the highest classification accuracy of 77.8% is conformed at threshold with ‐0.50 of opinion classification of VOC.
Through the proposed intelligent VOC analyzing system, the real time opinion classification and response priority of VOC can be predicted. Ultimately the positive effectiveness is expected to catch the customer complains at early stage and deal with it quickly with the lower number of staff to operate the VOC system. It can be made available human resource and time of customer service part.
Above all, this study is new try to automatic analyzing the unstructured VOC data using opinion mining, and shows that the system could be used as variable to classify the positive or negative polarity of VOC opinion. It is expected to suggest practical framework of the VOC analysis to diverse use and the model can be used as real VOC analyzing system if it is implemented as system.
Despite experiment results and expectation, this study has several limits. First of all, the sample data is only collected from a hospital web‐site. It means that the sentimental dictionary made by sample data can be lean too much towards on that hospital and web‐site. Therefore, next research has to take several channels such as call‐center and SNS, and other domain like government, financial company, and education institute.
The Effect of Patent Citation Relationship on Business Performance : A Social Network Analysis Perspective
Jun Hyung Park, and Kee-Young Kwahk
Vol. 19, No. 3, Page: 127 ~ 139
Keywords : Social Network Analysis, Patent Citation Network, Outdegree Centrality, Between Centrality, Efficiency
With an advent of recent knowledge‐based society, the interest in intellectual property has increased. Firms have tired to result in productive outcomes through continuous innovative activity. Especially, ICT firms which lead high‐tech industry have tried to manage intellectual property more systematically. Firm’s interest in the patent has increased in order to manage the innovative activity and Knowledge property. The patent involves not only simple information but also important values as information of technology, management and right. Moreover, as the patent has the detailed contents regarding technology development activity, it is regarded as valuable data. The patent which reflects technology spread and research outcomes and business performances are closely interrelated as the patent is considered as a significant the level of firm’s innovation. As the patent information which represents companies’ intellectual capital is accumulated continuously, it has become possible to do quantitative analysis. The advantages of patent in the related industry information and it’s standardize information can be easily obtained. Through the patent, the flow of knowledge can be determined. The patent information can analyze in various levels from patent to nation. The patent information is used to analyze technical status and the effects on performance. The patent which has a high frequency of citation refers to having high technological values. Analyzing the patent information contains both citation index analysis using the number of citation and network analysis using citation relationship. Network analysis can provide the information on the flows of knowledge and technological changes, and it can show future research direction. Studies using the patent citation analysis vary academically and practically. For the citation index research, studies to analyze influential big patent has been conducted, and for the network analysis research, studies to find out the flows of technology in a certain industry has been conducted. Social network analysis is applied not only in the sociology, but also in a field of management consulting and company’s knowledge management. Research of how the company’s network position has an impact on business performances has been conducted from various aspects in a field of network analysis. Social network analysis can be based on the visual forms. Network indicators are available through the quantitative analysis. Social network analysis is used when analyzing outcomes in terms of the position of network. Social network analysis focuses largely on centrality and structural holes. Centrality indicates that actors having central positions among other actors have an advantage to exert stronger influence for exchange relationship. Degree centrality, betweenness centrality and closeness centrality are used for centrality analysis. Structural holes refer to an empty place in social structure and are defined as efficiency and constraints. This study stresses and analyzes firms’ network in terms of the patent and how network characteristics have an influence on business performances. For the purpose of doing this, seventy‐four ICT companies listed in S&P500 are chosen for the sample. UCINET6 is used to analyze the network structural characteristics such as outdegree centrality, betweenness centrality and efficiency. Then, regression analysis test is conducted to find out how these network characteristics are related to business performance. It is found that each network index has significant impacts on net income, i.e. business performance. However, it is found that efficiency is negatively associated with business performance. As the efficiency increases, net income decreases and it has a negative impact on business performances. Furthermore, it is shown that betweenness centrality solely has statistically significance for the multiple regression analysis with three network indexes. The patent citation network analysis shows the flows of knowledge between firms, and it can be expected to contribute to company’s management strategies by analyzing company’s network structural positions.
Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques
Jung-hwan Bae, Ji-eun Son, and Min Song
Vol. 19, No. 3, Page: 141 ~ 156
Keywords : Social Media Mining, Twitter Trend Miming System, Topic Modeling, Network Analysis, Community Detection, Korean Presidential Election, Big Data
Social media is a representative form of the Web 2.0 that shapes the change of a user’s information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real‐time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad‐brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co‐occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention‐based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea ( for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', 'Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and ‘down contract’ from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendency-depending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms - 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn.' The results show that Twitter users mention all candidates’ name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention‐based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.

Advanced Search
Date Range