DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
Journal of Intelligence and Information Systems,
Vol. 24, No. 3, September 2018
|Business Application of Convolutional Neural Networks for Apparel Classification Using Runway Image
Vol. 24, No. 3, Page: 1 ~ 19
Keywords : Convolutional Neural Networks, Image Classification, Apparel, Runway, Mobility
Large amount of data is now available for research and business sectors to extract knowledge from it. This data can be in the form of unstructured data such as audio, text, and image data and can be analyzed by deep learning methodology. Deep learning is now widely used for various estimation, classification, and prediction problems. Especially, fashion business adopts deep learning techniques for apparel recognition, apparel search and retrieval engine, and automatic product recommendation. The core model of these applications is the image classification using Convolutional Neural Networks (CNN). CNN is made up of neurons which learn parameters such as weights while inputs come through and reach outputs. CNN has layer structure which is best suited for image classification as it is comprised of convolutional layer for generating feature maps, pooling layer for reducing the dimensionality of feature maps, and fully-connected layer for classifying the extracted features. However, most of the classification models have been trained using online product image, which is taken under controlled situation such as apparel image itself or professional model wearing apparel. This image may not be an effective way to train the classification model considering the situation when one might want to classify street fashion image or walking image, which is taken in uncontrolled situation and involves people’s movement and unexpected pose. Therefore, we propose to train the model with runway apparel image dataset which captures mobility. This will allow the classification model to be trained with far more variable data and enhance the adaptation with diverse query image. To achieve both convergence and generalization of the model, we apply Transfer Learning on our training network. As Transfer Learning in CNN is composed of pre-training and fine-tuning stages, we divide the training step into two. First, we pre-train our architecture with large-scale dataset, ImageNet dataset, which consists of 1.2 million images with 1000 categories including animals, plants, activities, materials, instrumentations, scenes, and foods. We use GoogLeNet for our main architecture as it has achieved great accuracy with efficiency in ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Second, we fine-tune the network with our own runway image dataset. For the runway image dataset, we could not find any previously and publicly made dataset, so we collect the dataset from Google Image Search attaining 2426 images of 32 major fashion brands including Anna Molinari, Balenciaga, Balmain, Brioni, Burberry, Celine, Chanel, Chloe, Christian Dior, Cividini, Dolce and Gabbana, Emilio Pucci, Ermenegildo, Fendi, Giuliana Teso, Gucci, Issey Miyake, Kenzo, Leonard, Louis Vuitton, Marc Jacobs, Marni, Max Mara, Missoni, Moschino, Ralph Lauren, Roberto Cavalli, Sonia Rykiel, Stella McCartney, Valentino, Versace, and Yve Saint Laurent. We perform 10-folded experiments to consider the random generation of training data, and our proposed model has achieved accuracy of 67.2% on final test. Our research suggests several advantages over previous related studies as to our best knowledge, there haven’t been any previous studies which trained the network for apparel image classification based on runway image dataset. We suggest the idea of training model with image capturing all the possible postures, which is denoted as mobility, by using our own runway apparel image dataset. Moreover, by applying Transfer Learning and using checkpoint and parameters provided by Tensorflow Slim, we could save time spent on training the classification model as taking 6 minutes per experiment to train the classifier.This model can be used in many business applications where the query image can be runway image, product image, or street fashion image. To be specific, runway query image can be used for mobile application service during fashion week to facilitate brand search, street style query image can be classified during fashion editorial task to classify and label the brand or style, and website query image can be processed by e-commerce multi-complex service providing item information or recommending similar item.
|Improving the Accuracy of Document Classification by Learning Heterogeneity
Vol. 24, No. 3, Page: 21 ~ 44
Keywords : Text Mining, Text Classification, Heterogeneity Learning, Semi-Supervised Learning, Ensemble Learning
In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media.
However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information.
Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network.
However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process.
In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier.
With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.
|Online Document Mining Approach to Predicting Crowdfunding Success
Vol. 24, No. 3, Page: 45 ~ 66
Keywords : Crowdfunding, Text analysis, Classification, Online data
Crowdfunding has become more popular than angel funding for fundraising by venture companies. Identification of success factors may be useful for fundraisers and investors to make decisions related to crowdfunding projects and predict a priori whether they will be successful or not. Recent studies have suggested several numeric factors, such as project goals and the number of associated SNS, studying how these affect the success of crowdfunding campaigns. However, prediction of the success of crowdfunding campaigns via non-numeric and unstructured data is not yet possible, especially through analysis of structural characteristics of documents introducing projects in need of funding. Analysis of these documents is promising because they are open and inexpensive to obtain. We propose a novel method to predict the success of a crowdfunding project based on the introductory text. To test the performance of the proposed method, in our study, texts related to 1,980 actual crowdfunding projects were collected and empirically analyzed. From the text data set, the following details about the projects were collected: category, number of replies, funding goal, fundraising method, reward, number of SNS
followers, number of images and videos, and miscellaneous numeric data. These factors were identified as significant input features to be used in classification algorithms. The results suggest that the proposed method outperforms other recently proposed, non-text-based methods in terms of accuracy, F-score, and elapsed time.
|A Study on Intelligent Value Chain Network System based on Firms’ Information
Vol. 24, No. 3, Page: 67 ~ 88
Keywords : Value Chain Network System(VCNS), Intelligent SMEs Information Support System, Inter-company Transaction Information, Product Deals Relationship Information, Similar(Competitive) Company Retrieval, Value Net Analysis, Value Chain Analysis
Until recently, as we recognize the significance of sustainable growth and competitiveness of
small-and-medium sized enterprises (SMEs), governmental support for tangible resources such as R&D,
manpower, funds, etc. has been mainly provided. However, it is also true that the inefficiency of support systems such as underestimated or redundant support has been raised because there exist conflicting policies in terms of appropriateness, effectiveness and efficiency of business support.
From the perspective of the government or a company, we believe that due to limited resources of
SMEs technology development and capacity enhancement through collaboration with external sources is the basis for creating competitive advantage for companies, and also emphasize value creation activities for it. This is why value chain network analysis is necessary in order to analyze inter-company deal relationships from a series of value chains and visualize results through establishing knowledge ecosystems at the corporate level. There exist Technology Opportunity Discovery (TOD) system that provides information on relevant products or technology status of companies with patents through retrievals over patent, product, or company name, CRETOP and KISLINE which both allow to view company (financial) information and credit information, but there exists no online system that provides a list of similar (competitive) companies based on the analysis of value chain network or information on potential clients or demanders that can have business deals in future.
Therefore, we focus on the "Value Chain Network System (VCNS)", a support partner for planning the corporate business strategy developed and managed by KISTI, and investigate the types of embedded network-based analysis modules, databases (D/Bs) to support them, and how to utilize the system efficiently. Further we explore the function of network visualization in intelligent value chain analysis system which becomes the core information to understand industrial structure system and to develop a company's new product development.
In order for a company to have the competitive superiority over other companies, it is necessary to
identify who are the competitors with patents or products currently being produced, and searching for
similar companies or competitors by each type of industry is the key to securing competitiveness in the commercialization of the target company. In addition, transaction information, which becomes business activity between companies, plays an important role in providing information regarding potential customers when both parties enter similar fields together. Identifying a competitor at the enterprise or industry level by using a network map based on such inter-company sales information can be implemented as a core module of value chain analysis.
The Value Chain Network System (VCNS) combines the concepts of value chain and industrial structure analysis with corporate information simply collected to date, so that it can grasp not only the market competition situation of individual companies but also the value chain relationship of a specific industry. Especially, it can be useful as an information analysis tool at the corporate level such as
identification of industry structure, identification of competitor trends, analysis of competitors, locating suppliers (sellers) and demanders (buyers), industry trends by item, finding promising items, finding new entrants, finding core companies and items by value chain, and recognizing the patents with corresponding companies, etc. In addition, based on the objectivity and reliability of the analysis results from transaction deals information and financial data, it is expected that value chain network system will be utilized for various purposes such as information support for business evaluation, R&D decision support and mid-term or short-term demand forecasting, in particular to more than 15,000 member companies in Korea, employees in R&D service sectors government-funded research institutes and public organizations.
In order to strengthen business competitiveness of companies, technology, patent and market information have been provided so far mainly by government agencies and private research-and-development service companies. This service has been presented in frames of patent analysis (mainly for rating, quantitative analysis) or market analysis (for market prediction and demand forecasting based on market reports). However, there was a limitation to solving the lack of information, which is one of the difficulties that firms in Korea often face in the stage of commercialization. In particular, it is much more difficult to obtain information about competitors and potential candidates. In this study, the real-time value chain analysis and visualization service module based on the proposed network map and the data in hands is compared with the expected market share, estimated sales volume, contact information (which implies potential suppliers for raw material / parts, and potential demanders for complete products / modules).
In future research, we intend to carry out the in-depth research for further investigating the indices of competitive factors through participation of research subjects and newly developing competitive indices for competitors or substitute items, and to additively promoting with data mining techniques and algorithms for improving the performance of VCNS.
|The Role of Open Innovation for SME’s R&D Success
Vol. 24, No. 3, Page: 89 ~ 117
Keywords : SME, R&D, Open Innovation, R&D Performance
The Korean companies are intensifying competition with not only domestic companies but also foreign companies in globalization. In this environment, it is essential activities not only for large companies but also Small and Medium Enterprises (SMEs) to get and develop the core competency. Particularly, SMEs that are inferior to resources of various aspects, such as financial resources etc., can make innovation through effective R&D investment. And then, SMEs can occupy a competency and can be survive at the environment.
Conventionally, the method of "self-development" by using only the internal resources of the company has been dominant. Recently, however, R&D method through cooperation, also called "Open Innovation", is emerging. Especially SMEs are relatively short of available internal resources. Therefore, it is necessary to utilize technology and resources through cooperation with external companies(such as joint development or contract development etc.) rather than self-development R&D.
In this context, we confirmed the effect of SMEs’ factors on sales in Korea. Specifically, the factors that SMEs hold are classified as 'Technical characteristic', 'Company competency', and 'R&D activity' and analyzed how they influence the sales achieved as a result of R&D. The analysis was based on a two-year statistical survey conducted by the Korean government. In addition, we confirmed the influence of the factors on the sales according to the R&D method(Self-Development vs. Open Innovation), and also observed the influence change in 29 industrial categories.
The results of the study are summarized as follows: First, regression analysis shows that twelve factors of SMEs have a significant effect on sales. Specifically, 15 factors included in the analysis, 12 factors excluding 3 factors were found to have significant influence. In the technical characteristic, 'imitation period' and 'product life cycle' of the technology were confirmed. In the company competency, 'R&D led person', 'researcher number', 'intellectual property registration status', 'number of R&D attempts', and 'ratio of success to trial' were confirmed. The R&D activity was found to have a significant impact on all included factors. Second, the influence of factors on the R&D method was confirmed, and the change was confirmed in four factors. In addition, these factors were found that have different effects on sales according to the R&D method. Specifically, ‘researcher number’, ‘number of R&D attempts’, ‘performance compensation system’, and ‘R&D investment’ were found to have significant moderate effects. In other words, the moderating effect of open innovation was confirmed for four factors. Third, on the industrial classification, it is confirmed that different factors have a significant influence on each industrial classification. At this point, it was confirmed that at least one factor, up to nine factors had a significant effect on the sales according to the industrial classification. Furthermore, different moderate effects have been confirmed in the industrial classification and R&D method. In the moderate effect, up to eight significant moderate effects were confirmed according to the industrial classification. In particular, 'R&D investment' and 'performance compensation system' were confirmed to be the most common moderating effect by each 12 times and 11 times in all industrial classification.
This study provides the following suggestions: First, it is necessary for SMEs to determine the R&D method in consideration of the characteristics of the technology to be R&D as well as the enterprise competency and the R&D activity. In addition, there is a need to identify and concentrate on the factors that increase sales in R&D decisions, which are mainly affected by the industry classification to which the company belongs. Second, governments that support SMEs’ R&D need to provide guidelines that are fit to their situation. It is necessary to differentiate the support for the company considering various factors such as technology and R&D purpose for their effective budget execution. Finally, based on the results of this study, we urge the need to reconsider the effectiveness of existing SME support policies.
|Keyword-based networked knowledge map expressing content relevance between knowledge
Vol. 24, No. 3, Page: 119 ~ 134
Keywords : Knowledge map, Content relevance, Associated knowledge, Knowledge network, Keyword-based
A knowledge map as the taxonomy used in a knowledge repository should be structured to support and supplement knowledge activities of users who sequentially inquire and select knowledge for problem solving. The conventional knowledge map with a hierarchical structure has the advantage of systematically sorting out types and status of the knowledge to be managed, however it is not only irrelevant to knowledge user’s process of cognition and utilization, but also incapable of supporting user's activity of querying and extracting knowledge. This study suggests a methodology for constructing a networked knowledge map that can support and reinforce the referential navigation, searching and selecting related and chained knowledge in term of contents, between knowledge. Regarding a keyword as the semantic information between knowledge, this research’s networked knowledge map can be constructed by aggregating each set of knowledge links in an automated manner. Since a keyword has the meaning of representing contents of a document, documents with common keywords have a similarity in content, and therefore the keyword-based document networks plays the role of a map expressing interactions between related knowledge. In order to examine the feasibility of the proposed methodology, 50 research papers were randomly selected, and an exemplified networked knowledge map between them with content relevance was implemented using common keywords.
|A Study on the Construal Level and Intention of Autonomous Driving Taxi According to Message Framing
Vol. 24, No. 3, Page: 135 ~ 155
Keywords : Interpretation level, Message Framing, Autonomous Driving Car, 4th industrial revolution, Intention to use
The purpose of this study is to analyze the difference of interpretation level and intention to use message framing when autonomous vehicle, which is emerging as the product of 4th industrial revolution, is used as taxi, Interpretation level refers to the interpretation of a product or service, assuming that it will happen in the near future or in the distant future. Message framing refers to the formation of positive or negative expressions or messages at the extremes of benefits and losses. In other words, previous studies interpret the value of a product or service differently according to these two concepts. The purpose of this study is to investigate whether there are differences in intention to use when two concepts are applied when an autonomous vehicle is launched as a taxi. The results are summarized as follows: First, the message format explaining the gain and why should be used when using the autonomous taxi in the message framing configuration, and the loss and how when the autonomous taxi is not used. Messages were constructed and compared. The two message framing differed (t = 3.063), and the message type describing the benefits and reasons showed a higher intention to use. In addition, the results according to interpretation level are summarized as follows. There was a difference in intentions to use when assuming that it would occur in the near future and in the near future with respect to the gain and loss, Respectively. In summary, in order to increase the intention of using autonomous taxis, it is concluded that messages should be given to people assuming positive messages (Gain) and what can happen in the distant future. In addition, this study will be able to utilize the research method in studying intention to use new technology. However, this study has the following limitations. First, it assumes message framing and time without user experience of autonomous taxi. This will be different from the actual experience of using an autonomous taxi in the future. Second, self-driving cars should technical progress is continuing, but laws and institutions must be established in order to commercialize it and build the infrastructure to operate the autonomous car. Considering this fact, the results of this study can not reflect a more realistic aspect. However, there is a practical limit to search for users with sufficient experience in new technologies such as autonomous vehicles. In fact, although the autonomous car to take advantage of the public transportation by taxi is now ready for the road infrastructure, and technical and legal public may not be willing to choose to not have enough knowledge to use the Autonomous cab. Therefore, the main purpose of this study is that by assuming that autonomous cars will be commercialized by taxi you can do to take advantage of the autonomous car, it is necessary to frame the message, why can most effectively be used to find how to deliver. In addition, the research methodology should be improved and future research should be done as follows. First, most students responded in this study. It is also true that it is difficult to generalize the hypotheses to be tested in this study. Therefore, in future studies, it would be reasonable to investigate the population of various distribution considering the age, area, occupation, education level, etc. Where autonomous taxi can be used rather than those who can drive. Second, it is desirable to construct various message framing of the questionnaire, but it is necessary to learn various message framing in advance and to prevent errors in response to the next message framing. Therefore, it is desirable to measure the message framing with a certain amount of time when the questionnaire is designed.
|Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market
Vol. 24, No. 3, Page: 157 ~ 176
Keywords : Administrative Issue, Logistic Regression, Decision Tree, KOSDAQ-listed Companies
The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues.
According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder’s equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%.
Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder’s equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive.
If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities.
Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased.
In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.
|A Study on Public Interest-based Technology Valuation Models in Water Resources Field
Vol. 24, No. 3, Page: 177 ~ 198
Keywords : Web-based Evaluation System, Water Resource Fields, Public Interest-based Technology, Technology Valuation, Evaluation of Research and Development Performance, Technology Assessment, Cost Benefit Analysis
Recently, as economic property it has become necessary to acquire and utilize the framework for water resource measurement and performance management as the property of water resources changes to hold “public property”. To date, the evaluation of water technology has been carried out by feasibility study analysis or technology assessment based on net present value (NPV) or benefit-to-cost (B/C) effect, however it is not yet systemized in terms of valuation models to objectively assess an economic value of technology-based business to receive diffusion and feedback of research outcomes. Therefore, K-water (known as a government-supported public company in Korea) company feels the necessity to establish a technology valuation framework suitable for technical characteristics of water resources fields in charge and verify an exemplified case applied to the technology.
The K-water evaluation technology applied to this study, as a public interest goods, can be used as a tool to measure the value and achievement contributed to society and to manage them. Therefore, by calculating the value in which the subject technology contributed to the entire society as a public resource, we make use of it as a basis information for the advertising medium of performance on the influence effect of the benefits or the necessity of cost input, and then secure the legitimacy for large-scale R&D cost input in terms of the characteristics of public technology. Hence, K-water company, one of the public corporation in Korea which deals with public goods of ‘water resources’, will be able to establish a commercialization strategy for business operation and prepare for a basis for the performance calculation of input R&D cost.
In this study, K-water has developed a web-based technology valuation model for public interest type water resources based on the technology evaluation system that is suitable for the characteristics of a technology in water resources fields. In particular, by utilizing the evaluation methodology of the Institute of Advanced Industrial Science and Technology (AIST) in Japan to match the expense items to the expense accounts based on the related benefit items, we proposed the so-called ‘K-water's proprietary model’ which involves the ‘cost-benefit’ approach and the FCF (Free Cash Flow), and ultimately led to build a pipeline on the K-water research performance management system and then verify the practical case of a technology related to "desalination".
We analyze the embedded design logic and evaluation process of web-based valuation system that reflects characteristics of water resources technology, reference information and database(D/B)-associated logic for each model to calculate public interest-based and profit-based technology values in technology integrated management system . We review the hybrid evaluation module that reflects the quantitative index of the qualitative evaluation indices reflecting the unique characteristics of water resources and the visualized user-interface (UI) of the actual web-based evaluation, which both are appended for calculating the business value based on financial data to the existing web-based technology valuation systems in other fields.
K-water's technology valuation model is evaluated by distinguishing between public-interest type and profitable-type water technology. First, evaluation modules in profit-type technology valuation model are designed based on ‘profitability of technology’. For example, the technology inventory K-water holds has a number of profit-oriented technologies such as water treatment membranes. On the other hand, the public interest-type technology valuation is designed to evaluate the public-interest oriented technology such as the dam, which reflects the characteristics of public benefits and costs.
In order to examine the appropriateness of the cost-benefit based public utility valuation model (i.e.
K-water specific technology valuation model) presented in this study, we applied to practical cases from calculation of benefit-to-cost analysis on water resource technology with 20 years of lifetime. In future we will additionally conduct verifying the K-water public utility-based valuation model by each business model which reflects various business environmental characteristics.
|Issue tracking and voting rate prediction for 19th Korean president election candidates
Vol. 24, No. 3, Page: 199 ~ 219
Keywords : 19th president election, comments, text mining, Big data analysis
With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research.
Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll.
This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day.
Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score.
By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates.
Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy.
Candidates can use positive issues more actively on election strategies, and try to correct negative issues.
Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem.
Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective.
If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.