Journal of Intelligence and Information Systems,
Vol. 25, No. 4, December 2019
The Characteristics and Performances of Manufacturing SMEs that Utilize Public Information Support Infrastructure
Keun-Hwan Kim, Taehoon Kwon, and Seung-pyo Jun
Vol. 25, No. 4, Page: 1 ~ 33
Keywords : Public Information support infrastructure, Government funded research institutes (GRIs), Small and medium-sized enterprises (SMEs), Discriminant group characteristics, Multiple mediated moderation effect
The small and medium sized enterprises (hereinafter SMEs) are already at a competitive disadvantaged when compared to large companies with more abundant resources. Manufacturing SMEs not only need a lot of information needed for new product development for sustainable growth and survival, but also seek networking to overcome the limitations of resources, but they are faced with limitations due to their size limitations. In a new era in which connectivity increases the complexity and uncertainty of the business environment, SMEs are increasingly urged to find information and solve networking problems.
In order to solve these problems, the government funded research institutes plays an important role and duty to solve the information asymmetry problem of SMEs. The purpose of this study is to identify the differentiating characteristics of SMEs that utilize the public information support infrastructure provided by SMEs to enhance the innovation capacity of SMEs, and how they contribute to corporate performance.
We argue that we need an infrastructure for providing information support to SMEs as part of this effort to strengthen of the role of government funded institutions; in this study, we specifically identify the target of such a policy and furthermore empirically demonstrate the effects of such policy-based efforts.
Our goal is to help establish the strategies for building the information supporting infrastructure. To achieve this purpose, we first classified the characteristics of SMEs that have been found to utilize the information supporting infrastructure provided by government funded institutions. This allows us to verify whether selection bias appears in the analyzed group, which helps us clarify the interpretative limits of our study results. Next, we performed mediator and moderator effect analysis for multiple variables to analyze the process through which the use of information supporting infrastructure led to an improvement in external networking capabilities and resulted in enhancing product competitiveness. This analysis helps identify the key factors we should focus on when offering indirect support to SMEs through the information supporting infrastructure, which in turn helps us more efficiently manage research related to SME supporting policies implemented by government funded institutions.
The results of this study showed the following. First, SMEs that used the information supporting infrastructure were found to have a significant difference in size in comparison to domestic R&D SMEs, but on the other hand, there was no significant difference in the cluster analysis that considered various variables. Based on these findings, we confirmed that SMEs that use the information supporting infrastructure are superior in size, and had a relatively higher distribution of companies that transact to a greater degree with large companies, when compared to the SMEs composing the general group of SMEs.
Also, we found that companies that already receive support from the information infrastructure have a high concentration of companies that need collaboration with government funded institution. Secondly, among the SMEs that use the information supporting infrastructure, we found that increasing external networking capabilities contributed to enhancing product competitiveness, and while this was no the effect of direct assistance, we also found that indirect contributions were made by increasing the open marketing capabilities: in other words, this was the result of an indirect-only mediator effect. Also, the number of times the company received additional support in this process through mentoring related to information utilization was found to have a mediated moderator effect on improving external networking capabilities and in turn strengthening product competitiveness.
The results of this study provide several insights that will help establish policies. KISTI's information support infrastructure may lead to the conclusion that marketing is already well underway, but it intentionally supports groups that enable to achieve good performance. As a result, the government should provide clear priorities whether to support the companies in the underdevelopment or to aid better performance. Through our research, we have identified how public information infrastructure contributes to product competitiveness. Here, we can draw some policy implications. First, the public information support infrastructure should have the capability to enhance the ability to interact with or to find the expert that provides required information. Second, if the utilization of public information support (online) infrastructure is effective, it is not necessary to continuously provide informational mentoring, which is a parallel offline support. Rather, offline support such as mentoring should be used as an appropriate device for abnormal symptom monitoring. Third, it is required that SMEs should improve their ability to utilize, because the effect of enhancing networking capacity through public information support infrastructure and enhancing product competitiveness through such infrastructure appears in most types of companies rather than in specific SMEs.
Measuring the Public Service Quality Using Process Mining: Focusing on N City's Building Licensing Complaint Service
Jung Seung Lee
Vol. 25, No. 4, Page: 35 ~ 52
Keywords : Process mining, Process map, Process pattern, Building licensing complaint service, Public service quality measure
As public services are provided in various forms, including e-government, the level of public demand for public service quality is increasing. Although continuous measurement and improvement of the quality of public services is needed to improve the quality of public services, traditional surveys are costly and time-consuming and have limitations. Therefore, there is a need for an analytical technique that can measure the quality of public services quickly and accurately at any time based on the data generated from public services.
In this study, we analyzed the quality of public services based on data using process mining techniques for civil licensing services in N city. It is because the N city's building license complaint service can secure data necessary for analysis and can be spread to other institutions through public service quality management.
This study conducted process mining on a total of 3678 building license complaint services in N city for two years from January 2014, and identified process maps and departments with high frequency and long processing time. According to the analysis results, there was a case where a department was crowded or relatively few at a certain point in time. In addition, there was a reasonable doubt that the increase in the number of complaints would increase the time required to complete the complaints.
According to the analysis results, the time required to complete the complaint was varied from the same day to a year and 146 days. The cumulative frequency of the top four departments of the Sewage Treatment Division, the Waterworks Division, the Urban Design Division, and the Green Growth Division exceeded 50% and the cumulative frequency of the top nine departments exceeded 70%. Higher departments were limited and there was a great deal of unbalanced load among departments. Most complaint services have a variety of different patterns of processes.
Research shows that the number of ‘complementary’ decisions has the greatest impact on the length of a complaint. This is interpreted as a lengthy period until the completion of the entire complaint is required because the 'complement' decision requires a physical period in which the complainant supplements and submits the documents again. In order to solve these problems, it is possible to drastically reduce the overall processing time of the complaints by preparing thoroughly before the filing of the complaints or in the preparation of the complaints, or the 'complementary' decision of other complaints. By clarifying and disclosing the cause and solution of one of the important data in the system, it helps the complainant to prepare in advance and convinces that the documents prepared by the public information will be passed.
The transparency of complaints can be sufficiently predictable. Documents prepared by pre-disclosed information are likely to be processed without problems, which not only shortens the processing period but also improves work efficiency by eliminating the need for renegotiation or multiple tasks from the point of view of the processor.
The results of this study can be used to find departments with high burdens of civil complaints at certain points of time and to flexibly manage the workforce allocation between departments. In addition, as a result of analyzing the pattern of the departments participating in the consultation by the characteristics of the complaints, it is possible to use it for automation or recommendation when requesting the consultation department. In addition, by using various data generated during the complaint process and using machine learning techniques, the pattern of the complaint process can be found. It can be used for automation / intelligence of civil complaint processing by making this algorithm and applying it to the system. This study is expected to be used to suggest future public service quality improvement through process mining analysis on civil service.
Object Tracking Based on Exactly Reweighted Online Total-Error-Rate Minimization
Se-In JANG, and Choong-Shik PARK
Vol. 25, No. 4, Page: 53 ~ 65
Keywords : Object Tracking, Online Learning, Total-Error-Rate Minimization, Exact Reweighting
Object tracking is one of important steps to achieve video-based surveillance systems. Object tracking is considered as an essential task similar to object detection and recognition. In order to perform object tracking, various machine learning methods (e.g., least-squares, perceptron and support vector machine) can be applied for different designs of tracking systems. In general, generative methods (e.g., principal component analysis) were utilized due to its simplicity and effectiveness. However, the generative methods were only focused on modeling the target object. Due to this limitation, discriminative methods (e.g., binary classification) were adopted to distinguish the target object and the background. Among the machine learning methods for binary classification, total error rate minimization can be used as one of successful machine learning methods for binary classification. The total error rate minimization can achieve a global minimum due to a quadratic approximation to a step function while other methods (e.g., support vector machine) seek local minima using nonlinear functions (e.g., hinge loss function). Due to this quadratic approximation, the total error rate minimization could obtain appropriate properties in solving optimization problems for binary classification. However, this total error rate minimization was based on a batch mode setting. The batch mode setting can be limited to several applications under offline learning.
Due to limited computing resources, offline learning could not handle large scale data sets. Compared to offline learning, online learning can update its solution without storing all training samples in learning process. Due to increment of large scale data sets, online learning becomes one of essential properties for various applications. Since object tracking needs to handle data samples in real time, online learning based total error rate minimization methods are necessary to efficiently address object tracking problems. Due to the need of the online learning, an online learning based total error rate minimization method was developed. However, an approximately reweighted technique was developed. Although the approximation technique is utilized, this online version of the total error rate minimization could achieve good performances in biometric applications. However, this method is assumed that the total error rate minimization can be asymptotically achieved when only the number of training samples is infinite.
Although there is the assumption to achieve the total error rate minimization, the approximation issue can continuously accumulate learning errors according to increment of training samples. Due to this reason, the approximated online learning solution can then lead a wrong solution. The wrong solution can make significant errors when it is applied to surveillance systems.
In this paper, we propose an exactly reweighted technique to recursively update the solution of the total error rate minimization in online learning manner. Compared to the approximately reweighted online total error rate minimization, an exactly reweighted online total error rate minimization is achieved. The proposed exact online learning method based on the total error rate minimization is then applied to object tracking problems. In our object tracking system, particle filtering is adopted. In particle filtering, our observation model is consisted of both generative and discriminative methods to leverage the advantages between generative and discriminative properties. In our experiments, our proposed object tracking system achieves promising performances on 8 public video sequences over competing object tracking systems. The paired t-test is also reported to evaluate its quality of the results.
Our proposed online learning method can be extended under the deep learning architecture which can cover the shallow and deep networks. Moreover, online learning methods, that need the exact reweighting process, can use our proposed reweighting technique. In addition to object tracking, the proposed online learning method can be easily applied to object detection and recognition. Therefore, our proposed methods can contribute to online learning community and object tracking, detection and recognition communities.
Development of Customer Sentiment Pattern Map for Webtoon Content Recommendation
Jun-Sik Lee, and Park Do-Hyung
Vol. 25, No. 4, Page: 67 ~ 88
Keywords : Big Data Analytics, Consumer Sentiments, Recommendation, Sentimental Analysis, Webtoon
Webtoon is a Korean-style digital comics platform that distributes comics content produced using the characteristic elements of the Internet in a form that can be consumed online. With the recent rapid growth of the webtoon industry and the exponential increase in the supply of webtoon content, the need for effective webtoon content recommendation measures is growing. Webtoons are digital content products that combine pictorial, literary and digital elements. Therefore, webtoons stimulate consumer sentiment by making readers have fun and engaging and empathizing with the situations in which webtoons are produced.
In this context, it can be expected that the sentiment that webtoons evoke to consumers will serve as an important criterion for consumers’ choice of webtoons. However, there is a lack of research to improve webtoons’ recommendation performance by utilizing consumer sentiment. This study is aimed at developing consumer sentiment pattern maps that can support effective recommendations of webtoon content, focusing on consumer sentiments that have not been fully discussed previously. Metadata and consumer sentiments data were collected for 200 works serviced on the Korean webtoon platform ‘Naver Webtoon’ to conduct this study. 488 sentiment terms were collected for 127 works, excluding those that did not meet the purpose of the analysis. Next, similar or duplicate terms were combined or abstracted in accordance with the bottom-up approach. As a result, we have built webtoons specialized sentiment-index, which are reduced to a total of 63 emotive adjectives. By performing exploratory factor analysis on the constructed sentiment-index, we have derived three important dimensions for classifying webtoon types. The exploratory factor analysis was performed through the Principal Component Analysis (PCA) using varimax factor rotation. The three dimensions were named ‘Immersion’, ‘Touch’ and ‘Irritant’ respectively. Based on this, K-Means clustering was performed and the entire webtoons were classified into four types. Each type was named ‘Snack’, ‘Drama’, ‘Irritant’, and ‘Romance’. For each type of webtoon, we wrote webtoon-sentiment 2-Mode network graphs and looked at the characteristics of the sentiment pattern appearing for each type.
In addition, through profiling analysis, we were able to derive meaningful strategic implications for each type of webtoon. First, The ‘Snack’ cluster is a collection of webtoons that are fast-paced and highly entertaining. Many consumers are interested in these webtoons, but they don't rate them well. Also, consumers mostly use simple expressions of sentiment when talking about these webtoons. Webtoons belonging to ‘Snack’ are expected to appeal to modern people who want to consume content easily and quickly during short travel time, such as commuting time. Secondly, webtoons belonging to ‘Drama’ are expected to evoke realistic and everyday sentiments rather than exaggerated and light comic ones. When consumers talk about webtoons belonging to a ‘Drama’ cluster in online, they are found to express a variety of sentiments. It is appropriate to establish an OSMU(One source multi-use) strategy to extend these webtoons to other content such as movies and TV series. Third, the sentiment pattern map of ‘Irritant’ shows the sentiments that discourage customer interest by stimulating discomfort. Webtoons that evoke these sentiments are hard to get public attention. Artists should pay attention to these sentiments that cause inconvenience to consumers in creating webtoons. Finally, Webtoons belonging to ‘Romance’ do not evoke a variety of consumer sentiments, but they are interpreted as touching consumers. They are expected to be consumed as ‘healing content’ targeted at consumers with high levels of stress or mental fatigue in their lives. The results of this study are meaningful in that it identifies the applicability of consumer sentiment in the areas of recommendation and classification of webtoons, and provides guidelines to help members of webtoons’ ecosystem better understand consumers and formulate strategies.
Evaluating the Quality of Recommendation System by Using Serendipity Measure
Dorjmaa Tserendulam , and Shin Taeksoo
Vol. 25, No. 4, Page: 89 ~ 103
Keywords : Recommendation system, Serendipity measure, Unexpectedness, Relevance, Quality
Recently, various approaches to recommendation systems have been studied in terms of the quality of recommendation system. A recommender system basically aims to provide personalized recommendations to users for specific items. Most of these systems always recommend the most relevant items of users or items.
Traditionally, the evaluation of recommender system quality has focused on the various predictive accuracy metrics of these. However, recommender system must be not only accurate but also useful to users. User satisfaction with recommender systems as an evaluation criterion of recommender system is related not only to how accurately the system recommends but also to how much it supports the user’s decision making. In particular, highly serendipitous recommendation would help a user to find a surprising and interesting item. Serendipity in this study is defined as a measure of the extent to which the recommended items are both attractive and surprising to the users. Therefore, this paper proposes an application of serendipity measure to recommender systems to evaluate the performance of recommender systems in terms of recommendation system quality. In this study we define relevant or attractive unexpectedness as serendipity measure for assessing recommendation systems. That is, serendipity measure is evaluated as the measure indicating how the recommender system can find unexpected and useful items for users.
Our experimental results show that highly serendipitous recommendation such as item-based collaborative filtering method has better performance than the other recommendations, i.e. user-based collaborative filtering method in terms of recommendation system quality.
Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity
Min Seok Lee, Seok Woo Yang, and Hong Joo Lee
Vol. 25, No. 4, Page: 105 ~ 122
Keywords : Sentence Classification, Feature Selection, Information Gain, Word Similarity, Word Embedding
Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information.
On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words.
The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM.
This study uses customer reviews on Kindle in, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words.
We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.
Development of Intelligent Job Classification System based on Job Posting on Job Sites
Jung Seung Lee
Vol. 25, No. 4, Page: 123 ~ 139
Keywords : Association rule, Frequent pattern mining, Job classification system, Software Industry
The job classification system of major job sites differs from site to site and is different from the job classification system of the ‘SQF(Sectoral Qualifications Framework)’ proposed by the SW field.
Therefore, a new job classification system is needed for SW companies, SW job seekers, and job sites to understand. The purpose of this study is to establish a standard job classification system that reflects market demand by analyzing SQF based on job offer information of major job sites and the NCS(National Competency Standards).
For this purpose, the association analysis between occupations of major job sites is conducted and the association rule between SQF and occupation is conducted to derive the association rule between occupations. Using this association rule, we proposed an intelligent job classification system based on data mapping the job classification system of major job sites and SQF and job classification system. First, major job sites are selected to obtain information on the job classification system of the SW market. Then We identify ways to collect job information from each site and collect data through open API. Focusing on the relationship between the data, filtering only the job information posted on each job site at the same time, other job information is deleted. Next, we will map the job classification system between job sites using the association rules derived from the association analysis. We will complete the mapping between these market segments, discuss with the experts, further map the SQF, and finally propose a new job classification system.
As a result, more than 30,000 job listings were collected in XML format using open API in 'WORKNET,' 'JOBKOREA,' and 'saramin', which are the main job sites in Korea. After filtering out about 900 job postings simultaneously posted on multiple job sites, 800 association rules were derived by applying the Apriori algorithm, which is a frequent pattern mining. Based on 800 related rules, the job classification system of WORKNET, JOBKOREA, and saramin and the SQF job classification system were mapped and classified into 1st and 4th stages.
In the new job taxonomy, the first primary class, IT consulting, computer system, network, and security related job system, consisted of three secondary classifications, five tertiary classifications, and five fourth classifications. The second primary classification, the database and the job system related to system operation, consisted of three secondary classifications, three tertiary classifications, and four fourth classifications. The third primary category, Web Planning, Web Programming, Web Design, and Game, was composed of four secondary classifications, nine tertiary classifications, and two fourth classifications. The last primary classification, job systems related to ICT management, computer and communication engineering technology, consisted of three secondary classifications and six tertiary classifications. In particular, the new job classification system has a relatively flexible stage of classification, unlike other existing classification systems. WORKNET divides jobs into third categories, JOBKOREA divides jobs into second categories, and the subdivided jobs into keywords. saramin divided the job into the second classification, and the subdivided the job into keyword form. The newly proposed standard job classification system accepts some keyword-based jobs, and treats some product names as jobs. In the classification system, not only are jobs suspended in the second classification, but there are also jobs that are subdivided into the fourth classification. This reflected the idea that not all jobs could be broken down into the same steps. We also proposed a combination of rules and experts' opinions from market data collected and conducted associative analysis. Therefore, the newly proposed job classification system can be regarded as a data-based intelligent job classification system that reflects the market demand, unlike the existing job classification system.
This study is meaningful in that it suggests a new job classification system that reflects market demand by attempting mapping between occupations based on data through the association analysis between occupations rather than intuition of some experts. However, this study has a limitation in that it cannot fully reflect the market demand that changes over time because the data collection point is temporary. As market demands change over time, including seasonal factors and major corporate public recruitment timings, continuous data monitoring and repeated experiments are needed to achieve more accurate matching. The results of this study can be used to suggest the direction of improvement of SQF in the SW industry in the future, and it is expected to be transferred to other industries with the experience of success in the SW industry.
Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Model
Ho-yeon Park, and Kyoung-jae Kim
Vol. 25, No. 4, Page: 141 ~ 154
Keywords : CNN, LSTM, Deep Learning, Integrated Model, Movie Review, Sentiment Analysis
Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models.
Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Advanced Search
Date Range