Journal of Intelligence and Information Systems,
Vol. 17, No. 4, December 2011
Product Family Design based on Analytic Network Process
Tai-Oun Kim
Vol. 17, No. 4, Page: 1 ~ 17
Keywords : Analytic Network Process, QFD
In order to maintain customer satisfaction and to remain productive and efficient in today's global competition, mass customization is adopted in many leading companies. Mass customization through product family and product platform enables companies to develop new products with flexibility, efficiency and quick responsiveness. Thus, product family strategy based on product platform is well suited to realize the mass customization. Product family is defined as a group of related products that share common features, components, and subsystems; and satisfy a variety of market niches. The objective is to propose a product family design strategy that provides priority weights among product components by satisfying customer requirements. The decision making process for a new product development requires a multiple criteria decision making technique with feedback. An analytical network process is adopted for the decision making modeling and procedure. For the implementation, a netbook product known as a small PC which is appropriate for the product family model is adopted. According to the proposed architecture, the priority weight of each component for each product family is derived. The relationship between the customer requirement and product component is analyzed and evaluated using QFD model.
Semantic Search : A Survey
Jin-Soo Park, Nam-Won Kim, Min-Jung Choi, Zhe Jin, and Young-Seok Choi
Vol. 17, No. 4, Page: 19 ~ 36
Keywords : Semantic Web, Semantic Search, Query Revision
Since the ambitious declaration of the vision of the Semantic Web, a growing number of studies on semantic search have recently been made. However, we recognize that our community has not so much accomplished despite those efforts. We analyze two underlying problems : a lack of a shared notion of semantic search that guides current research, and a lack of a comprehensive view that envisions future work. Based on this diagnosis, we start by defining semantic search as the process of retrieving desired information in response to user's input using semantic technologies such as ontologies. Then, we propose a classification framework in order for the community to obtain the better understanding of semantic search. The proposed classification framework consists of input processing, target source, search methodology, results ranking, and output data type. Last, we apply our proposed framework to prior studies and suggest future research directions.
Automatic Generation of DB Images for Testing Enterprise Systems
Oh-Seung Kwon, and Sa-Neung Hong
Vol. 17, No. 4, Page: 37 ~ 58
Keywords : Test Automation, Test Case, Database Application System, Regression Testing, Regression Testing, Database Pre/Post Image, Query ependency Sequence
In general, testing DB applications is much more difficult than testing other types of software. The fact that the DB states as much as the input data influence and determine the procedures and results of program testing is one of the decisive reasons for the difficulties. In order to create and maintain proper DB states for testing, it not only takes a lot of time and efforts, but also requires extensive IT expertise and business knowledge. Despite the difficulties, there are not enough research and tools for the needed help. This article reports the result of research on automatic creation and maintenance of DB states for testing DB applications. As its core, this investigation develops an automation tool which collects relevant information from a variety of sources such as log, schema, tables and messages, combines collected information intelligently, and creates pre- and post-Images of database tables proper for application tests. The proposed procedures and tool are expected to be greatly helpful for overcoming inefficiencies and difficulties in not just unit and integration tests but including regression tests. Practically, the tool and procedures proposed in this research allows developers to improve their productivity by reducing time and effort required for creating and maintaining appropriate DB sates, and enhances the quality of DB applications since they are conducive to a wider variety of test cases and support regression tests. Academically, this research deepens our understanding and introduces new approach to testing enterprise systems by analyzing patterns of SQL usages and defining a grammar to express and process the patterns.
CTKOS : Categorized Tag-based Knowledge Organization System
Dong-Hee Yoo, Gun-Woo Kim, Keun-Ho Choi, and Yong-Moo Suh
Vol. 17, No. 4, Page: 59 ~ 74
Keywords : Foxonomy, Tag, Web 2.0, Collective Intelligence, Knowledge Organization System
As more users are willingly participating in the creation of web contents, flat folksonomy using simple tags has emerged as a powerful instrument to classify and share a huge amount of knowledge on the web. However, flat folksonomy has semantic problems, such as ambiguity and misunderstanding of tags. To alleviate such problems, many studies have built structured folksonomy with a hierarchical structure or relationships among tags. However, structured folksonomy also has some fundamental problems, such as limited tagging to pre-defined vocabulary for new tags and the timeconsuming manual effort required for selecting tags. To resolve these problems, we suggested a new method of attaching a categorized tag (CT), followed by its category, to web content. CTs are automatically integrated into collaboratively-built structured folksonomy (CSF) in real time, reflecting the tag-and-category relationships by majority users. Then, we developed a CT-based knowledge organization system (CTKOS), which builds the CSF to classify organizational knowledge and allows us to locate the appropriate knowledge.
The Impact of Message Characteristics on Online Viral Diffusion in Online Social Media Services : The Case of Twitter
Young-Woo Nam, In-Soo Son, and Dong-Won Lee
Vol. 17, No. 4, Page: 75 ~ 94
Keywords : Information Diffusion, Message Characteristics, Social Media, Twitter
In this paper, we explore the information diffusion mechanism under social network environments by investigating the effect of message characteristics on the volume and speed of retweeting in Twitter, a popular online social media service. To this end, we select eight main keywords (i.e., '무상급식', '반값등록금', '나가수', '평창', '김연아', '박태환', '아이폰', '갤럭시') that have been popular on online social media in recent days. Each keyword represents various social aspects of Korea that recently grab people's attention such as political issues, entertainment, sports celebrities, and the latest digital products, and eventually holds distinctive message characteristics. Analyzing the frequency and velocity of retweeting for each keyword, we find that more than half of the sample messages posted on Twitter contain personal opinions for the certain keyword, but we also find that the tweets which include objective messages with hyperlink are the fastest ones when being retweeted by other followers. In overall, when being retweeted, the group of messages related to the certain keyword present distinctive diffusion patterns and speed according to message characteristics. From academic perspective, the findings in the study broaden our theoretical knowledge of information diffusion mechanism over online social media. For practitioners, the results also provide managerial implications regarding how to strategically utilize online social media for marketing communications with customers.
The Adaptive Personalization Method According to Users Purchasing Index : Application to Beverage Purchasing Predictions
Yoon-Joo Park
Vol. 17, No. 4, Page: 95 ~ 108
Keywords : Personalization, Customer Segmentation, Data Sparsity, Intelligent Recommendation
This is a study of the personalization method that intelligently adapts the level of clustering considering purchasing index of a customer. In the e-biz era, many companies gather customers' demographic and transactional information such as age, gender, purchasing date and product category. They use this information to predict customer's preferences or purchasing patterns so that they can provide more customized services to their customers. The previous Customer-Segmentation method provides customized services for each customer group. This method clusters a whole customer set into different groups based on their similarity and builds predictive models for the resulting groups. Thus, it can manage the number of predictive models and also provide more data for the customers who do not have enough data to build a good predictive model by using the data of other similar customers. However, this method often fails to provide highly personalized services to each customer, which is especially important to VIP customers. Furthermore, it clusters the customers who already have a considerable amount of data as well as the customers who only have small amount of data, which causes to increase computational cost unnecessarily without significant performance improvement. The other conventional method called 1-to-1 method provides more customized services than the Customer-Segmentation method for each individual customer since the predictive model are built using only the data for the individual customer. This method not only provides highly personalized services but also builds a relatively simple and less costly model that satisfies with each customer. However, the 1-to-1 method has a limitation that it does not produce a good predictive model when a customer has only a few numbers of data. In other words, if a customer has insufficient number of transactional data then the performance rate of this method deteriorate. In order to overcome the limitations of these two conventional methods, we suggested the new method called Intelligent Customer Segmentation method that provides adaptive personalized services according to the customer's purchasing index. The suggested method clusters customers according to their purchasing index, so that the prediction for the less purchasing customers are based on the data in more intensively clustered groups, and for the VIP customers, who already have a considerable amount of data, clustered to a much lesser extent or not clustered at all. The main idea of this method is that applying clustering technique when the number of transactional data of the target customer is less than the predefined criterion data size. In order to find this criterion number, we suggest the algorithm called sliding window correlation analysis in this study. The algorithm purposes to find the transactional data size that the performance of the 1-to-1 method is radically decreased due to the data sparity. After finding this criterion data size, we apply the conventional 1-to-1 method for the customers who have more data than the criterion and apply clustering technique who have less than this amount until they can use at least the predefined criterion amount of data for model building processes. We apply the two conventional methods and the newly suggested method to Neilsen's beverage purchasing data to predict the purchasing amounts of the customers and the purchasing categories. We use two data mining techniques (Support Vector Machine and Linear Regression) and two types of performance measures (MAE and RMSE) in order to predict two dependent variables as aforementioned. The results show that the suggested Intelligent Customer Segmentation method can outperform the conventional 1-to-1 method in many cases and produces the same level of performances compare with the Customer-Segmentation method spending much less computational cost.
A Real-Time Stock Market Prediction Using Knowledge Accumulation
Jin-Hwa Kim, Kwang-Hun Hong, and Jin-Young Min
Vol. 17, No. 4, Page: 109 ~ 130
Keywords : Stock Market Prediction, Stream Data, Data Mining
One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.
A Comparative Study of Information Delivery Method in Networks According to Off-line Communication
Won-Kuk Park, Chan Choi, Hyun-Sil Moon, Il-Young Choi, and Jae-Kyeong Kim
Vol. 17, No. 4, Page: 131 ~ 142
Keywords : Social Network Service, Social Network Analysis, Communication
In recent years, Social Network Service, which is defined as a web-based service that allows an individual to construct a public or a semi-public profile within a bounded system, articulates a list of other users with whom they share connections, and traverses their list of connections. For example, Facebook and Twitter are the representative sites of Social Network Service, and these sites are the big issue in the world. A lot of people use Social Network Services to connect and maintain social relationship. Recently the users of Social Network Services have increased dramatically. Accordingly, many organizations become interested in Social Network Services as means of marketing, media, communication with their customers, and so on, because social network services can offer a variety of benefits to organizations such as companies and associations. In other words, organizations can use Social Network Services to respond rapidly to various user's behaviors because Social Network Services can make it possible to communicate between the users more easily and faster. And marketing cost of the Social Network Service is lower than that of existing tools such as broadcasts, news papers, and direct mails. In addition, Social network Services are growing in market place. So, the organizations such as companies and associations can acquire potential customers for the future. However, organizations uniformly communicate with users through Social Network Service without consideration of the characteristics of the networks although networks have different effects on information deliveries. For example, members' cohesion in an offline communication is higher than that in an online communication because the members of the offline communication are very close. that is, the network of the offline communication has a strong tie. Accordingly, information delivery is fast in the network of the offline communication. In this study, we compose two networks which have different characteristic of communication in Twitter. First network is constructed with data based on an offline communication such as friend, family, senior and junior in school. Second network is constructed with randomly selected data from users who want to associate with friends in online. Each network size is 250 people who divide with three groups. The first group is an ego which means a person in the center of the network. The second group is the ego's followers. The last group is composed of the ego's follower's followers. We compare the networks through social network analysis and follower's reaction analysis. We investigate density and centrality to analyze the characteristic of each network. And we analyze the follower's reactions such as replies and retweets to find differences of information delivery in each network. Our experiment results indicate that density and centrality of the offline communicationbased network are higher than those of the online-based network. Also the number of replies are larger than that of retweets in the offline communication-based network. On the other hand, the number of retweets are larger than that of replies in the online based network. We identified that the effect of information delivery in the offline communication-based network was different from those in the online communication-based network through experiments. So, you configure the appropriate network types considering the characteristics of the network if you want to use social network as an effective marketing tool.
Electronic Roll Book using Electronic Bracelet.Child Safe-Guarding Device System
Seung-Jin Moon, Tae-Nam Kim, and Pan-Su Kim
Vol. 17, No. 4, Page: 143 ~ 155
Keywords : RFID, Electronic Bracelet, Electronic Roll, GPS, GPS, Sensor Networks
Lately electronic tagging policy for the sexual offenders was introduced in order to reduce and prevent sexual offences. However, most sexual offences against children happening these days are committed by the tagged offenders whose identities have been released. So, for the crime prevention, we need measures with which we could minimize the suffers more promptly and actively. This paper suggests a new system to relieve the sexual abuse related anxiety of the children and solve the problems that electronic bracelet has. Existing bracelets are only worn by serious criminals, and it's only for risk management and positioning, there is no way to protect the children who are the potential victims of sexual abuse and there actually happened some cases. So we suggest also letting the students(children) wear the LBS(Location Based Service) and USN(Ubiquitous Sensor Network) technology based electronic bracelets to monitor and figure out dangerous situations intelligently, so that we could prevent sexual offences against children beforehand, and while a crime is happening, we could judge the situation of the crime intelligently and take swift action to minimize the suffer. And by checking students' attendance and position, guardians could know where their children are in real time and could protect the children from not only sexual offences but also violent crimes against children like kidnapping. The overall system is like follows : RFID Tag for children monitors the approach of offenders. While an offender's RFID tag is approaching, it will transmit the situation and position as the first warning message to the control center and the guardians. When the offender is going far away, it turns to monitoring mode, and if the tag of the child or the offender is taken off or the child and offender stay at one position for 3~5 minutes or longer, then it will consider this as a dangerous situation, then transmit the emergency situations and position as the second warning message to the control center and the guardians, and ask for the dispatch of police to prevent the crime at the initial stage. The RFID module of criminals' electronic bracelets is RFID TAG, and the RFID module for the children is RFID receiver(reader), so wherever the offenders are, if an offender is at a place within 20m from a child, RFID module for children will transmit the situation every certain periods to the control center by the automatic response of the receiver. As for the positioning module, outdoors GPS or mobile communications module(CELL module)is used and UWB, WI-FI based module is used indoors. The sensor is set under the purpose of making it possible to measure the position coordinates even indoors, so that one could send his real time situation and position to the server of central control center. By using the RFID electronic roll book system of educational institutions and safety system installed at home, children's position and situation can be checked. When the child leaves for school, attendance can be checked through the electronic roll book, and when school is over the information is sent to the guardians. And using RFID access control turnstiles installed at the apartment or entrance of the house, the arrival of the children could be checked and the information is transmitted to the guardians. If the student is absent or didn't arrive at home, the information of the child is sent to the central control center from the electronic roll book or access control turnstiles, and look for the position of the child's electronic bracelet using GPS or mobile communications module, then send the information to the guardians and teacher so that they could report to the police immediately if necessary. Central management and control system is built under the purpose of monitoring dangerous situations and guardians' checking. It saves the warning and pattern data to figure out the areas with dangerous situation, and could help introduce crime prevention systems like CCTV with the highest priority. And by DB establishment personal data could be saved, the frequency of first and second warnings made, the terminal ID of the specific child and offender, warning made position, situation (like approaching, taken off of the electronic bracelet, same position for a certain time) and so on could be recorded, and the data is going to be used for preventing crimes. Even though we've already introduced electronic tagging to prevent recurrence of child sexual offences, but the crimes continuously occur. So I suggest this system to prevent crimes beforehand concerning the children's safety. If we make electronic bracelets easy to use and carry, and set the price reasonably so that many children can use, then lots of criminals could be prevented and we can protect the children easily. By preventing criminals before happening, it is going to be a helpful system for our safe life.
An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost
Hyeon-Uk Lee, and Hyun-Chul Ahn
Vol. 17, No. 4, Page: 157 ~ 173
Keywords : Intrusion Detection System, Support Vector Machines, Asymmetric Error Cost
As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Advanced Search
Date Range