DIGITAL LIBRARY ARCHIVE
HOME > DIGITAL LIBRARY ARCHIVE
< Previous   List   Next >  
Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID
Full-text Download
Sang-Hyun Lee (Department of Software and Computer Engineering, Ajou University)
Seong-Hun Yang (Department of Convergence Software, Myongji University)
Seung-Jin Oh (Department of Medical Information Technology Engineering, Soonchunhyang University)
Jinbeom Kang (Chief Technology Officer, Xinapse)
Vol. 28, No. 1, Page: 89 ~ 106
Keywords
Object Detection, Re-identification, Action Detection, Emotion Detection, Video Analysis
Abstract
Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object’s departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization.
In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos.
The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information.
The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value.
In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields’ dataset related to intelligent video analysis.
Show/Hide Detailed Information in Korean
계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템
이상현 (아주대학교 정보통신대학 소프트웨어학과)
양성훈 (명지대학교 ICT융합대학 융합소프트웨어학부)
오승진 (순천향대학교 의료과학대학 의료IT공학과)
강진범 (자이냅스)
Keywords
객체 추적, 재식별, 행동 인식, 표정 인식, 영상 분석
Abstract
최근 영상 데이터의 급증으로 이를 효과적으로 처리하기 위해 객체 탐지 및 추적, 행동 인식, 표정 인식, 재식별(Re-ID)과 같은 다양한 컴퓨터 비전 기술에 대한 수요도 급증했다. 그러나 객체 탐지 및 추적 기술은 객체의 영상 촬영 장소 이탈과 재등장, 오클루전(Occlusion) 등과 같이 성능을 저하시키는 많은 어려움을 안고 있다. 이에 따라 객체 탐지 및 추적 모델을 근간으로 하는 행동 및 표정 인식 모델 또한 객체별 데이터 추출에 난항을 겪는다. 또한 다양한 모델을 활용한 딥러닝 아키텍처는 병목과 최적화 부족으로 성능 저하를 겪는다. 본 연구에서는 YOLOv5기반 DeepSORT 객체추적 모델, SlowFast 기반 행동 인식 모델, Torchreid 기반 재식별 모델, 그리고 AWS Rekognition의 표정 인식 모델을 활용한 영상 분석 시스템에 단일 연결 계층적 군집화(Single-linkage Hierarchical Clustering)를 활용한 재식별(Re-ID) 기법과 GPU의 메모리 스루풋(Throughput)을 극대화하는 처리 기법을 적용한 행동 및 표정 검출용 영상 분석 시스템을 제안한다. 본 연 구에서 제안한 시스템은 간단한 메트릭을 사용하는 재식별 모델의 성능보다 높은 정확도와 실시간에 가까운 처리 성능을 가지며, 객체의 영상 촬영 장소 이탈과 재등장, 오클루전 등에 의한 추적 실패를 방지하고 영상 내 객체별 행동 및 표정 인식 결과를 동일 객체에 지속적으로 연동하여 영상을 효율적으로 분석할 수 있다.
Cite this article
JIIS(APA) Style
Lee, S.-H., Yang, S.-H., Oh, S.-J., & Kang, J. (2022). Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID. Journal of Intelligence and Information Systems, 28(1), 89-106.

IEEE Style
Sang-Hyun Lee, Seong-Hun Yang, Seung-Jin Oh, and Jinbeom Kang, "Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID", Journal of Intelligence and Information Systems, vol. 28, no. 1, pp. 89~106, 2022.

ACM Style
Lee, S.-H., Yang, S.-H., Oh, S.-J., & Kang, J., 2022. Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID. Journal of Intelligence and Information Systems. 28, 1, 89--106.
Export Formats : BiBTeX, EndNote
Advanced Search
Date Range

to
Search
@article{Lee:JIIS:2022:864,
author = {Lee, Sang-Hyun and Yang, Seong-Hun and Oh, Seung-Jin and Kang, Jinbeom},
title = {Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID},
journal = {Journal of Intelligence and Information Systems},
issue_date = {March 2022},
volume = {28},
number = {1},
month = Mar,
year = {2022},
issn = {2288-4866},
pages = {89--106},
url = {},
doi = {},
publisher = {Korea Intelligent Information System Society},
address = {Seoul, Republic of Korea},
keywords = { Object Detection, Re-identification, Action Detection, Emotion Detection and Video Analysis
},
}
%0 Journal Article
%1 864
%A Sang-Hyun Lee
%A Seong-Hun Yang
%A Seung-Jin Oh
%A Jinbeom Kang
%T Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID
%J Journal of Intelligence and Information Systems
%@ 2288-4866
%V 28
%N 1
%P 89-106
%D 2022
%R
%I Korea Intelligent Information System Society