Our Paper Published in Journal of Signal Processing Systems, June 2019, Volume 91, Issue 6

Our Paper Published in Journal of Signal Processing Systems, June 2019, Volume 91, Issue 6, pp 609–625

Bibliography Information

Yudistira, N., & Kurita, T. (2019). Deep Packet Flow: Action Recognition via Multiresolution Deep Wavelet Packet of Local Dense Optical Flows. Journal of Signal Processing Systems91(6), 609-625.


Action recognition with dynamic actor and scene has been a tremendous research topic. Recently, spatio temporal features such as optical flows have been utilized to define motion representation over sequence of time. However, to increase accuracy, deep decomposition is necessary either to enrich information under location or time-varying actions due to spatio temporal dynamics. To this end, we propose algorithm consists of vectors obtained by applying multi-resolution analysis of motion using Haar Wavelet Packet (HWP) over time. Its computation efficiency and robustness have led HWP to gain popularity in texture analysis but their applicability in motion analysis is yet to be explored. To extract representation, a sequence of bin of Histogram of Flow (HOF) is treated as signal channel. Deep decomposition is then applied by utilizing Wavelet Packet decomposition called Packet Flow to many levels. It allows us to represent action’s motions with various speeds and ranges which focuses not only on HOF within one frame or one cuboid but also on the temporal sequence. HWP, however, has translation covariant property that is not efficient in performance because actions occur in arbitrary time and various sampling location. To gain translation invariant capability, we pool each respective coefficient of decomposition for each level. It is found that with proper packet selection, it gives comparable results on the KTH action and Hollywood dataset with train-test division without localization. Even if spatiotemporal cuboid sampling is not densely sampled like of baseline method, we achieve lower complexity and comparable performance on camera motion burdened dataset like UCF Sports that motion features such as HOF do not perform well.