Deep convolutional neural networks (CNNs) aresuccessful in self-extracting features for video object detection.The deep features and shallow features extracted from CNN are different. The shallow features have low-level semantic information, while the deep features contain high-level semantic information. In this paper, we propose an effective feature fusion method: Multi-level feature aggregation (MFA), which connects the output layer of each stage to the input layer of other stages and combines the output of each stage at the last layer of the network. This architecture can effectively combine shallow features and deep features to enhance the ability of expressing features and recognition accuracy. MFA is a flexible and end-to-end network. In addition, our experiments prove that MFA achieves significant accuracy on DET and VID datasets on object detection, and our method achieves mAP on DET and VID.
MFA: Multi-level Feature Aggregation for Video Recognition Na Li, Kuangang Fan, Ouyang Qinghua, Yahui Liu
Cart
Create Account
Sign In