Published January 1, 2021 | Version v1
Conference paper Open

PoseTED: A Novel Regression-Based Technique for Recognizing Multiple Pose Instances

  • 1. Bahcesehir Univ, TR-34349 Besiktas, Turkey

Description

Pose estimation for multiple people can be viewed as a hierarchical set predicting challenge. Algorithms are needed to classify all persons according to their physical components appropriately. Pose estimation methods are divided into two categories: (1) heatmap-based, (2) regression-based. Heatmap-based techniques are susceptible to various heuristic designs and are not end-to-end trainable, while regression-based methods involve fewer intermediary non-differentiable stages. This paper presents a novel regression-based multi-instance human pose recognition network called PoseTED. It utilizes the well-known object detector YOLOv4 for person detection, and the spatial transformer network (STN) used as a cropping filter. After that, we used a CNN-based backbone that extracts deep features and positional encoding with an encoder-decoder transformer applied for keypoint detection, solving the heuristic design problem before regression-based techniques and increasing overall performance. A prediction-based feed-forward network (FFN) is used to predict several key locations' posture as a group and display the body components as an output. Two available public datasets are tested in this experiment. Experimental results are shown on the COCO andMPII datasets, with an average precision (AP) of 73.7% on the COCO val. dataset, 72.7% on the COCO test dev. dataset, and 89.7% on the MPII datasets, respectively. These results are comparable to the state-of-the-art methods.

Files

bib-4fdb4a2f-24fd-44f2-b3df-1cccdb928a74.txt

Files (173 Bytes)

Name Size Download all
md5:a43015e6b102d7dab31477d8406a7cc0
173 Bytes Preview Download