STSP-Net: A Spatial-Temporal Skeletal Perception Network for Robust 3D Pose Estimation in Children's Sports

Main Article Content

Wenyue Liu, DENISE KOH CHOON LIAN, Zhihao Zhang, Jianguo Qiu, Lili Wang

Abstract

Introduction: Children's sports motion pose estimation has significant applications in sports training, health monitoring, and rehabilitation assessment. However, existing 3D pose estimation methods still face challenges in sports scenarios, including insufficient stability in keypoint detection, unreasonable 3D structures, and a lack of temporal consistency in motion trajectories. These issues lead to poor robustness in pose prediction under high-speed motion and occlusion conditions.
Objectives: To address the limitations of current 3D pose estimation methods, this paper aims to propose a novel framework that enhances the stability, structural plausibility, and temporal consistency of pose estimation in dynamic and complex children’s sports scenarios.
Methods: This paper proposes a novel 3D pose estimation framework, STSP-Net (Spatial-Temporal Skeletal Perception Network), which integrates 2D keypoint detection, skeletal structure modeling, and temporal information modeling. Specifically: The Efficient Keypoint Detection Module (EPE-Module) employs a motion-region adaptive enhancement mechanism to improve keypoint detection accuracy and reduce jitter. The Graph-based Skeletal Representation Module (GSR-Module) constructs a human skeleton graph and utilizes a graph attention mechanism to optimize spatial relationships and ensure physical plausibility. The Temporal Motion Perception Module (TMP-Module) adopts a cross-attention mechanism to capture long-term motion trends and applies global temporal constraints to enhance smoothness and consistency.
Results: Experimental results demonstrate that STSP-Net achieves the lowest MPJPE of 48.5 mm on Human3.6M and 49.6 mm on ChildPlay, reducing error by 2.6% and 3.1% compared to the best baseline. It also achieves the lowest TS values of 3.3 mm/s² and 3.4 mm/s², ensuring smoother motion trajectories. Furthermore, STSP-Net maintains stable pose estimation in high-speed motion and occlusion scenarios, consistently outperforming existing methods.
Conclusions: STSP-Net effectively addresses the core challenges in children's sports motion pose estimation by improving keypoint detection stability, enforcing 3D skeletal consistency, and enhancing temporal smoothness. It offers a robust solution for practical applications in sports, health, and rehabilitation domains.

Article Details

Section
Articles