3D dynamic hand gestures recognition using the Leap Motion sensor and
Convolutional Neural Networks
Andrea Ranieri - CNR-IMATI
CNNs glossary #3
- CNNs are trained in two phases:
- in the forward pass, features are extracted from the input image and the output of the network is compared to the ground truth through a loss function
- in the backward pass, neurons’ parameters (weights and biases) are adjusted through backpropagation (1989) and gradient descent
- Before ResNets, the vanishing gradient problem made deep CNNs difficult to train, because the so called “loss landscape” was too noisy for gradient descent to make progress
Credits: Visualizing the Loss Landscape of Neural Nets
CNNs glossary #4
- At a higher level of abstraction, a CNN model is trained starting from:
- the network architecture (ResNet-34, ResNet-50, EfficientNet-B4, etc.)
- the dataset (typically composed of training/validation/test set) and a set of data augmentation transformations
- the loss function (crossentropy loss, MSE, MAE, etc.)
- the choice of hyperparameters (batch size, learning rate, number of training epochs, etc.)
Future research directions #1
- What’s the weak point of our current approach?
- There is no way to determine when a gesture starts or ends (it’s just a “single image” classifier)
- Once again, the developer has the burden of deciding when one gesture ends and another begins, when two predictions are part of the same gesture, etc.
- How do we do it?
- One prediction every two seconds (avg. duration of the gestures)
- Thresholds
Future research directions #2
- Two (or three?) possible ways to make the NN deal with temporal information:
- Currently working on:
- 3D-ResNet-50-KMS
- pretrained on Kinetics-700 (K), Moments in Time (M), STAIR-Actions (S)
Future research directions #3
- Same SHREC 2020 contest dataset
- This time not just single images, but entire sequences (30 frames)
- Plain sequences with basic data augmentation
- without partial sequences, without noise
- The good news is that the network is learning
Questions?