Why do we need to take care of the spine during yoga?
When we want to strengthen the back with the help of yoga, it's vital that you carefully monitor the correct position of the back (posture). However, this is not so simple, as we cannot see ourselves from the correct angle to do this.
During yoga classes, instructors, as a rule, pay great attention to teaching their students to maintain the so-called "neutral" position of the back. This means that all 3 physiological bends of the spine are present: cervical and lumbar lordosis, and thoracic kyphosis, which in turn allows the spine to correctly perform its shock-absorbing function.

If exercises are performed with improper posture, you can get significant injuries, since a large portion of the load will fall on the vulnerable parts of the spine: the cervical and lumbar areas.

To solve any task with machine learning, we first need to state it in a measurable way. This means that we can unambiguously map some real-world object to a measure. For spine curvature, we've selected the following options: estimate the curvature of a spine as a whole (curved, neutral, or concave) or estimate each segment of the spine separately. The spine consists of several sections, and we'd like to assess each of them: lumbar spine, thoracic spine, and cervical spine.
Zenia uses the phone's front camera to analyze users' movements and give feedback on the go. Our pose estimation model takes RGB images as input and works in real-time on the iPhone, starting with the previous generation SE, and will soon be available on Android. We want spine curvature data to also be available in real-time and not affect the performance of the technology too much.
Research available solutions
Of course, we want to get to work right away, but first, we need to research the field. This is good practice because it allows not only to save time but to analyze the performance ranges and potential trade-offs of available approaches and establish a baseline. Unfortunately, digging through arxiv, Google Scholar, and GitHub brought only papers on spine curvature estimation from X-ray images, which is an important topic for studying spine disorders but not relevant to the task at hand. This means that we'll need to explore the problem ourselves and start from scratch.
Stating the machine learning problem
To solve a problem with machine learning, we need to specify the inputs and outputs for the model. Our model will take a regular image as input, but we need to choose what we want the model to predict. We can formulate the model output as a type of machine learning problem: classification, regression, and semantic segmentation.[1] To make the choice easier, we'll separate them by 3 criteria: difficulty of labeling, interpretability, meaning how hard it will be to use the model predictions in the analysis engine, and potential performance cost.
Based on these criteria, we've chosen classification, because it has high interpretability for our use case and is the easiest to label. If multiple people label the same image, their responses can be used to measure confidence, which helps to model uncertainty.
Data labeling
It usually starts with us (the R&D team) labeling about 500 samples. It helps to better understand the task and find out some of the use cases that we've missed. We also try to make the data-labeling process as efficient as possible, so we've automatically selected only those poses where a person stands with their side to the camera. This was straightforward because we use a custom labeling tool. We've selected the needed poses and added them to the new labeling task - spine curvature classification.
First experiments
To evaluate if our assumptions were correct, and conduct the first set of experiments, 500 images is enough. We've taken our pre-trained pose estimation model, frozen its weights, and added the additional classification head. This approach adds almost no computational overhead and is suited for most mobile devices. We've added classification handling and a new set of image augmentations to our training pipeline.

After debugging (it happens fast on a small dataset) and training, the first model we test it on a set of videos. It allows us not to care too much about overfitting here because we don't rely on metrics to evaluate the model.

These are the results we got:
As expected, our solution works well for simple cases. However, it fails in a more complex scenario. We consider it to be a good result, as it means that the selected approach works, but we need more data.[1]

After the first validation of our assumptions, we can start the data-labeling process and continue the experiments. For example, pass only the part of the image, containing the spine, to the classifier, or add attention using segmentation.[2]

чет как-то недостаточно убедительно

тут можно чето дописать
Later experiments
After getting the labeled data, we retrain our classifier and test it once again.
To have another simple baseline, we've trained a separate ResNet-based classifier, that is not embedded into our main model. Here are the comparison metrics:
So, that's how we've added a spine curvature estimation algorithm to Zenia. It's accurate and fast and doesn't affect performance. We hope that it'll help you focus on your spine more, because the condition of the back is a fundamental factor affecting human health, therefore, we consider it essential to monitor the position of the spine during yoga with Zenia. We also want our students to be able to monitor their progress, which will allow them to improve posture, boost health, and, over time, improve their quality of life.
September 7, 2020

Author: Olga Samoilova
© 2020 Zenia Inc.