Yoga Pose Classification

Project Goal:

This project compared multiple methods to classify images of people performing 107 different yoga poses. Methodologies tested include a CNN, a fine tuning the VGG19 Imagenet CNN, and Ultralytics Pose Estimation feature extraction with a Neural Network classifier. The trained models were then used to create a web app with Streamlit API, which allows users to upload images of them performing yoga and provide idenfication of the pose.

Modeling Approach:

Regular CNN:
- Architecture: 5 convolutional layers with max pooling, and 2 dense layers- resulting in 812K total parameters.
- Results: 38% Test Accuracy
Fine Tuning of VGG19 Imagenet:
- Architecture: The first 20 layers of the Imagenet model were frozen, with 2 dense layers added- resulting in 46M total parameters.
- Results: 65% Test Accuracy
Pose Estimation Feature Extraction:
- Architecture: Yolo11 Pose was used to create tensors with information about where a person's body and limbs are located relative to one another. These 'pose estimation' results were then fed into a neural network with 6 hidden layers.
- Results: 30% Test Accuracy

Ultralytics provides open source models including YOLO and YOLO-based Posed Estimation models for object detection. The image above provides an example of the Ultralytics pose esitmation output.

Web App Code

Highlighted Code Sections:


# Regular CNN Architecture
model3 = Sequential([
    Conv2D(32, (3, 3), activation="relu", input_shape=(224, 224, 3)),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),

    Conv2D(64, (3, 3), activation="relu"),
    MaxPooling2D(pool_size=(2, 2)),

    Conv2D(128,(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),

    Conv2D(128, (3, 3), activation="relu"),
    MaxPooling2D(pool_size=(2, 2)),

    Conv2D(128, (3, 3), activation="relu"),
    MaxPooling2D(pool_size=(2, 2)),

    Flatten(),
    Dense(128, activation="relu"),

    Dropout(0.5),

    Dense(len(class_labels), activation="softmax")  # Output layer
])

# Compile model with adam optimizer
model3.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["accuracy"])


# VGG Imagenet transfer learning
model = applications.VGG19(weights = "imagenet", include_top=False, input_shape=(224, 224, 3))
for layer in model.layers[:20]:
  layer.trainable = False
#Adding custom Layers
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
x = Dropout(0.5)(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(num_classes, activation="softmax")(x)

# creating the final model
model_final = Model(model.input, predictions)


# Feature Extraction with Ultralytics Pose Estimation and NN Classifier
from ultralytics import YOLO
# Load YOLO Pose Estimation Model
model_yolo = YOLO("yolo11n-pose.pt")  # Update with your model file

# Set dataset path
dataset_path = "/content/yoga-poses-dataset-107/107 yoga poses/train"  # Update with your dataset path

# Get all image file paths
image_paths = glob(os.path.join(dataset_path, "**/*.png"), recursive=True)  # Adjust extension if needed

# Store extracted features and labels
keypoints_list = []
labels = []
valid_image_paths = []  # Store only images where a person is detected

print("Extracting keypoints from images...")

# Process images
for img_path in image_paths:
    # Run YOLO pose estimation
    results = model_yolo(img_path)

    # Extract keypoints (for the first detected person)
    for result in results:
        if result.keypoints is not None and len(result.keypoints.xy.cpu().numpy()) > 0:
            keypoints = result.keypoints.xy.cpu().numpy().flatten()  # Flatten keypoints array
            keypoints_list.append(keypoints)

            # Extract label from folder name (Assuming structure: dataset/class_name/image.jpg)
            label = os.path.basename(os.path.dirname(img_path))
            labels.append(label)
            valid_image_paths.append(img_path)  # Store valid image path

print(f"Total images processed: {len(image_paths)}")
print(f"Valid images with detected persons: {len(valid_image_paths)}")


# NN Classifier
num_classes = len(label_encoder.classes_)  # Number of unique yoga poses

model_nn = Sequential([
    Dense(128, input_shape=(X_train.shape[1],)),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dense(128),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dense(128),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dense(128),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dense(128),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dense(128),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dense(64),
    tf.keras.layers.LeakyReLU(alpha=0.01),
    BatchNormalization(),

    Dropout(0.6),
    Dense(num_classes, activation="softmax")  # Multi-class classification
])

# Compile Model
model_nn.compile(optimizer=Adam(learning_rate=0.001 ),
                 loss="sparse_categorical_crossentropy",
                 metrics=["accuracy"])