Model-fusion-gesture-recognition-based-on-LSTM-and-Mediapipe

This project extracts feature points based on the Mediapipe open-source model, uses LSTM, CNN, and Vision of Transformer network models for training to obtain a deep learning model, and finally deploys and applies it.The download link for the relevant model weight files百度网盘:

https://pan.baidu.com/s/1avCHiO_LR4AHeJfhMC2Ntw?pwd=2xjf  
Extraction code: 2xjf

1.Environment

Package Name	Version	Package Name	Version
opencv-python	4.9.0.80	timm	0.9.2
pillow	10.2.0	torchvision	0.19.1
torch	2.4.1	Python	3.11.9
mediapipe	0.10.14	CUDA	12.7
einops	0.8.0	NVIDIA-SMI	566.14
h5py	3.11.0	scikit-learn	1.4.2

2.Usage

Our code uses the feature point extraction technology based on mediapipe, and uses the relevant code in utils.py to package the generated data set. Then we use CNN + LSTM and ViT + LSTM technologies to train and test the data set respectively. Finally, the obtained model files are deployed for real-time recognition of dynamic gestures. Among them, our project designed two GUI interfaces for data collection and gesture recognition to facilitate the processing of related data, namely APP_Data_Collector.py and APP_Gesture_Recognizer.py. Of course, you can also run both interfaces at the same time by running APP_Redal.py for operation. If you try to input a new gesture, you can collect about 50 gesture videos of yours lasting for 1.5 seconds through the data collection interface, then extract the key point data through utils.py, and finally obtain a new network model through train.py.

Data Collection

First of all, you can click "New Class Name" to add new gesture categories. Then the default collection time of the system is 1.5 seconds. You need to complete the gesture collection within 1.5 seconds and repeat the collection of the same gesture 50 times. When it's done, run utils.py to complete the collection of the key point coordinates of the video gesture, and the system will automatically save the data under the embed/ folder.

Gesture Recognition

Next, it's about the deployment of the trained models. This model used CNN, PointNet, Transformer, and LSTM and obtained three models. Finally, the model results were fused and output to obtain the final gesture classification result. Of course, the relevant gesture classification demonstrations I have uploaded to BiliBili related videos, and you can watch it directly through the link. Finally, there are two choices of gesture recognition models, and you can check this by running the program.

# You can run this code to get the GUI interface
python APP_Redal.py

3.Model

The three models implemented in this project all display the relevant network structure in the .onnx format. You can also generate the .onnx file by running the relevant code under models/ and visualize the model results through the Netron app.

4.Training & Testing

Next, you can train the network model by running the train.py code file. And the relevant test code is also in the same folder. You can use the plot_confusion_matrix function in the code to draw the confusion matrix and classification score of the test.The following is the display of the confusion matrices of the three model files of this project that I present for your reference and model selection.

5.Thanks

Finally, Thank you for watching. Please could you light up a little star.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
asserts		asserts
embed		embed
logs		logs
models		models
output_image		output_image
.gitignore		.gitignore
APP_Data_Collector.py		APP_Data_Collector.py
APP_Gesture_Recognizer.py		APP_Gesture_Recognizer.py
APP_Redal.py		APP_Redal.py
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
gesture_recognition_notes.txt		gesture_recognition_notes.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model-fusion-gesture-recognition-based-on-LSTM-and-Mediapipe

1.Environment

2.Usage

Data Collection

Gesture Recognition

3.Model

4.Training & Testing

5.Thanks

About

Uh oh!

Releases

Packages

Languages

License

Rtwotwo/Gesture-Recognition

Folders and files

Latest commit

History

Repository files navigation

Model-fusion-gesture-recognition-based-on-LSTM-and-Mediapipe

1.Environment

2.Usage

Data Collection

Gesture Recognition

3.Model

4.Training & Testing

5.Thanks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages