Sign Language Recognition using Neural Networks

Mandladharanireddy
5 min readFeb 25, 2022

Mandla Dharani, Marripudi Naga Pujitha

Introduction

Sign language conversion has been a long-standing computer vision problem. Several solutions have come up but none of them have been portable for them to be used in a standalone device or application.

Sign Languages are a set of languages that use predefined actions and movements to convey a message. These languages are primarily developed to aid deaf and other verbally challenged people. Different regions have different sign languages like American Sign Language, Indian Sign Language, etc.

We plan on alleviating this problem by harnessing the power of the mobile phone and the recent advances in deep learning. With the advent of deep learning, end-to-end models are being built for a wide range of problems that only require images as input. Datasets have made it possible to harness the power of the models better.

Problem Statement

In this project, we aim towards analyzing and recognizing various signs from a dynamic database.​

In Dynamic Database, the user can train any number of inputs dynamically. With such a divergent data set, we are able to train our system to good levels and thus obtain good results.​

We propose an end-to-end solution that would require only a 2D image as an input, or the user can train his own dataset. Our aim is to make it easy for people to communicate using the model. There are 20% deaf and mute people who use sign language around the world, it’s our responsibility to make the world easier for them.

Algorithms

We used different machine learning techniques for the detection of sign language.

Support Vector Machines (SVM) ​

Support Vector Machine(SVM) is a supervised machine learning algorithm used for both classification and regression.

Logistic Regression

Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist.

K-nearest neighbors (KNN) ​

KNN works by finding the distances between a query and all the examples in the data, selecting the specified number of examples (K) closest to the query, then voting for the most frequent label or averaging the labels.

Convolution Neural Networks (CNN)

A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data.

Architecture

Pre Processing

The original image pixels are subtracted from all the images. Then resize the image to 244X244, to create more training data augmentation was applied. The data was shuffled in order to have a diverse sub dataset when picked randomly.

Squeezenet

The squeeze net architecture comprises a number of filters. The first level comprises four 1x1 filters that are concatenated at the next layer. The concatenation ensures that the number of parameters is minimal. The primary objective of the squeezenet architecture is to reduce the number of parameters, and in turn the size of the network.

The concatenated layer is fed onto the expand layer and hence the number of interconnections between the squeeze layer and the expand layer is minimal. This ensures that the size of the network is low. The expand layer comprises 3x3 filters along with more 1x1 filters. These are concatenated in order to attain the result.

Results

The model was trained for 50 epochs. The initial training accuracy and validation accuracy increases drastically till the 15th epoch. The accuracy then attains a plateau limit as the epochs increase. The maximum validation accuracy attained is 83.29% at the 24th epoch. Whereas the maximum training accuracy attained is 87.47%. The correlation between the training and validation accuracy is 98.47% which signifies that the model has been trained accurately.

The model is able to give accurate predictions but there are certain cases when it fails. From our observation, we noticed that this happens when similar-looking alphabets like ‘a’ and ‘t’ where the difference between them is a thumb on the side for ‘a’ whereas ‘t’ has a thumb in between the index and middle finger. When an image with different light conditions is given, or the fingers are not visible then it leads to a false prediction.

Future Scope

In the future, we will embed a live video stream application too, where users can operate the software with sign languages. We will use cloud features for more space in our database. In the future using flutter, API’s we develop a mobile application for both Android and IOS. This model can be integrated into our voice and face recognition applications to understand emotions deeper using sign language.

Conclusion

Sign language recognition is a hard problem if we consider all the possible combinations of gestures that a system of this kind needs to understand and translate. That being said, probably the best way to solve this problem is to divide it into simpler problems, and the system presented here would correspond to a possible solution to one of them. It was observed that the model tends to confuse several signs with each other, such as U and W. But thinking a bit about it, maybe it doesn’t need to have a perfect performance since using an orthography corrector or a word predictor would increase the translation accuracy.

The next step is to analyze the solution and study ways to improve the system. Some improvements could be carried by collecting more quality data, trying more convolutional neural network architectures, or redesigning the vision system.

The model developed was a squeezenet architecture which enabled the complete architecture to be stored on a mobile device. This helped with the accessibility of such a solution for the public. Hence, algorithmic recognition for mobile devices is currently preferred in order to present the majority of people with a highly accessible solution. In the future, the dataset preprocessing will help to improve the accuracy of the model. The lighting conditions and distance of the image from the camera should not affect the outcome of the prediction.

References

--

--