Hand Signals Recognition using Convolutional Neural Network
This weekend I manage to release my yet-another-toy project called . hsr
In this medium story, I want to share with you about my experiences on how I collect the training data, how I implement and train the neural net model. I also highlight a TensorFlow’s that I use in this project. Input Pipeline
Collecting The Training Data
Data collection process
I use a Webcam API on Chrome browser to collect the data. The idea is to capture each frames via the webcam and save it to the HTML canvas. Here is the of HTML and JS script that I use. I just click the start button and it start capturing itself. gist
I found that downloading the captured image one-by-one is very time consuming. So, I use this simple script to download all the captured images:
// get possible elements, init variables
image_els = document.querySelectorAll("img");
var image_urls = []; // search for <img src="" />
for (var a = 0; a < image_els.length; a++) {
if ((href = image_els[a].getAttribute("src"))) image_urls.push(href);
} // generate <a> and force download
// download 50 images
for (var i = 0; i < 50; i++) {
var link = document.createElement("a");
var filename = "5-1-" + i;
link.setAttribute("download", filename);
link.setAttribute("href", image_urls[i]);
link.click();
delete link;
}
Neural Network Model
The model is highly inspired by LeNet-5 (LeCun, 1998). You can read the . paper
For the input, I choose the size 320x240. I use two convolution layers with filter size 32x32 and 16x16. All the convolution layer is followed by subsampling layer using max-pooling operations. For the fully-connected layer, I use one hidden layer only with the size of 400 units. For the output layer, I use softmax with the size of 5 units. Each unit on the output layer represent the class of hand signals.
This architecture can be easily translated into computation-graph in TensorFlow:
# Convolutional layer 1
with tf.name_scope('conv1'):
W = tf.Variable(
tf.truncated_normal(
shape=(
CONV1_FILTER_SIZE,
CONV1_FILTER_SIZE,
NUM_CHANNELS,
CONV1_FILTER_COUNT),
dtype=tf.float32,
stddev=5e-2),
name='weights')
b = tf.Variable(
tf.zeros(
shape=(CONV1_FILTER_COUNT),
dtype=tf.float32),
name='biases')
conv = tf.nn.conv2d(
input=images,
filter=W,
strides=(1, 1, 1, 1),
padding='SAME',
name='convolutional')
conv_bias = tf.nn.bias_add(conv, b)
conv_act = tf.nn.relu(
features=conv_bias,
name='activation')
pool1 = tf.nn.max_pool(
value=conv_act,
ksize=(1, 2, 2, 1),
strides=(1, 2, 2, 1),
padding='SAME',
name='subsampling')
# Convolutional layer 2
with tf.name_scope('conv2'):
W = tf.Variable(
tf.truncated_normal(
shape=(
CONV2_FILTER_SIZE,
CONV2_FILTER_SIZE,
CONV1_FILTER_COUNT,
CONV2_FILTER_COUNT),
dtype=tf.float32,
stddev=5e-2),
name='weights')
b = tf.Variable(
tf.zeros(
shape=(CONV2_FILTER_COUNT),
dtype=tf.float32),
name='biases')
conv = tf.nn.conv2d(
input=pool1,
filter=W,
strides=(1, 1, 1, 1),
padding='SAME',
name='convolutional')
conv_bias = tf.nn.bias_add(conv, b)
conv_act = tf.nn.relu(
features=conv_bias,
name='activation')
pool2 = tf.nn.max_pool(
value=conv_act,
ksize=(1, 2, 2, 1),
strides=(1, 2, 2, 1),
padding='SAME',
name='subsampling')
# Hidden layer
with tf.name_scope('hidden'):
conv_output_size = 28800
W = tf.Variable(
tf.truncated_normal(
shape=(conv_output_size, HIDDEN_LAYER_SIZE),
dtype=tf.float32,
stddev=5e-2),
name='weights')
b = tf.Variable(
tf.zeros(
shape=(HIDDEN_LAYER_SIZE),
dtype=tf.float32),
name='biases')
reshape = tf.reshape(
tensor=pool2,
shape=[BATCH_SIZE, -1])
h1 = tf.nn.relu(
features=tf.add(tf.matmul(reshape, W), b),
name='activation')
# Softmax layer
with tf.name_scope('softmax'):
W = tf.Variable(
tf.truncated_normal(
shape=(HIDDEN_LAYER_SIZE, NUM_CLASS),
dtype=tf.float32,
stddev=5e-2),
name='weights')
b = tf.Variable(
tf.zeros(
shape=(NUM_CLASS),
dtype=tf.float32),
name='biases')
logits = tf.add(tf.matmul(h1, W), b, name='logits')
You can see full-version of the . graph
Training The Network
For the training process, the first step is to read all the images.
def read_images(data_dir):
pattern = os.path.join(data_dir, '*.png')
filenames = tf.train.match_filenames_once(pattern, name='list_files')
queue = tf.train.string_input_producer(
filenames,
num_epochs=NUM_EPOCHS,
shuffle=True,
name='queue')
reader = tf.WholeFileReader()
filename, content = reader.read(queue, name='read_image')
filename = tf.Print(
filename,
data=[filename],
message='loading: ')
filename_split = tf.string_split([filename], delimiter='/')
label_id = tf.string_to_number(tf.substr(filename_split.values[1],
0, 1), out_type=tf.int32)
label = tf.one_hot(
label_id-1,
5,
on_value=1.0,
off_value=0.0,
dtype=tf.float32)
img_tensor = tf.image.decode_png(
content,
dtype=tf.uint8,
channels=3,
name='img_decode')
# Preprocess the image, Performs random transformations
# Random flip
img_tensor_flip = tf.image.random_flip_left_right(img_tensor)
# Random brightness
img_tensor_bri = tf.image.random_brightness(img_tensor_flip,
max_delta=0.2)
# Per-image scaling
img_tensor_std = tf.image.per_image_standardization(img_tensor_bri)
min_after_dequeue = 1000
capacity = min_after_dequeue + 3 * BATCH_SIZE
example_batch, label_batch = tf.train.shuffle_batch(
[img_tensor_std, label],
batch_size=BATCH_SIZE,
shapes=[(IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS), (NUM_CLASS)],
capacity=capacity,
min_after_dequeue=min_after_dequeue,
name='train_shuffle')
return example_batch, label_batch
read_images
takes an path to directory as an argument. This will take care of
the preprocessing step like a randomly flip the image and shuffle the batch
process. The random transformation step is used to prevent overfitting on the
network.
This function is implementation of Input Pipeline in TensorFlow. The idea is :
you create a producer, in this case the producer is string_input_producer
then
you create the reader
for each produced string. The last one, you pass it the
results to another queue that handle a batching process like
tf.train.shuffle_batch
. And each you run this graph, you will get the batch of
data.
The next step is to define a loss function and the optimizer. I use cross entropy for the loss and Adam for the optimizer.
That’s it. You can see a full implementation on . my github
TAGS
*post-tags*
- [1]
LINKS
*post-links*
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]