Pixel-wise Skin Segmentation

21 October 2017
Yehya Abouelnaga, Hesham M. Eraqi, Mohamed N. Moustafa
The American University in Cairo

Introduction

Skin segmentation was a challenging problem (in [1]) to solve due to the different lighting conditions. We use a Multivariate Gaussian Naive Bayes classifier to develop a pixel-wise skin segmentation model that operates on an HSV colorspace. Our model is very similar to [2] except that we don’t use a histogram. Instead, we use a normal distribution.

\begin{equation} \begin{split} \text{Model}(x) & = \frac{\mathbb{P}(skin \mid x)}{\mathbb{P}(skin \mid x) + \mathbb{P}(non\text{-}skin \mid x)}
& = \frac{\mathbb{P}(x \mid skin)}{\mathbb{P}(x \mid skin) + \mathbb{P}(x \mid non\text{-}skin)} \end{split} \end{equation}

Note that (i.e. we don’t make any assumptions about existence of skin pixels in the image).

We trained our model using the UCI Skin Segmentation dataset [3]. , , , and are evaluated from labeled skin/non-skin values in [3]. Now, we have a normal distribution for both skin and non-skin pixels ready for use. In order to evaluate the model (i.e. skin-segment images), for each pixel, , in the input image, we transform the RGB values to HSV, and then, feed them to our model. Then, we get a probability heat map of skin in the image. We classify a pixel to a “skin” when . Then, we cluster the skin pixels into objects and remove those with a small number of pixels. In other words, we don’t expect neither faces nor hands to have a very small number of pixels.

Implementation

We implemented the skin segmentation in Tensorflow so that we could integrate it directly with our Tensorflow-based deep learning (i.e. AlexNet and InceptionV3) models. Also, porting that code into Tensorflow enabled us to train and test the skin segmentation on our GPUs.

Model Definition

import tensorflow as tf

X = tf.placeholder(tf.float32, shape=(None, 3), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

def ClassDistribution(scope):
    with tf.name_scope(scope):
        X = tf.placeholder(tf.float32, shape=(None, 3), name="X")
        sample_size = tf.cast(tf.shape(X)[0], dtype=tf.float32, name="sample_size")
        Mu = tf.reduce_mean(X, 0, name="Mu")
        X_norm = tf.sub(X, Mu, name="X_norm")
        Sigma = tf.div(tf.matmul(tf.transpose(X_norm), X_norm, name="Sigma"), sample_size)
        Dist = tf_dist.MultivariateNormalFull(Mu, Sigma, name="Dist")

        return {
            'X': X,
            'SampleSize': sample_size,
            'Mu': Mu,
            'Sigma': Sigma,
            'X_Normalized': X_norm,
            'Dist': Dist
        }

Skin = ClassDistribution("Skin")
NonSkin = ClassDistribution("NonSkin")

Pdf_Skin = Skin['Dist'].pdf(X)
Pdf_NonSkin = NonSkin['Dist'].pdf(X)
Tot_Pdf = Pdf_Skin + Pdf_NonSkin

Prob_Skin = Pdf_Skin / Tot_Pdf
Prob_NonSkin = Pdf_NonSkin / Tot_Pdf

Pred = Prob_NonSkin # as Label(NonSkin) = 1, Label(Skin) = 0
correct_prediction = tf.equal(tf.round(Pred), y)
Accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Model Training

with tf.Session() as sess:
    Skin_Mu, Skin_Sigma, NonSkin_Mu, NonSkin_Sigma = sess.run([
        Skin['Mu'], Skin['Sigma'], NonSkin['Mu'], NonSkin['Sigma']
    ], feed_dict={
        Skin['X']: cl.rgb_to_hsv(X_Skin[:, [2,1,0]]),
        NonSkin['X']: cl.rgb_to_hsv(X_NonSkin[:, [2,1,0]])
    })

Model Usage

def process_image(fltr, img, threshold):
    lbl, nlbl = label(fltr > threshold)
    lbls = np.arange(1, nlbl+1)
    objs = labeled_comprehension(fltr, lbl, lbls, np.count_nonzero, int, 0)
    main_objs = np.arange(0, objs.shape[0])[objs>1]
    score = main_objs.sum()
    top_3_lbls = reduce(lambda prev, curr: prev + (lbl == curr+1).astype(int), main_objs, np.zeros(lbl.shape))
    img_w_lbls = np.where(top_3_lbls[:,:, np.newaxis].astype(np.bool), img, np.zeros(img.shape)).astype(np.uint8)
    return lbl, nlbl, objs, main_objs, score, top_3_lbls, img_w_lbls

with tf.Session() as sess:
    img = imread('/path/to/my/image.png')
    fltrs = sess.run(Prob_Skin, feed_dict={
        X: cl.rgb_to_hsv(img).reshape((-1, 3)),
        Skin['Mu']: Skin_Mu,
        Skin['Sigma']: Skin_Sigma,
        NonSkin['Mu']: NonSkin_Mu,
        NonSkin['Sigma']: NonSkin_Sigma
    }).reshape(raw_shape)

    lbl, nlbl, objs, main_objs, score, top_3_lbls, img_w_lbls = process_image(
        fltrs, img, threshold
    )
    imsave(img_path.replace('.original.', '.segmented.'), img_w_lbls)

Results

The following are skin-segmented images using our pixel-wise skin segmentation implementation. The model is sensitive to lighting conditions and clothes colors. Adding more context-based information would probably help us improve our skin prediction accuracy.

References

  1. [1]Y. Abouelnaga, H. M. Eraqi, and M. N. Moustafa, “Real-time Distracted Driver Posture Classification,” arXiv preprint arXiv:1706.09498, 2017.
  2. [2]S. L. Phung, A. Bouzerdoum, and D. Chai, “Skin segmentation using color and edge information,” in Signal Processing and Its Applications, 2003. Proceedings. Seventh International Symposium on, 2003, vol. 1, pp. 525–528.
  3. [3]R. Bhatt and A. Dhall, “Skin segmentation dataset,” UCI Machine Learning Repository, 2010.