Phone Vision 18 – Erosion and Dilation

23 03 2011

Our exploration of set theory last time was a precursor to discussing dilation and erosion.  Simply put, dilation makes things bigger and erosion makes things smaller.  Seems obvious right?  Let’s see if the mathematicians can muck it up.

Translation

This is not necessarily a fundamental set operation, but it is critical for what we will be doing. Translation can be described as shifting the pixels in the image. We can move it in both the x and y directions and we will represent the amount of movement using a point. To translate the set then, we simply add that point to every element of the set. For instance, using the point (20, 20) the puzzle is moved down 20 pixels and right 20 pixels like this:

image

The light blue area marks the original location.

Dilation

Dilation is an extension of translation from a point to a set of points. If you union the sets created by each translation you end up with dilation.  In the following diagram, the black pixels represent the original puzzle piece.  The light blue halo represents several translations of the original.  Each translation is translucent so I have tried to make the image large enough so you can see the overlap.  Notice how the hole is missing?

image

The set of points that were used to create the translated sets is called the structuring element.  A common structuring element is

{ (-1,-1), (0,-1), (1,-1),(-1,0), (0,0), (1,0),(-1,1), (0,1), (1,1) }

This will expand the image by one pixel in every direction.  If we think about this as a binary image, it might be represented by this:

image

Kinda reminds me of one of our convolution kernels.  Structuring elements are often called “probes”.  This one is centered at (0,0).

Using the PixelSet we created last time, the code is pretty straightforward.

public void Dilate(PixelSet structuringElem)

{

    PixelSet dilatedSet = new PixelSet();

    foreach(int translateY in structuringElem._pixels.Keys)

    {

        foreach(int translateX in structuringElem._pixels[translateY])

        {

            PixelSet translatedSet = new PixelSet();

            foreach (int y in _pixels.Keys)

            {

                foreach(int x in _pixels[y])

                {

                    translatedSet.Add(x + translateX, y + translateY);

                }

            }

            dilatedSet.Union(translatedSet);

        }

    }

    _pixels = new Dictionary<int,List<int>>(dilatedSet._pixels);

}

Here is the puzzle piece dilated by one pixel in all directions.  The hole is gone.

image

Erosion

Erosion is similar to dilation except that our translation is subtraction instead of addition and we are finding the intersection instead of the union.  Here is an image that has been eroded by 10 pixels in all directions.  The light blue area is the original puzzle.

 

image

 

public void Erode(PixelSet structuringElem)

{

    PixelSet erodedSet = new PixelSet(this);

    foreach (int translateY in structuringElem._pixels.Keys)

    {

        foreach (int translateX in structuringElem._pixels[translateY])

        {

            PixelSet translatedSet = new PixelSet();

            foreach (int y in _pixels.Keys)

            {

                foreach (int x in _pixels[y])

                {

                    translatedSet.Add(x – translateX, y – translateY);

                }

            }

            erodedSet.Intersect(translatedSet);

        }

    }

    _pixels = new Dictionary<int,List<int>>(erodedSet._pixels);

}

 

Here is the puzzle eroded by one pixel in all directions.  That hole is a little bigger.

image

Summary

Like I said at the beginning, dilation makes things bigger and erosion makes things smaller.  Interestingly, erosion is simply the dilation of the background and dilation is the erosion of the background.

Download Code

http://cid-88e82fb27d609ced.office.live.com/embedicon.aspx/Blog%20Files/PhoneVision/PhoneVision%2018%20-%20Erosion%20and%20Dilation.zip

Up Next: Opening and Closing





Phone Vision 17–Sets

16 03 2011

Last time we were able to get a pretty decent binary image.  The question is, what do we do with it?  Before we can answer that, we need to make sure you’ve got a basic understanding of set theory.

Sets

In a binary image we can think of the objects of interest as a set of points.  Here is our binary image from last time:

image

I am going to focus on just the top puzzle piece so let’s extract it.  (The code for extracting this is not important for this lesson, but it is included in the code as GetPuzzleImage).

image

We are going to be working with this image for a while.  In this case, the black pixels will represent members of our set.

Universe

The universe is the set of all possibilities.  In the case of image processing this is typically the set of all of the pixels.  I have seen this referred to as E2 (for 2-dimensional Euclidean space).  In our case, the universe is…

image

As expected, the set of every possibility is all black.  This is important some operations rely on this set.

Empty Set

The empty set is, well, empty.

image

Since the boundary for our image is not a specified size, this is just an empty image. If the universe was a specified size then the empty set would often be represented by a completely white image.

Complement

The complement is everything in the universe that is not in the set.  The complement of our puzzle piece:

image

The puzzle piece is now a hole in the black image.

Union (⋃)

The union is the set of all elements in either of two sets.  If you have been playing along, this is very similar to the | (binary OR) operation mentioned in lesson three. Below we union the puzzle set with the set of its complement.  The output is the universe.

image

image

image

Intersection (⋂)

Finally, the intersection is the set of all shared members of two sets.  Below we intersect the puzzle set with its complement.  This is similar to the & (binary AND) operations mentioned way back in lesson 2.  Since the complement of a set  doesn’t share any elements with the original set the output is going to be the empty set.

imageimageimage

Pixel Set

In order to demonstrate these concepts (and a few in the future) I created a class called PixelSet that behaves similar to a mathematical set. 

Caution: This set is not optimized for use in actual applications.  You’ve been warned.

Elements

The elements of the set are represented by a Dictionary.

Dictionary<int, List<int>> _pixels = new Dictionary<int, List<int>>();

The key is the y-value for the pixel and the List<int> represents all x-values in that row.

 

Add

 

public void Add(int x, int y)

{

    List<int> columns = new List<int>();

    if (!_pixels.ContainsKey(y))

    {

        _pixels.Add(y, columns);

    }

    else

    {

        columns = _pixels[y];

    }

 

    if (!columns.Contains(x))

    {

        columns.Add(x);

    }

}

 

The add function is a little tricky because if the row (y-value) doesn’t have a representative in the set yet then we need to create a new list to contain them. If it does, then we need to use the existing list.

 

Remove

 

public void Remove(int x, int y)

{

    if (_pixels.ContainsKey(y))

    {

        if (_pixels[y].Contains(x))

        {

            _pixels[y].Remove(x);

            if (_pixels[y].Count == 0)

            {

                _pixels.Remove(y);

            }

        }

    }

}

 

Remove is included for completeness, but I’m not calling it from any location in the code.

 

Union

 

public void Union(PixelSet otherSet)

{

    foreach (int key in otherSet._pixels.Keys)

    {

        foreach (int col in otherSet._pixels[key])

        {

            Add(col, key);

        }

    }

}

 

A possible optimization here is to loop through the smallest set.  You would need to maintain a count, but since new members are added through Add that should be easy.

 

Intersect

 

public void Intersect(PixelSet otherSet)

{

    PixelSet intersection = new PixelSet();

    foreach (int key in otherSet._pixels.Keys)

    {

        if (_pixels.ContainsKey(key))

        {

            foreach (int col in otherSet._pixels[key])

            {

                if (_pixels[key].Contains(col))

                {

                    intersection.Add(col, key);

                }

            }

        }

    }

    _pixels = new Dictionary<int, List<int>>(intersection._pixels);

}

 

Once again you could change this to loop through the smallest set to speed this up.

 

Summary
 

I expect this will have been a review for many readers, but it’s important to have a grasp of this information before we head into the next lesson regarding erosion and dilation.  I’d also like to reiterate that this is not optimized for image processing.  My HD7 takes almost 8 times longer to execute some of these tight loops.

 

Download Code

http://cid-88e82fb27d609ced.office.live.com/embedicon.aspx/Blog%20Files/PhoneVision/PhoneVision%2017%20-%20Set%20Operations.zip

Up Next: Erosion and Dilation





Phone Vision 16 – Binary Images

9 03 2011

When we are creating a binary image we are defining objects of interest.  Defining these objects is very specific to the task at hand.  In some cases a simple threshold will work while others require a more complex approach such as color segmentation, edge detection, or something even more sinister.

Thresholding

The most common way to create a binary image from a grayscale image is to pick an intensity threshold (also known as gray-level segmentation).  Everything below the threshold becomes a pixel of interest (black).  Everything else is not (white).  Let’s explore a few methods for picking that threshold.

The Image

Interestingly, this picture is a white piece of poster board (not blue) with some dark puzzle pieces on it.  If you look closely you can see the famous HD7 pink tint.  The image on the right is simply converted to grayscale using the average of the RGB values.

imageimage

Mean Threshold

Almost every time an image processing problem arises and a simple average is a possible answer my gut tells me to go for it.  In this case, my gut is going to lead me astray, but first some code:

WriteableBitmap ToAverageBinary(WriteableBitmap grayscale)

{

    WriteableBitmap binary =

        new WriteableBitmap(grayscale.PixelWidth,

            grayscale.PixelHeight);

 

 

    int[] histogramData = new int[256];

    int maxCount = 0;

 

    //first we will determine the histogram

    //for the grayscale image

    for (int pixelIndex = 0;

        pixelIndex < grayscale.Pixels.Length;

        pixelIndex++)

    {

        byte intensity = (byte)grayscale.Pixels[pixelIndex];

 

        //simply add another to the count

        //for that intensity

        histogramData[intensity]++;

 

        if (histogramData[intensity] > maxCount)

        {

            maxCount = histogramData[intensity];

        }

    }

 

    //now we need to figure out the average intensity

    long average = 0;

    for (int intensity = 0; intensity < 256; intensity++)

    {

        average += intensity * histogramData[intensity];

    }

 

    //this is our threshold

    average /= grayscale.Pixels.Length;

 

    for (int pixelIndex = 0;

        pixelIndex < grayscale.Pixels.Length;

        pixelIndex++)

    {

        byte intensity = (byte)grayscale.Pixels[pixelIndex];

 

        //now we’re going to set the pixels

        //greater than or equal to the average

        //to white and everything else to black

        if (intensity >= average)

        {

            intensity = 255;

            unchecked { binary.Pixels[pixelIndex] = (int)0xFFFFFFFF; }

        }

        else

        {

            intensity = 0;

            unchecked { binary.Pixels[pixelIndex] = (int)0xFF000000; }

        }

    }

 

    //note that this is the ORIGINAL histogram

    //not the histogram for the binary image

    PlotHistogram(histogramData, maxCount);

 

    //this is a line to show where the

    //average is relative to the rest of

    //the histogram

    Line averageLine = new Line();

    averageLine.X1 = HistogramPlot.Width * average / 255;

    averageLine.X2 = HistogramPlot.Width * average / 255;

    averageLine.Y1 = 0;

    averageLine.Y2 = HistogramPlot.Height;

    averageLine.Stroke = new SolidColorBrush(Colors.Magenta);

    averageLine.StrokeThickness = 2;

    averageLine.StrokeDashArray = new DoubleCollection() { 5, 2.5 };

    HistogramPlot.Children.Add(averageLine);

    return binary;

}

image

 

While the puzzle pieces came through loud and clear so did a lot of garbage.  Notice how the average line (magenta) splits the peak?  This is going to lead to a large number of false black pixels.  What we want is an automated technique for shifting the threshold left.

 

Two Peak

 

If you look at the original grayscale image you’ll notice that the puzzle pieces seem to be close to the same intensity while the background is another. This is represented in the histogram by the large peak (the background) and an almost imperceptible peak toward the dark end of the spectrum (the puzzle pieces).

 

Finding the large peak is trivial.  It’s just the highest occurring intensity.  The small peak on the other hand is a little trickier.  It’s not the second most frequent intensity – that’s right next to the largest peak.  A little trick is to give a higher weight to intensities that are far from the highest peak.  To do this we multiply the intensity count at each point by the square of its distance to the peak.  That’s a mouthful, but it’s pretty easy.  Here’s the code:

 

int secondPeak = 0;

long secondPeakCount = 0;

for (int intensity = 0; intensity < 256; intensity++)

{

    long adjustedPeakCount =

        (long)(histogramData[intensity]*Math.Pow(intensity-maxPeak, 2));

    if (adjustedPeakCount > secondPeakCount)

    {

        secondPeak = intensity;

        secondPeakCount = adjustedPeakCount;

    }

}

So we calculate an adjusted count for each intensity.  By multiplying it by the square of the distance we give higher counts to those further from the first peak.  Amazingly, this works even in this case where the second peak is so small.

image

Wow!  Those results are so much better.  Notice I marked the two peaks with green and the new threshold is in magenta.

Summary

Don’t use the mean.  You might get lucky in a few cases, but in general you’ll end up in a messy situation.  That said, the two-peak solution will work well in cases where your objects are of a consistent color and that color is separate from the background.  I created this photo because I knew it would have good results.  I highly recommend you try these techniques out on your own photos so you can get a feel for when they will work and when they won’t.

We will most likely revisit different thresholding techniques for different situations when they arise, but for now we have a binary image so we’re going to see what we can do with it.

Download Code

http://cid-88e82fb27d609ced.office.live.com/embedicon.aspx/Blog%20Files/PhoneVision/PhoneVision%2016%20-%20Binary%20Images.zip

Up Next: Sets