How to Use the Excel MATCH Formula to Assign Histogram Bins

by Matthew Kuo on November 22, 2015

in Distribution Analysis, Excel, Formula Writing

To learn more about Excel, go to the organized listing of all my Excel tutorial posts or review the most popular Excel books on Amazon
 

One of the most annoying things to have to do in Excel is writing a Nested IF statement.  For those of you that haven’t done this before, a Nested IF is basically a long formula with multiple IF conditions that you need to account for.  The most common usage of this formula is to assign values into histogram bins, or essentially, designate where a value exists within a range of numbers.  Below is an example of a very long Nested IF formula used to build histogram data:

MATCH Formula Histogram 01

I’ll go into detail about this process in a separate post, but just by looking at the formula, you can tell there are potential issues with using the Nested IF.  Beyond being very long, complicated, and prone to mistakes, the formula is also very difficult to audit if you need to check it for errors.  The colored highlighting of the cells becomes almost useless because there are so many references you need to deal with.  Additionally, the formula does not scale well; if I wanted to add six more bins to this histogram, it would double in terms of length and complexity.

The MATCH formula provides an alternative to the Nested IF statement.  There’s one key to making the MATCH formula work in this situation:

You need to use the MATCH formula’s approximate match option

This is really important because 99% of the time we want to perform a lookup with the MATCH formula, we’re looking for an exact match.  Here, we’re specifically telling Excel NOT to look for an exact match.

The three key benefits of using MATCH over a Nested IF are:

  • The formula is much shorter and is therefore much easier to write
  • The formula is much easier to audit
  • The formula is scalable; if you add additional bins, the MATCH formula doesn’t get any longer or more complex

Next, we’ll go through an example where we use the MATCH formula to assign histogram bins to a small data set, in order to produce data that will populate a histogram chart.  For the sake of simplicity, we’ll assume that our data is only comprised of positive numbers, and therefore, we won’t need to account for negative values within the histogram.

MATCH Formula Histogram 02

The MATCH Formula Approach

Step 1: Define Your Histogram Bins

The first step for building any histogram is to define your bins, which will represent the x-axis shown in the histogram example above.  This includes a number of different components:

  • Bin Number
  • Bin Minimum & Maximum
  • Bin Name
  • Histogram Count

MATCH Formula Histogram 03

Bin Number – your Bin Number is as simple as it sounds.  It’s just an ascending count of all of your histogram bins.  It’s important to note that, to use the MATCH function properly, this is a required component of the process.  It is also required that you start numbering your bins from the number 1.  (ie don’t start your count at 0)

Bin Minimum & Maximum – these values define the minimum and maximum range for a value to fall into a particular bin.  The distance between these values will also implicitly define your bin size.  (For Bin 1, my minimum is 0, my maximum is 10, and therefore, my bin size is 10)  In general, all of your bins should be the same size, except for the potentially the first and last bins.  Additionally, there should be no spaces between your bins; the maximum of one bin should be the minimum of the next bin.

For the MATCH formula to work, your bins must be sorted in ascending order, so start with your smallest bin first.  The last bin should not have a maximum, and in the example above, I’ve just put the “infinity” symbol.

Lastly, when defining your bins, you should also denote whether each floor or maximum is strict or non-strict inequality.  For example, while numerically my minimum is 0 for the first bin, I want to include all values that are greater than or equal to 0 (non-strict inequality).  My maximum for the same bin is 10, and I want to only include values that are less than 10 (strict inequality).  Both of these are denoted in my field headings.

Bin Name – because histograms can have different definitions when it comes to the use of inequalities, it’s also a good idea to include a Bin Name to make your inclusion criteria explicit.  This will essentially be the label name along the x-axis that you show in your chart.

Histogram Count – the histogram count is just the count of values that landed in each of your bins.  This is the number that we will display in the chart.

Before we move on, it’s important to note that building a histogram can be an iterative process, as you may not know what bin ranges and bins sizes make sense for your data until you’ve tried building it out.  By using the MATCH formula approach instead of Nested IF, you’ll actually have a lot more flexibility in making changes after you’ve written your initial formula.

Step 2:  Load your Data Set

Load your data into a single vertical column (highlighted in yellow below) and add a field next to it for Bin Assignment.

MATCH Formula Histogram 04

Step 3: Write your Bin Assignment Formula using MATCH

The syntax for the MATCH formula is as follows:

= MATCH ( lookup_value , lookup_array , [match_type] )

Lookup Value – link to the first value of your data set.

Lookup Array – choose the array that represents your Bin Minimum

Match Type – enter 1 to have Excel perform an approximate match

MATCH Formula Histogram 05

How it Works

The fundamental purpose of the MATCH formula is to return the position of a value within an array.  Since we’ve selected the Bin Minimum column as our array, and that array has six numbers in it, we have six possible “positions” that can be returned to us (1, 2, 3, 4, 5, or 6).

Notice that these position numbers match exactly with the Bin Numbers we defined in the first step.  The way we’ve set this up, whatever value Excel returns will match exactly with the Bin Numbers we need.

Next, let’s look at the logic we employed when we told Excel to do an approximate match:

Excel looks for the largest value within our selected array that is still less than our lookup value.  Because each of our bins are continuous (there are no breaks between them) this logic works out to be the exact same as looking for the bin minimum of each of our lookup values.

MATCH Formula Histogram 06

For 36, this number is 30 because among the numbers in the array (0, 10, 20, 30, 40, and 50), 30 is the largest number less than 36.

Once Excel has found the number to lock onto, it returns the position of that number (30) within the array we selected.

For the case of 30, that position number is 4, because it is the fourth cell down in the array we selected.

You’ve now properly assigned the number 36 to its appropriate histogram bin.

Wrapping Up the Histogram Build

Take these last few steps to finish out building your histogram.

Step 4: Copy the MATCH Formula Down for Your Data Set

Remember to reference lock your bin minimum array reference.  Then just copy the formula down for your data set.

MATCH Formula Histogram 07

Step 5: Use the COUNTIF Formula to Get Your Histogram Counts

After you’ve come up with all of the histogram bins for each of your data set values, all you need to do now is count how many times each bin number has appeared.  To do that, we use the COUNTIF formula:

= COUNTIF ( range , criteria )

Range – select your entire data set and make sure to reference lock it

Criteria – just select the bin number associated with your current row

MATCH Formula Histogram 08

After you’ve written this formula once, copy it down to the rest of the bins.

Step 6: Build Your Histogram Chart

The very last step is to add your histogram chart to your data.  If you’re just trying to get a quick view of how your data is distributed, and don’t need to actually see a chart, one simple alternative is to use conditional formatting with data bars.

Select your histogram data set.  Then, within the “Home” tab of the ribbon, select:

Conditional FormattingData BarsGradient Fill

MATCH Formula Histogram 09

This will essentially produce the same visual output as a full histogram chart, except your visual will be flipped on its side.

MATCH Formula Histogram 10

Conclusion

For many Excel users, writing a Nested IF formula is one of the first complex formula writing tasks that we learn to perform.  While it’s still beneficial to learn this process, Nested IF formulas clearly have serious flaws that make them difficult to use.  A simpler way to get the same benefits of a Nested IF is to use the MATCH formula, and particularly, the MATCH formula’s approximate match option.  This is one of the best utilizations of the MATCH formula in Excel, and using this approach will make your formula shorter, easier to audit, and scalable if you need a lot of bin sizes.

(Note: You can also use VLOOKUP’s range lookup as a third approach)

{ 1 comment… read it below or add one }

Md. Shahariar Kabir February 24, 2016 at 4:27 am

It is very useful for an analyst!

Reply

Leave a Comment

Previous post:

Next post:

\n