One of the most annoying things to have to do in Excel is writing a Nested IF statement. For those of you that haven’t done this before, a Nested IF is basically a long formula with multiple IF conditions that you need to account for. The most common usage of this formula is to assign values into histogram bins, or essentially, designate where a value exists within a range of numbers. Below is an example of a very long Nested IF formula used to build histogram data:
I’ll go into detail about this process in a separate post, but just by looking at the formula, you can tell there are potential issues with using the Nested IF. Beyond being very long, complicated, and prone to mistakes, the formula is also very difficult to audit if you need to check it for errors. The colored highlighting of the cells becomes almost useless because there are so many references you need to deal with. Additionally, the formula does not scale well; if I wanted to add six more bins to this histogram, it would double in terms of length and complexity.
The MATCH formula provides an alternative to the Nested IF statement. There’s one key to making the MATCH formula work in this situation:
You need to use the MATCH formula’s approximate match option
This is really important because 99% of the time we want to perform a lookup with the MATCH formula, we’re looking for an exact match. Here, we’re specifically telling Excel NOT to look for an exact match.
The three key benefits of using MATCH over a Nested IF are:
- The formula is much shorter and is therefore much easier to write
- The formula is much easier to audit
- The formula is scalable; if you add additional bins, the MATCH formula doesn’t get any longer or more complex
Next, we’ll go through an example where we use the MATCH formula to assign histogram bins to a small data set, in order to produce data that will populate a histogram chart. For the sake of simplicity, we’ll assume that our data is only comprised of positive numbers, and therefore, we won’t need to account for negative values within the histogram.
The MATCH Formula Approach
Step 1: Define Your Histogram Bins
The first step for building any histogram is to define your bins, which will represent the x-axis shown in the histogram example above. This includes a number of different components:
- Bin Number
- Bin Minimum & Maximum
- Bin Name
- Histogram Count
Bin Number – your Bin Number is as simple as it sounds. It’s just an ascending count of all of your histogram bins. It’s important to note that, to use the MATCH function properly, this is a required component of the process. It is also required that you start numbering your bins from the number 1. (ie don’t start your count at 0)
Bin Minimum & Maximum – these values define the minimum and maximum range for a value to fall into a particular bin. The distance between these values will also implicitly define your bin size. (For Bin 1, my minimum is 0, my maximum is 10, and therefore, my bin size is 10) In general, all of your bins should be the same size, except for the potentially the first and last bins. Additionally, there should be no spaces between your bins; the maximum of one bin should be the minimum of the next bin.
For the MATCH formula to work, your bins must be sorted in ascending order, so start with your smallest bin first. The last bin should not have a maximum, and in the example above, I’ve just put the “infinity” symbol.
Lastly, when defining your bins, you should also denote whether each floor or maximum is strict or non-strict inequality. For example, while numerically my minimum is 0 for the first bin, I want to include all values that are greater than or equal to 0 (non-strict inequality). My maximum for the same bin is 10, and I want to only include values that are less than 10 (strict inequality). Both of these are denoted in my field headings.
Bin Name – because histograms can have different definitions when it comes to the use of inequalities, it’s also a good idea to include a Bin Name to make your inclusion criteria explicit. This will essentially be the label name along the x-axis that you show in your chart.
Histogram Count – the histogram count is just the count of values that landed in each of your bins. This is the number that we will display in the chart.
Before we move on, it’s important to note that building a histogram can be an iterative process, as you may not know what bin ranges and bins sizes make sense for your data until you’ve tried building it out. By using the MATCH formula approach instead of Nested IF, you’ll actually have a lot more flexibility in making changes after you’ve written your initial formula.
Step 2: Load your Data Set
Load your data into a single vertical column (highlighted in yellow below) and add a field next to it for Bin Assignment.
Step 3: Write your Bin Assignment Formula using MATCH
The syntax for the MATCH formula is as follows:
= MATCH ( lookup_value , lookup_array , [match_type] )
Lookup Value – link to the first value of your data set.
Lookup Array – choose the array that represents your Bin Minimum
Match Type – enter 1 to have Excel perform an approximate match
How it Works
The fundamental purpose of the MATCH formula is to return the position of a value within an array. Since we’ve selected the Bin Minimum column as our array, and that array has six numbers in it, we have six possible “positions” that can be returned to us (1, 2, 3, 4, 5, or 6).
Notice that these position numbers match exactly with the Bin Numbers we defined in the first step. The way we’ve set this up, whatever value Excel returns will match exactly with the Bin Numbers we need.
Next, let’s look at the logic we employed when we told Excel to do an approximate match:
Excel looks for the largest value within our selected array that is still less than our lookup value. Because each of our bins are continuous (there are no breaks between them) this logic works out to be the exact same as looking for the bin minimum of each of our lookup values.
For 36, this number is 30 because among the numbers in the array (0, 10, 20, 30, 40, and 50), 30 is the largest number less than 36.
Once Excel has found the number to lock onto, it returns the position of that number (30) within the array we selected.
For the case of 30, that position number is 4, because it is the fourth cell down in the array we selected.
You’ve now properly assigned the number 36 to its appropriate histogram bin.
Wrapping Up the Histogram Build
Take these last few steps to finish out building your histogram.
Step 4: Copy the MATCH Formula Down for Your Data Set
Remember to reference lock your bin minimum array reference. Then just copy the formula down for your data set.
Step 5: Use the COUNTIF Formula to Get Your Histogram Counts
After you’ve come up with all of the histogram bins for each of your data set values, all you need to do now is count how many times each bin number has appeared. To do that, we use the COUNTIF formula:
= COUNTIF ( range , criteria )
Range – select your entire data set and make sure to reference lock it
Criteria – just select the bin number associated with your current row
After you’ve written this formula once, copy it down to the rest of the bins.
Step 6: Build Your Histogram Chart
The very last step is to add your histogram chart to your data. If you’re just trying to get a quick view of how your data is distributed, and don’t need to actually see a chart, one simple alternative is to use conditional formatting with data bars.
Select your histogram data set. Then, within the “Home” tab of the ribbon, select:
Conditional Formatting → Data Bars → Gradient Fill
This will essentially produce the same visual output as a full histogram chart, except your visual will be flipped on its side.
For many Excel users, writing a Nested IF formula is one of the first complex formula writing tasks that we learn to perform. While it’s still beneficial to learn this process, Nested IF formulas clearly have serious flaws that make them difficult to use. A simpler way to get the same benefits of a Nested IF is to use the MATCH formula, and particularly, the MATCH formula’s approximate match option. This is one of the best utilizations of the MATCH formula in Excel, and using this approach will make your formula shorter, easier to audit, and scalable if you need a lot of bin sizes.
(Note: You can also use VLOOKUP’s range lookup as a third approach)
1 thought on “How to Use the Excel MATCH Formula to Assign Histogram Bins”
It is very useful for an analyst!