Excel data manipulation using array formulas

Whether we like it or not, there’s always a process, a form, or a report that requires a little bit of manipulation in Excel. We just need to apply one simple operation to use the result elsewhere. But there’s no way to easily do that manually.

Often, the issue comes from the source data not having the same number of rows as as our target. We might need to remove one department, zero values, a specific account… So we end up using copy/paste or a pivot table because there’s no other way.

But there are ways to automate this process that can save you a lot of time.

Stating the problem

Say you have a list of accounts and balances coming from one of your systems, and no way to change how you receive this data. It includes zero values which you would like to get rid of :

If you copy/paste the data, you rely on a manual process that is tedious, repetitive, and can lead to errors.

If you use the data in another sheets through formulas, you will paste the zeroes as well.

The solution

By using the INDEX formula in Excel, you can return a specific cell in a range by its row and column. All you need to do is to come up with a way to count cells so you skip those with zeroes and keep the rest.

For readability’s sake, let’s use “Data” as the named range (by clicking Formulas > Define Name), which will have the added benefit of allowing you to grow the range without having to change the formulas referencing it.

We would like a new range in cell B2 to look something like this :

And we would also like to be able to use the formula without having to copy/paste it in all cells every time, which would defeat the purpose.

You can obtain this result by pasting this formula in cell B3

=INDEX(Data, SMALL(IF((INDEX(Data, , 2)<>0), MATCH(ROW(Data), ROW(Data)), “”), ROWS(B$3:$B3)), COLUMNS(B$3:$B3))

For column C, just change B$3:$B3 in the formula above to B$3:$C3 and paste in cell C3 to get the second column’s values.

How does this work

Let’s explore the formula in more details

As mentioned before, INDEX (Data, row , column) will return the value itself.

COLUMNS(B$3:$B3) will return the column’s number when it’s pasted horizontally and vertically, thanks to the dollar signs freezing the column and the row, hence covering our new range perfectly.

The row is defined by SMALL (array , k), which is the heart of this formula: it return’s the k-th smallest element in an array. And by defining k as ROWS(B$3:$B3) we map this to the row in our target range (the 5th smallest value will be in row 5). Here again, the dollar signs ensure the formula properly counts rows.

The array we are looking for must only include the values we want to return, which in our case are non-zero values. This is why our IF condition tests <>0 on the second column in the Data range. This is achieved by the use of INDEX(Data, , 2) which returns the whole column as an array.

If our logic test is true, we need it to return the list of rows that corresponds to it: MATCH (ROW (Data), ROW(Data)).

To understand the inner working of this formula, you can paste this in a cell =IF((INDEX(Data, , 2)<>0), MATCH(ROW(Data), ROW(Data)), “”) Since the result is an array, Excel will paste it in multiple cells and give the following result:

We can see here that for all values that failed the logic test, the row number was replaced by “”, as per our IF statement, which will work perfectly with SMALL.

The fact that the result is an array is the reason why we do not need to paste this formula in cells below.

Extending this technique

Although our example here tests zero values in column 2 through the use of INDEX(Data, , 2)<>0, much more can be done. We only need to change our condition in the first 2 cells to get the results we need.

INDEX(Data, , 2)>=0 properly returns all line with a balance greater than or equal to 0

A bit more sophisticated due to SEARCH returning some undefined value, we can use this to look for an hyphen in the  account number:


Even though the logic test is done on column 1, this doesn’t change the fundamentals of the technique.

Array formulas

This solution relies on the use of array formulas, an Excel feature that allows us to populate multiple cells by entering a formula in only one of them.

Office 365 allows you to simply enter the formulas as described in this blog, but older versions might require an extra step.

If Excel gives you an error when you try to enter one of the formula above, put { curly brackets } around it and instead of pressing enter, press CTRL + SHIFT + ENTER. This will trigger the array formula functionality in Excel, allowing you to use this solution.


Contact us at support@aptusbusiness.com.au for more information