How to use Statistical Functions in Excel 2013
Excel 2013 includes numerous Statistical Functions which are useful to statisticians and can be grouped into three general groups: Description, Prediction and Inference.
Many Statistical Functions in Excel supersede Functions from previous versions. Ranges in Excel can be defined through the Define Name command and are useful in Statistical Functions like Min and Max.
Two important sub-categories of Statistical Functions pertain to Averages and Measures of Spread. Excel allows users to calculate various types of Averages, such as Median, Mean and Mode. The Average If and Average If S Functions are designed to determine Averages subject to one or more criteria.
Watch the free video here, transcripts for the entire video follow:
Do you need to learn Microsoft Excel 2013? Get 19 hours of Microsoft Excel 2013 training – click here.
Video transcripts:
Hello again and welcome back to our course on Excel 2013 Advanced. In this section I’m going to start to look at the statistical functions in Excel 2013. I’m not going to assume that you’re a statistician and have a deep and broad knowledge of statistics, but most people have enough exposure to statistics in everyday life nowadays to understand the kinds of terms and terminology that I’m going to use.
Microsoft Excel has been used in statistics and by statisticians for quite a few years now and it’s actually a pretty strong tool. As we’ll see during this section and later on in the course not only can we do some very useful numeric work but we can also use some of the charting and graphing features of Excel 2013 to look at statistical questions and statistical issues. There are some specialist pieces of software available to do a lot of these sort of statistical number crunching that you might need but the advantage of Excel, of course, is that it does many other things as well and many more people have copies of Microsoft Excel. So if you’ve got a particular statistical problem to solve or a piece of work to do the chances are that Excel can help you to solve that problem or do that job for you; if it can’t then you probably will need to look at one of the more specialized products.
Now for the purposes of this course I’ve divided the application of the statistical functions in Excel 2013 into three groups and those three groups are: description, prediction, and inference. The first one we’re going to look at is description and by that I mean we take some statistics, in our case a set of numbers, and we look at ways of describing that set of numbers. So we’re really just looking at fact finding. We’re saying what are the features of this set of numbers that will help us, for instance, understand what the numbers tell us or maybe even compare them with other similar or related sets of numbers? I’m then going to look at prediction and in particular at regression and how we can use some existing measurements to predict what might happen in the future.
The third category in relation to using Excel 2013 in statistics is inference. By inference what we mean is the ability to infer something about a population by measuring a sample from that population. So typically we’re talking about things like doing opinion polls, maybe taking 10 or 100 or 1,000 people, getting their opinion, measuring some feature, and from that measurement inferring something about the population. Now this is an area that requires quite a bit more statistical knowledge so I’m not going to go into it in great detail but I’ll look at one or two examples of how Excel can help with inferential statistics.
Now I’m looking at the list of statistical functions in Excel Help here and one of the things you’ll notice straightaway is that it is a very, very long list of function. It’s one of the areas of Excel where the most functions are available and if you’re not a statistician many of those won’t really mean a lot to you. I’m not going to worry too much about those very detailed statistical functions, the ones that involve quite a lot of technical knowledge. I’m going to look at the more straightforward ones in this section. But I would like to point something out. If you’ve used an earlier version of Excel and if you’ve used statistical functions before many of those functions have been superseded in Excel 2010 and Excel 2013 with functions of a similar name but with extensions. For instance the quartile functions that I’ll look at a little bit later; QUARTILE.EXC and QUARTILE.INC, for example.
Many functions have extensions that may surprise you if you’ve used these before. I’m going to explain some of these in this section and the next section but just be aware that if you’ve used Excel functions before don’t assume that things are exactly the same now because quite a few things have changed for reasons that as I say I’ll explain shortly.
So let me introduce you to the data that we’re going to use in this section. I have here a section of some details of sales of policies for an insurance company. They have a number of branch offices around the United States and one of their offices is in Orlando. And in columns B and C here you can see over a period from the beginning of 2009 to the end of 2012 the number of inquiries they had from clients and the number of policies that they sold. Now typically when you have this sort of set of numbers like the policy sales here you have a mass of numbers just a sequence and you might be able to see a pattern. But what you often need are some descriptive statistics. Things like what’s the average number of policies that were sold? And the sort of way that you could use these would be maybe to look at another branch of the insurance company and compare the average number of policies sold there as well. But in the first instance what we’re going to look at is typically some of the descriptive statistics that we can produce from a set of numbers like these. I’ve got inquiry statistics and policy statistics. I’m just going to look at the policy statistics here.
Now first of all if I’m going to calculate a number of statistics for the policy sales over that four year period I’m going to be referring to this range of numbers from C4 down to C51, quite a lot. So what I’m going to do is define a name for that range of numbers. So having selected, right click, define name. I’m going to call it Policy Sales Orlando. Click on OK and now whenever I want to refer to that range I can use that name.
So let’s start with one or two really straightforward statistics. One of the very straightforward statistics would be what is the minimum value in that range? What’s the lowest number of policies that were sold? Now the function we use for that is the min function. So let’s just put that in. And all we need for the min function as an argument is the range of numbers to look at. Well of course we’ve got a name and there it is. As soon as I see that appear, Policy Sales Orlando, tick, and I see that in any one month the smallest number of sales is 57. Similarly for the maximum, I’m sure you can work this out. Now in a very simplistic way you could compare the sales at two offices by looking at what the minimum number and the maximum number of sales are.
Minimum and Maximum values can be very useful as descriptive statistics but generally speaking we want to look at two types of statistics which are of particular interest. One of them is the average and one is the measure of spread.
Now there are various types of average. Probably the best known one is the arithmetic mean which I’ll look at in just a moment. But other types of average include the median which if you put all of a set of values into sequence is the middle one or if there are an even number of values the average of the two middle values. And sometimes we’re also interested in the mode. The mode is the value that occurs most frequently. Median and mode functions are available in Excel 2013. I’ll leave you to check those out using the Help. But let’s look at the arithmetic average, the mean. So I’m going to put in here Mean and the average function is Equals average. And for that all I need to do is to put in that range again. So I’ve got my name setup and I find that for this particular branch the average number of sales per month over that period is 114.
Now very often in statistical analysis the mean is the value that you’ll be using but let’s look at one of two additional things about mean. One of the things that you can do which is very useful statistically is to use one of the more advanced average functions to be more specific and selective about the mean. Supposing I wanted to know the mean number of sales in 2012. Now I could of course define a different range and look at the mean for that but I could also use one of the average if functions.
Now the function we’re going to look at here is the average if function and the principle of the average if function is we’re going to average a range of numbers but we’re going to average that range subject to a criterion. Now we define the criterion using a range which might be the actual range. If you look at the syntax of the function here, average if, we can say subject to a criterion apply to a range, get the average of this range. Now the range that we apply the criterion to may be the same range that we’re finding the average of or it may be a different one. I’ll explain that more carefully in a moment.
There is also a function average if S, so it’s the same name but with an S on the end where you can define multiple criteria. In this case we’re just going to have one criterion and the criterion we’re going to use is that the year in question is 2012.
So let’s see how we put this together. Note that the month identifiers are in column A. So A4 to A51 give us the months and of course our data are in column C and we have the whole range defined by our name. This is how I structure this average if, Equals average if. Now then what range am I checking? Now the range I’m checking in this case is A4 to A51. I need to put that in as a range. Next what am I checking? Well if I’m looking for months in the year 2012 what I’m actually going to put as my criterion is Equals star 12 and basically what that means is the contents of a cell in the range A4 to A51 is something with 12 at the end. So that means it’s one of the months in 2012. Now what range am I averaging? Well the range I’m actually averaging is the range in column C which as I know I’ve got setup with a name, so that’s Policy Sales Orlando. And that’s basically the structure, the syntax for this average if.
Now note if the criterion actually applied to the contents of C, so for instance supposing instead of looking for months in 2012 I was looking for the average where the number of sales was say less than 100 then the criterion would apply to column C and I will have column C both here for the criterion range and for the range to average. So I could use Policy Sales Orlando in both places. And if that were the case I wouldn’t actually need to include the third argument because by default it’s the same as the first argument. But in this case the criterion applies to column A so I need to specify the range in A, A4 to A51. I want months that end in 1-2 and let’s see what I get as my average if I check there and the value is 81. So for months in 2012 the average number of sales is 81.
So that’s our first look at averages. In the next section I’m going to take a look at percentiles and then we’re going to look at measures of spread. So please join me for that.