Basic Operations
Average, Minimum, Maximum:
To calculate the average (or mean), minimum, or maximum value of a dataset, simply do the following:
Step 1) Select an empty cell.
Step 2) Type =AVERAGE( in that cell, or MIN( or MAX(.
Step 3) Highlight the cells you want to operate on.
Step 4) Close the parenthesis if necessary and press enter.
Counting Events:
Excel can count the number of times we have a given value in a cell, for instance, the number of "A"s in a column, using the formula: =COUNTIF(COLUMNS, "=A").
It also works for numbers, for instance =COUNTIF(COLUMNS, "=5"), and you can even use =COUNTIF(COLUMNS, ">5") to get the number of entries greater than 5. (replacing > with < for less than 5).
If you want to compare to a number in a cell, say A18, you can't just type =COUNTIF(COLUMNS, ">A18"), you have to end the quotation early and add an ampersand, like this: =COUNTIF(D17:XFD17,">"&A18).
Finally, to just count how many numbers there are no matter their value, use =COUNT(COLUMNS) .
Autofill:
Excel will automagically try to complete patterns.
Step 1) Select one or more cells contining the single value or pattern of values you want to extend.
Step 2) Click the small square at the bottom right of the selection, and drag the small square that appears at the bottom right of the selection.
Static References with Autofill:
Often, particularly when generating random numbers, we want to refer to one particular cell when autofilling, and don't want it to keep changing the reference.
Put a dollarsign $ between the letter and number of a reference to freeze the column, so A17 becomes A$17.
To freeze a row, but the dollarsign before the letter, so A17 becomes $A17.
Combining these will freeze both, so A17 becomes $A$17.
Fill as Many Columns as Possible:
To continue an autofill in one or more rows for as many columns as we can, we need to do the following:
Step 1) Highlight the column you want to duplicate.
Step 2) Press CTRL+SHIFT+Right Arrow Key to highlight all columns to the right of the currently selected one.
Step 3) Press CTRL+R to fill the selected rows.
This is useful for simulating many samples from the same distribution.
Random Number Generation (RNG)
Uniformly Distributed Between 0 and 1:
The only random numbers Excel provides are totally random numbers between 0 and 1. All other RNG builds on these initial steps.
Step 1) Type =RAND() into a new cell.
Step 2) Use Autofill to generate more if desired (see Basic Operations section).
Uniformly Between Two Values:
Let's say we want to simulate random values between 100 and 150 with all values equally likely to be picked.
Step 1) Calculate how long the interval is, in this case 150 - 100 = 50.
Step 2) In a new cell, multiply RAND() by the interval length, so 50 in this case.
Step 3) Add the lower end of the interval, so 100 in this case.
Step 4) Use Autofill to generate more if desired (see Basic Operations section).
Between 0 and 1 with a Specific Average:
Let's say we want to simulate values between 0 and 1 with a mean of 0.9.
Step 1) Calculate the power we need to use:
i) Calculate 1 divided by the desired mean.
ii) Subtract 1 from the result.
Step 2) In a new cell, exponentiate RAND() by the value from Step 1, in this case we get =RAND()^(1/0.9-1)"
Step 3) Use Autofill to generate more if desired (see Basic Operations section).
Step 4) Optionally, verify the process by calculating the average of the new data and checking that it's close to the desired mean.
Beta Distribution (Between 0 and 1 with a Specific Average and Variance):
Let's say we want to simulate values between 0 and 1 with a mean of 0.6 and a variance of 0.1.
Step 1) Calculate the alpha parameter using the formula =MEAN^2*( (1-MEAN)/VAR-1/MEAN) , where MEAN is the desired average, and VAR the desired variance.
Step 2) Calculate the beta parameter using the formula =ALPHA*(1/MEAN-1) , where MEAN is the desired average, and ALPHA is the result of Step 1.
Step 3) Use the formula =BETA.INV(RAND(), alpha, beta) to generate Beta random variables with the given alpha and beta parameters.
Step 3) Use Autofill to generate more if desired (see Basic Operations section as well as the Static Cells with Autofill).
Step 4) Optionally, verify the process by calculating the average and variance of the new data and checking that they are close to the desired values.
Step 1:
Step 2 not shown.
Step 3:
Scaled Beta Distribution (Between any Two Bounds with Specific Average and Variance):
This will be our general model for bounded data. Since the Beta distribution is defined between 0 and 1, we'll need to begin by scaling our data down to 0,1, and end by scaling it back up.
In this example, we will model scores of the SAT, an American college entrance exam, which are bounded between 400 and 1,600. We will give them a mean of 1,050 and a variance of 100^2 = 10,000.
Step 1) Calculate the average and variance of the data rescaled to being between 0 and 1.
step 1i) calculate the scaled average: =(MEAN-MIN)/(RANGE) , where MEAN is the desired average, MIN is the lower bound, and RANGE is the distance between the lower and upper bounds, MAX-MIN.
step 1ii) calculate the scaled variance: =VAR/RANGE^2 , where VAR is the desired variance and RANGE is the distance between the lower and upper bounds, MAX-MIN.
Step 2) Generate Beta numbers with the scaled mean and variance as described above in section Beta Distribution.
Step 3) Scale these data to your original range using the formula =RANGE*BETANUM+MIN, where BETANUM is the result of Step 2, and RANGE and MIN refer to the desired lower bound and the distance between the lower and upper bounds.
Normal Distribution:
This will be our default model and general model for unbounded data.
Step 1) Type =NORM.INV(RAND(),MEAN,STDEV) into a new cell, where MEAN is the desired average of the normal distribution, and STDEV is the desired standard deviation (NOT variance).
Step 2) Use Autofill to generate more if desired (see Basic Operations section).
Gamma Distribution:
This will be our default model and general model for positive data. To sample from a Gamma distribution, you first need to have a average and standard deviation in mind.
Step 1) Translate your mean and standard deviation into a alpha and beta parameters using the formulas ALPHA=(MEAN/STDEV)^2 and BETA=(MEAN^2/STDEV)
Step 2) Type =GAMMA.INV(RAND(),ALPHA,BETA) into a new cell, where ALPHA and BETA were calculated in the previous step.
Step 3) Use Autofill with static references to generate more if desired (see Basic Operations section).
Binomial Distribution:
Our default model for binary data.
Step 1) Type =BINOM.INV(SAMPLE_SIZE, PROP, RAND()), where SAMPLE_SIZE gives how many "trials"/the size of our sample, and PROP gives the probability of getting a "success". The result represents how many successes there were; to calculate the proportion, divide this by SAMPLE_SIZE.
Step 2) Use Autofill with static references to generate more if desired (see Basic Operations section).
Fixing RNG:
Excel by default changes the random numbers anytime anything is changed in any cell. This can potentially completely change your analysis, and require you to remake things like histograms each time. If this is frustrating, what you can do is copy and paste your random numbers in a special way that allows you to paste the actual value on screen rather than the formula.
Step 1) Click and drag to select the RNG cells you want to fix.
Step 2) Copy the selection, by (1) right clicking, then clicking "Copy".
Step 3) Right click, then click "Paste Special".
Step 4) In the window that appears, change the top selection from "All" to "Values". Then click "OK".
Statistical Analysis
Z Tests:
The Z test in Excel assumes a normal sampling distribution and assumes a > alternative distribution.
Step 1) Calculate the p value using =Z.TEST(SAMPLE, NULL_MEAN, NULL_STDEV), where SAMPLE is the sample you want to test, NULL_MEAN is the mean of your null distribution, and NULL_STDEV is the standard deviation of your null distribution. If you omit NULL_STDEV, it will be estimated from the data (as shown in the gif).
Step 2) If you instead want a < alternative, you need to calculate one minus the number we just calculated; =1-Z.TEST(SAMPLE, NULL_MEAN, NULL_STDEV).
Descriptive Statistics:
Step 1) Click the "Data" tab.
Step 2) Click the "Data Analysis" button on the far right (make sure the Analysis Toolpak is loaded).
Step 3) Click the Input Range box and highlight the data cells.
Step 4) Check the box next to Summary statistics.
Step 5) Click "OK"
A new page will open with the results.
Confidence Intervals:
Excel doesn't have built-in CI functionality, but we can make our own. We know confidence intervals are implicitly associated with two sided hypothesis tests.
To make a confidence interval, we need to have a confidence level CONF in mind.
Step 1) Determine the factor k: =NORM.INV(0.5+CONF/2,0,1). For example, if your confidence level was 0.95 (or 95%), we'd do =NORM.INV(0.5+0.95/2,0,1).
Step 2) Calculate your Standard Error. This will vary depending on what confidence interval you're building.
Step 3) Compute the upper and lower bounds via =POINT_ESTIMATE - k*SE and =POINT_ESTIMATE + k*SE, where POINT_ESTIMATE is what you're building a confidence interval on, k is from step 1, and SE is the standard error from step 2.
Two Sample Z Test:
Note: This is not applicable to paired data.
Step 1) Calculate the variances for your two samples individually.
Step 1) Click the "Data" tab.
Step 2) Click the "Data Analysis" button on the far right (make sure the Analysis Toolpak is loaded).
Step 3) Select "z-Test: Two Sample for Means".
Step 4) Highlight your two samples using "Variable 1/2 Range"
Step 5) Enter your "Hypothesized Mean Difference" (usually 0), then click OK.
Step 6) Input the variances you calculated in Step 1 (on my computer, I unfortunately have to do this manually). Then, click OK.
A new page will open with the results.
Two Sample T Test:
Step 1) Click the "Data" tab.
Step 2) Click the "Data Analysis" button on the far right (make sure the Analysis Toolpak is loaded).
Step 3) Select "t-Test: Two-Sample Assuming Unequal Variances".
Step 4) Highlight your two samples using "Variable 1/2 Range"
Step 5) Enter your "Hypothesized Mean Difference" (usually 0), then click OK.
A new page will open with the results.
Linear Regression:
Step 1) Click the "Data" tab.
Step 2) Click the "Data Analysis" button on the far right (make sure the Analysis Toolpak is loaded).
Step 3) Select "Regression".
Step 4) In the box next to "Input Y Range", highlight the variable you're trying to predict, a.k.a. the "output" or "response" variable.
Step 5) In the box next to "Input X Range", highlight one or more columns containing variables which will be used for prediction. Then click OK.
A new page will open with the results. The column under "Coefficients" gives the regression coefficients (a.k.a. "multipliers") and the intercept. The P values are given under the column "P-Value".