How to Create Sample Data Sets for Microsoft® Excel Spreadsheets
How to use Excel formulae, using the RANDBETWEEN, DATE and VLOOKUP functions, to quickly create meaningful sets of data for testing spreadsheets.
Last updated on 2020-06-10 by David Wallis.
Your spreadsheets need thorough testing before they can be relied upon. Testing requires data. Those data need to be representative of the data that users will input when your spreadsheet goes live.
If your spreadsheet is to record sales of office stationery items, say, then it’s better if the data refer to “pencils”, “paper” and “erasers” than to generic lists like “Item 1”, “Item 2” and “Item 3”.
In this article are suggestions on how to produce data that your users will recognise, so that those users can be reliably engaged in testing.
Excel’s RANDBETWEEN Function
We’ll use the RANDBETWEEN function extensively in creating data. RANDBETWEEN will return a whole number randomly drawn from between whatever lower and upper whole-number values you tell it:
Note that in use RANDBETWEEN will change the value it produces each time you update your spreadsheet. So we’ll need to suppress this action once our data set is complete.
Excel’s built-in calendar treats dates as numbers. The number one stands for 1900-01-01; and the calendar increments by one for each day since then. So this formula would create a random number representing a date between 2019-01-01 and 2019-12-31:
To appreciate the numbers, put today’s date in a cell (Ctrl + ;) and then apply the General number format to that cell.
Fiddling with actual numbers is a bit medieval. Using Excel’s DATE function is much more now. DATE has this structure:
So, for a random date in 2019:
That’s the formula you copy down the column in your data table headed “Invoice Date”, “Transaction Date”, “Date of Introduction” or whatever.
Let’s assume an Item could be a supplier, customer, product or service, and so on. Anything drawn from a list that you could input into Excel and use as the basis for a lookup table.
Here are the products from a company selling homes and wellbeing products for wild birds, input into Excel and provided with an ID number:
We give the cells A2 to B9 the range name nmProducts, which makes this formula easy to interpret:
The Data Set
Here’s an Excel spreadsheet of data created using the formulas discussed above:
Costs are an example of values that you need to appear in your data to, say, two decimal places. Since RANDBETWEEN returns whole numbers only, you use this formula to achieve values to pounds and pence, or to dollars and cents:
We may want prices for products and services:
We usually tie prices to products and services. Our range name nmProducts now extends across three columns:
We’re now challenged to have our product and its price for each transaction respond to the same random number.
Thus both the Product and Unit Price column draw on the same random number generated in the RAND column.
Completing the Data Set
We now have techniques for producing random, yet meaningful, rows of data that could be adapted to generate data sets of any sort. It just remains to freeze the data to remove the RANDBETWEEN functions that cause the data to change each time your spreadsheet recalculates.
But before doing that, I recommend you take a copy of the worksheet for future reference, so that you don’t need to re-create the formulas each time you want a new data set.
To freeze the data, simply select it, Copy and Paste Values. Then you can delete the RAND column.
Your Support for dmw TIPS
If you’re going to make use of Tips in any of your professional or commercial work, then please consider making a payment in recognition of the time spent publishing this website, keeping it free of advertising.
To make a contribution by PayPal in GBP (£ sterling) —
To make a contribution by PayPal in USD ($ US) —
To say how much you want to pay and to receive
a tax invoice for a GBP (£ sterling) contribution —
Thanks, in anticipation.