Extracting Dates from Unstructured Text Strings in Microsoft Excel
Excel formulas and a VBA function for isolating a date from a text string that mixes the date with other text in no particular order.
Last updated on 2019-10-29 by David Wallis.
In many of the spreadsheets I’ve been asked by clients to upgrade, users have been obliged to use a single cell to hold all sorts of information that should have been distributed across a number of cells in the row. In correcting matters — adding additional columns — there’s the challenge of splitting the original input into the new cells.
Often it’s a date that needs to be isolated from the text string that it occupies. Its location in the string, and even its format, may make isolatiing the date an extra challenge, as in this selection:
- TPB/SG 14/05/2017
- 29/04/2017 AH
- SG/DP phone Jill first
- 13/04/2017 Bob Smith
- Bob Smith
- | Mae Greene
- RB toDp
- BK to PCAT
- Steve Dicks/SG
- GG +4(0)632900843
- Andy Pink & Gina Blue
- Pete 5/9
No doubt there are other strings that would be useful for testing formulas and code. Your speadsheets might contain variations that you’d need to include for testing.
Strategy and Limitations
These are the conditions I’m proposing for deciding how our formula determines there’s a date to lift out of a text string:
- Text is identified as containing a date when there’s a character sequence of ?#/?#/??## (# is a digit, ? is a digit or is missing)
- The supposed date is bounded by spaces, with the one exception of the date occupying first place in the string
- It’s too risky making the assumption that 5/9, for example, represents a date. So character sequences bounded by spaces and having one / only are not treated as dates
- If a string appears to contain more than one date, then its the first one that’s dug out in preference
- If the ?#/?#/??## sequence suggests a date but that is not an actual date, 29/02/2019, for example, then the formula ignores it.
Further down the page we’ll use VBA to lift some of these conditions, to make a function that is more general purpose that our formula.
Making a Start on dd/mm/yyyy Dates
Here we use Excel’s SEARCH function to locate the position at which a /??/ character sequence, if any, is included in the the string:
We’ve identified the three text strings that possibly contain a date, and marked up the other ones that certainly do not. Because we know that a dd/mm/yyyy contains 10 characters we can isolate them:
Thus isolated, the dates are in text form. They’re not dates that Excel would recognise as such.
Convert Date Text Strings into Excel Dates
Excel’s DATEVALUE function converts a data text into a numeric representation of a date:
Excel works dates based on its built-in calendar, which starts in the year 1900 and extends to year 9999. 01/Jan/1900 is day one, 02/Jan/1900 is day two, and so on, adding one to the number for each day further on.
The General number format is applied to Column C. That’s why Excel displays the results in cells C2, C3 and C5 as numbers. These represent the number of days elapsed since the start of Excel’s calendar.
Managing the #VALUE! Warning
We use the IFERROR function to suppless the #VALUE! formula warning (more on formula warnings in Error Warnings)
Having investigated techniques for isolating dates, we combine them into a single formula:
Changing the number format applied to Column B to Short Date to make Excel display those numbers as what we recognise as dates:
The formula works for the ##/##/#### dates in the text strings I’ve illustrated. Please test it your own strings before you come to rely upon the formula.
Also, you should test the formula, and possibly modify it, on your dates if they are not, like mine, in the dd/mm/yyyy format. Go to Window’s Settings and choose Time & Language to check your date formats:
The above formats are the ones Excel is adopting for the spreadsheets illustrated in this Excel Tip.
Other Date Formats
Our formula successfully isolates ##/##/#### sequences from the text string. But now what we need to do is extend it to accommodate other legitimate sequences. Trying it out on some of these:
We’re going to need a lot of If ... ElseIf ... Else stuff if we’re going to make the formula general purpose. Accepted, Excel 2016 can accommodate up to 32,767 characters in a cell (early versions of Excel, 255), but the task of copying this formula and adapting it for other worksheets, without fault, would make my head hurt. So, at this point, I turn to VBA.
Setting the Particulars for Our VBA Function
These are the parameters for our VBA function that is to isolate a potential date from a text string:
- The text string of potential date must contain two forward slashes (/)
- All forms ?#/?#/??## must be accommodated (# is a digit, ? is a digit or is omitted)
- If the text string contains more than one date, then the function returns the first of them.
These are circumstances that I can think of in which the code should not be allowed to extract a date:
- Strings that contain a profusion of forward slashes, for example P/BS/G 1/12/1/1914/12/2018. Though it could extract potential dates (P/BS/G 1/12/1/1914/12/2018 or P/BS/G 1/12/1/1914/12/2018), I can’t convince myself that any such subjective interpretation should be left open to the function
- The typo “O” (Oscar) instead of a zero, per 29/O4/2017 AH. Tempting, but how far should double-guessing go?
- What is almost certainly meant to be a date, but has been mis-typed, for example 29/02/2017 AH. Same reasoning as §2
Summarising the above:
- The text string must contain a sub-string of the form ?#/?#/??##
- The sub-string must be bounded by spaces, except at the beginning (a space to the right) and the end of the text string (a space to the left). Though restrictive, I’m content with this condition because in the majority of spreadsheets I’ve worked on the users have been methodical in using spaces to distinguish dates from other text.
Additional Challenges for the VBA
This is the set of text strings that I’m working with in order to demonstrate the VBA:
- TPB/SG 14/05/2018
- TPB/SG 14/05/17
- TPB/SG 1/05/2017
- TPB/SG AH14/05/2017
- 29/04/2017 AH
- 2/04/2017 AH
- 02/04/2017 AH
- TPB/SG 14/05/17AH
- TPB/SG AH1/05/2017
- TPB/SG 14/5/2017 AH
- Pete 5/9/19
- Pete 5/9 AH
- P/BS/G 1/12/1/1914/12/2018
- P/BS/G 1/12/1/19 14/12/2018
- P/BS/G 01/02/1914/12/2018
- P/BS/G 1/12/1928 14/12/2018
- P/BS/G 01/02/1914// 12/1
- P/BS/G 01/02/1914A2019
You’ll need to add patterns that I’ve not included when you try the VBA function on your strings. But what I hope you see I’m doing is provide strings that might challenge any function you might be considering for the job.
This is my current VBA function:
This is the function applied to the list of text strings:
What we see is the function returning a serial number for any text that it recognises as a date and a zero for any string that it considers does not contain a date. Now we need to turn the serial number into a recognisable date and to suppress the zero:
Column C has the Short Date format applied to its cells. Now we can replace Columns B and C with a single, new column B:
Your Support for dmw TIPS
If you’re going to make use of Tips in any of your professional or commercial work, then please consider making a payment in recognition of the time spent publishing this website.
To make a contribution by PayPal in GBP (£ sterling) —
To make a contribution by PayPal in USD ($ US) —
To say how much you want to pay and to receive
a tax invoice for a GBP (£ sterling) contribution —
Thanks, in anticipation.
If there’s enough interest and encouragement, I’ll explain how the VBA works and how you might modify it to suit your own spreadsheets.