Extract delimited data using Microsoft Excel Power Query


Microsoft Excel logo on textured paper.
Image: Renan/Adobe Stock

Quite often, we receive data in the form of characters strung together. For our purposes, we might not need the full string, but only a portion. For instance, you might receive a list of transaction numbers, which in part, contain the customer identification number. Furthermore, you need only the customer portion to create a relationship between that customer and a table that contains the customer name.

SEE: Google Workspace vs. Microsoft 365: A side-by-side analysis w/checklist (TechRepublic Premium)

In this Excel tutorial, I’ll show you how to use Power Query’s Extract and Split Column features to extract delimited strings into their components. I’m using Microsoft 365 Desktop and Power Query in Microsoft Excel. Power Query is available in older versions through Excel 10. You can download the Microsoft Excel demo file for this tutorial.

Why you should use Excel’s Power Query

You can use Excel string functions, Text to Columns or Flash Fill, but here are reasons you might not:

  • Text to Columns writes over the original data.
  • Your data might not be in Excel, as although Power Query is available in Excel, Power Query can import data from lots of sources — not just Excel.
  • The source data contains more rows that you can import into Excel.
  • You might need to use Power Query for something much more complex and extracting a subset of the original entry is just the first step.

If the data is in Excel, you might use functions or formulas, but unless you’re an expert, that will take a bit of time. Most of us can’t just rattle off the necessary syntax and get it right the first time. Power Query is quick and requires no specialized knowledge of Excel functions.

How to get the data into Power Query

We’ll be working with a simple Excel sheet with a few delimited strings in a Table named TableCustomerID. You don’t have to replace the default Table name, but meaningful names are easier to work with if you have multiple Tables.

Let’s suppose you have a list of customer identification numbers with three sections each. Furthermore, a hyphen character serves as a delimiter between the three sections (Figure A). You want to use the middle component of each string because that’s the section that actually identifies each customer. The other two components identify the region where the customer resides and a transaction number.

Figure A

Load the Excel data into Power Query.
Load the Excel data into Power Query.

The first step is to load the data into Power Query as follows:

1. Click anywhere inside the Table.

2. Click the Data tab.

3. In the Get & Transform Data group, click From Table/Range.

That’s it. The simple Table shown in Figure A is now in Power Query.

With the data in Power Query, you can start extracting sections.

How to extract delimited strings using Extract options in Power Query

There are really two ways to extract data in Power Query. We’ll begin by using Extract options, which returns a subset of the original value. To do so, click the Transform tab and then click the Extract dropdown in the Text Column group.

As you can see in Figure B, there are several options and they’re all self-explanatory for the most part. We’re going to look at the delimiter options so you can see what each one does.

Figure B

There are several Extract options.
There are several Extract options.

After clicking the header of the Customer ID field to select the column, click the Transform tab, if necessary and then click Extract in the Text Column Group. Choose the Text Before Delimiter option. In the resulting dialog, enter the hyphen character (Figure C) and click OK.

Figure C

Enter the hyphen character as the delimiter.
Enter the hyphen character as the delimiter.

As you can see in Figure D, this option returns only the first character(s) before the delimiter.

Figure D

This option returns only one character for each string.
This option returns only one character for each string.

To reclaim the original data, delete the Extracted Text Before Delimiter step in the Applied Steps pane shown in Figure E.

Figure E

Simply select it and press Delete.
Simply select it and press Delete.

Now, let’s do the same thing with the next option, Text After Delimiter. When prompted, enter the hyphen character and click OK to see the results shown in Figure F. This time, Power Query removes the first two characters, the first number and the first hyphen.

Figure F

This option returns all the characters after the first hyphen character.
This option returns all the characters after the first hyphen character.

Once again, reclaim the original data by deleting the extract step and then choose the Text Between Delimiters option. This time, Power Query prompts for supply two delimiters. In this case, they’re both the hyphen character (Figure G).

Figure G

Enter both the start and end delimiter.
Enter both the start and end delimiter.

Click OK to see the results shown in Figure H.

Figure H

The last option returns the characters between the two hyphen characters.
The last option returns the characters between the two hyphen characters.

Now we’ll look at another way to divide the three sections of each string, but we won’t extract pieces from the strings, we’ll split the strings. Reclaim the original data before you continue.

How to extract delimited strings using Split Column in Power Query

Power Query’s Split Column helps you return more than a single piece of the string. For instance, let’s suppose you want three columns of data, one for each section. To accomplish this, use Split Column as follows:

1. After selecting the column, click the Home tab.

2. In the Transform group, click Split Column.

3. Click the first option, By Delimiter.

4. In the resulting dialog, you don’t need to do much because Power Query does a good job of discerning your needs. Make sure Power Query selects Each Occurrence of the Delimiter in the Split At section (Figure I).

Figure I

Choose the option that uses all the delimiter characters.
Choose the option that uses all the delimiter characters.

5. Click OK to see the results in Figure J.

Figure J

This option separates the three sections into three columns.
This option separates the three sections into three columns.

This option separates each string into three columns using the delimiter character to determine where each section begins and ends.

As you can see, both Extract and Split Column help you separate data quickly. You’re likely to run into uses for both.



Source link