Extract values from page titles using regex

Scenario

You can extract certain values from page titles and then show them in different columns by manipulating Reporting outputs using regular expressions (regex).

In this scenario, we are going to use Page Supplier, Text Supplier, and Match Supplier supplier key and regular expressions (regex) to extract country name, code, and phone code from page titles.

The following example was implemented based on a page with the following structure:

As shown in the above screenshot, page titles have a format of "Country Name (Country Code-Phone Code)" and we will separate each of them by using regex and show them in different columns.


Result


Recipe

Ingredients

Apps

Platform
Server, Data Center
Level

INTERMEDIATE

Estimated time

20 minutes

Macros

Suppliers

Storage format

Macro structure

You can recreate the example in the editor view:

Steps

  1. Add a Report Table macro and within it add one Local Reporter macro and three Report Column macros.

  2. Set the Local Reporter macro's Key parameter to "page:children".

  3. Set the first Report Column macro's Title to "Page Title" and add a Report Info macro within it.
    Set the Report Info macro's Key to "page:title" and tick its Link to item parameter.

  4. Set the second Report Column macro's Title to "Country" and add a Report Info macro within it.
    Set the Report Info macro's Key to "page:title>match "(.*?)\\((.*?)-(.*?)\\)">group 1". 

  5. Set the second Report Column macro's Title to "Country Code" and add a Report Info macro within it.
    Set the Report Info macro's Key to "page:title>match "(.*?)\\((.*?)-(.*?)\\)">group 2".

  6. Set the third Report Column macro's Title to "Phone Code" and add a Report Info macro within it.
    Set the Report Info macro's Key to "page:title>match "(.*?)\\((.*?)-(.*?)\\)">group 3".

Regex explanation

(.*?)\\((.*?)-(.*?)\\) regex is used to group page titles output into three groups:

Group 1: (.*?)

To select all values until opening bracket '(' character 

Skipped: \\(

Opening bracket '(' character which has been excluded from groups

Group 2: (.*?)

To select all characters after opening bracket and before first dash '-' character 

Skipped: -

Dash '-' character which has been excluded from groups

Group 3: (.*?)

To select all values after first dash '-' character and before closing bracket ')'

Skipped:\\)

Closing bracket ')' character which has been excluded from groups