Tableau Prep has improved considerably since it was launched. It offers a lot of tools to pre-process and transform your data. But sometimes, you need to apply a function that has no equivalent in Prep, want to extract some data using an API interface. The main differences between Tableau’s new Tableau Prep tool and data prep from within Tableau Desktopare in the presentation and in the number of options available. One such difference is that you can connect to 70 different data sources in Tableau Desktop and in the first production release of Tableau Prep, you can connect to 28 data sources. Pivot rows to columns (Tableau Prep Builder version 2019.1.1 and later and on the web). No matter how you pivot your fields, you can interact directly with the results and perform any additional cleaning operations to get your data looking just the way you want it. The main differences between Tableau’s new Tableau Prep tool and data prep from within Tableau Desktopare in the presentation and in the number of options available. One such difference is that you can connect to 70 different data sources in Tableau Desktop and in the first production release of Tableau Prep, you can connect to 28 data sources. Tableau Prep enables users to get to the analysis phase faster by helping them quickly combine, shape, and clean their data. According to the vendor, a direct and visual experience helps provide users with a deeper understanding of their data, smart features make data preparation simple, and integration with the Tableau analytical workflow allows for faster speed to insight.
Note: Starting in version 2020.4.1, you can now create and edit flows in Tableau Server and Tableau Online. The content in this topic applies to all platforms, unless specifically noted. For more information about authoring flows on the web, see Tableau Prep on the Web.
Sometimes analyzing data from a spreadsheet or crosstab format can be difficult in Tableau. Tableau prefers data to be 'tall' instead of 'wide', which means that you often have to pivot your data from columns to rows so that Tableau can evaluate it properly.
However you may also have scenarios where your data tables are tall and narrow and are too normalized to properly analyze. For example a sales department that tracks advertising spend in two columns, one called Advertising that contains rows for radio, television and print and one column for total spent. In this type of scenario, to analyze this data as separate measures you would need to pivot that row data to columns.
But what about pivoting larger data sets or data that changes frequently over time? You can use a wildcard pattern match to search for fields that match the pattern and automatically pivot the data.
Use one of the following options when pivoting your data:
Pivot columns to rows
Use wildcard search to instantly pivot fields based on a pattern match (Tableau Prep Builder version 2019.1.1 and later and on the web).
- Pivot rows to columns (Tableau Prep Builder version 2019.1.1 and later and on the web).
No matter how you pivot your fields, you can interact directly with the results and perform any additional cleaning operations to get your data looking just the way you want it. You can also use Tableau Prep's smart default naming feature to automatically rename your pivoted fields and values.
Pivot columns to rows
Use this pivot option to go from wide data to tall data. Pivot from columns to rows on one or more groups of fields. Select the fields that you want to work with and pivot the data from columns to rows.
Connect to your data source.
Drag the table that you want to pivot to the Flow pane.
Do one of the following:
- Tableau Prep Builder Version 2019.4.2 and later and on the web: In the Profile pane, select the fields that you want to pivot, then right-click or Ctrl-click (MacOS) and select Pivot Columns to Rows from the menu. If using this option, skip to step 7.
All versions: Click the plus icon, and select Add Pivot from the context menu.
Select Fields (Tableau Prep Builder version 2019.4.2 and later and on the web) Flow Step Menu (all versions)
(Optional) In the Fields pane, enter a value in the Search field to search the field list for fields to pivot.
- (Optional) Select the Automatically rename pivoted fields and values check box to enable Tableau Prep to rename the new pivoted fields using common values in the data. If no common values are found, the default name is used.
Select one or more fields from the left pane, and drag them to the Pivot1 Values column in the Pivoted Fields pane.
(Optional) In the Pivoted Fields pane, click the plus icon to add more columns to pivot on, then repeat the previous step to select more fields to pivot. Your results appear immediately in both the Pivot Results pane and the data grid.
Note: You must select the same number of fields that you selected in Step 5. For example if you selected 3 fields to initially pivot on, then each subsequent column that you pivot on must also contain 3 fields.
If you didn't enable the default naming option or if Tableau Prep couldn't automatically detect a name, edit the names of the fields. You can also edit the names of the original fields in this pane to best describe the data.
(Optional) Rename the new Pivot step to keep track of your changes. For example 'Pivot months'.
To refresh your pivot data when data changes, run your flow. If new fields are added to your data source that need to be added to the pivot, manually add them to the pivot.
Example: Pivoting on multiple fields
This example shows a spreadsheet for pharmaceutical sales, taxes and totals by month and year.
By pivoting the data you can create rows for each month and year and individual columns for sales, taxes and totals so that Tableau can more easily interpret this data for analysis.
Watch 'pivot on multiple field' in action.
Use wildcard search to pivot
If you work with large data sets or if your data frequently changes over time, starting in Tableau Prep Builder version 2019.1.1 and on the web, you can use a wildcard search when pivoting columns to rows to instantly pivot your data based on a wildcard pattern match.
If new fields are added or removed that match the pattern, Tableau Prep detects the schema change when the flow is run and the pivot results are automatically updated.
Connect to your data source.
Drag the table that you want to pivot to the Flow pane.
Click the plus icon, and select Add Pivot from the context menu.
In the Pivoted Fields pane, click on the link Use wildcard search to pivot .
Enter a value or partial value that you want to search for. For example, enter Sales_ to match fields that are labeled as sales_2017, sales_2018 and sales_2019.
Do not use asterisks to match the pattern unless they are part of the field value that you are searching for. Instead click the Search Options button to select how you want to match the value. Then press Enter to apply the search and pivot the matching values.
(Optional) In the Pivoted Fields pane, click the plus icon to add more columns to pivot on, then repeat the previous step to select more fields to pivot.
If you didn't enable the default naming option or if Tableau Prep couldn't automatically detect a name, edit the names of the fields.
To refresh your pivot data when data changes, run your flow. Any new fields added to your data source that match the wildcard pattern are automatically detected and added to the pivot.
If the results aren't what you expect, try one of the following options:
Enter a different value pattern in the Search field and press enter. The pivot will automatically refresh and show the new results.
Manually drag additional fields to the Pivot1 Values column in the Pivoted Fields pane. You can also remove fields that were added manually by dragging them off the Pivot1 Values column and dropping them in the Fields pane.
Note: Fields that were added from the wildcard search results can't be removed by dragging them off the Pivot1 Values column. Instead try using a more specific pattern to match the search results you are looking for.
Pivot rows to columns
In Tableau Prep Builder version 2019.1.1 and later and on the web, pivot rows to columns if your data is too normalized and you need to create new columns - going from tall data to wider data.
Download Tableau Prep
For example if you have advertising costs for each month that includes all advertising types in one column, if you pivot the data from rows to columns you can then have a separate column for each advertising type instead, making the data easier to analyze.
Tableau Prep Data
You can select one field to pivot on. The field values for that field are then used to create the new columns. Then, select a field to use to populate the new columns. These field values are aggregated and you can select the type of aggregation to apply.
Because aggregation is applied, pivoting columns back to rows won't reverse this pivot action. To reverse a row to column pivot type, you will need to undo the action. Either click the Undo button on the top menu, remove the fields from the Pivoted Fields pane or delete the pivot step.
Connect to your data source.
Drag the table that you want to pivot to the Flow pane.
Click the plus icon, and select Add Pivot from the context menu.
In the Pivoted Fields pane, select Rows to Columns from the drop-down list.
- (Optional) In the Fields pane, enter a value in the Search field to search the field list for fields to pivot
Select a field from the left pane, and drag it to the Field that will pivot rows to columns section in the Pivoted Fields pane.
Note: If the field you want to pivot on has a data type of date or datetime, you will need to change the data type to string to pivot it.
The values in this field will be used to create and name the new columns. You can change the column names in the Pivot Results pane later.
Select a field from the left pane and drag it to the Field to aggregate for new columns section in the Pivoted Fields pane. The values in this field are used to populate the new columns created from the previous step.
A default aggregation type is assigned to the field. Click the aggregation type to change it.
In the Pivot Results pane, review the results and apply any cleaning operations to the new columns that you created.
If the field being pivoted has a change in its row data, right-click or Ctrl-click (MacOS) on the Pivot step in the flow pane and select Refresh.
Disclaimer: This topic includes information about a third-party product. Please note that while we make every effort to keep references to third-party content accurate, the information we provide here might change without notice as python changes. For the most up-to-date information, please consult the python documentation and support.
Python is a widely used high-level programming language for general-purpose programming. By sending Python commands to an external service through Tableau Prep Builder, you can easily extend your data preparation options by performing actions like adding row numbers, ranking fields, filling down fields and performing other cleaning operations that you might otherwise do using calculated fields.
To include Python scripts in your flow, you need to configure a connection between Tableau and a TabPy server. Then you can use Python scripts to apply supported functions to data from your flow using a pandas dataframe. When you add a script step to your flow and specify the configuration details, file, and function that you want to use, data is securely passed to the TabPy server, the expressions in the script are applied, and the results are returned as a table that you can clean or output as needed.
You can run flows that include script steps in Tableau Server as long as you have configured a connection to your TabPy server. Running flows with script steps in Tableau Online, isn't currently supported. To configure Tableau Server, see Configure the Tableau Python (TabPy) server for Tableau Server.
Prerequisites
To include Python scripts in your flow, complete the following setup. Creating or running flows with script steps in Tableau Online isn't currently supported.
Download and install Python(Link opens in a new window). Download and install the most current version of Python for Linux, Mac or Windows.
Download and install the Tableau Python server (TabPy(Link opens in a new window)). Follow the installation and configuration instructions for installing TabPy. Tableau Prep Builder uses TabPy to pass data from your flow through TabPy as the input, applies your script, then returns the results back to the flow.
- Install Pandas. Run
pip3 install pandas
. You must use a pandas data frame in your scripts to integrate with Tableau Prep Builder.
Configure the Tableau Python (TabPy) server for Tableau Server
Use the following instructions to configure a connection between your TabPy server and Tableau Server.
- Version 2019.3 and later: You can run published flows that include script steps in Tableau Server.
- Version 2020.4.1 and later: You can create, edit, and run flows that include script steps in Tableau Server.
- Tableau Online: Creating or running flows with script steps isn't currently supported.
- Open the TSM command line/shell .
Enter the following commands to set the host address, port values and connect timeout:
tsm security maestro-tabpy-ssl enable --connection-type {maestro-tabpy-secure/maestro-tabpy} --tabpy-host <TabPy IP address or host name> --tabpy-port <TabPy port> --tabpy-username <TabPy username> --tabpy-password <TabPy password> --tabpy-connect-timeout-ms <TabPy connect timeout>
- Select
{maestro-tabpy-secure}
to enable a secure connection or{maestro-tabpy}
to enable an unsecured connection. - If you select
{maestro-tabpy-secure}
, specify the certificate file-cf<certificate file path>
in the command line. - Specify the
--tabpy-connect-timeout-ms <TabPy connect timeout>
in milliseconds. For example--tabpy-connect-timeout-ms 900000
.
- Select
To disable the TabPy connection enter the following command
tsm security maestro-tabpy-ssl disable
Create your python script
When you create your script, include a function that specifies a pandas (pd.DataFrame) as an argument of the function. This will call your data from Tableau Prep Builder. You will also need to return the results in a pandas (pd.DataFrame) using supported data types.
For example to add encoding to a set of fields in a flow, you could write the following script:
The following data types are supported:
Data type in Tableau Prep Builder | Data type in Python |
---|---|
String | Standard UTF-8 string |
Decimal | Double |
Int | Integer |
Bool | Boolean |
Date | String in ISO_DATE format “YYYY-MM-DD” with optional zone offset. For example, “2011-12-03” is a valid date. |
DateTime | String in ISO_DATE_TIME format “YYYY-MM-DDT:HH:mm:ss” with optional zone offset. For example, “2011-12-03T10:15:30+01:00” is a vslid date. |
Note: Date and DateTime must always be returned as a valid string.
If you want to return different fields than what you input, you'll need to include a get_output_schema function in your script that defines the output and data types. Otherwise, the output will use the fields from the input data, which are taken from the step that is prior to the script step in the flow.
Use the following syntax when specifying the data types for your fields in the get_output_schema:
Function in Python | Resulting data type |
---|---|
prep_string () | String |
prep_decimal () | Decimal |
prep_int () | Integer |
prep_bool () | Boolean |
prep_date () | Date |
prep_datetime () | DateTime |
The following example shows the get_output_schema function added to the field encoding python script:
Connect to your Tableau Python (TabPy) server
Important: Starting in Tableau Prep Builder version 2020.3.3, you can configure your server connection once from the top Help menu instead of setting up your connection per flow in the Script step by clicking Connect to Tableau Python (TabPy) Server and entering your connection details. You will need to reconfigure your connection using this new menu for any flows that were created in an older version of Tableau Prep Builder that you open in version 2020.3.3.
- Select Help > Settings and Performance > Manage Analytics Extension Connection.
In the Select an Analytics Extension drop-down list, select Tableau Python (TabPy) Server.
- Enter your credentials:
- Port 9004 is the default port for TabPy.
- If the server requires credentials, enter a username and password.
If the server uses SSL encryption, select the Require SSL check box, then click the Custom configuration file.. link to specify a certificate for the connection.
Note: Tableau Prep Builder doesn't provide a way to test the connection. If there is a problem with the connection an error message shows.
Add a script to your flow
Start your TabPy server then complete the following steps:
Note: TabPy requires tornado package version 5.1.1 to run. If you receive the error 'tornado.web' has no attribute 'asynchronous' when trying to start TabPy, from the command line run pip list
to check the version of tornado that was installed. If you have a different version installed, download the tornado package version 5.1.1(Link opens in a new window). Then run pip uninstall tornado
to uninstall your current version, then run pip install tornado5.1.1
to install the required version.
Open Tableau Prep Builder and click the Add connection button.
In web authoring, from the Home page, click Create > Flow or from the Explore page, click New > Flow. Then click Connect to Data.
From the list of connectors, select the file type or server that hosts your data. Explain everything auf mac. If prompted, enter the information needed to sign in and access your data.
Click the plus icon, and select Add Script from the context menu.
In the Script pane, in the Connection type section, select Tableau Python (TabPy) Server.
- In the File Name section, click Browse to select your script file.
Enter the Function Name then press Enter to run your script.