Why do join types matter in a SAP HANA information view


What are we joining?

When you design an SAP HANA information view, we must also define joins between tables or attribute views. These joins can be configured in the foundation of attribute views and analytic views. They can also logically occur in analytic and calculation views.

Why do joins matter?

In some cases, joining tables can be an expensive process in the SAP HANA engines. This is especially true when the cardinality between tables is very high. For example, an order header table with 20 million rows is joined to an order line detail table with 100 million rows. Excessive joining in the model is also something that can lead to performance issues. Excessive joining is something that occurs often when we are attempting to denormalized OLTP data using SAP HANA’s information views.

What is the solution to work around join cost?

Join pruning is a process where SAP HANA eliminates both tables and joins from its execution plan based how the joins are defined in the information view and how data is queried from the information view.

For example, let’s assume that we have a customer table or attribute view configured with a left outer join to a sales transaction table in our analytic view.  If we query the information view and only select a SUM(SALES_AMOUNT) with no GROUP BY or WHERE clause, then the execution plan might skip the join process between the customer table and the sales transaction table. However, if we execute SELECT CUSTOMER_NAME, SUM(SALES_AMOUNT) then the join between the customer table and sales transaction will be required regardless of the join type used.

The process is subjective and related to how the joins are configured in our information view. It is also subjective to how the information view is queried. If every query we execute selects all attribute and measures, then join pruning would never be invoked. Fortunately, in the real world, this is not typically the case. Most queries only select a subset of data from the information view. For this reasons, we should try to design our information views to take advantage of join pruning.

What join options do I have?

There are five main types of joins to look at in this context. Inner Joins, Referential Joins, Left Outer, Right Outer and Text Joins. Below is an outline on how SAP HANA handles each join type.

INNER JOIN

Inner Joins are always evaluated in the model. No join pruning occurs because an inner join effectively tells the SAP HANA engine to always join no matter what query is executed. Records are only returned when a matching value is found in both tables. Overall they are one of the most expensive join types that can be defined in an information view. They also have the potential to remove records from the results assuming that referential integrity is poorly maintained in the source tables.

REFERENTIAL JOIN

For all intents and purposes, the referential join acts like an inner join. However, under specific circumstances, the referential join will be pruned from the execution of the information view. Assuming that there are no filters defined in an attribute view’s foundation, the SAP HANA engines will likely prune the join from the execution plan of the analytic view. This assumes that no columns are queried within the same attribute view.  However, if a column from the attribute view is included in a query, this join will act as an inner join.

Let me reiterate the exception to the pruning rule one more time. If filters are defined in the attribute view, the join will be included and enforced as an inner join even if no attribute view columns are selected. This means that records are only returned when a match is found in both the attribute view and analytic view foundation table. It also means that referential joins are not always pruned from the information view execution plan. Pay close attention to this exception because referential joins can be just as expensive as inner joins.

When a referential join is pruned we have to make sure that the referential integrity between tables is sound. If it is not, we might get varying results based on the columns selected in our queries. For example, a transaction might not have a shipped date defined. When we query for transactions with a shipped date, this transaction would be excluded. When we exclude any shipped date attributes from our query, the transaction would magically appear again because of join pruning. Based on experience, this can confuse the heck out of an end user… or a developer 🙂

LEFT OUTER

Based on several query scenarios that I tested, the left outer join is always pruned from the execution plan assuming that no columns are selected from an attribute view defined within the model. Left outer joins are often defined in instances where the analytic view foundation table is known to have no matching record within an attribute view. By design, they prevent the model from excluding transactions. This is because it will ensure that all possible transactions are returned, even if we are missing master data. Overall I would say that the left outer join has the potential to be the best performing join within our information view definition.

However, we have to make sure that it makes sense before we arbitrarily set up our models with left outer joins. We sometime need the inner join to help exclude records. For example, let’s assume that we have a date attribute view joined to a data column in our analytic view foundation table. Using an inner join we can exclude any transaction that has no defined date. There are other instances where we might find phantom transactions in our aggregates with a left outer join. For example, a transaction that has no customer, date or product. In those cases, something was likely only partially deleted from the system. Therefore, referential integrity issues can create problems.If you recall a similar issue can happen with referential joins. However, with left outer joins we will always return the transaction.

RIGHT OUTER

The right outer join is used in instances where all attribute view values need to be returned even if they have no matching transactions.  For example, I need a complete list of all customers even if they have never been issued an invoice. This would be helpful if the query were targeting both existing and potential customers.

Keep in mind that a mismatched transaction will be excluded from any analytic view’s aggregates. Based on my testing, a right outer join is always performed in the execution of the model. I was not able to devise a single scenario where the join was pruned from the execution of the model. With this in mind, it will have a high cost when used in the information view definition.

TEXT JOIN

This is effectively an inner join setup specifically to work with the SPRAS field found in SAP BusinessSuite tables. Based on my brief testing, it is always executed and never pruned from the model.

3 comments

  1. Dear Jonathan,
    Wonderful article! Very well articulated with minimum words.

    Just wondered – you said, Text Join is effectively an inner join. Is it inner or left outer join, could you please confirm? As a developer, I would expect the articles to be appearing in output with their descriptions in English. In case, English description is not maintained for an article, I would still prefer that article to appear in output with blank description rather than missing it out totally from output. Your thoughts please.

  2. Look at the options for Static Join Versus Dynamic Joins

    •In static joins, the join condition isn’t changed, irrespective of the client query.
    •In a dynamic join, if the client query to the join doesn’t request a join column, a query run time error occurs. This behavior of dynamic join is different from the static joins.
    •Dynamic joins enforces aggregation before executing the join, but for static joins the aggregation happens after the join. This means that, for dynamic joins, if a join column is not requested by the client query, its value is first aggregated, and later the join condition is executed based on columns requested in the client query.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s