Data Modeling vs Analytic Modeling in SAP HANA


I have been working with clients and helping them implement solutions on SAP HANA standalone for about four months. I have learned quite a bit from firsthand experience in dealing with items ranging from loading data, configuring security, and creating multi-dimensional models within HANA. I have also seen firsthand the potential HANA has to be a game changer in terms of data processing and analysis. In preparation for these projects I have conducted an exorbitant amount of research to help me gain a deeper understanding of the engines, capabilities and technology within HANA. I am now at a point where I feel that I have done my homework and can comment on a few observations based on both my experiences and research. The first item that I believe to be an important discussion point is differentiating the terms HANA Modeling and good old fashion Data Modeling. With the current version of HANA there is still a difference, and I will explain those differences in this article. For the record, this article is centered on standalone SAP HANA and not BW Powered by HANA. There is a significant difference between the two solutions and most aspects of this article will not apply to companies that plan to implement BW Powered by HANA.

Before I dig deep into the differences, let me define what I mean by Data Modeling. In general terms, Data Modeling is a mix of both business processes and ETL technology that results in the transformation of normalized and disparate data into a clean and organized structure suitable for decision making. It plays a key role in an organization’s Data Governance program. In many cases the result of a Data Modeling exercise will be a data warehouse or data mart. However, the primary purpose of the Data Modeling task is to provide consistent and quality access to data. Without following these processes, Business Intelligence for large and sometimes small organizations will be anything but Intelligent. I have seen this processes overlooked many times over the past decade and can attest, that organizations that fail to implement a solid data foundation will flounder in delivering meaningful analytics. If you would like to dive deeper into the various facets of Data Modeling, there is a descent description of it available on Wikipedia . In terms of this article and SAP HANA, I am referring to utilizing this process to transform, clean and load data into a physical star schema model utilizing HANA columnar tables as the target. If you have done any research or seen presentations on SAP HANA you are likely aware that the HANA studio client tool contains a subset of modeling tools and objects. HANA modeling allows you to create rich multi-dimensional analytics and views of data. The product is very similar to OLAP by result but fundamentally different in terms of data storage and indexing. Most OLAP tools on the market today will build a proprietary database called a Cube, and store it on a disk array (MOLAP). User will then access this data via OLAP tools or OLAP middleware. HANA modeling does not create persistent cubes, but rather allows developers to define metadata that produces CUBE like features. HANA will use this metadata and its calculation engines to retrieve data from the base in-memory columnar tables at lightning fast speeds. I could spend several thousand additional words describing all the reasons that HANA is revolutionary in terms of aggregating data, but there is more than enough information available on SAP SDN to explain how HANA works. For this article, I simply want to describe the differences between traditional Data Modeling and modeling analytics on SAP HANA. If you plan to implement solutions on SAP HANA standalone you need to understand that HANA alone does not solve problems dealing with the structure, organization or quality of your data. With that said, HANA will provide the ability to analyze, aggregate, compare, calculate and forecast extremely large data volumes at speeds unlike any database currently available. It truly is unique in terms of how it has merged in-memory database storage, parallelization of calculations and OLAP processing into a single solution. However, HANA is not an ETL tool and it is not a solution that allows you to forgo Data Governance and Data Modeling. To emphasize this point I want to speak technical about the types of transformations and de-normalization that cannot be achieved by HANA alone (in its current release). I will also review the types of transformations and de-normalization that can be achieved by HANA today. Based on my experience and research there are three types of transformations that will have to be solved outside of HANA. There are more, but I will remain focused on the actual problems I have experienced.

In all of my projects to date, we selected Data Services 4.0 to load data into HANA. This turned out to be a wise decision, because it was able to help manage the complex data modeling problems that occurred due to the normalized nature of the data we loaded.

Transformation Example 1

The first problem comes in the form of normalized hierarchical data. Hierarchal data can be stored in relational tables in two extremes. The first extreme is based on a normalized approach. This is a technique where records are stored with a parent child relationship. In the example below each ITEM ID is stored with its PARENT ITEM ID, TYPE and DESCIRPTION. This is not a very practical example, but given the number of NDRs I have signed to date, I want to make sure that I don’t disclose any customer data or data structures. Hopefully you can use your imagination.

ITEM ID PARENT ITEM ID ITEM DESCIRPTION ITEM TYPE
1 North America Continent
2 1 Florida Stage / Region
3 2 Miami City
4 1 Georgia Stage / Region
5 4 Atlanta City
6 1 Texas Stage / Region
7 6 Dallas City
8 4 123 Main St Address
9 2 4356 Palms Ave. Address
10 6 456 Cowboy Way Address
11 4 123 Jones St Address

This storage method will reduce the overall size of the data within the table by eliminating duplicate character values. It will also increase the performance of inserting new values because you only have to perform a single row insert or update when modifying the relationships. This technique is very typical and you will see it implemented in OLTP optimized databases or source system applications for a variety of reasons. The other extreme of storing hierarchical data comes in a more de-normalized format. In this example the data values will be stored in a way that is highly column repetitive but low record cardinality. In this example the ADDRESS ID is the primary key of the table.

Address ID Address City Name State / Region Continent
8 4356 Palms Ave. Miami Florida North America
9 123 Main St Atlanta Georgia North America
10 456 Cowboy Way Dallas Texas North America
11 123 Jones St Atlanta Georgia North America

If you think in terms of creating Dimensions in a star schema, the de-normalized form of this data is required to eliminate rows from being repeated after joining this table to a Fact Table. If your source data in normalized you need a way to transform it into the de-normalized form before joining it to a Fact Table. While you might be able to devise a solution using a recursive procedure, a database view, or other HANA coding, you should keep in mind that this type of de-normalization processing will be performed each time a dependent query is executed. Even with HANA and all its features, this is neither a scalable methodology nor recommended approach. Given the above example, the best solution is to utilize Data Services to extract the data from the source system, transform (de-normalize) the data, then load it into HANA. With this approach, you perform the work on a scheduled or near real-time basis but not each time the data is queried. There are other reasons that storing this type of data in its de-normalized form is advantageous on HANA. For more details you can review the SAP HANA Database developer’s guide .

Transformation Example 2

Another example where Data Services was needed occurred when a reverse pivot transformation was required. With normalized data it is highly likely that you find an example where multiple attributes of a single value are stored as multiple records. In the example below you will see that a single order has multiple status dates:

ORDER ID STATUS TYPE STATUS DATE
1 CREATED 1/1/2012
1 SHIPPED 1/4/2012
2 CREATED 1/5/2012
3 CREATED 1/5/2012
3 SHIPPED 1/8/2012
3 INVOICED 1/9/2012

If you join this table, in its native format, to the ORDER DETAILS table, you will again create a situation where the order amount is repeated for each status. In Data Modeling terms we can solve this problem by utilizing a reverse pivot transformation on the ORDER STATUS TABLE. Once pivoted the results will look like this:

ORDER ID DATE_CREATED DATE_SHIPPED DATE_INVOICED
1 1/1/2012 1/4/2012
2 1/5/2012
3 1/5/2012 1/8/2012 1/9/2012

Once the dates are pivoted on the ORDER ID axis we can be assured that there will be only one record for each ORDER ID. This will be very useful in situations where you need to combine these dates with other summarized order related metrics without affecting the overall cardinality of the results. Below is an example results sets.

ORDER ID DATE CREATED DATE SHIPPED DATE INVOICED Num Order Lines Order Amount
1 1/1/2012 1/4/2012 2 2345.45
2 1/5/2012 6 3456.90
3 1/5/2012 1/8/2012 1/9/2012 7 4567.19

Transformation Example 3

The final example revolves around the need to perform data quality updates or checks on data. There are no features or functions within HANA to perform data quality updates to source system data as it is loaded into HANA or queried from HANA. This can be very problematic if there are no constraints on the source system applications that populate the source data. There are several transforms in Data Services that allow developers to validate and consequently update data based on rules, master data, or other data sources. Imagine that you have an analytic that is based on city and month. If your city is derived from an address that is captured via a manual data input process and there are no validations being performed on the front end system, it is highly likely that there will be quality issues with your data. If you load the “dirty” or “trash” data into HANA, without validation, your analytic will contain multiple variations of city names. How valuable are the results of an analytic when they look like the following example?

Month CITY SALES
Jan Atlant 345,6787
Jan Atlanta_ 3,456
Jan antlanta 21,456
Jan Atlanta 1,345,678
Jan AtlantaGa 456
Jan Atlantas 234

In my opinion, just because SAP HANA is capable of performing calculations 3000 times faster than other legacy databases, does not mean that you can forgo the governance processes of data. I have seen postings and tweets on the internet where some seam to suggest that the rules of Data Governance and Data Modeling have been replaced by the capabilities of SAP HANA. If you are of that opinion, I would have to respectfully disagree. In terms of Data Governance and Data Modeling, SAP HANA is simply a platform to store massive amounts of multi-dimensional data and subsequently provide lightning fast access to that information. That means that other processes and tools must be leveraged in order to load quality data into HANA.

Now that I have highlighted a few examples of where SAP HANA is unable to transform data, I would like to describe a few scenarios where SAP HANA is able to de-normalize or transform data. When developing Attribute Views (Dimensions) or Analytic Views (Star-Schema Cubes) in HANA studio you are able to perform the following functions.

  • Join columnar tables on one or more columns
  • Filter columns to eliminate various results
  • Derive columns based on standard HANA SQLScript functions
  • Build Date and Time based Attribute Views based on system tables stored in HANA. This is similar to building a Data Dimensions, but HANA can manage the dates.
  • Create Measures using SUM, MIN, MAX or COUNT

When developing Calculation Views (Advanced Queries) in HANA studio you are able to perform the following functions.

  • Join columnar tables, attribute views or analytic views (Data Sets)
  • Union columnar tables, attribute views or analytic views (Data Sets)
  • Aggregate Data Sets
  • Filter columns and create derived columns using projection transforms
  • Create Runtime Filters and Input Parameters
  • Use Stored Procedures to fully manipulate the dataset. However, this will slow the processing down.

You will find that some of these options are very similar to the capabilities of the BusinessObjects Universe but they are implemented very differently. Depending on the normalized nature of your source system data, the above options will likely be sufficient to “model” the data for analytic purposes. However these options will not provide data quality mechanisms nor advanced transformation capabilities as discussed before. To give you a better idea of what an attribute view is let me give you a few examples. If your company is selling computers you would likely have attribute views that generate a list of the products and product categories. You would also likely find Attribute Views for Sales Dates, Sales Locations, Distribution Centers, Vendors and many other items. When developing Analytic Views you will find a similar set of capabilities but the goal of the Analytic View is to define the data foundation and link it to existing Attribute Views. The data foundation is effectively the table that contains the values that are measured. Again, if your company is selling computers and storing the sales transactions in the sales_order_table, the sales_order_table would likely be your foundation table. Within the data foundation you will define private attributes and measures. The measures will represent items like “Sales Amount”, “Quantity” and “Sales Tax”. You would then join the sales_order_table, utilizing the private attributes defined in the foundation, to existing Attribute View using defined keys. In the end you should have a star-schema logic view of the data that contains both Dimensions and Measures. When developing calculation views you will find a few more options to help calculate measures that might span multiple Analytic Views. There are also options to help you forecast measures. The Calculation View is very unique in that provides support for both graphical and script based calculations. The script based interface allows developers to utilize ANSI92 SQL or special “CE” functions on SAP HANA to generate views of the data. The CE functions are preferred because they are optimized for the parallel execution of the code. The graphical interface will allow developers to perform common calculations such as aggregating values from multiple Analytic Views that share common Attributes and forecasting Measures. The scripting engine allows developers to perform more complex calculations that con not be expressed graphically. The modeling capabilities of SAP HANA are very unique in that they are directly imbedded into the RDMS. Other products will separate the RDMS and OALP tools into different engines. For example, Microsoft SQL Server and Microsoft SSAS are separate engines. With SAP HANA, the two components are intertwined (In-Memory) to allow for superior query performance.

Hopefully after reading this posting you will have a better understanding of the modeling capabilities of SAP HANA. It is also my hope that you will understand the role that Data Services 4.0 plays in transforming data and the overall need for Data Modeling with SAP HANA.

Update (7/29/2013)

SAP HANA Live is a new rapid solution available from SAP that incorporates the use of SLT (Real-Time Replication) of data , pre-built multidimensional models and analytic content. If you are running Business Suite on SAP HANA, HANA live can be implemented without the use of SLT. (Directly on the tables). In many ways this is a different approach to provisioning SAP HANA then the one described in the blog posting. First, most of the models and content are already built by SAP. Second, because a SAP system is the source, the modeling capabilities within SAP HANA are able to handle most of the generic transformations. However, I have found that you sometimes need Data Services to help supplement these project when the organizations requirements don’t fit the mold of the pre-delivered content. This is especially true when 3rd part data is incorporated into the models. This is not to say that SAP HANA live does not add value, but rather a statement that not every organization is the same. I will try and create a new blog posting to discuss this in more detail.

22 comments

  1. This is great for someone looking to understand how SAP Hana can be used for non SAP BW data. One question on something you mentioned ” For the record, this article is centered on standalone SAP HANA and not BW Powered by HANA.” Understanding that to be the context, can you still comment on highlight what happens when a SAP BW solution moves to Hana. Do the analytic, attribute, Calculation views need to be created manually or do they get automatically generated somehow from SAP meta data.

  2. With BW powered by HANA, you can convert DSO and InfoCubes into “in-memory” optimized. BW does all the work for you and their is no need to use the HANA studio analytic modeling. As you suggested, in BW it is just metadata accessing the HANA in-memory columnar tables on the database. If you already have a strong foothold in BW, then BW powered by HANA is an easy path to upgrade to. If you are a legacy BOBJ customer (with little or no SAP ECC) then standalone HANA is a better path for implementing Data Marts or data models on HANA.

  3. That makes sense. thank you. And in the standalone scenario then, having implemented data models on HANA , one would expose the data model and underlying data through Business Objects Explorer ( am saying this having read another of your articles where you laid out the value proposition of Explorer 4.0 ? on top of HANA). Also wondered if the more typical Business objects semantic layer(Universe) could leverage HANA’s power or that would not be a good fit

  4. Hi Jonathan,
    It was a wonderful article to get a very good understanding of how modelling is done in HANA. This might be a silly doubt, Can i know which one is treated as multi dimensional view – calculated view or analytic view? and what are all the differences between an infocube in BW and analytic view in HANA?

  5. Calculation views are built on existing Analytic Views. Technically the Analytic View is the multi dimensional model and the Calculation View is a “calculated” representation of one or more Analytic Views.

    Compared to BW, analytic views are similar to Info Cubes however there is no actual data stores in a SAP Hana analytic view. If you have NW BW 7.3 powered by HANA you will also find that you can setup something similar with in- memory optimized DSOs and Cubes. The main difference being that BW manages the model and you do not have to use the SAP Hana studio to develop Analytic Views.

  6. Thanks Jonathan.

    I have some more doubts regarding Architecture part of HANA. What is the difference between undo and redo log. What are the phases of save points? What is the function performed in each phase of savepoint? Please throw some light on the above questions. Thanks in Advance.

  7. I’m not sure that level of detail is available but I have not experienced any issues with rolling back, restoring transaction logs or incremental backups. Based on my direct experience, it appears to work like most popular databases.

    You can read about the logging in more detail at: http://en.wikipedia.org/wiki/SAP_HANA#Logging_and_transactions

    There are a few general statements in the documentation:

    The Transaction Manager insures ACID compliance and the
    MVCC helps manage record locking

  8. Great article Jonathan and I’d really appreciate your thoughts on a data modelling aspect that you only touch on in passing. As an SAP partner we spend our time helping clients a bit further up the modelling process so to speak. By this I mean working out the specific tables and relationships that define the data in the SAP system that is to be loaded into HANA. Many of them find this a real “Discovery” challenge and yet there’s very little written about it in the Data Services, BOBJ and HANA world. How do you and your customers address this part of the process? Graham

  9. The simple and broad answer is based on the following:

    1.) Use Information Steward and Data Services to profile the data. This helps us to identify and confirm the business processes and where the data is stored. InfSteward does a good job of translating the technical names into English.

    2). Our experience in working with SAP Systems as a source over the past 15 years.

    3.) Work with the customer and develop spread sheets that outline the model and confirm the business processes. (Dims / Facts / Others). This helps us identify where the data lives in SAP and how they are using it.

    4.) Develop Mapping documents to show the full lineage of the data during the modeling process.Again, we seam to identify several items in the source system when working with the business users during this process.

    There are other ways to help identify the source data and relationships but I find the both data services and Information Steward do a good job of helping you start the process.

  10. Great insights Jonathan.

    I would like to get your opinion on creating a traditional data model like star schema against having a couple of highly denormalized tables with dimensions and facts in the same table (at the cost of high amount of duplication). I know this is a silly question but i was challenged by a colleague on my traditional approach. He was of the opinion to copy the table as is in highly denormalized form and HANA will take care of duplication and speed of retrieval for eg, For a travel related database — with TABLE1 [ name, from-city, to-city, stay_from, stay_to, hotel, transaction type, (15 more attributes and few facts) and total amount] And TABLE2[ around details of hotel and with their standard charges, cities with list of hotels etc etc], TABLE3[list agents by city with the hotels and apartments they manage including details of facilities etc]. Load them as is into HANA assuming the data volume to be of the order of 1million rows.

  11. There is value in the star schema approach in relation to reusability, manageability and conformity of Attribute Views to multiple Analytic Views in SAP HANA or even in a standard star schema data model. With that said, it is apparent that the columnar storage characteristics of SAP HANA are very sensitive to high cardinality joins between two or more tables. I don’t believe that the start schema is dead with HANA, however we might need to adjust our thinking with really large tables on a case by case basis.

    We have found that using the ETL process (SAP Data Services) to “Pre-Join” high cardinality tables before they are provisioned in SAP HANA can yield significant gains within query response times. A good example is Order Header and Order Detail tables merged into a large fact table. Granted, SAP HANA is fast in both normalized and denormalized situations, but the “super-table” recommendation can make it even faster when join costs is a factor. In some cases this means we are pushing dimensional attributes into the fact table. In other cases we are pushing header information into the detail table. Although, this can also apply to standard dimensions as well. With that said, we need to make sure that approach does not complicate other items such as Type II Slowly Changing Dimensions or the updating of attributes.

    I hope this answers your question.

  12. Also, it is true that SAP HANA’s compression will help reduce the storage for repeating values. Traditionally we build Dimensions to reduce the storage footprint. However, with SAP HANA, the columnar compression eliminates some of the redundancy.

  13. Jonathan,

    Its a wonderful blog discusses about the standalone modeling and use of data services for ETL purposes.

    I have couple of questions; please let me know when you get a chance.

    Best way of modeling: Please correct me if I am wrong, the SAP HANA LIVE is completely built on calculation view, there is no Attribute and Analytic views. I have got different answer why there is only Calculation view and there are no Alaytic view and Attribute views. We are in SP7 latest version. This is a brand new HANA in top of non-SAP (DB2 source). What is the best way to model this scenario, meaning, can we model everything in the Calculation view’s like SAP HANA live or do you suggest using the standard attribute, analytical and calculation views to do the data model. Is SAP moving away from AV & AT to only calculation Views to simply the modeling approach?

    Universe: If we are doing all the calculations in SAP HANA like RKF. CKF and other calculations , what is the point in having additional layer of universe , because the reporting compnets cam access the queries directly on views .In one of our POC , we found that the using universe affect performance.

    Documents on Analytic Modeling. I did not find many documents on analytic modeling specific to standalone HANA (on the web). Could you please direct to some of the additional information on Studio/standalone modeling.

    Real time reporting: Our overall objective is to give a real time or close to real time reporting requirements, how data services can help, meaning I can schedule the data loads every 3 or 5 min to pull the data from source. If I am using the Data services, how soon I can get the data in HANA, I know it depends on the no of records and the transformations in between the systems & network speed. Assuming that I will schele the job every 2 min and it will take another 5 min to process the Data services job , is it fair to say the my information will be available on the BOBJ tools with in 10 min from the creation of the records.

    Are there any new ETL capabilities included in SP7, I see some additional features included in SP7. Is some of the concepts discussed are still valid, because in SP7 we have star join concept.

    Thank you for the excellent information.
    Regards

    Ramakrishnan Ramanathaiah

  14. Hi Jonathan,

    A good article!

    I have a question regarding how HANA fits into a CIF landscape? My understanding from your article is HANA replaces the need for creating physical cubes, so each HANA schema is a super fast analytical data mart. But where is the EDW, or will there be no need for a integrated EDW any more? How about in a industry like banking, where data governance and quality is most important. Therefore, a bank would like to see the data in the management reports, financial reports (say running on HANA) are consistent with other regulatory reports coming from other solutions, like SAS. The data which is fed into SAS Banking solution should ideally come from the same EDW as what going into HANA, am I right? So my basic question is how Hana fits into/change traditional EDW architecture.

    Thanks

    Bing

  15. thanks for the blog. Does Data modeling in Hana is a replacement of Universe designing as both does the slicing and dicing data.
    Does Hana Trial allows the IDT connectivity. I am able to create schema and modeling through Eclipse luna Hana studio but IDT does not seem to connect using JDBC to hanatrial.ondemand.com.?
    Also Please advise the ETL options to replicate data the HANA cloud. ? Can Talend, SSIS do the trick what SAP BODS or SLT does. ?

  16. Hi..thanks for the informative blog.. I am new to SAP BI and SAP HANA. I have a query- why BW infocube is treated as calculation view when imported into SAP HANA when its basic structure matches with analytical view..one central table with dimensions on each side..

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.