Technology News

latest updates from easySERVICE™

How to Address Data Warehouse Optimization challenges

Introduction:

As data volumes grow, companies spend more and more on the data warehouse environment. The problem arises when capacity in the environment is consumed too quickly, which ultimately forces organizations into costly upgrades in storage and processing power.

Users now demand not only more complex and flexible analysis but also more timely information. Data must be available 24×7, and many business users demand that data supporting decision making be accessible within hours—in some cases, minutes or even seconds—of when an event occurs. Organizations also realize that the same data needs to be utilized by many different processes and thus many different workload profiles.

Ideally, organizations should keep detail data at the lowest form possible in a functionally neutral data model. This enables the business community to ask any number of questions from a wide range of perspectives and processes. The basic premise is that you can always aggregate detail data, but you can never decompose summary data. This does not imply that you should never use summary tables; rather, it means that you should not replace detail data with only summary data.

Importance of Data Warehouse Optimization

If the desired analysis requires detail data, using only summary tables will fail no matter the technical justification. Implementers and users must work together to understand the business requirements and what drives them; then they need to use the least intrusive process possible to meet those requirements.

The primary reason for building summary tables, adding indexes, enforcing priority and denormalizing data is to increase performance. If you could use any amount of data to ask any question and get an instant response, you would never need to build summaries or indexes. These structures function as a workaround to other limitations.

Their very existence requires more space, data management and time between the occurrence of an event and the ability to take action. The question is how to balance providing higher performance with minimizing data replication and management.

Query frequency and performance consistency are also important considerations. For example, 70% of one enterprise’s 1,000 weekly queries asked for summary-level questions against detail data. With a summary table, queries executed in approximately six seconds instead of four minutes, a difference of 2,730 minutes of processing time. Even factoring in the 105 minutes required each week to maintain the summary table, the resulting savings of 2,625 minutes per week clearly justified making a summary table.

Over time, this company found that usage shifted from resolving many queries with summary data to matching a majority of queries against detail data. With fewer queries benefiting from a summary table, it was simply dropped without affecting other processes.

How to Optimizing performance

The process of optimizing the data management environment should not be undertaken without understanding the consequences. It is important to remember that if the technology currently in place cannot support data capture, storage and usage at the lowest level, it would be wise to seriously examine that technology’s ability to support ongoing business needs.

When optimizing performance, the starting point is having a clear grasp of two aspects of the system. First, what is the unaided performance? The database may be powerful enough that executing the query against the tables without additional indexes or denormalization gives the required response time. This raises the second aspect.

If you are going to escalate up the ladder of performance optimization, then you must understand the value of the performance being requested. What business outcome is changed because the response time goes from one minute to 10 seconds? If that question cannot be answered, then stop all your efforts until you can justify the expense of added optimizations. If it can be answered, the IT group should follow a step-by-step approach to balance cost of performance with benefit of analytic output.

The seven steps of optimization

  1. Start with a neutral data model that is normalized and not specific to any function or group
  • Provides the lowest level of detail needed to support business objectives
  • Addresses future needs via the level of detail defined/established
  • Uses views that define detail tables for security and concurrency
  • Enables access views that allow selects (read without intent or dirty reads) against tables that are being updated to avoid locking issues.
  1. Implement views to ease navigation
  • Helps tools create better structured query language (SQL) using third-party tools and manipulating views
  • Provides the structure that front-end tools may need to create optimized SQL and eliminate the dependence on the front-end tools understanding each and every database on the market
  • Incorporates optimized join techniques in the views
  • Integrates a layer of security to the tables
  • Offers each business unit its own logical functionalization—presenting data in the way users want to see it
  • Allows data to be stored once and then converted through views, reducing data redundancy, ensuring consistency and simplifying data management
  1. Add indexes, which can range from simple techniques such as secondary indexes to complex structures such as covered, join or aggregate indexes
  • Provides a major advantage in that the system maintains them concurrently with base data table
  • Attaches some overhead to the indexes, so understand the impact and trade-offs and identify them in a cost versus benefit analysis
  1. Enforce prioritization using the database prioritization framework to secure critical workload with resources when necessary
  • Optimizes index and view structures once the SQL has been optimized
  • Provides better use of available capacity but not more capacity
  • Offers better understanding of the workload
  • Becomes most noticeable when system is congested; it is for these congested times that you are prioritizing work
  • Allows business community to provide priority direction
  1. Proceed to rational summaries and denormalization only when the need for higher performance outweighs analysis flexibility
  • Follows a strategy of coexistence rather than replacement, retaining detail tables for extended analysis while adding denormalized structures
  • Encompasses multiple needs and can serve a variety of users and departments, justifying cost of management
  • Satisfies the greater good of the user community rather than a sole functional purpose
  • Results in data volume and volatility playing an important part in cost-benefit analysis
  1. Consider “irrational” summaries and denormalization to customize data models for specific business processes and/or reports
  • Facilitates when desired performance outweighs the demand of flexible analysis and a structure is needed to address a single function
  • Includes the higher cost of data management and resources when considering the cost-benefit analysis
  • Minimizes impact by maintaining the underlying detail data; when conditions change in the future, this irrational structure can be eliminated with little impact

If you still have not achieved the desired performance then there is one more step. However, it has a relatively higher cost. So before proceeding, make sure the anticipated business value will outweigh the cost of the additional data movement and management required to keep the extracted data in agreement with detail data.

  1. Explore, expand and export
  • Explore the business usage of the answer sets and validate that the change in business outcome will drive the expected revenue increase or cost decrease, depending on the application. Are the business users willing to stand behind the expectations, and does the enhanced performance justify the cost?
  • Expand the current platform. Is the performance requirement so critical that new capacity is warranted?
  • Export the data from the main data warehouse to an application-specific platform. In this situation, a dedicated environment tuned specifically for its application will provide much more control over the individual application. Keep in mind all of the cost of the duplicated data, added time lag to action, and the cost of a new platform and software environment that will need to be managed and supported.

Conclusion:

Taking these seven steps requires understanding the cost involved with each step and the benefits derived from doing so. It also requires making decisions that support long- and short-term needs. In some cases you may create summary tables or add denormalized data models that you will drop eventually as the functions evolve over time. This is acceptable as long as eliminating the tables does not cause interruptions or massive application changes. One way to ensure this is to refrain from using the summary or denormalized tables as the input to more downstream applications whenever possible.

When applying the seven steps, perform cost-benefit analysis for each proposed step, and include physical aspects such as disk space, resources to manage the structure, and lost opportunities due to time delays to maintain the process. Improvements may be seen in:

Query performance and opportunity gained from faster response

  • User concurrency rates
  • Throughput
  • User satisfaction and productivity
  • Utilization of third-party tools
  • Resource consumption used to execute a query

Most data warehouse implementations focus on getting something up quickly. The longer-term goals of analysis and flexibility are overlooked in the rush to provide initial value. To reach the goal of having phase one meet immediate needs, many quickly jump to summary tables and functional models. But taking a relatively short amount of time to consider the data warehouse’s real purpose, and building a correct foundation for it, the IT group can assure a better future for the data warehouse, and thus the entire user community.

The bottom line is that the implementers, not the user community, should decide whether to use a summary table or functional model. On the other hand, the users, not the IT group, must determine which business capability needs must be delivered to drive business opportunity.

The more you move or manipulate the data into answer sets and push data to process, the more inflexible the environment becomes to meeting future needs. The better approach is to push process to data and work on simplifying the overall architecture by minimizing data movement and maximizing data reusability.

If you’d like to discuss any of the above best practices or lessons learned with us or to learn more about how we are partnering with companies just like yours to ensure the availability of mission-critical data, please contact us at (855) US STELLAR.

Advertisements

One comment on “How to Address Data Warehouse Optimization challenges

  1. StellarPhoenixS
    January 28, 2015

    Reblogged this on Stellar Phoenix Solutions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on December 30, 2014 by in Applications, Big Data, Data Storage and tagged , , , .
%d bloggers like this: