latest updates from easySERVICE™
As data volumes grow, companies spend more and more on the data warehouse environment. The problem arises when capacity in the environment is consumed too quickly, which ultimately forces organizations into costly upgrades in storage and processing power.
Users now demand not only more complex and flexible analysis but also more timely information. Data must be available 24×7, and many business users demand that data supporting decision making be accessible within hours—in some cases, minutes or even seconds—of when an event occurs. Organizations also realize that the same data needs to be utilized by many different processes and thus many different workload profiles.
Ideally, organizations should keep detail data at the lowest form possible in a functionally neutral data model. This enables the business community to ask any number of questions from a wide range of perspectives and processes. The basic premise is that you can always aggregate detail data, but you can never decompose summary data. This does not imply that you should never use summary tables; rather, it means that you should not replace detail data with only summary data.
Importance of Data Warehouse Optimization
If the desired analysis requires detail data, using only summary tables will fail no matter the technical justification. Implementers and users must work together to understand the business requirements and what drives them; then they need to use the least intrusive process possible to meet those requirements.
The primary reason for building summary tables, adding indexes, enforcing priority and denormalizing data is to increase performance. If you could use any amount of data to ask any question and get an instant response, you would never need to build summaries or indexes. These structures function as a workaround to other limitations.
Their very existence requires more space, data management and time between the occurrence of an event and the ability to take action. The question is how to balance providing higher performance with minimizing data replication and management.
Query frequency and performance consistency are also important considerations. For example, 70% of one enterprise’s 1,000 weekly queries asked for summary-level questions against detail data. With a summary table, queries executed in approximately six seconds instead of four minutes, a difference of 2,730 minutes of processing time. Even factoring in the 105 minutes required each week to maintain the summary table, the resulting savings of 2,625 minutes per week clearly justified making a summary table.
Over time, this company found that usage shifted from resolving many queries with summary data to matching a majority of queries against detail data. With fewer queries benefiting from a summary table, it was simply dropped without affecting other processes.
How to Optimizing performance
The process of optimizing the data management environment should not be undertaken without understanding the consequences. It is important to remember that if the technology currently in place cannot support data capture, storage and usage at the lowest level, it would be wise to seriously examine that technology’s ability to support ongoing business needs.
When optimizing performance, the starting point is having a clear grasp of two aspects of the system. First, what is the unaided performance? The database may be powerful enough that executing the query against the tables without additional indexes or denormalization gives the required response time. This raises the second aspect.
If you are going to escalate up the ladder of performance optimization, then you must understand the value of the performance being requested. What business outcome is changed because the response time goes from one minute to 10 seconds? If that question cannot be answered, then stop all your efforts until you can justify the expense of added optimizations. If it can be answered, the IT group should follow a step-by-step approach to balance cost of performance with benefit of analytic output.
The seven steps of optimization
If you still have not achieved the desired performance then there is one more step. However, it has a relatively higher cost. So before proceeding, make sure the anticipated business value will outweigh the cost of the additional data movement and management required to keep the extracted data in agreement with detail data.
Taking these seven steps requires understanding the cost involved with each step and the benefits derived from doing so. It also requires making decisions that support long- and short-term needs. In some cases you may create summary tables or add denormalized data models that you will drop eventually as the functions evolve over time. This is acceptable as long as eliminating the tables does not cause interruptions or massive application changes. One way to ensure this is to refrain from using the summary or denormalized tables as the input to more downstream applications whenever possible.
When applying the seven steps, perform cost-benefit analysis for each proposed step, and include physical aspects such as disk space, resources to manage the structure, and lost opportunities due to time delays to maintain the process. Improvements may be seen in:
Query performance and opportunity gained from faster response
Most data warehouse implementations focus on getting something up quickly. The longer-term goals of analysis and flexibility are overlooked in the rush to provide initial value. To reach the goal of having phase one meet immediate needs, many quickly jump to summary tables and functional models. But taking a relatively short amount of time to consider the data warehouse’s real purpose, and building a correct foundation for it, the IT group can assure a better future for the data warehouse, and thus the entire user community.
The bottom line is that the implementers, not the user community, should decide whether to use a summary table or functional model. On the other hand, the users, not the IT group, must determine which business capability needs must be delivered to drive business opportunity.
The more you move or manipulate the data into answer sets and push data to process, the more inflexible the environment becomes to meeting future needs. The better approach is to push process to data and work on simplifying the overall architecture by minimizing data movement and maximizing data reusability.
If you’d like to discuss any of the above best practices or lessons learned with us or to learn more about how we are partnering with companies just like yours to ensure the availability of mission-critical data, please contact us at (855) US STELLAR.