Datamarts are powerful tools for corporate departments, but you need a strategy to keep them out of each other’s way.
They’re like snap-on plastic building toys: Solid. Unbreakable. You can snap a datamart onto the main latticework, and you’ve created a new and profitable corner of your world. Remove one and the rest of the structure is hardly affected. It can be whatever you want it to be. And it’s so easy, a CFO can put one together.
“The overall structure is somewhat complicated, but the end user gets a Fisher-Price toy,” says Jim Curran, senior VP of MIS at State Street Boston Corp., which performs custodial services for the world’s biggest pension funds and institutional investment portfolios.
“The users are generally not technical people, so we had to build something powerful and still avoid the usual pitfalls of datamarts,” Curran says.
The usual pitfalls of datamarts can cost a fortune. Departments pull data down from a warehouse, or in many companies from an operational database system, and slice and dice the numbers to suit their own needs. In today’s decentralized environment, such data quickly becomes entangled in a confusing web of meaning and non-meaning. Is Smith on Main Street the Smith Company or Dr. Laura Smith? Did we sell them a single shipment of white paper for $1,000 last year, or 200 separate reams for a total of $1,0007 Or was it a “white paper,” as in “report?”
It gets even more complicated when a business’ assets are as high as State Street Boston’s: $2.7 trillion in custody and $245 billion in investment portfolios. With these numbers, the metaphor of a toy may be scary at first, but datamarts that can be set up easily are just what the banker ordered.
To get all these seemingly contradictory qualities in one system, State Street in 1994 purchased the Essbase OLAP tool from Arbor Software of Sunnyvale, Calif. At the time, State Street was running off multiple mainframe databases and data was extracted and manipulated with Eicon Access and other tools. Analysts parsed the data into spreadsheet formats and then into reports. Budgeting was time-consuming and pricing of services lagged behind market information. Financial officers couldn’t go directly to their data–they had to go through a brigade of programmers, analysts, and administrators whose task was to keep the manipulated data of each department from poisoning the raw data of the corporate decision-support systems. Consequently, managers had problems forecasting profitability from month to month. They needed to get closer to their data.
Despite his title, Curran isn’t a technical person by training. He became VP of information systems when he introduced OLAP computing to State Street just two years ago. At the time he was a financial analyst for the company. But he is now keeper of the corporate datamarts for some of the most powerful users in the world.
One of them is John Spencer, controller for the Global Investment Services Group within State Street. “Clients purchase a fairly sophisticated package of services and we have to keep a close watch on the relationship between services offered and profits,” he says.
Spencer’s department uses OLAP templates to drill all the way down to client-level records. Every 30 days, he and his financial staff can see which clients are using what percentage of which services.
“It’s a fast car,” Spencer says. While “a few people can figure out how to use it by doing it,” Spencer says, “we still need more training and we’re in the process of taking the training.” Meanwhile, “we also can hit the database with standard Excel spreadsheets.” Other users tap into the data through Microsoft or Lotus spreadsheets or through Pilot Designer EIS, the front-end product State Street used before it began using the Essbase client.
State Street now has more than 20 Essbase databases totaling about 7GB sitting on a Windows NT twin processor Compaq ProLiant server with 20GB of storage capacity.
State Street keeps the data clean by using only Essbase databases. They are fed operational data from SQL databases and from a general-ledger database built on Computer Associates’ Datacom software sitting on an IBM mainframe. Only a few connections require a human expert to scrub and prepare data. “The back end of this operation is extremely clean,” says Curran. By the end of 1997, Essbase will let users store data in a relational datastore or in the native OLAP Essbase.
One of the early popular approaches to multiple datamart building was to build a huge central data warehouse–as opposed to a distributed data warehouse–and allow departments to create datamarts from that. “But then horror stories started to come out about bad data,” says Kirk Cruikshank, VP of marketing at Arbor Software. “That’s the kind of thing that made Jim Curran go for the distributed datamart model.”
What’s a datamart?
Experts would call State Street’s datamart architecture the perfect setup designed to keep departments from bumping into one another as they create their own versions. In a water-shed article in Datamation (“Are multiple data warehouses too much of a good thing?” April, p. 94), datamart pioneer W.H. Inmon notes that typical datamarts contain “both summarized and customized data that reflect the individual tastes and needs of the sponsoring departments.” Inmon contrasts that with the “current-level detailed data warehouse,” which is a central storage facility.
“Many sites call a small data warehouse a datamart and this is the beginning of their trouble,” says Dave Gleason, director of authorings and methods at the Information Consulting Group of Oakbrook Terrace, Ill.-based Platinum Technology. In this scenario, the user organizations often build their datamarts before the corporation has built its warehouse, a guarantee of future data clashes and errors. “The idea of building a datamart before you start a warehouse can be good only if the datamart builders commit to some baseline level of data architecture,” Gleason says.
At LTV Steel Corp. in Cleveland, multiple datamart building is architected tightly by using a single product–SAS Institute’s SAS/Warehouse Administrator–end to end. Over the past five years, data warehouse chief Robert Scharl has built 18 datamarts on a 40GB warehouse. LTV says it has realized much more than a 100% return on investment with the SAS warehousing and datamining tools. “We saved more than $16 million in just a few projects,” Scharl says. The one-product architecture has contributed to the bottom line more than anything else.
“Initially I looked at what Oracle and Sybase had to offer and I realized it would cost an extra half million dollars right from the start,” says Scharl. Much of the extra expense would have been the cost of administering an environment that, while more open than the single-product architecture, would have been more complex to manage.
Use any tool
Platinum’s Gleason says that in order to keep data clean across multiple datamarts, “what is critical is that you have a set of enterprise reference data. It follows the dimensional modeling approach: the set of codes and decodes I am going to use and the dimensional decompositions–this is called dimensional hierarchy.”
In Gleason’s theoretical environment, if you build a West Coast sales database and a Northeast sales database you can use results from both to tally sales for the whole group by applying the same definitions to terms for products and services, even if the datamarts are built on different tools and databases and have different levels of granularity.
At Ryder, the Jacksonville, Fla.-based truck rental company, data confusion has never been a problem because the underlying data warehouse is distributed. “The datamart strategy came out of our group,” says Jim Sutter, data warehouse administrator for Ryder. “We insisted on naming considerations and data typing that were uniform.”
Ryder also chose mission-critical business projects to lay the groundwork for the datamart strategy. “Initially we built datamarts to help follow the repair records on our trucks,” says Sutter. “It’s a lot cheaper to pull in a vehicle for routine maintenance than it would be to provide emergency roadside assistance.” Later, datamarts told Ryder where tracks were located so a dealer who was running low could quickly find the nearest rental site where trucks were sitting idle.
Ryder rents data
The heart of the Ryder datamart system is a decision-support system distributed across more than 80 AS/400s throughout North America. The initial data warehouse on which this was built was subject-oriented, fully integrated, and read-only to the end users. That was three years ago.
“We learned over time to give the end users more freedom to use the data,” Sutter says. But having an enterprise data model gives Ryder’s central IT mariagets control over corporatewide data.
Ryder’s main database is a DB2 common server running on UNIX. The individual datamarts are built with Visual Warehouse, IBM’s datamart software, and run with the VW Administration set.
Since Visual Warehouse has FTP capabilities, Ryder is now building metadata for the Web so that departments can create instantaneous datamarts on the corporate intranet. One result of this, Sutter hopes, is that departments will be less likely to drop below his radar and set up their own renegade architectures.
“Every once in a while someone builds something to answer a question practically on the fly,” Sutter says. “Typically, the ones built outside the rules die on their own real quickly. But we don’t worry about that.” The few that survive will show up sooner or later, but Ryder can make a copy of a renegade datamart, clean up the copy to conform to the enterprise model, and run the copy parallel to the renegade datamart.
Ryder’s ability to build parallel datamarts to outmaneuver any renegades is largely due to the rapid and relatively inexpensive development permitted by the IBM tool. “For $50,000 you can have a major-league datamart up and running in a month,” says George Zagelow, IBM’s business intelligence manager.
Ryder uses IBM’s DB2. Other Visual Warehouse customers deploy Oracle7, CA-Ingres, and Sybase plus IMS and assorted other flat files.
At State Street Boston, Curran has built 20 datamarts in two years. “I started with Essbase as sort of a project,” he says. “I wrote a lot of the earliest databases with no outside help. If the tools are easy to use, you can build lots of datamarts and manage them.”
And Arbor says it is going to get better soon. It refers to relational/OLAP datamart building as the third wave of data management. “It will allow those who already have made a big investment in relational data to have a foot in each camp–OLAP and relational,” says Arbor’s Cruikshank.
To the end user, this will once again appear to be as simple as playing with snapon building blocks.//
Innovative datamart strategies
Vendors and corporate customers alike have produced a range of solutions for multiple datamarts in recent months.
IBM and Sybase have teamed up to offer a Data Mall that ships on an RS/6000 system. A data malt is a data warehouse and datamarts that come ready, out of the box, for your data. The package provides network analysis tools in an effort to overcome performance problems inherent in multiple datamarts. Sybase Enterprise Connect middleware comes with rules for building a datamart out of the central data warehouse, thus automating the techniques perfected by organizations like Ryder.
Oracle’s Data Mart Suites, which just started to ship at press time, include a variety of tools from partner organizations, all aimed at organizing a multiple data-warehouse strategy. For instance, the Oracle package will ship with tools from Menlo Park, Calif.-based Sagent Technology that allow administrators to move data by linking a series of icons. The linked icons create data-flow plans that show which data is flowing from which source. Upon approval of the data-flow plan, the administrator can point and click at a starting icon and begin the data-moving process. This is designed to allow for rapid prototyping and deploying of databases.
Sagent’s data-flow facility is also shipping with Prism Solutions, long a presence in the high-end datamart business. Officials at Sunnyvale, Calif. -based Prism say they will use the Sagent software to provide workgroup datamarts that can be built in hours. As with SAS, Prism takes the single-product road toward datamart integration. Prism’s Scalable Data Warehousing Solutions package the software in five different ensembles, ranging from terabyte data warehousing to multimegabyte datamarts.
Informix, meanwhile, is shipping its NewEra database environment with a variety of tools for multiple datamarts, including Woodcliff, N.J.-based Computer Systems Advisors’ new Silverrun software for forward and reverse engineering of departmental warehouses and large-scale datamarts from Informix data. Users of the Silverrun workbench can model a workgroup repository from Informix data, allowing the data warehousing group to rapidly build departmental datamarts from the same central database, thus controlling business rules and data definitions.
Innovating in a manner similar to that of the IBM Visual Warehouse group, Computer Associates in April announced Workgroup OpenIngres, a DBMS that ships on NT on Intel platforms, NT on Alpha, SCO OpenServer UnixWare, and Sun Solaris for Intel. The idea is to provide a scalable database upon which a warehouse could be built at the enterprise level or a datamart at the workgroup level. Data from that workgroup datamart eventually could be blended with the enterprise system to maintain data definitions and rules. Workgroup OpenIngres ships with OpenIngres Internet Commerce Enabled (ICE) to permit distribution of datamarts across intranets and extranets.
Informatica of Menlo Park, Calif., which began shipping its PowerMart distributed datamart environment in 1996, builds the data warehouse from the variety of datamarts. Using the Informatica Metadata Exchange (MX) architecture, the IT department can integrate data from a variety of OLAP tools, including Andyne, Brio, Business Objects, Cognos, Information Advantage, Infospace, IQ Software, and Microstrategy. The Informatica approach assumes that departments will build their data warehouses independently, and thus bases the future distributed warehouse on the company’s various datamarts.
Vmark Software’s Datastage also builds its environment from disparate datamarts.
NCR, a leading outsourcer of data warehousing projects, comes into the picture mostly after the warehouse is built, insisting that IT rationalize the entire enterprise-database environment. The data warehousing model of NCR layers data types: operational source data on the upper layer, followed by data transformation, followed by a data warehouse built from the “corporate memory” (operational databases), followed by a single RIDBMS that contains a single version of the truth. Against this, all disparate data warehouses are examined and catalogued so that the technology, no matter how innovative, never gets in the way of business.
According to NCR senior consultant Rob Armstrong, “An important aspect to understand about the architecture is that the information access and datamining tools need to access both datamarts and the enterprisewide data. If the tools do not have uninhibited access to the correct levels of data, then the warehouse itself is severely limited in its capacity to provide value to the end user.”