by Craig Lagrow

Java fires up its database engines

news
May 1, 199623 mins

"Binding" database functionality to your Web site, with a little help from Java

While young Bob Epstein was taking notes in his computer science class at UC-Berkeley, little did he know that one day he would compete head-on for leadership in the database-software market with his own professor, Michael Stonebraker.

Epstein graduated and went on to found Sybase, a company whose database sales are exceeded only by those of Larry Ellisonโ€™s Oracle.

Professor Stonebraker, after inventing one of the industryโ€™s first relational database called Ingres, today challenges the dominance of both Sybase and Oracle with his feisty new startup โ€” Illustra.

Envisioning an entirely new type of database architecture โ€” one that made absolutely no assumptions about the type of data it would contain โ€” Bob Epstein isnโ€™t the only one studying Professor Stonebrakerโ€™s latest invention. The database industry is scrambling to mimic the extensibility that Illustra (now a subsidiary of Informix) has pioneered in a revolutionary new database architecture.

For those serious developers who want to โ€œbindโ€ database functionality within and underneath their Web siteโ€™s content, there are some powerful new tools from these three companies and others. And the good news for JavaSoft and its partners is that Java promises to play an integral role in all of them.

Relational databases are good at handling the stuff that fits neatly into spreadsheets, like short strings of text and numbers. But Web sites will soon have more complex data types like VRML, video, animation, and sound.

Professor Stonebrakerโ€™s vision was to build an architecture that was infinitely extensible โ€” one that could empower developers to create types, functions, and rules for new data types that would be too difficult to manage with conventional database tools.

This month in The Cyberstruction Zone, letโ€™s take a look under the hoods of Illustra, Sybase, and Versant to see where the most progressive of the database engine guys are headed.

A two-minute history of RDBMS on the Web

Ninety percent of all the content on the World Wide Web resides within simple, hierarchical filesystems.

Web sites today are emerging out of the โ€œdesign & buildโ€ phase and into the โ€œmanage & planโ€ era. Integral to this migration will be the successful integration of Web-database engine technologies. (See Figure 1.)

If Web sites are going to be anything more than online marketing brochures presented in HTML format, developers need to learn about โ€œbinding,โ€ the marriage of Web content to underlying relational- and object-oriented database engines.

Parallel to Web site evolution from Design to Planning states is the migration from a free to a commerce-driven paradigms (see Figure 2), and each step involves a further degree of sophistication in the underlying database architecture for a site.

As time unfolds, weโ€™re going to see the expression of this maturation beginning with online catalogs, growing into online publishing, and eventually into fully-functional, commerce-driven site environments. (See Figure 3.)

Most of the popular relational database architectures have a 25-year-old structure which was a dramatic improvement upon flat-file storage of simple filesystems. In their simplest form, a relational database has an architecture that looks like this:

PARSER

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€“

OPTIMIZER

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€“

FUNCTION MANAGER

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€“

STORAGE MANAGER

โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€”โ€“

DISK STORAGE

What makes one RDBMS better than another is how tightly bolted together these layers are. The more efficient the design, the less โ€œlatencyโ€ that will slow down the system.

Once you start introducing new data types into an RDBMS engine (like sound and animation), the storage manager starts working overtime to pull out the files from responses to SQL queries. This affects the latency dramatically, and many RDB systems break down.

Sybaseโ€™s and Oracleโ€™s engines (Figures 4 and 5, respectively) have not been fundamentally rebuilt (particularly in the case of Oracle) for a long time; the Informix code, on the other hand, was completely rewritten and redesigned in 1991, a significant advantage to the team of Informix programmers who are now working to integrate Illustraโ€™s Web-oriented engine.

Multi-blobs of complex objects within Web content that relational databases donโ€™t understand are easier for a tool like Illustra that was designed from the ground up for data type extensibility. (See Figure 6.)

Only a few Web database architectures can boast a tight coupling of its relational engine to its Web content: Navisoft (a subsidiary of America Online that uses an architecture based on Illustra) and Netscapeโ€™s LiveWire Pro (based on Informixโ€™s architecture) are among the few examples today.

โ€œThere are no turn-key solutions out there, and I wouldnโ€™t consider LiveWire Pro to be turn-key at this juncture,โ€ said Bill Ray, director of digital media distribution at Illustra. LiveWire Pro is a development environment that is early in its evolution. Two things to keep in mind: It is bundled with Informixโ€™s Online Workgroup Server, and Netscape couldโ€™ve chosen Oracle or Sybase instead.

Finally, there is legitimate concern on the part of those companies who want to connect their large, back-office production databases to the World Wide Web. Security is only the first of many issues that face companies with decades worth of proprietary data behind their corporate firewalls.

Jumpstarting Java for Web databases

Someday, a SQL server will be able to externally call a Java-stored procedure, and Java-enabled browsers will have functionality that take dynamically stored data into a new realm. Today, however, server-side Java applets (Sybase engineers call them โ€œdataletsโ€) can call SQL server databases using Sybaseโ€™s OpenClient and the ObjectConnect frameworks. Take a look at Illustraโ€™s demo called โ€œUMVโ€ on www.illustra.com.

All the companies interviewed for this column agreed that Java will soon become the tool of choice for creating โ€œhooksโ€ that will allow HTML-formatted user interfaces to passively or actively capture clickstream data from online Web travelers.

While both Microsoft and Oracle have licensed Java, itโ€™s not clear whether they and others will eventually chose to implement it in their Web-database engine technologies.

The following questions remain: At what point is it safe to build a server-side commercial application with Java? And at what point will there really be portability to platforms other than Sun or Intel/Windows? When will Apple, HP, IBM, and DEC jump on board the Java train? With at least five hardware vendors designing Java chips, how will this affect portability and Javaโ€™s role in speeding up queries to Web databases?

โ€œMicrosoft certainly has the ability to set standards,โ€ said Rich Mironov, director of Sybaseโ€™s Internet products group. โ€œI think theyโ€™re going to twist OLE very hard, and weโ€™ll have [at least] two standards: plug-ins for Netscape (NSAPI) and OLE for Windows (ISAPI). And maybe one more. Microsoftโ€™s standards will always be there, itโ€™s just a question of whether they will dominate. But the Netscape folks have been reading the Microsoft business plan for years, and who knows where Netscape will go next?โ€

Most of the database vendors interviewed are concerned that premature implementations of Java could backfire.

โ€œUntil thereโ€™s a stable, efficient Java compiler on more than a few platforms, our customers are not willing to gamble on untested code,โ€ said Mironov. Because of the marketโ€™s pressure and need for a solution like Java, he further stated that he believes Java will mature in a brief 2-year cycle, and not the average 10-year cycles of other new technology standards.

โ€œMost of our customers are going to experiment rather than implement, except where they are doing media-oriented or client-oriented stuff where the sizzle is the important thing,โ€ said Mironov.

C was created and promoted by AT&T, which had no hardware platform to worry about. Java is being pushed by Sun, which has a lot of mixed incentives. HP and IBM might be forced to do Java, for example, but will they like it?

โ€œJava is still only 50 percent as fast as C at best,โ€ said Mironov. โ€œThe reality is going to be limited until there are native compilers that optimize and speed things up a lot more.โ€

Microsoft doesnโ€™t have to be the company that writes a third-party Java implementation for Windows, but itโ€™s instructive to learn from Microsoftโ€™s success in bending the SQL Access Group sufficiently so that they helped create a Microsoft-owned ODBC standard. After the standardization committees disbanded, everybody else had to follow this direction because Microsoft had put its official stamp of approval on ODBC.

The Microsoft strategy is very clear and very well established: Every year, the companys take one new product area and make it cease to exist. The next target may be to take the DBMS market and the Web server market and make them an integral part of Microsoftโ€™s operating systems.

โ€œThe word on the street is that the next thing Microsoft will do with its OS will be to make its browser more integral,โ€ said Mironov. โ€œYouโ€™ll be able to look at all your files with the browser and itโ€™ll be more seamless whether you are actually on the Net or not.โ€

โ€œCompared to Java, Perl and CGI create a great big hairball of code that you have to maintain,โ€ said Bill Ray of Illustra. โ€œTheyโ€™re both slow, theyโ€™re insecure, theyโ€™re middleware, and theyโ€™re difficult to manage. You can write two stupid scripts to get data in and out, or you can write one smart one to do both.โ€

Some drawbacks to CGI and Perl: Theyโ€™re less efficient, the scripts can be difficult to manage and maintain, the middleware layers are potential security holes; and there is process overhead associated with CGI programming.

Illustraโ€™s Web DataBlade eliminates the need for Perl programming. Web content and code are both stored in the database itself, eliminating the middleware level. Web DataBlade introduces an application page paradigm, which is executed and parsed by a database server function, WebExplode, which takes an application page as an argument. Application pages can contain HTML, SGL, and Java; and, since they are objects managed by Illustra, they can be protected, reused, and combined with other application pages in building Web-based applications.

In contrast, Sybase thinks more highly of Perl, at least for now. The first version of Sybaseโ€™s web.sql will include the OpenClient tool, which gives developers the ability to get access back to the whole back-office production database securely. However, it is via a Perl interpreter that all this stuff is made accessible. According to Sybase engineers, however, the second version of web.sql will become more โ€œJava centric.โ€

โ€œWould you be willing to put your 0 million production database on the hook with somebodyโ€™s compiler thatโ€™s really only had 200 person-hours of testing?โ€ asked Mironov. โ€œWeโ€™re going to do our next version of web.sql in Java probably, and it will access Java applets and have all the appropriate underpinnings so that you can get to Javaish interfaces for object connecting on a SQL server. But weโ€™re under the classic pressure of โ€˜get something done so that our customers can get real work done today.โ€™ โ€œ

While the Perl language is only 8-bit, and while itโ€™s not a real-time architecture, it has been in use by Web developers for a long time, and a lot of people know to use it.

Illustra pushes the envelope

As a small, start-up company trying to gain traction against Oracle, Sybase, Ingres, and Informix, the only way Illustra can compete is to narrowly focus and try to identify those areas where they can play ball with a competitive advantage.

Illustraโ€™s architecture has sought to โ€œhave its cake and eat it too.โ€ From relational design, it borrowed functions like row and column SQL querying, commit, rollback, use, grant, and revoke; and from object-oriented databases, it borrowed polymorphing, inheritance, and type extensibility, and is heading into the upper-right portion of the graph as presented in Figure 7.

Before it decided to purchase the Illustra company, Informix had a project nicknamed the โ€œIllustra Killer Project.โ€

After doing the โ€œbuy vs. buildโ€ calculus, the executives at Informix decided to acquire Illustra to build upon the toolโ€™s extensible architecture. Previously, Informix has made its mark on the strength of its scalability, parallelization of queries, manageability, and performance. Today, Informix and Illustra engineers are working frantically to couple the two database engines, with the target completion date of year-end 1996.

โ€œAt Illustra, we are building on the databaseโ€™s inherent extensibility and providing the connectivity to the adjacent technology that you need to build these second- and third-generation sites,โ€ said Ray.

As the web migrates from โ€œstate-lessโ€ to โ€œstate-fullโ€ Web applications, and from โ€œpublish & subscribeโ€ to โ€œclient/serverโ€ architectures, they must learn to develop small modules that are going to surround where the Web server used to be. Figure 8 illustrates the kinds of functions and tool vendors in these two application class areas.

โ€œWeโ€™re competing in certain โ€˜core servicesโ€™ areas,โ€ said Ray. โ€œWe want to give developers reusable code that provides a framework upon which we can build turn-key templates.โ€ Figure 9 illustrates the underlying architecture for Illustraโ€™s strategy.

Illustra is currently delivering three layers of technology to Web developers. The first layer is a completely extensible database engine that makes no assumptions about the types of data to be managed. The next layer is the Web DataBlade itself, which facilitates the connection between Illustra and any Web server. On top of these two layers are application templates, which are managed collections of application pages available for customization for a specific client implementation.

Illustra refers to a new breed of specialized, add-on DataBlades as tools that slice and dice a database in unique ways, similar to how class libraries operate. โ€œYou take the complexity out of the application, and let the DataBlades execute directly within the engine,โ€ said Ray. DataBlades (see Figure 10) will let developers define functions that โ€œteachโ€ their databases how to deal with user-defined data types.

Further, DataBlades allow end users and developers to innovate as opposed to waiting for Illustra to add support for a specific data type. Anyone can develop DataBlades. Illustra builds DataBlades that are horizontal in nature (e.g., TimeSeries for the financial markets, 2D and 3D for geospatial markets, etc.), but third parties can also provide DataBlades (e.g., PLS for text retrieval and Verity for searching).

When a developer creates a โ€œtypeโ€ in Illustra, what heโ€™s really doing is inserting a type definition into a type meta-table, and when he defines a function, itโ€™s very similar.

Illustra was the content-management system for photojournalist Rick Smolanโ€™s recent experiment, 24 Hours in Cyberspace. Illustra designed and built the digital editing system and managed all content (text, images, audio) for the site. There were three phases to Smolanโ€™s experiment:

  • Collection โ€” e-mail submitted with self-describing data, digested, parsed, and automatically routed to six pods which mapped to the six themes of the site. Application pages and the Illustra Rules system facilitated this process.
  • Staging of media contributions โ€” through the filesystem where the referential integrity between the links was handled by a tool called NetObjects and then mirrored to 18 sites throughout the world.
  • Publishing โ€” once the content was finalized, NetObjects staged the site locally, then Sun Microsystems Integration Services mirrored the content around the world.

Sybaseโ€™s Web strategy

In Sybaseโ€™s Web strategy, there are three interoperable and intercommunicable โ€œzonesโ€ : the client software, the middle-tier server software, and the back-end production databases with multiple formats and standards for data sources.

โ€œWe let you keep your logic and data in the middle or the back end so that the solution is client-side independent,โ€ said Sybaseโ€™s Mironov. โ€œThis method will keep proprietary data secure. If you donโ€™t plan ahead by having the right database architecture, youโ€™re hosed in this fast-moving market.โ€

A big part of Sybaseโ€™s Web strategy centers around its acquisition of PowerSoft, the leading Windows applications tools company. Its PowerBuilder product is the top client/server tool for building Sybase applications, as well as Oracle applications (where it is the market share leader). PowerSoft has also announced a C++/Java development environment called Optima++ that builds upon the concept of โ€œcomponentsโ€ โ€” self-describing and self-helping gadgets.

PowerBuilderโ€™s largest user base is Oracle developers, and Sybase has the largest installed based of gateways to DB2, RDB, and other popular data-source formats at the present time.

Sybase also purchased Visual Components Inc., creators of the Formula One spreadsheet, a โ€œrich textโ€ (RTF) plug-in, and a Web-oriented spell checker.

The technology from Visual Components gives Sybase a competitive advantage in the use of innovative โ€œdata windowsโ€ and certain graphical tools allow for data to be formatted in charts, columns, and so on. The spreadsheet application works like Excel, and the rich-text description is Microsoft Word-like. Sybase plans to embed spreadsheet and word-processing client software applications in a network-savvy manner for its users.

โ€œIn a world where the network computer wins, we hope to have all the software that runs on it,โ€ said Mironov. โ€œYou could effectively run a browser application in background mode, get the data, and drop it into your application if thatโ€™s what you want to do.โ€

In the โ€œmiddlewareโ€ space between the Web server and the database, Sybase plans four different initiatives:

  • web.sql โ€” the โ€œlightest, cheapest way to get into the game,โ€ this tool is called from one line of HTML on the server side;
  • PowerBuilder 5 โ€” a tool for building server-side, non-trivial objects for rich, complex applications that will read data, make decisions, do joins, and analyze data in a non-object-oriented manner;
  • Optima++ โ€” a tool for doing Java-equivalent applications;
  • Object Connect โ€” a series of object bindings plus object/relational database functionality for Microsoftโ€™s OLE (and eventually C++ and Java).

Using Sybaseโ€™s ObjectConnect, developers can make an object request that will automatically determine whether the result will come from an โ€œobject store,โ€ a โ€œrelational store,โ€ or a โ€œdirectory store.โ€ If itโ€™s going to a relational store, it does the mapping and then it goes into the relational database and retrieves the object, freeing the developer from directly having to do any of mundane, customized coding.

On the deeper, back end of Sybaseโ€™s Web strategy, there is a set of projects called EnterpriseCONNECT, which offers a gateway to 35 kinds of foreign data โ€” from DB2 on the mainframe to Oracle, Informix, Ingres, and a number of others. โ€œWeโ€™re deep into the 90th percentile on data sources,โ€ said Mironov. โ€œWe assume a heterogeneous world and that our customers are always discovering a new source of heterogeneity that they forgot to tell us about. Weโ€™re proud to say that we have approximately 35 different back-end data sources. Thatโ€™s 30 more than everybody else.โ€

The real key to Sybaseโ€™s gamble is web.sql itself. When the web.sql interpreter links in with the http server, web.sql preprocesses the page after it discovers the โ€œsybase type = sqlโ€ tag. What immediately comes back is text. When the text executes, it passes the result back to the Web server (which never sees any SQL) and the Web browser (which also never sees any SQL), and the result is presented to the user as a static page that is formatted with all the resulting data from the original query.

โ€œIf a Web developer writes a page in web.sql,โ€ said Mironov, โ€œSybase will guarantee that itโ€™ll work on the other side of the firewall.โ€

web.sql assumes that you have a pre-existing infrastructure in the back end that is opened up with OpenClient and OpenServer. PowerBuilder 5 assumes youโ€™re going to design an application with logic and business rules and youโ€™re going to put that in its perfect place. Finally, Sybaseโ€™s ObjectConnect tool assumes that youโ€™re going to do this same thing, but that youโ€™re going to be using an object interface that gives you one more level of indirection (in case some of your data resides in an object or relational store) and youโ€™re moving it in or out.

Versant takes the high road

Visionaries describe the worldโ€™s telephone systems as the โ€œultimate accounting systemsโ€ of the future. Not only will they move voice, data, and video; theyโ€™ll also be performing a function that they do best of all โ€” sending customers a bill and collecting the money.

What if the telecommunications industry giants could provide billing and fulfillment services to corporations in a variety of industries? Speeding through the trunks of its high-bandwidth telephone lines someday will be the digital content of next-generation, TCP/IP networks that take the Internet to another level.

Doing databases at this level of complexity and volume is no simple task. Now enters a company called Versant Object Technologies.

Founded in 1988 by a small group of relational database architects who foresaw the limitations of existing RDB technology when put to the test of high-bandwidth, high-capacity ATM switches, Versant sought to improve the solution to this problem: database-intensive applications mapped poorly to the object-oriented programming languages of today.

Versantโ€™s founders set out to build a database that would store โ€œobjectsโ€ as opposed to a persistent storage method of like-kind elements.

โ€œOur focus is on network management and high-demand database architectures,โ€ said Craig Russell, director of product management at Versant.

With the big telecommunications clients that Versant serves, their object-oriented database magic can be embedded directly into the switches, the routes, and the fiber lines that bind everything together. With big transportation companies, itโ€™s the concrete and the trucks. With big utilities, itโ€™s the power grid itself (the poles, the wires, the transformers and how they interoperate with each other).

A new standard has been created by the International Telecommunications Union (ITU) called โ€œMNโ€ โ€” for telecommunications Management Network. Intended to help standardize how large companies manage their infrastructures and constantly flowing digital assets, MN promises to help companies deliver real database-intensive services to real customers in terms of business management.

The structure of how to manage a businessโ€™s data, along with the unique types of functions and procedures for managing it, are best described by the word โ€œobjects.โ€

โ€œWe have โ€˜business objectsโ€™ that use โ€˜service objectsโ€™ that use โ€˜operations objectsโ€™ that use โ€˜element objects,โ€™ โ€ said Versantโ€™s Russell. โ€œItโ€™s a completely integrated stack of object-based abstractions of a given business model, its service model, and the actual operations of its network itself.โ€

If you combine all those things you come up with a particularly elite, small number of large customers who are embedding Versantโ€™s object-oriented database technology in areas like billing systems, operations management, and even the embedded management of the fabric inside ATM switches themselves.

From top to bottom, Versant has invaded the telecommunications industry via its relationships with clients like AT&T, MCI, Sprint, Erickson, Alcatel, and others.

โ€œWe tend to not compete head-on with SQL database server companies,โ€ Russell said. โ€œAfter theyโ€™ve tried to deploy applications with relational tools, they often find it too difficult to cost/justify the continuing management of relational code to do what Versant does with object code.โ€

Relational database tools fall apart when confronted with the three areas that Versant specializes in: complex data (multimedia and other data types), complex relationships among data sources, and distribution.

โ€œSome of our clients have come to grips with how theyโ€™ve been โ€˜force-fittingโ€™ this relational model into something that it wasnโ€™t really designed to do,โ€ Russell claimed. Versant handles complex relationships among objects that facilitate rapid access from one part of the database to another. It also provides for distribution of data over local- or wide-area networks.

โ€œMore and more, weโ€™re finding some of the more demanding applications โ€” for example, managed-care health systems โ€” where consumers or providers have a need for object-oriented database technology,โ€ said Russell. A health service provider has three complex functions: a billing organization, a health-care provider, and a big conglomerate that performs a medical service for the consumer.

โ€œWhen you peel a really big onion, it doesnโ€™t fit nicely into rows and columns,โ€ said Russell. โ€œYouโ€™ve got this huge mismatch when you try to present disparate information in a relational model.โ€

Versantโ€™s technology treats the Web as merely another medium, and there will be two ways of accessing the data: Java and Declarative definitions with HTML formatting.

โ€œWe believe that Java is the language of the Internet,โ€ said Russell. โ€œWhat you really need for complex applications is intelligence in your โ€˜network toaster.โ€™ Thatโ€™s where we see the role of Java; and itโ€™s ideally suited for multiple, dynamic data sources in the same screen by providing the intelligence there that says, โ€˜Not only do I have multiple, disparate data sources, but Iโ€™m going to invoke a transaction that will combine all information from those together into a single display and help the user through the massaging of that information.โ€™ โ€œ

Russell envisions a future where consumers will โ€œrentโ€ a Java applet from Fidelity Investments that will perform an analysis of live stock portfolio values, and then make suggestions on actions to be taken.

Versant will be supporting not only Java, but also CGI and JDBC. (JDBC is an implementation in Java for ODBC. Itโ€™s a wrapper for SQL that is an extension of standards work to define a single API that would be usable to access a number of SQL data sources.)

From Versantโ€™s perspective, a Java applet out on the Web will have an execution environment that will be very limited by current standards. โ€œItโ€™ll be a couple megabytes of RAM, a Pentium-sized processor โ€” what we call a โ€˜network toaster,โ€™ โ€ said Russell. โ€œIn such a limited execution environment, youโ€™re going to need to get your data from somewhere else.โ€

There are four โ€œsomewhere elseโ€ sources from where the data will come on the Web:

  • HTML/http route โ€” a URL is sent back to the Web site where data is requested, and SQL scripts pull out static data to be displayed in HTML. (Versant has a declarative method of retrieving that information very much like the kind of script that used to get data from Oracle server.)
  • A Java application program on the serverโ€™s back end that is tied in and invoked from a Web serverโ€™s script that will poke into a Versant database, โ€œmassageโ€ the data, and format it in HTML output for display.
  • A JDBC interface where your Java application will go through a package of Java code that was designed to translate the information into a JDBC call, which in turn would open a data source, retrieve the info, and send it back for display.
  • โ€œORBโ€ technology for heavy-duty Web applications. Itโ€™s the detonation of a Java IDL and NEO called Joe that allows Web developers to define an interface to an object (that is built in Java on the front end) and execute behavior on that Java object using a standard ORB implementation language. These kinds of apps would involve data sources at the mainframes, relational databases, or other object-oriented databases.

Next monthโ€ฆ

Next month in The Cyberstruction Zone, weโ€™ll take a look at how to use Java in conjunction with Moving Worlds โ€” the next-generation implementation of VRML by a consortium of companies including Netscape, SGI, Sony, and Worlds Inc.

As Chairman of Cyberstruction Inc., Craig LaGrow serves a small number of Fortune 500 clients and advertising agencies by creating customized, turn-key WWW solutions (โ€œOnline Technology Implementation Plansโ€) that implement advanced, emerging Web technologies. Syndicated material copyright 1996, Cyberstruction Inc.