next up previous
Next: Bibliography Up: Opportunities for Bandwidth Adaptation Previous: Generality of results

   
6 Conclusions and discussion

We characterized compound documents generated by the three most popular applications of the Microsoft Office suite: Word, PowerPoint, and Excel. Our focus was on identifying opportunities for adapting these documents to the constraints of bandwidth-limited clients. Our study encompassed over 12,500 documents, comprising over 4 GB of data, retrieved from 935 different Web sites.

We identified the following opportunities for adaptation:
1.
For large documents, images and components account for the majority of the data. Moreover, images and image components are the most common non-text data found in Office documents. These results suggest that components, and in particular images should be the main focus of any adaptation efforts. We are currently in the process of adding quality-aware transcoding and caching of images and components to Puppeteer and plan to measure the savings of these techniques.
2.
For read only documents, discarding the native component data results in savings of up to 35% and 21% for Word and PowerPoint respectively.
3.
Garbage collection of OLE archives achieves savings greater than 16% for 24% of Word and 35% of PowerPoint documents.
4.
Compression achieves savings of 77% for OLE archives and 90% for XML. Moreover, once compressed there is no significant difference in the sizes of the two file formats. Since XML formats are significantly easier to parse and manipulate than OLE archives, they are a more attractive target for adaptation.
5.
The structure of Office documents (pages, slides, and sheets) can be used to download elements on demand and reduce the time that users wait before they can start work on the document.
Furthermore, our experience studying the Office file formats resulted in the following insights:
1.
The data suggests that the ``save as'' operation is largely misunderstood by users. The large savings that we show from garbage collection suggest that users do not understand the implications of fast-save mode (the default), instead believing the ``save as'' operation to be a way to create a copy of the document.
2.
The lack of built-in support for compression in OLE archives has forced designers to implement ad-hoc solutions to achieve high performance. This experience suggests that a compression feature would be a desirable addition to OLE archives.
3.
OLE archive formats are likely to remain the preferred intermediate format for Office documents, while the XML-based format will likely be the format of choice for Web publishing. The XML-based format has the advantage that it can more easily be interpreted by application other than Office (e.g., Web browsers). It is also amenable to widespread browser techniques that improve user perceived latency, such as incremental rendering and fetch on-demand. On the flip side, the current implementation of Office 2000 does not implement incremental loading or writing of XML-based documents, leading to higher latencies for opening and storing XML-based documents than those experienced on similar OLE archive documents. Moreover, some of the Office formats do not yet have XML equivalents.

next up previous
Next: Bibliography Up: Opportunities for Bandwidth Adaptation Previous: Generality of results
Eyal DeLara
2000-05-16