Home
Links
People
Publications
|
Restructuring Legacy Software to Reduce Build Time and Improve Productivity
Large legacy C/C++ software systems typically consist of
header
files (.h files) and implementation files (.c files). Header files
typically
contain declarations of the symbols used in the implementation files.
By
including header files through pre-processing directives (#include
statements) dependencies
between files are created. Ideally an implementation file includes only
the declarations that it will use. However, a header file can be
included by multiple files and as such may contain declarations and
definitions that are not used by all implementation files that include
it. In such cases, false
dependencies
are created. Another problem is that symbols may be declared in more
than one place. As systems evolve, such
redundant declarations tend to become common.
Redundancies and false dependencies do not affect the
functionality of a system, but they do affect the efficiency of the development
process. The longer the build process takes, the longer developers have to
wait to integrate their changes. Large software systems that contain millions
of lines of code may take several hours to build. Redundancies increase the
size of the code and may cause inconsistencies. A false dependency between an
implementation file and its header exacerbates the problem by causing
unnecessary compilation of the implementation file when an independent part of
the header file has changed. This problem is particularly important in light of
the popularity of the sync-and-stabilize development paradigm [1], where software systems undergo frequent, often
daily, builds.
Traditional approaches to improving the efficiency of
the build
process focus on removing false target dependencies in make files.
However,
this approach does not consider the internal details of implementation
files. In this project, we take a novel preprocessing approach to the
removal
of redundant declarations and false dependencies based on analysis of
program units inside header files.
The main steps of this approach are:
- Construct a program units graph and a file
dependency graph.
- Partition the graphs to remove redundancies and
false
dependencies between files.
- Reorganize the files into directories to reduce
coupling and improve cohesion.
Step 2 provides an initial restructuring of header files which
eliminates
redundancies and false dependencies[3], but at
the cost
of producing a great number of header files. This step makes the
system more difficult to maintain. We created a softgoal dependency
graph based on the NFR framework [4, 5, 6] in order to investigate the
tradeoff between improving maintainability and improving productivity. We used
this
graph to come up with an operationalization in which the number of
header files was controlled by organizing the software system’s files
into directories and refactoring the header files into these
directories.
We applied this algorithm to several case studies. In one study, the
affect of header restructuring on the public domain text editor Vim (Vi
improved)[2] was studied. Applying this technique alone, we
found that time for build with "-O2" option turned on was reduced by
56%, or a
2x speedup.
What is New since CASCON03
Recently we have verified that this compilation tuning technique works orthogonal to
other tuing techniques used for parallel compilation (e.g. DISTCC) and compilation
cache (e.g. CCACHE, precompiled headers option): having 8 CPUs, the speed up
rocketed to 40x when the above three techniques are applied together, while the
net speedup of our technique on top of the other techniques is up to 8x. The
major reason for the super-linear speedup is that the network traffic of
sending the precompiled form among machines in the compiler farm is reduced 3x
by our technique.
The previous approach reported at CASCON 2003, however, was based
on a heavy weight fact extraction using abstract syntax graph (see
the Datrix C/C++ schema).
Our recent development can replace such a cubesome fact extraction
with the existing parser plus a small overhead (light-weighted),
and can generate the restructured code on-the-fly. The overhead of doing the
precompilation for restructuring is now smaller than the saved compilation time
if the code base is just compiled once. If the code based is to be compiled N
times, the precompilation overhead can be divided by N times. We have tried
both GCC and Intel C/C++ compiler and verified that the precompilation results
are compatible and independent to the choice of C/C++ compilers.
The new algorithm is implemented using the GCC 3.4.0 parser, see the technical
report. Due to the confidential reason, we can not disclose the
name of the industrial software for IBM.
Resources
Removing
false code dependencies (PDF) , the related results for VIM 6.1.
"Light-weight Fine-grain C/C++ precompilation" and the related results for VIM 6.2.
References
- 1
- M. A. Cusumano and R. W. Selby.
How Microsoft builds software. Communications of the ACM,
40(6), June 1997.
- 2
- Bram Moolenaar.
Vim 6.1, http:
- 3
- Y. Yu and H. Dayani-Fard. and J.
Mylopoulos. Removing False Code Dependencies to Speedup Software
Build Processes. (PDF)
In Proceedings of the CASCON, 2003 , Toronto, Canada, Oct
2003.
- 4
- H. Dayani-Fard. Quality-based software
release management.
PhD thesis, Queen’s University, 2003.
- 5
- L. Chung, B. A. Nixon, E. Yu, and
J. Mylopoulos. Non-Functional Requirements in Software
Engineering.
Kluwer Academic Publishing, 1999.
- 6
- L. Tahvildari and K. Kontogiannis Requirements-Driven
Software Re-engineering Framework.
WCRE’01, 2001, pp. 71--80.
|