Back
Two approaches to shared library support for uClinux/uClibc
Two approaches to shared library support for uClinux/uClibc
by Rick Stevenson (July 15, 2002)
Foreword: Two companies (SnapGear and RidgeRun)
recently announced dynamic linking support for uClinux using two
different approaches. In this whitepaper, SnapGear CEO Rick Stevenson
discusses various practical issues from both technical and legal
perspectives, and compares and contrasts the two companies' differing
approaches. (Note: material in this paper is based on email discussions
between Rick Stevenson of SnapGear and Dan George of RidgeRun,
moderated by Rick Lehrbaum of LinuxDevices.com.)
Two Approaches to Shared Library Support for uClinux/uClibc
by Rick Stevenson
What is uClinux?
uClinux
is a set of patches to conventional Linux for MMUless microprocessors
(e.g. ARM7, Motorola ColdFire). MMUless processors are quite common in
deeply embedded systems because of lower price-point per unit, a major
factor of embedded product development where component price is
crucial. Prior to uClinux a variety of proprietary or homegrown
executives were used but all lacked the advantages of the Linux API
which provides a uniform application interface in a powerful and
consistent POSIX manner.
What is the big deal about MMUless microprocessors and how does uClinux help?
Because
processors without an MMU (Memory Management Unit) lack hardware memory
management and protection, conventional Linux cannot operate. uClinux
allows a wealth of open source software to be immediately ported to
these microprocessors and provides a high level of application
stability.
Unfortunately, one of the key features lacking in
uClinux was dynamic linking -- all applications had to be statically
linked with their libraries resulting in large firmware images and
wasted space. The addition of dynamic linking solves a big hurdle in
the race to keep code size down and increase functionality in deeply
embedded devices.
What are the advantages of shared libraries and why is it controversial?
Program
and data storage in an embedded device are often limited to a highly
constrained combination of Flash ROM and conventional RAM -- space is
at a constant premium. The more memory that can be saved, the more room
there is for additional application functionality and capacity. Shared
libraries mean only one copy of the library code need be present rather
than an individual copy for each executable.
There are other
advantages, such as version control, in that a single update of a
shared library will update all executables simultaneously. There is
often a 'double dipping' advantage to the space savings -- that is,
implementations that unpack a compressed firmware Flash image into RAM
for execution save both ways.
SnapGear engineers have previously contributed XIP (eXecute In Place) technology (see SnapGear Technical Bulletin #8),
which went some way to reducing memory usage by allowing the text
segment to be shared by simultaneous executing images, along with
advanced memory management (see SnapGear Technical Bulletin #2 and Technical Bulletin #11).
GPL 'Tainting'
The GNU Open Source license agreement, the General Public License
(GPL), is a double-edged gift. All open source code released under the
GPL is free to use for both noncommercial and commercial purposes, as
long as the license is preserved and source to the application is
provided to those who receive the application. All extensions to the
code and additions that are linked in are also subject to the GPL. This
latter condition is often referred to as "GPL Tainting"; that is, by
using GPL'ed code your own source also becomes GPL code.
There has been considerable discussion around the issue of whether code that is linked to GPL software dynamically, rather than statically, needs to be GPL, because in the former case the linking only takes place at run time rather than at compile time.
Although there is a special exception in the case of the Linux kernel
(made by Linus himself), in general dynamic linking doesn't get around
the linking issue -- and the GNU lawyers take a dim view of anyone
attempting to navigate around the spirit and letter of the GPL license
agreement. In this article
Jerry Epplin says: "If, at execution time, your work is linked with a
GPL work, it is a derived work. Note that it does not matter whether
the linking is static or dynamic, so making use of a GPL-licensed
shared library creates a work derived from the library."
On the other hand, linking to code released under the GNU Lesser General Public License
(LGPL) is fine, regardless of whether dynamic or static linking is
used, as the LGPL is a less restrictive license agreement. Note,
however, that it only takes one introduced fragment of GPL code to
render everything else GPL even if it started as LGPL.
Two Philosophies
The RidgeRun announcement
indicates that their work was focused on dynamic linking of libraries
for the sake of being able to combine proprietary code/drivers with
LGPL/GPL software in an embedded system. RidgeRun also wanted to be
able to satisfy customers migrating from a conventional Linux
background. RidgeRun was driven to implement shared libraries by a
customer who could not use Linux if it meant releasing, or even
offering to release, various parts of their source code. Addressing
this issue as the primary goal, they also believed that a strong draw
to Linux would be its support for various applications. Linux is a
general purpose OS, and their customers liked the idea that they
wouldn't have to be hard core embedded programmers to use it. Thus,
they decided that they would try solve the problem in a standard
'non-embedded' Linux way.
The SnapGear announcement,
on the other hand, indicated that the focus was on using shared
libraries as a means to eliminate redundant code and thereby reduce the
software memory footprint requirements (both ROM and RAM) required for
typical embedded applications. This would appeal to typical embedded
system developers.
History
Both companies saw the
need for shared library support around the same time, but were not
aware of each other until late in their projects.
RidgeRun
attempted to solicit community support in October of 2001, but to no
avail. After much 'Googling', talking with such 'pundits' as Erik
Anderson, Phil Blundell, Ralph S., and others, they struck out on their
own. No one seemed to be working on it at that time. In hindsight, they
noted that they should have expanded the search outside the ARM
community and touched base with SnapGear. They hadn't really considered
shared libraries as a uClinux problem but rather as a uClibc/binutils
problem.
Similarly, at SnapGear the development team was very
focused on the Motorola ColdFire community. A SnapGear engineer, David
McCullough, became aware that RidgeRun was also working on shared
libraries and touched base with them to see if there were any
cooperative opportunities. As both teams had nearly finished, and had
taken quite different approaches, it was agreed that there was little
benefit in trying to reconcile the two projects. By the time RidgeRun
heard from David, we were done with everything but the announcement
which they made shortly after. Both companies agreed that it was too
bad we hadn't hooked up sooner, as we'd have happily worked together.
RidgeRun's Approach
The
RidgeRun approach uses ELF format files and something very close to the
vanilla Linux approach. This provides a degree of comfort for
non-embedded developers, but does incur additional space requirements
(ELF headers, symbol table information).
RidgeRun chose ELF so
that they could leverage the existing ld.so already supplied by uClibc.
The size difference was within their budget, and had they run into a
size problem they'd have put more effort into minimizing ELF overhead.
It is actually pretty small already.
SnapGear's Approach
The
SnapGear approach uses the uClinux flat file format. It provides a very
space-efficient solution, but does introduce some limitations (which
SnapGear doesn't believe are serious in an embedded environment).
SnapGear
didn't have to write a new ld.so to support flat files. The
applications are statically linked against the shared libraries, so
there is no need for ld.so at all.
Comparisons
The RidgeRun approach . . . -
based on ELF format and in fact actually is standard ELF format for the
shared libs. Programs and statically linked libs are still flat (see
below).
- currently for ARM7, believed portable to other processors
-
less space-efficient in some situations, although they do strip out
unused symbol table entries, etc. and this could easily turn into a
long debate with lots of byte counting. (The best way to resolve this
question would be to pick one or more example apps and look at the
approximate percentage overhead to get a rough feel for relative space
efficiency.)
Attributes that make the RidgeRun approach larger . . . Attributes that make the RidgeRun approach smaller . . . -
no GOT required; that is, a GOT is not required. We were carrying
around a GOT already (to avoid data segment size limits on the ColdFire
architecture) so the additional space is insignificant for us. It may
not be on ARM7.
- single pic base eliminates PIC load on every global function
- No library indices or corresponding tables
-
requires source code changes for callouts. More specifically, libraries
that invoke callbacks must take care to save client PIC info when the
function pointer is passed into the library. Ridgerun think this
problem is solvable but weren't able to put the time into it. They
accepted this limitation with much trepidation.
- requires
compiler patching because of a problems with PIC code trying to address
function pointers r9. Binutils also needed work. Changes were in ld and
ld.so to support the thunk code. Programs that make use of shared libs
must be in ELF object file format; other programs may be in flat.
Ridgerun fixed a number of bugs as well because they we were running
through existing blocks of code under new circumstances such as
combining command line options in a new way.
- more
flexible platform for future dynamic linking changes with no need to
keep track of global identifiers for shared libraries. Not much of an
issue for hard core embedded use but could get messy over time. Might
not be a problem in the life time of MMU-less processors.
- RidgeRun shared libraries run XIP.
-
Function call overhead to execute thunk for exported functions.
However, not all global functions in the library need a 'thunk'. No
overhead when these 'thunked' functions are called from within the
library.
The SnapGear approach . . . - based on flat file format
- currently for Motorola ColdFire and 68k, highly portable to other processors
- very space-efficient
-
some limitations, e.g. max size of 16MB per library, can link max of
255 shared libs in one app, app must be relinked if the library changes
(the build system does this automatically). This does mean that apps
must be re-linked if a library changes. Could get messy when apps come
from various suppliers. Again, not an issue for the hard core embedded
case. We generate a runtime error when running against a newer library.
You don't need source code to relink, so third parties components could
be supplied as object files.
- no source code changes needed
-
requires compiler changes but not to ld and ld.so. The only changes
required were a small mod to the compiler and changes in elf2flt.
- programs run XIP and the libraries do so as well
-
on the ColdFire processor each and every global library function
includes at most three extra instructions -- two in the prolog and one
in the epilog. However, most functions don't need separate save/restore
instructions (movem deals with this) so the overhead is generally one
instruction. Better still, not all functions need the data segment
stuff setup and then there is no overhead.
Conclusions
The
objectives of the two teams were different. The RidgeRun mechanism is
flexible and familiar for non-embedded developers. On the other hand,
SnapGear's primary goal was to pack more features into a small
footprint -- it's a 'hard core' embedded solution in the same vein as
our previous contributions to memory management, PIC enhancements, and
eXecute In Place which have allowed uClinux developers to cram
applications into smaller memory footprints, or just make more
efficient use of existing hardware.
RidgeRun viewed uClinux as
first and foremost a Linux for MMU-less processors and secondly, a
small footprint version of Linux. If the hardware is MMU-less then it
is probably memory constrained as well. Their approach took both of
these considerations into account but the priority was on Linux. The
outcome was that applications do not have to be re-deployed whenever
libraries are re-deployed. The cost in terms of size was small or
non-existent.
SnapGear has built a distribution of kernel and tools that is available for public download from uclinux.org.
It
would be worthwhile for the RidgeRun changes to be integrated into
uClibc and binutils if this hasn't already been done -- although
SnapGear's solution meets the needs of the hard-core community.
From
a legal perspective, RidgeRun's approach is possibly best for systems
that must incorporate proprietary software components. Specifically,
the RidgeRun shared library approach supports item 6.b. of the LGPL.
Item 6.b. is the most practical option for many commercial products
because there is no requirement to publish client source code or object
files. The object files, of course, are included in the shipped
product. SnapGear's solution requires re-linking and therefore doesn't
support 6.b. The other options under Item 6 require more effort by the
manufacturer to maintain compliance with the license.
Technically,
RidgeRun's approach is good for devices that might have programs from
various sources (ISVs) and running various libraries. The flexibility
of their approach makes it easier to update pieces of the system
without breaking others. SnapGear's approach requires complete
re-deployment of all applications using an updated shared library.
Under many circumstances this is not a problem. For example, SnapGear's
own VPN Firewall Appliances are field upgradable and the firmware is
built by SnapGear themselves. Similarly, SnapGear's OEM customers have
full access to SnapGear's toolchains.
The SnapGear approach may,
in the final analysis, result in a smaller memory footprint. RidgeRun
expects this difference to be small, but it may make a difference in
systems with 2MB or less of system RAM/ROM. RidgeRun's advantages may
not be as apparent in such systems.
SnapGear's approach provides
a good mechanism for saving on memory resource requirements. RidgeRun's
approach is also conserves memory resources and also addresses legal
and flexibility issues faced by manufactures considering uClinux for
commercial products, as long as a clear distinction is made between GPL
and LPGL source -- otherwise there is no advantage.
Developers are encouraged to review both approaches and choose what is best suited to their specific circumstances.
He has been involved in the UNIX and Open Source communities for over
20 years and is an Adjunct Professor in the School of Information
Technology and Electrical Engineering at the University of Queensland,
a leading Australian university. Stevenson is one of the original
founders of SnapGear where he served initially as VP of Engineering and
CTO, and has previously held senior roles within companies such as IBM,
DASCOM, and Pyramid Technology.
Related stories: Talk back! Do you have comments or questions on this story? talkback here