|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Softpanorama C Webliography
The Tao gave birth to machine language.
Machine language gave birth to the assembler.
The assembler gave birth to the compiler.
Now their are ten thousand languages.
Each language has its purpose, however humble.
Each language expresses the Yin and Yang of software.
Each language has its place within the Tao.
But do not program in COBOL if you can avoid it.
The
Tao of Programming
All through my life, I've always used the programming language
that blended best with the debugging system and operating system that I'm
using.
If I had a better debugger for language X,
and if X went well with the operating system, I would be using that.
Donald
Knuth
Note: This material on this somewhat intersect with
C++ page as I consider C++ to be mainly "a better
C" and I am deeply skeptical about OO approach. Some materials that are
missing in this page might probably be found on the
C++ page
C is often referred to as a ``high-level assembly language.''
That means that it is not optimal as the first programming language.
The ability to perform low-level operations that are needed for systems
programming is actually a distinctive feature of the language. I believe
that the language was a very important invention for its time. The
first widespread machine independent system programming language.
As Alex Stepanov aptly noted in his
Dr. Dobb's Journal Interview
Let's consider now why C
is a great language. It is commonly believed that C is a hack which
was successful because Unix was written in it. I disagree. Over a long
period of time computer architectures evolved, not because of some clever
people figuring how to evolve architectures---as a matter of fact, clever
people were pushing tagged architectures during that period of time---but
because of the demands of different programmers to solve real problems.
Computers that were able to deal just with numbers evolved into computers
with byte-addressable memory, flat address spaces, and pointers. This
was a natural evolution reflecting the growing set of problems that
people were solving. C, reflecting the genius of Dennis Ritchie,
provided a minimal model of the computer that had evolved over 30 years.
C was not a quick hack. As computers evolved to handle
all kinds of problems, C, being the minimal model of such a computer,
became a very powerful language to solve all kinds of problems in different
domains very effectively. This is the secret of C's portability:
it is the best representation of an abstract computer that we have.
Of course, the abstraction is done over the set of real computers, not
some imaginary computational devices. Moreover, people could understand
the machine model behind C. It is much easier for an average engineer
to understand the machine model behind C than the machine model behind
Ada or even Scheme. C succeeded because it was doing the right
thing, not because of AT&T promoting it or Unix being written with it.
Right now it got it's second life as a lower level language
for "dual language" programming (in combination with scripting languages).
Especially easy to leant is TCL+C dual
language programming techniques. I strongly advice learn TCL to any
serious C programmer. Otherwise you will deprive yourself of a lot of important
concepts and method of program development and probably will never be as
productive as you can be.
C is a simple and elegant language, that introduced a lot
of new ideas into language disign. As Alex Stepanov in his
Dr. Dobb's Journal Interview aptly put it:
Let's consider now why C
is a great language. It is commonly believed that C is a hack which
was successful because Unix was written in it. I disagree. Over a long
period of time computer architectures evolved, not because of some clever
people figuring how to evolve architectures---as a matter of fact, clever
people were pushing tagged architectures during that period of time---but
because of the demands of different programmers to solve real problems.
Computers that were able to deal just with numbers evolved into computers
with byte-addressable memory, flat address spaces, and pointers. This
was a natural evolution reflecting the growing set of problems that
people were solving. C, reflecting the genius of Dennis Ritchie,
provided a minimal model of the computer that had evolved over 30 years.
C was not a quick hack. As computers evolved to handle
all kinds of problems, C, being the minimal model of such a computer,
became a very powerful language to solve all kinds of problems in different
domains very effectively. This is the secret of C's portability:
it is the best representation of an abstract computer that we have.
Of course, the abstraction is done over the set of real computers, not
some imaginary computational devices. Moreover, people could understand
the machine model behind C. It is much easier for an average engineer
to understand the machine model behind C than the machine model behind
Ada or even Scheme. C succeeded because it was doing the right
thing, not because of AT&T promoting it or Unix being written with it.
While borrowing features of PL/1 and BCPL it really elegantly
integrated the concept of pointers into PL/1-style framework, provided practical
set of high-level control structures and introduced shortcuts for increment/decrement
style operations. As Donald Knuth
remarked:
The
way C handles pointers, for example, was a brilliant innovation; it
solved a lot of problems that we had before in data structuring and
made the programs look good afterwards. C isn't the perfect language,
no language is, but I think it has a lot of virtues, and you can avoid
the parts you don't like. I do like C as a language, especially because
it blends in with the operating system (if you're using UNIX, for example).
All through my life, I've always used the programming language that
blended best with the debugging system and operating system that I'm
using. If I had a better debugger for language X, and if X went well
with the operating system, I would be using that.
And believe me despite C++ existence (and partially due
to it ;-) C will be around for a long time. As Dennis Ritchie aptly
put in
one of his interviews:
LinuxWorld.com:
C and Unix have exhibited remarkable stability, popularity, and longevity
in the past three decades. How do you explain that unusual phenomenon?
Dennis Ritchie:
Somehow, both hit some sweet spots. The longevity is a bit remarkable
-- I began to observe a while ago that both have been around, in not
astonishingly changed form, for well more half the lifetime of commercial
computers. This must have to do with finding the right point of abstraction
of computer hardware for implementation of the applications.
The basic Unix idea -- a hierarchical
file system with simple operations on it (create/open/read/write/delete
with I/O operations based on just descriptor/buffer/count) -- wasn't
new even in 1970, but has proved to be amazingly adaptable in many ways.
Likewise, C managed to escape its original close ties with Unix as a
useful tool for writing applications in different environments. Even
more than Unix, it is a pragmatic tool that seems to have flown at the
right height.
Both Unix and C gained from accidents
of history. We picked the very popular PDP-11 during the 1970s, then
the VAX during the early 1980s. [See Resources
for links to both.] And AT&T and Bell Labs maintained policies about
software distribution that were, in retrospect, pretty liberal. It wasn't
today's notion of open software by any means, but it was close enough
to help get both the language and the operating system accepted in many
places, including universities, the government, and in growing companies.
LinuxWorld.com:
Five or ten years from now, will C still be as popular and indispensable
as it is today, especially in system programming, networking, and embedded
systems, or will newer programming languages take its place?
Dennis Ritchie:
I really don't know the answer to this, except to observe that software
is much harder to change en masse than hardware. C++ and Java, say,
are presumably growing faster than plain C, but I bet C will still be
around. For infrastructure technology, C will be hard to displace. The
same could be said, of course, of other languages (Pascal versions,
Ada for example). But the ecological niches you mention are well occupied.
What is changing is that higher-level
languages are becoming much more important as the number of computer-involved
people increases. Things that began as neat but small tools, like Perl
or Python, say, are suddenly more central in the whole scheme of things.
The kind of programming that C provides will probably remain similar
absolutely or slowly decline in usage, but relatively, JavaScript or
its variants, or XML, will continue to become more central. For that
matter, it may be that Visual Basic is the most heavily used language
around the world. I'm not picking a winner here, but higher-level ways
of instructing machines will continue to occupy more of the center of
the stage.
But C is notoriously
difficult to learn as a first language. Other things equal,
the best way to learn C is to learn assembly language
first or to learn two of them in parallel. This way you will probably find
C constructs and especially pointer arithmetic quite natural. If you
have never programmed in assembly language, you may be frustrated by syntax
and convoluted semantic of pointer arithmetic, treatment array names as
pointers and so on. The main problem here is that the language was designed
for people who already are able to program in assembler. This language was
designed for writer of an operating system (Unix) and that is noticeable.
In any case, you should understand that C was designed by accomplished system
programmers for accomplished programmers and do not expect to much help
as for finding errors neither from the compiler not from run time system
(which is non-existent in pure C, and is not very helpful in C++). Both
space and time efficiency and the ability to be close to machine language
constructions was necessary on 24K PDP 11 were it was first implemented.
See A
development of the C language by Dennis Ritchie for more historical
information.
You will be much better off if you already took course in
some classic programming language like Basic, Fortran or Pascal(Turbo
Pascal is just great as the first language with Modula-2 as logical
continuation). This way you will be able to understand the language by comparing
C-way of doing thing with Pascal-way of doing things that you already know.
The problem of learning C at high schools and universities
is often complicated by teachers ;-). Many teachers forgot the problems
they face when they study the language themselves and try to feed students
as much material as possible in the very first course. IMHO the attempts
to teach both C and C++ in one semester course are really pretty close to
a crime. No it's worse than a crime -- this is a blunder ;-(. In any case
after such course usually more than 50% of students hate programming in
general and C in particular... Again I would like to stress that Pascal
(in its Turbo Pascal incarnation) is a much better first language, but if
you are unfortunate enough have C as your first language try to slow down
and spend first seven weeks without pointers and structures -- the language
will be much better understood if you do not rush to complex constructs
and master a reasonable subset before jumping into complex stuff. Bad books
can also complicate things considerable. Please read
[alt.comp.lang.learn.c-c++]
- FAQ for some useful recommendations on how to avoid typical pitfalls
and problems in C.
Due to the presence of the preprocessor, diagnostic of lexical
and syntax errors of C compilers is exceptionally bad and to avoid frustration
you better write with minimum number of mistakes. That means having a good
textbook and consulting it often. Moreover
than means that you need to check manually program for typical mistakes
(like missing semicolons, "=" instead of "==" in comparison, etc.).
You can actually save a lot of time this way.
For me list of "gotchas" -- errors that I already made and
that took me considerable time to discover was really helpful. You
can use some of the lists available on the WEB as
a starting point, but you will be much better off by creating your own from
scratch. For example due to my previous many years experience with PL/1
I still sometimes use "=" instead of "==" in if statements and loops. This
is an annoying error. See
The Top 10 Ways to get screwed by the C programming language.
Also annoying and difficult to uncover were cases when I
forgot to place & operator before the name of the variable and passed a
value instead of address. C method of representing strings as array of characters
that ends with null was pretty interesting in 70th, but lost much of its
appeal on computers with several gigabyte of memory. The necessity of having
null at the end of the string leads to subtle errors. That's why sometimes
you will see recommendations like the one in the
SDM C Style Guide:
4.7 Standard: Explicit +1 in String Length Declaration
for \n
Character arrays used as strings, i.e.,
to hold ASCII text and terminated by a null character) should have a
defined length that explicitly includes the "+ 1" character for the
null string terminator.
#define NAME_LEN 20 + 1
char name[NAME_LEN];
Style is important too. One needs (no, actually one should)
to use indent or other pretty-printer
-- they are really important as they really simplify catching of errors
by creating a pattern of indentation that is distinctive from what you might
expect. At the same time one should not go overboard with style by enforcing
upon oneself things that make no sense at all. Although I like the idea
of using high level control construct whenever possible, I really hate structured
programming pundits that teach to avoid GOTO, Breaks, continue, global variables
no matter what -- really religious attitude. This "structured programming
fundamentalists" are not as bad as (now mostly extinct) verification proponents
a la Professor E.W. Dijkstra (who BTW originated
"considered
harmful" cliche in his influential
Go To Statement
Considered Harmful paper, published in Communications of the ACM,
Vol. 11, No. 3, March 1968, pp. 147-148.), but still they try to make programming
more difficult instead of trying making it easier. As B. Kernighan
noted in his famous
Why Pascal is Not My Favorite Programming Language:
There is no 'break' statement for exiting
loops. This is consistent with the one entry-one exit philosophy espoused
by proponents of structured programming, but it does lead to nasty circumlocutions
or duplicated code, particularly when coupled with the inability to
control the order in which logical expressions are evaluated. Consider
this common situation, expressed in C or Ratfor:
while (getnext(...)) {
if (something)
break
rest of loop
}
With no 'break' statement, the first
attempt in Pascal is
done := false;
while (not done) and (getnext(...)) do
if something then
done := true
else begin
rest of loop
end
A scientific ground of this attempts to avoid certain construct
is completely non-existent -- so, as
Donald Knuth pointed out, feel
free to use them with no guilt feeling if you try to implement a higher
level control construct that is simply not availed in a given language.
For example I like many recommendation
SDM C Style Guide and just ignore half-dozen their "structured programming
fundamentalism"-based recommendation like avoiding global variables, continue
statements and GOTOs (see Donald Knuth's
famous article about this issue for more details), but your mileage may
vary.
Paradoxically, execution time errors are easier to find
as most implementations have pretty decent debuggers with step by step execution.
Borland is probably the best, but I was really impressed by a debugger that
is built in Visual Studio 5.0. May be it was written by Borland people who
were bought by Microsoft just before Borland itself was bought by Inprise
;-)
But deficiency of C are logical continuation of its strong
points. And there are a lot of strong point in C -- its popularity proves
that it is one of the best system programming languages around. People are
flexible enough to adapt to the language and top programmers can produce
up to a couple of thousand lines of code over a weekend. GCC is the
main Linux compiler and in order to use it one needs to learn C. So C is
the key to open source software. One deficiency of C that I hate is
the fact that it does not support
coroutines but there are
libraries for GCC that can help.
It is silly to consider C to be a weaker programming language
than C++. C++ is a decent language, but it is to certain extent an overkill
and is a much more complex programming language than C. It also magnify
problems that exist in C making debugging even more difficult. Contrary
to OO advocates C++ in not always better than C.
Actually you can in many cases one can do much better by
using tandem of TCL + C. TCL has a very simple structure. Each line
starts with a command, such as dotask and a number of arguments.
Each command is implemented as a C function. This function is responsible
for handling all the arguments. See my
TCL page. C programmers can also benefit
from learning Expect.
It is the greatest testing tool in existance. See also DejaGnu
Important:
-
Turbo C version 2.01 is now free from Borland -- see link to Zip
file
here ! This is a great free compiler for beginners . It provides
provided everything you needed, all of the tools,
included in one environment. Turbo C 2.01 provided tight integration
between the editor, compiler, linker, and debugger and can be used on
any computer. 1M of memory is enough ;-) It can run on Linux
in emulator. See also
Borland
Community Museum (free registration required).
-
Intel compiler is probably the best as for quality
of the generated code on the Intel platform.
It is available with a non-commercial license, meaning that anyone
can download and use the full compiler for non-profit work. This is
the best optimizing compiler you can get. The installation of Intel
compiler is far faster and easier than the installation of Visual Studio
.NET. The Intel compiler scores are approximately
2.5 times better then gcc 3.2.1 for the Monte-Carlo simulation, which
is a considerably larger margin than for any of the other parts of the
SciMark 2.0 benchmark. For other parts it is outperforming by only small
margins of 10% or less. See
Benchmarking Intel C++ against GNU gcc on Linux.
-
High quality
Borland C++ Compiler 5.5 is now free too and one can use it for
writing programs in windows.
-
For Dos
Soon
Watcom C/C++ and Fortran Compilers will be open source and will be much
better deal than Borland. It's another top of the line C compiler -- I thing
it's better than Microsoft. See
http://www.openwatcom.org
And the last but not least. Sometimes you will feel a block
- you can not do it any more no matter what. Here is several possible
ways to overcome this condition:
- Do some intense physical activity for several hours
like running, diving, bicycling, fast swimming, etc. Then take a
shower and try it again... This is usually very helpful.
- Switch to another projects for at least a couple of days...
- Go off on a tangent - sleep, read something that isn't specific
to the source of the frustration, but still connected with programming,
for example:
- Anything by Donald Knuth
- Back issues (the older the better ;-) of Byte, Dr.
Dobbs', etc.
- Go to the library and browse programming books at random for
several hours (This one is for those who like such an activity,
like me ;-)
Good luck !
Dr. Nikolai Bezroukov
Notes:
- This is a Spartan WHYFF (We Help
You For Free) site written by people for whom English
is not a native language.
Some amount of grammar and spelling errors should be
expected.
- The site contain some broken links
as it develops like a living tree...
Please try to use Google, Open directory,
etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate
if you can
mail us a correct link.
|
|
|
|
The available
C extensions can be classified
in several ways. This article
puts them in two broad categories:
- Functionality
extensions bring new capabilities
from GCC.
- Optimization
extensions help you generate
more efficient code.
Functionality
extensions
Let's start by exploring
some of the GCC tricks that
extend the standard C language.
Type
discovery
GCC permits the identification
of a type through the reference
to a variable. This kind of
operation permits a form of
what's commonly referred to
as generic programming.
Similar functionality can be
found in many modern programming
languages such as C++, Ada,
and the Java™ language. Linux
uses typeof to
build type-dependent operations
such as min and
max. Listing 1
shows how you can use
typeof to build a generic
macro (from ./linux/include/linux/kernel.h).
Listing 1.
Using typeof to
build a generic macro
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
|
Range extension
GCC includes support for
ranges, which can be put to
use in many areas of the C language.
One of those areas is on
case statements
within switch/case
blocks. In complex conditional
structures, you might typically
depend on cascades of
if statements to achieve
the same result that is represented
more elegantly in Listing 2
(from ./linux/drivers/scsi/sd.c).
The use of switch/case
also enables compiler optimization
by using a jump table implementation.
Listing 2.
Using ranges within case
statements
static int sd_major(int major_idx)
{
switch (major_idx) {
case 0:
return SCSI_DISK0_MAJOR;
case 1 ... 7:
return SCSI_DISK1_MAJOR + major_idx - 1;
case 8 ... 15:
return SCSI_DISK8_MAJOR + major_idx - 8;
default:
BUG();
return 0; /* shut up gcc */
}
}
|
Ranges can also be used for initialization, as shown below (from
./linux/arch/cris/arch-v32/kernel/smp.c).
In this example, an array is
created of spinlock_t
with a size of LOCK_COUNT.
Each element of the array is
initialized with the value
SPIN_LOCK_UNLOCKED.
/* Vector of locks used for various atomic operations */
spinlock_t cris_atomic_locks[] = { [0 ... LOCK_COUNT - 1] = SPIN_LOCK_UNLOCKED};
|
Ranges also support more complex initializations. For example,
the following code specifies
initial values for sub-ranges
of an array.
int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };
|
Zero-length arrays
In standard C, at least one
element of an array must be
defined. This requirement tends
to complicate code design. However,
GCC supports the concept of
zero-length arrays, which can
be particularly useful for structure
definitions. This concept is
similar to the flexible array
member in ISO C99, but it uses
a different syntax.
The following example declares
an array with zero members at
the end of a structure (from
./linux/drivers/ieee1394/raw1394-private.h).
This allows the element in the
structure to reference memory
that follows and is contiguous
with the structure instance.
You may find this useful in
cases where you need to have
a variable number of array members.
struct iso_block_store {
atomic_t refcount;
size_t data_size;
quadlet_t data[0];
};
|
Determining call
address
In many instances, you may
find it useful or necessary
to determine the caller of a
given function. GCC provides
the built-in function
__builtin_return_address
for just this purpose. This
function is commonly used for
debugging, but it has many other
uses within the kernel.
As shown in the code below,
__builtin_return_address
takes an argument called
level. The argument
defines the level of the call
stack for which you want to
obtain the return address. For
example, if you specify a
level of
0, you are requesting
the return address of the current
function. If you specify a
level of
1, you are requesting
the return address of the calling
function (and so on).
void * __builtin_return_address( unsigned int level );
|
The local_bh_disable function in the following example
(from ./linux/kernel/softirq.c)
disables soft interrupts on
the local processor to prevent
softirqs, tasklets, and bottom
halves from running on the current
processor. The return address
is captured using __builtin_return_address
so that it can be used for later
tracing purposes.
void local_bh_disable(void)
{
__local_bh_disable((unsigned long)__builtin_return_address(0));
}
|
Constant detection
GCC provides a built-in function
that you can use to determine
whether a value is a constant
at compile-time. This is valuable
information because you can
construct expressions that can
be optimized through constant
folding. The __builtin_constant_p
function is used to test for
constants.
The prototype for __builtin_constant_p
is shown below. Note that
__builtin_constant_p
cannot verify all constants,
because some are not easily
proven by GCC.
int __builtin_constant_p( exp )
|
Linux uses constant detection quite frequently. In the example
shown in Listing 3 (from ./linux/include/linux/log2.h),
constant detection is used to
optimize the roundup_pow_of_two
macro. If the expression can
be verified as a constant, then
a constant expression (which
is available for optimization)
is used. Otherwise, if the expression
is not a constant, another macro
function is called to round
up the value to a power of two.
Listing 3.
Constant detection to optimize
a macro function
#define roundup_pow_of_two(n) \
( \
__builtin_constant_p(n) ? ( \
(n == 1) ? 1 : \
(1UL << (ilog2((n) - 1) + 1)) \
) : \
__roundup_pow_of_two(n) \
)
|
Function
attributes
GCC provides a variety of
function-level attributes that
allow you to provide more data
to the compiler to assist in
the optimization process. This
section describes some of these
attributes that are associated
with functionality. The next
section describes
attributes that affect optimization.
As shown in Listing 4, the
attributes are aliased by other
symbolic definitions. You can
use this as a guide to help
read the source references that
demonstrate the use of the attributes
(as defined in ./linux/include/linux/compiler-gcc3.h).
Listing 4.
Function attribute definitions
# define __inline__ __inline__ __attribute__((always_inline))
# define __deprecated __attribute__((deprecated))
# define __attribute_used__ __attribute__((__used__))
# define __attribute_const__ __attribute__((__const__))
# define __must_check __attribute__((warn_unused_result))
|
The definitions shown in Listing 4 reflect some of the function
attributes available in GCC.
They are also some of the most
useful function attributes in
the Linux kernel. Following
are explanations of how you
can best use these attributes:
always_inline
tells GCC to inline the
specified function regardless
of whether optimization
is enabled.
deprecated
tells you when a function
has been deprecated and
should no longer be used.
If you attempt to use a
deprecated function, you
receive a warning. You can
also apply this attribute
to types and variables to
encourage developers to
wean themselves from those
kernel assets.
__used__
tells the compiler that
this function is used regardless
of whether GCC finds instances
of calls to the function.
This can be useful in cases
where C functions are called
from assembly.
__const__
tells the compiler that
a particular function has
no state (that is, it uses
the arguments passed in
to generate a result to
return).
warn_unused_result
forces the compiler to check
that all callers check the
result of the function.
This ensures that callers
are properly validating
the function result so that
they can handle the appropriate
errors.
Following are examples of
these function being used in
the Linux kernel. The
deprecated example comes
from the architecture non-specific
kernel (./linux/kernel/resource.c),
and the const example
comes from the IA64 kernel source
(./linux/arch/ia64/kernel/unwind.c).
int __deprecated __check_region(struct resource
*parent, unsigned long start, unsigned long n)
static enum unw_register_index __attribute_const__
decode_abreg(unsigned char abreg, int memory)
|
|
|
|
|
The new Inline module for Perl allows you to write code in other languages
(like C, Python, Tcl, or Java) and toss it into Perl scripts with wild
abandon. Unlike previous ways of interfacing C code with Perl, Inline
is very easy to use, and very much in keeping with the Perl philosophy.
One extremely useful application of Inline is to write quick wrapper
code around a C-language library to use it from Perl, thus turning Perl
into (as far as I'm concerned) the best testing platform on the
planet.
Perl has always been pathetically eclectic, but until now it hasn't
been terribly easy to make it work with other languages or with libraries
that weren't constructed specifically for it. You had to write interface
code in the XS language (or get SWIG to do that for you), build an organized
module, and generally keep track of a whole lot of details.
But now things have changed. The Inline module, written and actively
(very actively) maintained by Brian Ingerson, provides facilities
to bind other languages to Perl. In addition its sub-modules (Inline::C,
Inline::Python, Inline::Tcl, Inline::Java, Inline::Foo, etc.) allow
you to embed those languages directly in Perl files, where they
will be found, built, and dynaloaded into Perl in a completely transparent
manner. The user of your script will never know the difference, except
that the first invocation of Inline-enabled code takes a little time
to complete the compilation of the embedded code.
The world's simplest Inline::C program
Just to show you what I mean, let's look at the simplest possible
Inline program; this uses an embedded C function, but you can do substantially
the same thing with any other language that has Inline support.
Listing 1. Inline "Hello, world"
use Inline C => <<'END_C';
void greet() {
printf("Hello, world!
");
}
END_C
greet;
|
Naturally, what the code does is obvious. It defines a C-language
function to do the expected action, and then it treats it as a Perl
function thereafter. In other words, Inline does exactly what an
extension module should do. The question that may be uppermost in
your mind is, "How does it do that?". The answer is pretty much
what you'd expect: it takes your C code, builds an XS file around it
in the same way that a human extension module writer would, builds that
module, then loads it. Subsequent invocations of the code will simply
find the pre-built module already there, and load it directly.
You can even invoke Inline at runtime by using the Inline->bind
function. I don't want to do anything more than dangle that tantalizing
fact before you, because there's nothing special about it besides the
point that you can do it if you want to.
[May 13, 2008] cstring
3.4.4 by Dr Proctor
About: cstring is a small and simple platform-independent
C library for the definition and manipulation of expandable C-style
strings. Strings are represented as instances of the cstring_t structure,
and manipulated by the library's functions. Its features include selection
of different allocator pools, mapping cstring_t instances as views onto
existing memory areas, efficient work-ahead memory optimization, and
minimal link requirements.
Changes: This release incorporates support for the Safe String
library and for the Win64 platform.
Andrew Binstock and Donald Knuth converse on the success of open
source, the problem with multicore architecture, the disappointing lack
of interest in literate programming, the menace of reusable code, and
that urban legend about winning a programming contest with a single
compilation.
Andrew Binstock: You are one of the fathers of the open-source
revolution, even if you aren’t widely heralded as such. You previously
have stated that you released
TeX as open source because of the problem of proprietary
implementations at the time, and to invite corrections to the code—both
of which are key drivers for open-source projects today. Have you been
surprised by the success of open source since that time?
Donald Knuth: The success of open source code is perhaps the only
thing in the computer field that hasn’t surprised me during
the past several decades. But it still hasn’t reached its full potential;
I believe that open-source programs will
begin to be completely dominant as the economy moves more and more from
products towards services, and as more and more volunteers arise to
improve the code.
For example, open-source code can produce
thousands of binaries, tuned perfectly to the configurations of individual
users, whereas commercial software usually will exist in only a few
versions. A generic binary executable file must include
things like inefficient "sync" instructions that are totally inappropriate
for many installations; such wastage goes away when the source code
is highly configurable. This should be a huge win for open source.
Yet I think that a few programs, such as Adobe Photoshop, will always
be superior to competitors like the Gimp—for some reason, I really don’t
know why! I’m quite willing to pay good
money for really good software,
if I believe that it has been produced by the best programmers.
Remember, though, that my opinion on economic questions is highly
suspect, since I’m just an educator and scientist. I understand almost
nothing about the marketplace.
Andrew: A story states that you once entered a programming
contest at Stanford (I believe) and you submitted the winning entry,
which worked correctly after a single compilation. Is this
story true? In that vein, today’s developers frequently build programs
writing small code increments followed by immediate compilation and
the creation and running of unit tests. What are your thoughts on this
approach to software development?
Donald: The story you heard is typical of legends that are based
on only a small kernel of truth. Here’s what actually happened:
John McCarthy decided in 1971 to have a Memorial Day Programming
Race. All of the contestants except me worked at his AI Lab up in the
hills above Stanford, using the WAITS time-sharing system; I was down
on the main campus, where the only computer available to me was a mainframe
for which I had to punch cards and submit them for processing in batch
mode. I used Wirth’s
ALGOL W system (the predecessor of Pascal). My program didn’t
work the first time, but fortunately I could use Ed Satterthwaite’s
excellent offline debugging system for ALGOL W, so I needed only two
runs. Meanwhile, the folks using WAITS couldn’t get enough machine cycles
because their machine was so overloaded. (I think that the second-place
finisher, using that "modern" approach, came in about an hour after
I had submitted the winning entry with old-fangled methods.) It wasn’t
a fair contest.
As to your real question, the idea of immediate compilation and "unit
tests" appeals to me only rarely, when I’m feeling my way in a totally
unknown environment and need feedback about what works and what doesn’t.
Otherwise, lots of time is wasted on activities
that I simply never need to perform or even think about. Nothing needs
to be "mocked up."
Andrew: One of the emerging problems for developers, especially
client-side developers, is changing their thinking to write programs
in terms of threads. This concern, driven by the advent of inexpensive
multicore PCs, surely will require that many algorithms be recast for
multithreading, or at least to be thread-safe. So far, much of the work
you’ve published for Volume 4 of
The Art of Computer Programming (TAOCP) doesn’t
seem to touch on this dimension. Do you expect to enter into problems
of concurrency and parallel programming in upcoming work, especially
since it would seem to be a natural fit with the combinatorial topics
you’re currently working on?
Donald: The field of combinatorial algorithms is so vast that I’ll
be lucky to pack its sequential aspects into three or four
physical volumes, and I don’t think the sequential methods are ever
going to be unimportant. Conversely, the half-life of parallel techniques
is very short, because hardware changes rapidly and each new machine
needs a somewhat different approach. So I decided long ago to stick
to what I know best. Other people understand parallel machines much
better than I do; programmers should listen to them, not me, for guidance
on how to deal with simultaneity.
Andrew: Vendors of multicore processors have expressed frustration
at the difficulty of moving developers to this model. As a former professor,
what thoughts do you have on this transition and how to make it happen?
Is it a question of proper tools, such as better native support for
concurrency in languages, or of execution frameworks? Or are there other
solutions?
Donald: I don’t want to duck your question entirely. I might as well
flame a bit about my personal unhappiness
with the current trend toward multicore architecture.
To me, it looks more or less like the hardware designers have
run out of ideas, and that they’re trying
to pass the blame for the future demise of Moore’s Law to the software
writers by giving us machines that work faster only on a few key benchmarks!
I won’t be surprised at all if the whole multithreading idea turns out
to be a flop, worse than the "Titanium"
approach that was supposed to be so terrific—until
it turned out that the wished-for compilers were basically impossible
to write.
Let me put it this way: During the past 50 years, I’ve written well
over a thousand programs, many of which have substantial size. I can’t
think of even five of those programs that would have been enhanced
noticeably by parallelism or multithreading. Surely, for example, multiple
processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these
promised machines of the future? I hear
almost nothing but grief from software people, although the hardware
folks in our department assure me that I’m wrong.
I know that important applications for parallelism exist—rendering
graphics, breaking codes, scanning images, simulating physical and biological
processes, etc. But all these applications require dedicated code and
special-purpose techniques, which will need to be changed substantially
every few years.
Even if I knew enough about such methods to write about them in
TAOCP, my time would be largely wasted, because soon there
would be little reason for anybody to read those parts. (Similarly,
when I prepare the third edition of
Volume 3 I plan
to rip out much of the material about how to sort on magnetic tapes.
That stuff was once one of the hottest topics in the whole software
field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both
only when I’m running two independent jobs at the same time; that’s
nice, but it happens only a few minutes every week. If I had four processors,
or eight, or more, I still wouldn’t be any better off, considering the
kind of work I do—even though I’m using my computer almost every day
during most of the day. So why should I be so happy about the future
that hardware vendors promise? They think a magic bullet will come along
to make multicores speed up my kind of work; I think it’s a pipe dream.
(No—that’s the wrong metaphor! "Pipelines" actually work for me, but
threads don’t. Maybe the word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably
will get better with multicores. I’ve been talking about my technical
work, however, not recreation. I also admit that I haven’t got many
bright ideas about what I wish hardware designers would provide instead
of multicores, now that they’ve begun to hit a wall with respect to
sequential computation. (But my
MMIX
design contains several ideas that would substantially improve the current
performance of the kinds of programs that concern me most—at the cost
of incompatibility with legacy x86 programs.)
Andrew: One of the few projects of yours that hasn’t been
embraced by a widespread community is
literate programming.
What are your thoughts about why literate programming didn’t catch on?
And is there anything you’d have done differently in retrospect regarding
literate programming?
Donald: Literate programming is a very personal thing. I think it’s
terrific, but that might well be because I’m a very strange person.
It has tens of thousands of fans, but not millions.
In my experience, software created with
literate programming has turned out to be significantly better than
software developed in more traditional ways. Yet ordinary
software is usually okay—I’d give it a grade of C (or maybe C++), but
not F; hence, the traditional methods stay with us. Since they’re understood
by a vast community of programmers, most people have no big incentive
to change, just as I’m not motivated to learn Esperanto even though
it might be preferable to English and German and French and Russian
(if everybody switched).
Jon Bentley
probably hit the nail on the head when he once was asked why literate
programming hasn’t taken the whole world by storm.
He observed that a small percentage of the
world’s population is good at programming, and a small percentage is
good at writing; apparently I am asking everybody to be in both subsets.
Yet to me, literate programming is certainly the most important thing
that came out of the TeX project. Not only has it enabled me to write
and maintain programs faster and more reliably than ever before, and
been one of my greatest sources of joy since the 1980s—it has actually
been indispensable at times. Some of my major programs, such
as the MMIX meta-simulator, could not have been written with any other
methodology that I’ve ever heard of. The complexity was simply too daunting
for my limited brain to handle; without literate programming, the whole
enterprise would have flopped miserably.
If people do discover nice ways to use the newfangled multithreaded
machines, I would expect the discovery to come from people who routinely
use literate programming. Literate programming
is what you need to rise above the ordinary level of achievement.
But I don’t believe in forcing ideas on anybody. If literate
programming isn’t your style, please forget it and do what you like.
If nobody likes it but me, let it die.
On a positive note, I’ve been pleased to discover that the conventions
of CWEB are already standard equipment within preinstalled software
such as Makefiles, when I get off-the-shelf Linux these days.
Andrew: In
Fascicle 1 of Volume 1, you reintroduced the MMIX computer,
which is the 64-bit upgrade to the venerable MIX machine comp-sci students
have come to know over many years. You previously described MMIX in
great detail in
MMIXware.
I’ve read portions of both books, but can’t tell whether the Fascicle
updates or changes anything that appeared in MMIXware, or whether it’s
a pure synopsis. Could you clarify?
Donald: Volume 1 Fascicle 1 is a programmer’s introduction, which
includes instructive exercises and such things. The MMIXware book is
a detailed reference manual, somewhat terse and dry, plus a bunch of
literate programs that describe prototype software for people to build
upon. Both books define the same computer (once the errata to MMIXware
are incorporated from my website). For most readers of TAOCP,
the first fascicle contains everything about MMIX that they’ll ever
need or want to know.
I should point out, however, that MMIX isn’t a single machine; it’s
an architecture with almost unlimited varieties of implementations,
depending on different choices of functional units, different pipeline
configurations, different approaches to multiple-instruction-issue,
different ways to do branch prediction, different cache sizes, different
strategies for cache replacement, different bus speeds, etc. Some instructions
and/or registers can be emulated with software on "cheaper" versions
of the hardware. And so on. It’s a test bed, all simulatable with my
meta-simulator, even though advanced versions would be impossible to
build effectively until another five years go by (and then we could
ask for even further advances just by advancing the meta-simulator specs
another notch).
Suppose you want to know if five separate multiplier units and/or
three-way instruction issuing would speed up a given MMIX program. Or
maybe the instruction and/or data cache could be made larger or smaller
or more associative. Just fire up the meta-simulator and see what happens.
Andrew: As I suspect you don’t use unit testing with MMIXAL,
could you step me through how you go about making sure that your code
works correctly under a wide variety of conditions and inputs? If you
have a specific work routine around verification, could you describe
it?
Donald: Most examples of machine language code in TAOCP
appear in Volumes 1-3; by the time we get to Volume 4, such low-level
detail is largely unnecessary and we can work safely at a higher level
of abstraction. Thus, I’ve needed to write only a dozen or so MMIX programs
while preparing the opening parts of Volume 4, and they’re all pretty
much toy programs—nothing substantial. For little things like that,
I just use informal verification methods, based on the theory that I’ve
written up for the book, together with the MMIXAL assembler and MMIX
simulator that are readily available on the Net (and described in full
detail in the MMIXware book).
That simulator includes debugging features like the ones I found
so useful in Ed Satterthwaite’s system for ALGOL W, mentioned earlier.
I always feel quite confident after checking a program with those tools.
Andrew: Despite its formulation many years ago, TeX is still
thriving, primarily as the foundation for
LaTeX. While
TeX has been effectively frozen at your request, are there features
that you would want to change or add to it, if you had the time and
bandwidth? If so, what are the major items you add/change?
Donald: I believe changes to TeX would cause much more harm than
good. Other people who want other features are creating their own systems,
and I’ve always encouraged further development—except that nobody should
give their program the same name as mine. I want to take permanent responsibility
for TeX and Metafont,
and for all the nitty-gritty things that affect existing documents that
rely on my work, such as the precise dimensions of characters in the
Computer Modern fonts.
Andrew: One of the little-discussed aspects of software development
is how to do design work on software in a completely new domain. You
were faced with this issue when you undertook TeX: No prior art was
available to you as source code, and it was a domain in which you weren’t
an expert. How did you approach the design, and how long did it take
before you were comfortable entering into the coding portion?
Donald: That’s another good question! I’ve discussed the answer in
great detail in Chapter 10 of my book
Literate Programming, together with Chapters 1 and 2 of my book
Digital Typography. I think that anybody who is really interested
in this topic will enjoy reading those chapters. (See also Digital
Typography Chapters 24 and 25 for the complete first and second
drafts of my initial design of TeX in 1977.)
Andrew: The books on TeX and the program itself show a clear
concern for limiting memory usage—an important problem for systems of
that era. Today, the concern for memory usage in programs has more to
do with cache sizes. As someone who has designed a processor in software,
the issues of cache-aware and
cache-oblivious
algorithms surely must have crossed your radar screen. Is
the role of processor caches on algorithm design something that you
expect to cover, even if indirectly, in your upcoming work?
Donald: I mentioned earlier that MMIX provides a test bed for many
varieties of cache. And it’s a software-implemented machine, so we can
perform experiments that will be repeatable even a hundred years from
now. Certainly the next editions of Volumes 1-3 will discuss the behavior
of various basic algorithms with respect to different cache parameters.
In Volume 4 so far, I count about a dozen references to cache memory
and cache-friendly approaches (not to mention a "memo cache," which
is a different but related idea in software).
Andrew: What set of tools do you use today for writing
TAOCP? Do you use TeX? LaTeX? CWEB? Word processor? And what
do you use for the coding?
Donald: My general working style is to write everything first with
pencil and paper, sitting beside a big wastebasket. Then I use Emacs
to enter the text into my machine, using the conventions of TeX. I use
tex, dvips, and gv to see the results, which appear on my screen almost
instantaneously these days. I check my math with Mathematica.
I program every algorithm that’s discussed (so that I can thoroughly
understand it) using CWEB, which works splendidly with the GDB debugger.
I make the illustrations with
MetaPost (or, in
rare cases, on a Mac with Adobe Photoshop or Illustrator). I have some
homemade tools, like my own spell-checker for TeX and CWEB within Emacs.
I designed my own bitmap font for use with Emacs, because I hate the
way the ASCII apostrophe and the left open quote have morphed into independent
symbols that no longer match each other visually. I have special Emacs
modes to help me classify all the tens of thousands of papers and notes
in my files, and special Emacs keyboard shortcuts that make bookwriting
a little bit like playing an organ. I prefer
rxvt to xterm for terminal
input. Since last December, I’ve been using a file backup system called
backupfs, which
meets my need beautifully to archive the daily state of every file.
According to the current directories on my machine, I’ve written
68 different CWEB programs so far this year. There were about 100 in
2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has
an extremely convenient "change file" mechanism, with which I can rapidly
create multiple versions and variations on a theme; so far in 2008 I’ve
made 73 variations on those 68 themes. (Some of the variations are quite
short, only a few bytes; others are 5KB or more. Some of the CWEB programs
are quite substantial, like the 55-page BDD package that I completed
in January.) Thus, you can see how important literate programming is
in my life.
I currently use Ubuntu Linux,
on a standalone laptop—it has no Internet connection. I occasionally
carry flash memory drives between this machine and the Macs that I use
for network surfing and graphics; but I trust my family jewels only
to Linux. Incidentally, with Linux I much prefer the keyboard focus
that I can get with classic
FVWM to the GNOME and
KDE environments that other people seem to like better. To each his
own.
Andrew: You state in the preface of
Fascicle 0 of Volume 4 of TAOCP that Volume 4 surely
will comprise three volumes and possibly more. It’s clear from the text
that you’re really enjoying writing on this topic. Given that, what
is your confidence in the note posted on the TAOCP website
that Volume 5 will see light of day by 2015?
Donald: If you check the Wayback Machine for previous incarnations
of that web page, you will see that the number 2015 has not been constant.
You’re certainly correct that I’m having a ball writing up this material,
because I keep running into fascinating facts that simply can’t be left
out—even though more than half of my notes don’t make the final cut.
Precise time estimates are impossible, because I can’t tell until
getting deep into each section how much of the stuff in my files is
going to be really fundamental and how much of it is going to be irrelevant
to my book or too advanced. A lot of the recent literature is academic
one-upmanship of limited interest to me; authors these days often introduce
arcane methods that outperform the simpler techniques only when the
problem size exceeds the number of protons in the universe. Such algorithms
could never be important in a real computer application. I read hundreds
of such papers to see if they might contain nuggets for programmers,
but most of them wind up getting short shrift.
From a scheduling standpoint, all I know at present is that I must
someday digest a huge amount of material that I’ve been collecting and
filing for 45 years. I gain important time by working in batch mode:
I don’t read a paper in depth until I can deal with dozens of others
on the same topic during the same week. When I finally am ready to read
what has been collected about a topic, I might find out that I can zoom
ahead because most of it is eminently forgettable for my purposes. On
the other hand, I might discover that it’s fundamental and deserves
weeks of study; then I’d have to edit my website and push that number
2015 closer to infinity.
Andrew: In late 2006, you were diagnosed with prostate cancer.
How is your health today?
Donald: Naturally, the cancer will be a serious concern. I have superb
doctors. At the moment I feel as healthy as ever, modulo being 70 years
old. Words flow freely as I write TAOCP and as I write the
literate programs that precede drafts of TAOCP. I wake up in
the morning with ideas that please me, and some of those ideas actually
please me also later in the day when I’ve entered them into my computer.
On the other hand, I willingly put myself in God’s hands with respect
to how much more I’ll be able to do before cancer or heart disease or
senility or whatever strikes. If I should unexpectedly die tomorrow,
I’ll have no reason to complain, because my life has been incredibly
blessed. Conversely, as long as I’m able to write about computer science,
I intend to do my best to organize and expound upon the tens of thousands
of technical papers that I’ve collected and made notes on since 1962.
Andrew: On your website, you mention that the
Peoples Archive
recently made a series of videos in which you reflect on your past life.
In segment 93, "Advice to Young People," you advise that people shouldn’t
do something simply because it’s trendy. As we know all too well, software
development is as subject to fads as any other discipline. Can you give
some examples that are currently in vogue, which developers shouldn’t
adopt simply because they’re currently popular or because that’s the
way they’re currently done? Would you care to identify important examples
of this outside of software development?
Donald: Hmm. That question is almost contradictory, because I’m basically
advising young people to listen to themselves rather than to others,
and I’m one of the others. Almost every biography of every person whom
you would like to emulate will say that he or she did many things against
the "conventional wisdom" of the day.
Still, I hate to duck your questions even though I also hate to offend
other people’s sensibilities—given that software methodology has always
been akin to religion. With the caveat that there’s no reason anybody
should care about the opinions of a computer scientist/mathematician
like me regarding software development, let me just say that almost
everything I’ve ever heard associated with the term "extreme
programming" sounds like exactly the wrong way to go...with one
exception. The exception is the idea of working in teams and reading
each other’s code. That idea is crucial, and it might even mask out
all the terrible aspects of extreme programming that alarm me.
I also must confess to a strong bias against the fashion for reusable
code. To me, "re-editable code" is much, much better than an untouchable
black box or toolkit. I could go on and on about this. If you’re totally
convinced that reusable code is wonderful, I probably won’t be able
to sway you anyway, but you’ll never convince me that reusable code
isn’t mostly a menace.
Here’s a question that you may well have meant to ask: Why is the
new book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1?
The answer is that computer programmers will understand that I wasn’t
ready to begin writing Volume 4 of TAOCP at its true beginning
point, because we know that the initialization of a program can’t be
written until the program itself takes shape. So I started in 2005 with
Volume 4 Fascicle 2, after which came Fascicles 3 and 4. (Think of
Star Wars, which began with Episode 4.)
About: Xcoral is a multi-window mouse-based text editor for
Unix/X11 with syntax highlighting and auto-indentation. A built-in browser
enables you to navigate through C functions, C++ and Java classes, methods,
files, and attributes. This browser is very fast and self-updates automatically
after file modifications. An ANSI C Interpreter (Smac) is also built-in
to dynamically extend the editor's facilities (with user functions,
keybindings, modes, etc).
Changes: Bugfixes.
About: Sunifdef is a command line tool for eliminating superfluous
preprocessor clutter from C and C++ source files. It is a more powerful
successor to the FreeBSD 'unifdef' tool. Sunifdef is most useful to
developers of constantly evolving products with large code bases, where
preprocessor conditionals are used to configure the feature sets, APIs
or implementations of different releases. In these environments, the
code base steadily accumulates #ifdef-pollution as transient configuration
options become obsolete. Sunifdef can largely automate the recurrent
task of purging redundant #if logic from the code.
Changes: Six bugs are fixed in this release. Five of these
fixes tackle longstanding defects of sunifdef's parsing and evaluation
of integer constants, a niche that has received little scrutiny since
the tool branched from unifdef. This version provides robust parsing
of hex, decimal, and octal numerals and arithmetic on them. However,
sunifdef still evaluates all integer constants as ints and performs
signed integer arithmetic upon them. This falls short of emulating the
C preprocessor's arithmetic in limit cases, which is an unfixed defect.
About: ATF is a collection of libraries and utilities designed
to ease unattended application testing in the hands of developers and
end users of a specific piece of software. Tests can currently be written
in C/C++ or POSIX shell and, contrary to other testing frameworks, ATF
tests are installed into the system alongside any other application
files. This allows the end user to easily verify that the software behaves
correctly on her system. Furthermore, the results of the test suites
can be collected into nicely-formatted reports to simplify their visualization
and analysis.
Changes: This release adds preliminary documentation on the
C++ and shell interfaces to write tests, mainly directed to developers
wishing to adopt ATF. It adds a way to specify required architectures
and machines for given tests through the require. arch and require.machine
properties; if the platform running the tests does not fulfill the requirements,
the tests are simply skipped. It adds the ability to limit the maximum
time a test case can last through the timeout property, killing tests
that get stalled. There are many portability fixes, especially to SunOS,
and small improvements all around.
SWIG is a software development tool that connects programs written in
C and C++ with a variety of high-level programming languages. SWIG is
primarily used with common scripting languages such as Perl, PHP, Python,
Tcl/Tk, and Ruby, however the list of supported languages also includes
non-scripting languages such as C#, Common Lisp (CLISP, Allegro CL,
UFFI), Java, Modula-3, OCAML, and R. Also several interpreted and compiled
Scheme implementations (Guile, MzScheme, Chicken) are supported. SWIG
is most commonly used to create high-level interpreted or compiled programming
environments, user interfaces, and as a tool for testing and prototyping
C/C++ software. SWIG can also export its parse tree in the form of XML
and Lisp s-expressions.
Release focus: Minor feature enhancements
Changes:
shared_ptr support was added for Java and C#. STL support for Ruby was
enhanced. Windows support for R was added. A long-standing memory leak
in the PHP module was fixed. Numerous fixes and minor enhancements were
made for Allegrocl, C#, cffi, Chicken, Guile, Java, Lua, Ocaml, Perl,
PHP, Python, Ruby, and Tcl. Warning support was improved.
Getting the output of a shell command from a C program using popen
Sometimes its necessary to access the output of a shell command
(more than just the return value) in a C program. One way could be to
redirect it to a file and then access it .The other would be by using
the popen function.
#include<stdio.h>
main(){
char cmd[80];
FILE *fptr;
char out[256];
int ret;
strcpy(cmd,"ls -l");
fptr = popen(cmd, "r");
while(1){
fgets(out, 256, fptr);
if(feof(fptr)) break;
puts(out);
}
ret = pclose(fptr);
}
/* Noted tested with S10 gcc only ..*/
Splint is a tool for statically checking C programs for security
vulnerabilities and coding mistakes. With minimal effort, Splint can
be used as a better lint. If additional effort is invested adding annotations
to programs, Splint can perform stronger checking than can be done by
any standard lint.
About 10 months ago, I was writing a library. As I was writing
it, I started to look at the whole issue of notifying the caller
of errors. In typical fashion, I tried to optimize the error
handling problem rather than just do the right thing, and just
use error codes. I did a ton of research. Here is a current
list of links and articles on the subject.
Getting Started
To get you started here are some good starting points. They
both received a lot of attention on the internet.
A colorful
post by Damien Katz.
A nice
opinion piece that is pro-error codes by the famous Joel
of
Joel on Software.
Read my
original post with excellent comments by
Daniel Lyons,
Paul Clegg, and Neville of the North.
Nutshell
The default and standard way of handling errors since the
begining is to just use error codes with some convention of
noticing them. For example, you could document the error condition
with an api and then set a global variable for the actual code.
It is up to the programmer calling the function to notice the
error and do the right thing.
This is the technique used by operating systems and most
libraries. Historically, these systems have never been consistent
or compatable with other conventions. The most evolved system
for this would probably be the
Microsoft COM system. All functions return an HRESULT, which
is essentially an error code.
The next system was the ‘exception-handling’ system. In this
system errors cannot be ingored. Exception handlers are declared,
optionally, at a given scope. If an exception is thrown (ie
an error has occurred), handlers are searched up the stack until
a matching handler is found.
IMHO, the exception system isn’t used properly in 90% of
the cases. There is a fine balance between a soft error and
something exceptional. The syntax also tends to get in the way
for even the simplest of errors. I agree that there should be
errors that are not ignored, but there has to be a better way.
So, old skoolers are ‘we use error codes, and we
like them, dammit - aka, super disciplined programming,
usually for real-time, embedded and smaller systems.
The new schoolers are, ‘you have to be kidding about error-codes,
use exceptions’ - aks, yeah, we use exceptions, that is what
the language gives us… and btw, no, we don’t mind typing on
our keyboards a lot
Somehow, there has to be a better way. Maybe it will be system
or application, specific.
Moving On - Old / New Ideas
If you don’t mind it being a C++ article,
here is an amazing one from Andrei Alexandrescu and Petru
Marginean. (Andrei is widely known for his great work on Policy
Based design with C++, which is excellent) The artcle is well
written and practical. In fact, the idea was so good, the language
‘D’ made it part of the language.
Here is an example:
void User::AddFriend(User& newFriend)
{
friends_.push_back(&newFriend);
try
{
pDB_->AddFriend(GetName(), newFriend.GetName());
}
catch (...)
{
friends_.pop_back();
throw;
}
}
10 lines, and this is for the super-simple example.
void User::AddFriend(User& newFriend)
{
friends_.push_back(&newFriend);
ScopeGuard guard = MakeObjGuard(friends_, &UserCont::pop_back);
pDB_->AddFriend(GetName(), newFriend.GetName());
guard.Dismiss();
}
In D it would look even cleaner:
void User::AddFriend(User& newFriend)
{
friends_.push_back(&newFriend);
scope(failure) friends_.pop_back();
pDB_->AddFriend(GetName(), newFriend.GetName());
}
IMHO, I think exception handling will move more towards systems
like this. Higher level, simpler and cleaner.
Other interesting systems are the ones developed for Common
Lisp, Erlang, and Smalltalk. I’m sure Haskell has something
to say about this as well.
The Common Lisp and Smalltalk ones are similar. Instead of
forcing a mechanism like most exception handlers. These systems
give the exception ‘catcher’ the choice of retry’ing or doing
something different at the point of the exception. Very powerful.
Speaking of smalltalk, here is an excellent
article called
Subsystem Exception Handling in Smalltalk. I highly recommend
it.
My Recomendation
If you are building a library, use error codes. Error codes
are much easier to turn into exceptions by the language wrapper
that will eventually be built on top.
When programming, don’t get trapped into think about the
little picture. A lot of these errors are just pawns in the
grand scheme of assuring that you have all of your resources
in place before you begin your task at hand. If you present
your code in that manner, it will be much easier to understand
for all parties.
More Links
Error Codes vs. Exceptions by Damien Katz.
opinion piece that is pro-error codes by the famous Joel
of
Joel on Software.
Read my
original post with excellent comments by
Daniel Lyons,
Paul Clegg, and Neville of the North.
Microsoft COM
D Language - Exception Safe Programming
Subsystem Exception Handling in Smalltalk - nice section
on history as well
http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html
A nice long thread on comp.lang.c++.moderated
*Slightly Wacky, But Neat *
http://www.halfbakery.com/idea/C20exception20handling_20macros
http://www.nicemice.net/cexcept/ http://home.rochester.rr.com/bigbyofrocny/GEF/
http://www.on-time.com/ddj0011.htm
|
About:
Doxygen is a cross-platform, JavaDoc-like documentation
system for C++, C, Objective-C, C#, Java, IDL, Python, and
PHP. Doxygen can be used to generate an on-line class browser
(in HTML) and/or an off-line reference manual (in LaTeX
or RTF) from a set of source files. Doxygen can also be
configured to extract the code-structure from undocumented
source files. This includes dependency graphs, class diagrams
and hyperlinked source code. This type of information can
be very useful to quickly find your way in large source
distributions.
Changes:
This release fixes a number of bugs that could cause it
to crash under certain conditions or produce invalid output.
|
Make allows a programmer to easily keep track of a project by maintaining
current versions of their programs from separate sources. Make can automate
various tasks for you, not only compiling proper branch of source code
from the project tree, but helping you automate other tasks, such as
cleaning directories, organizing output, and even debugging.
I agree with your ramblings, although by chance I happen to have
one counter-example - John Carmack of id Software. The first Quake really
was an amazing technical achievement (real-time texture-mapped 3D graphics
done in software that looked good on a Pentium 75?!?).
And if you look at the source code (which
you can download for free), it's some of the prettiest, easy-to-follow
C code I've ever seen. And aside from a few interviews,
Carmack hasn't written smack.
Error reporting in C programsC is the most commonly used
programming language on UNIX platforms. Despite the popularity of other
languages on UNIX (such as Java™, C++, Python, or Perl), all of the
application programming interfaces (APIs) of systems have been created
for C. The standard C library, part of every C compiler suite, is the
foundation upon which UNIX standards, such as Portable Operating System
Interface (POSIX) and the Single UNIX Specification, were created.
When C and UNIX were developed in the early 1970s, the concept of
exceptions, which interrupt the flow of an application when some condition
occurs, was fairly new or non-existent. The libraries had to use other
conventions for reporting errors.
While you're pouring over the C library, or almost any other UNIX
library, you'll discover two common ways of reporting failures:
- The function returns an error or success code; if it's an error
code, the code itself can be used to figure out what went wrong.
- The function returns a specific value (or range of values) to
indicate an error, and the global variable
errno is
set to indicate the cause of the problem.
The errno global variable (or, more accurately, symbol,
since on systems with a thread-safe C library, errno is
actually a function or macro that ensures each thread has its own
errno) is defined in the <errno.h> system
header, along with all of its possible values defined as standard constants.
Many of the functions in the first category actually return one of
the standard errno codes, but it's impossible to tell how
a function behaves and what it returns without checking the Returns
section of the manual page. If you're lucky, the function's man page
lists all of its possible return values and what they mean in the context
of this particular function. Third party libraries often have a single
convention that's followed by all of the functions in the library but,
again, you'll have to check the library's documentation before making
any assumptions.
Let's take a quick look at some code demonstrating errno
and a couple of functions that you can use to transform that error code
into something more human-readable.
[Feb 14, 2006] Free Microsoft compilers
-
Get a Free Copy of Visual Studio 2005 Express Editions
Download a copy of Visual Studio 2005 Express
Editions today – easy to use tools for the hobbyist, novice and student
developer.
-
Visual C++ Toolkit 2003 The Microsoft Visual C++ Toolkit 2003 includes
the core tools developers need to compile and link C++-based applications
for Windows and the .NET Common Language Runtime – compiler, linker,
libraries, and sample code.
[Nov 9, 2005]
10 Things I Hate About (UNIX -- a primitive and misguided view on C;
The main value of thier peace is that it contains most of the typical
arguments that people who as no clue in software engineering attack the
language. The author does not understand that for higher level language
TCL or similar scripting languages should be used along not instead of C.
The C language was
written to enable UNIX to be portable. It's designed
to produce good code for the PDP-11, and very closely
maps to that machine's capabilities. There's no
support for concurrency in C, for example. In a
modern language such as Erlang, primitives exist
in the language for creating different threads of
execution and sending messages between them. This
is very important today, when it's a lot cheaper
to buy two computers than one that's twice as fast.
C also lacks a number
of other features present in modern languages. The
most obvious is lack of support for strings. The
lack of bounds-testing on arrays is another example—one
responsible for a large number of security holes
in UNIX software. Another aspect of C that's responsible
for several security holes is the fact that integers
in C have a fixed size—if you try to store something
that doesn't fit, you get an overflow. Unfortunately,
this overflow isn't handled nicely. In Smalltalk,
the overflow would be caught transparently to the
developer and the integer increased in size to fit
it. In other low-level languages, the assignment
would generate an error that could be handled by
the program. In C, it's silently ignored. And how
big is the smallest value that won't fit in a C
integer? Well, that's up to the implementation.
Next, we get to the
woefully inadequate C preprocessor. The preprocessor
in C works by very simple token substitution—it
has no concept of the underlying structure of the
code. One obvious example of the limitations of
this setup is when you try adding control structures
to the language. With Smalltalk, this is trivial—blocks
of code in Smalltalk can be passed as arguments,
so any message call can be a control statement.
In LISP, the preprocessor can be used to encode
design patterns, greatly reducing the amount of
code needed. C can just about handle simple inline-function
equivalents.
The real problem
with C, however, is that it's the standard language
for UNIX systems. All system calls and common libraries
expose C functions, because C is the lowest common
denominator—and C is very low. C was designed when
the procedural paradigm was only just gaining acceptance,
when Real Programmers used assembly languages and
structured programming was something only people
in universities cared about. If you want to create
an object-oriented library on UNIX, you either expose
it in the language in which it was written—forcing
other developers to choose the same language as
you—or you write a cumbersome wrapper in C. Hardly
an ideal solution.
Making Wrong Code Look Wrong - Joel on Software the main problem with
C critics is not the C is perfect (it is far from being perfect), but that
critics are ignorant. Joel rehashes old C warts without real understanding
of solutions available. For example indent
is one of the simplest solutions to "deceptive nesting" problem in C. BTW
the problem was present even in languages with better, more flexible code
blocks like PL/1 BTW PL/1 permits label on each closing bracket in order
to match the opening bracket; it also permit multiple block closure with
a singled labeled bracket like
a: begin;
... begin; ... begin ... end a; /* end a closes all 3 blocks */
Anyway here is his rant:
...As you get more proficient at writing
code in a particular environment, you start to learn to see other things.
Things that may be perfectly legal and perfectly OK according to the
coding convention, but which make you worry.
For example, in C:
char* dest, src;
This is legal code; it may conform
to your coding convention, and it may even be what was intended, but
when you’ve had enough experience writing C code, you’ll notice that
this declares dest
as a char
pointer while declaring
src as merely
a char,
and even if this might be what you wanted, it probably isn’t.
That code smells a little bit dirty.
Even more subtle:
if (i != 0)
foo(i);
In this case the code is 100% correct;
it conforms to most coding conventions and there’s nothing wrong with
it, but the fact that the single-statement body of the
if statement is not
enclosed in braces may be bugging you, because you might be thinking
in the back of your head, gosh, somebody might insert another line of
code there
if (i != 0)
bar(i);
foo(i);
… and forget to add the braces,
and thus accidentally make foo(i)unconditional!
So when you see blocks of code that aren’t in braces, you might sense
just a tiny, wee, soupçon of uncleanliness which makes you uneasy.
OK, so far I’ve mentioned three levels
of achievement as a programmer:
- You don’t know clean from unclean.
- You have a superficial idea of cleanliness,
mostly at the level of conformance to coding conventions.
- You start to smell subtle hints
of uncleanliness beneath the surface and they bug you enough to
reach out and fix the code.
There’s an even higher level, though,
which is what I really want to talk about:
4. You deliberately architect your code
in such a way that your nose for uncleanliness makes your code more
likely to be correct.
This is the real
art: making robust code by literally inventing conventions
that make errors stand out on the screen.
So now I’ll walk you through a little
example, and then I’ll show you a general rule you can use for inventing
these code-robustness conventions, and in the end it will lead to a
defense of a certain type of Hungarian Notation, probably not the type
that makes people carsick, though, and a criticism of exceptions in
certain circumstances, though probably not the kind of circumstances
you find yourself in most of the time.
But if you’re so convinced that Hungarian
Notation is a Bad Thing and that exceptions are the best invention since
the chocolate milkshake and you don’t even want to hear any other opinions,
well, head on over to Rory’s and read the
excellent comix instead; you probably won’t be missing much here
anyway; in fact in a minute I’m going to have actual code samples which
are likely to put you to sleep even before they get a chance to make
you angry. Yep. I think the plan will be to lull you almost completely
to sleep and then to sneak the Hungarian=good, Exceptions=bad thing
on you when you’re sleepy and not really putting up much of a fight.
An Example
Right. On with the example. Let’s pretend
that you’re building some kind of a web-based application, since those
seem to be all the rage with the kids these days.
Now, there’s a security vulnerability
called the Cross Site Scripting Vulnerability, a.k.a.
XSS. I won’t go into the details here: all you have to know is that
when you build a web application you have to be careful never to repeat
back any strings that the user types into forms.
So for example if you have a web page
that says “What is your name?” with an edit box and then submitting
that page takes you to another page that says, Hello, Elmer! (assuming
the user’s name is Elmer), well, that’s a security vulnerability, because
the user could type in all kinds of weird HTML and JavaScript instead
of “Elmer” and their weird JavaScript could do narsty things, and now
those narsty things appear to come from you, so for example they can
read cookies that you put there and forward them on to Dr. Evil’s evil
site.
Let’s put it in pseudocode. Imagine that
s = Request("name")
reads input (a POST argument) from the
HTML form. If you ever write this code:
Write "Hello, " & Request("name")
your site is already vulnerable to XSS
attacks. That’s all it takes.
Instead you have to encode it before
you copy it back into the HTML. Encoding it means replacing
" with
", replacing
> with
>, and
so forth. So
Write "Hello, " & Encode(Request("name"))
is perfectly safe.
All strings that originate from the user
are unsafe. Any unsafe string must not be output without encoding
it.
Let’s try to come up with a coding convention
that will ensure that if you ever make this mistake, the code will just
look wrong. If wrong code, at least, looks wrong,
then it has a fighting chance of getting caught by someone working on
that code or reviewing that code.
V IDE
V IDE works with GNU g++, Borland C++ 5.5 and Java and runs on Windows
and Linux. It includes a syntax highlighting editor for C/C++, Java,
Perl, Fortran, TeX and HTML. It has a built-in code beautifier,
macro support, ctags support, project manager, integrated
support for the V applications generator and icon editor, integrated
support for the GNU gdb and Sun's jdb (for Java), etc.
Slashdot Optimizations - Programmer vs. Compiler
Re:Clear Code (Score:5, Insightful)
by Rei (128717)
on Friday February 25, @04:56PM (#11782241)
(http://www.cursor.org/)
|
An important lesson that I wish I had learned when I was younger
;) It is crazy to start optimizing before you know where your
bottlenecks are. Don't guess - run a profiler. It's not hard, and
you'll likely get some big surprises.
Another thing to remember is this: the compiler isn't stupid; don't
pretend that it is. I had senior developers at an earlier job mad
at me because I wasn't creating temporary variables for the limits
of my loop indices (on unprofiled code, nonetheless!). It took actually
digging up an article on the net to show that all modern compilers
automatically dereference any const references (be they arrays,
linked lists, const object functions, etc) before starting the loop.
Another example: function calls. I've heard some people be insistant
that the way to speed up an inner loop is to remove the code from
function calls so that you don't have function call overhead. No!
Again, compilers will do this for you. As compilers were evolving,
they added the "inline" keyword, which does this for you. Eventually,
the compilers got smart enough that they started inlining code on
their own when not specified and not inlining it when coders told
it to be inline if it would be inefficient. Due to coder pressure,
at least one compiler that I read about had an "inline damnit" (or
something to that effect) keyword to force inlining when you're
positive that you know better than the compiler ;)
Once again, the compiler isn't stupid. If an optimization seems
"obvious" to you, odds are pretty good that the compiler will take
care of it. Go for the non-obvious optimizations. Can you remove
a loop from a nested set of loops by changing how you're representing
your data? Can you replace a hack that you made with standard library
code (which tends to be optimized like crazy)? Etc. Don't start
dereferencing variables, removing the code from function calls,
or things like this. The compiler will do this for you.
If possible, work with the compiler to help it. Use "restrict".
Use "const". Give it whatever clues you can. |