Re: Operator overloading in C
- From: jacob navia <jacob@xxxxxxxxxxxxxxxx>
- Date: Wed, 05 Sep 2007 18:03:19 +0200
Here is the first part of the pdf document. I think it is easier if I post the document here in several messages.
More will follow.
New directions in C
1:Motivation
1.1: The current situation
The development of the C++ language has had an adverse effect in
the development of C. Since C++ was designed as the better C, C was
(and is) presented as the example of how not to do things, even
if both languages retained a large common base.
All development of C as an independent language has
been neglected and C has been relegated to the past.
The need for a simple and efficient language persists however,
and C is the language of choice for many systems running today
and many new ones. However, programming in C is made more difficult
than it should be because of some glaring deficiencies like its buggy
string library, the lack of a common library for the most used
containers like lists, stacks, and other popular data structures.
Since C++ went out to be "the better C", it is important to avoid
reintroducing the whole complexities of C++ into C and keep the
language as simple as it should be, but not simpler than the
minimum necessary to use it without much pain.
Of course the idea of improving C is doomed according to the C++
people, that obviously will say that the solution is not to
improve C but to come over to C++.
This same failure forecast is found by some C people, that see
any change to their baby as the beginning of the end of the spirit of C.
This situation is exacerbated by the C standards comitee, that
after the bad reception of the new C99 standard has gone into
deep sleep, making any changes or improvements to the standard
language impossible since the earliest date for any standard
publication is setup at 2020.
C is now effectively frozen.
1.2:The most important changes needed.
This proposal seeks to address the following problems as
they appear in our day to day programming.
o The lack of a counted string data structure, that would
give programmers an alternative to the inefficient and
extremely error prone zero terminated strings.
o The need to add new numerical types to the language.
o The need for an abstract containers library that would
allow to improve the developing of portable programs by
providing simple data structures like lists, flexible
arrays, stacks, and others, without any of the
complexity of the STL of C++.
Lets see this points in more detail.
A counted string data structure
-------------------------------
C has traditionally defined the string data type as a
sequence of bytes ending with a binary zero byte. This
is an extremely inefficient way of storing strings since
the length of the string, one of the most used
information about the string, must be recalculated at
each access. This implies in most cases a function call
to strlen(), or a loop. In case of long strings this
makes C less efficient than a BASIC interpreter written in C#.
This will be recognized by most C programmers, and the
existence of countless string libraries testifies to this fact.
The problem of those libraries is that they are not
standard and must be ported from one application to the
next, and from one operating system to the next.
What is needed is language support for counted strings.
Another requirement is that the syntax for accessing strings
and the whole library should be as similar to the traditional
C syntax as possible. The goal must be that the code
running now should be ported with minimal effort to the
new environment, so that the user can switch back to zero
terminated strings easily if it must run in an environment
where the library is absent.
The need for new numeric types
------------------------------
Currently, there are several proposals for new types of
numbers pending with the standards comitee.
o Extension for the programming language C to support
decimal floating-point arithmetic. Document ISO/IEC TR 24732.
o Fixed point arithmetic proposal, in ISO/IEC DTR 18037.
The standard doesnt provide any means of providing extended
precision integer and floating point numbers. Bignum integer
packages abound, but since there is no language support,
their usage is very cumbersome.
All those different types of numbers and more can be
integrated into the language with an uniform method,
that provides a standard way of doing this changes,
and doesnt divide the language in subsets, where we
would have one subset with fixed point arithmetic
syntax, and another with decimal arithmetic,
and many implementations with none.
The need for an abstract containers library
-------------------------------------------
In the current state, C doesnt have any language support
(in the form of a standard library) for any data structure
like stacks/lists/flexible arrays/ and many others. What
is needed is a standard way of accessing those commonly
used data structures using an uniform syntax, what would
allow programs that use them to be portable. The situation
today is similar to the situation with character strings,
where there are a lot of libraries, but all incompatible
with each other. What is needed is to keep those libraries
but to standardize a syntax that would allow user programs
to be portable.
The proposed changes
--------------------
All proposed changes have no impact in the existing language,
and can be used when necessary without having any impact
in the performance of the already existing code.
This document then, proposes the development of several enhancements
to the language, mostly compatible with their C++ counterparts.
The main aim is to make C programming easier, more secure and
more flexible than it is now. Each addition is justified by
the improvements it brings, and possible uses and mis-uses
are discussed.
All this developments are implemented using the lcc-win32
compiler system. These are not just proposals but a reference
implementation exists, and it is widely distributed since several years.
The main propositions developed here are:
o Operator overloading
o Garbage Collection
o Generic functions
o Default function arguments
o References
All this propositions have as a goal increasing the level of abstraction
used by C programmers without unduly increasing the complexity of
the implementation. All this enhancements have added only about
2 000 lines of code to the original code of the lcc-win32 compiler.
This is extremely small, and proves that apparently difficult
extensions can be inserted into an existing compiler without
any code bloat.
Each enhancement can be viewed separately, but their strength
is only visible when they all work together.
The first part of this documents details the specifications for
the proposed changes, the second describes applications for them
in the form of a string library and a container library.
There you see how these enhancements work, and how they could
be used to implement a good standard library for C.
Operator overloading
--------------------
Operator overloading allows the user to define its own functions
for performing the basic operations of the languge for user
defined data types.
Motivation
Many languages today accept operator overloading. Among them
Ada, C++, C#, D, Delphi, Perl, Python, Visual Basic,.Ruby,
Smalltalk, Eiffel.
The purpose of this enhancement within the context of the
C language is to:
o Allow the user to define new types of numbers or numeric objects.
o Allow a generic access to containers by allowing the user to
define special array like access to containers using
the overloaded [ and ] operators.
1) Many applications need to define special kinds of numbers.
Rational arithmetic, big numbers, extended precision floating
point come immediately to mind, and there are surely many
others.
For instance the Technical Report 24732 of the ISO/IEC
proposes a new kind of decimal floating point, the
Technical Report DTR 18037 proposes fixed point operations, etc.
All of them propose changes to the language in the
form of new keywords. A conceptually simpler solution is
to allow a single change that would accommodate all those
needs without making C impossible to follow by adding
a new keyword for every kind of number that users may need.
2) Everybody will agree that the usage of arrays in C is
peculiar and quite difficult to use. Allowing users to
define new kinds of array access permits to integrate many
needs like bounds checking within the language without adding
any special new syntax. There are several propositions
about bounded strings circulating in the standardization
committee, and they propose several different enhancements
to the existing library, mainly by the addition of several
parameters to the string functions to pass the length of
the receiving strings. This is a misguided approach since
it still leaves too much work to the programmer that should
still take care of following the size of each string he/she
uses in the program without ever making a mistake.
This is asking for trouble.
Obviously counting string lengths is better done by
machines. The length should be a quantity associated to the
string and managed at runtime by the routines using those
strings. This would be, by the way, much more efficient than
searching the terminating zero each and every time a string
is used.
Still, it is needed to retain the original array-like syntax
for this strings or bounded buffers/arrays. It is an intuitive
syntax, in use almost in all programming languages and it will
allow an easier transition of existing code. Then, we need
an overloading of the operator index [ ].
These are the main objectives of this syntax change. Note that
it is not in the design objectives to replace normal
procedures like string concatenation with an overloaded
add operator, nor to replace formatted output with the
shift operators. It is obvious too, that once this syntax is
in use, such bad applications can be programmed and it is
impossible to do anything about them.
Syntax:
result-type operator symbol ( arguments )
Result-type is the type of the operator result.
symbol is one of the operator symbols
(+ - * / << [, etc. Explained in detail later)
An exception to the above rule are the pre-increment
and pre-decrement operators, that are written:
result-type ++operator ( argument )
result-type --operator ( argument )
This enhancement doesn't use any new keywords. The C99
standard explicitly forbids new keywords, and this has
been respected. It remains to be seen if really an
operator keyword is needed. As implemented in the
reference implementation it is still possible to write:
int operator = 67;
without any problems.
The rules for using the operator identifier are as follows:
o It must appear at the global level.
o It must be preceded by a type name.
o It must be followed by one of the operator symbols,
and then an opening parentheses, an argument list that
can't be empty and can't be longer than 2 arguments,
followed by a closing parenthesis.
If it is followed by a ; it is a prototype for an operator
defined elsewhere. All rules applying to prototypes apply
equally to this prototype.
If it is followed by an opening brace it is the beginning
of an operator definition. All rules that apply to function
definition apply also here.
Note that all this rules are no longer needed if the standard
accepts a new "operator" keyword.
The operators that can be overloaded are:
Operator Symbols Symbol Name Description and prototype
+ plus Type operator+(Type arg1, Type arg2);
The arguments arent necessarily of the same type. Pointers can't be used for arg1 or arg2.
- minus Type operator-(Type arg1, Type arg2);
The arguments can't be pointers.
- unary_minus Type operator-(Type arg1);
* multiply Type operator*(Type arg1,Type arg2);
/ divide Type operator/(Type arg1,Type arg2);
== equal int operator==(Type arg1,Type arg2);
The arguments can't be pointers and the result type must be an integer
!= notEqual int operator!=(Type arg1,Type arg2);
The arguments can't be pointers, and the result type must be an integer.
++ Post-Increment Type operator++(Type arg1);
The argument can't be a pointer.
-- Post-decrement Type operator--(Type arg1);
The argument can't be a pointer
++ Pre-Increment Type ++operator(Type arg1);
The argument can't be a pointer.
-- Post-Decrement Type --operatorType arg1);
The argument can't be a pointer.
< less int operator<(Type arg1, Type arg2);
The arguments can't be a pointers. Result type is int.
<= lessequal Type operator<=(Type arg1,Type arg2);
The arguments can't be pointers. Result type is int.
>= greaterequal Type operator>=(Type arg1,Type arg2);
The arguments cant be pointers. Result type integer.
! logicalNot int operator!(Type arg1);
~ not Type operator~(Type arg1);
% mod Type operator%(Type arg1,Type arg2);
<< leftshift Type operator<<(Type arg1,Type arg2);
>> rightshift Type operator>>(Type arg1,Type arg2);
= asgn Type operator=(Type arg1,Type arg2);
^ xor Type operator^(Type arg1,Type arg2);
& and Type operator&(Type arg1,Type arg2);
| or Type operator|(Type arg1,Type arg2);
[] index Type operator[](Type arg1,Type arg2);
[]= indexasgn Type operator[]=(Type arg1,Type arg2);
+= plusasgn Type operator+=(Type &arg1, Type arg2);
-= minusasgn Type operator-=(Type &arg1, Type arg2);
*= multasgn Type operator*=(Type &arg1, Type arg2);
/= divasgn Type operator/=(Type &arg1, Type arg2);
<<= lshasgn Type operator<<=(Type &arg1, Type arg2);
>>= rshasgn Type operator>>=(Type &arg1, Type arg2);
() cast Type operator()(Type arg1);
The argument can't be a pointer.
* indirection Type operator*(Type arg1);
The argument can't be a pointer.
Rules for the arguments
-----------------------
At least one of the arguments for the overloaded operators
must be a user defined type. Pointers are accepted only
when the operator has no standard C counterpart for
operations with pointers. Pointer multiplication is not
allowed in standard C, so an overloaded multiplication
operator that takes two pointers is not ambiguous.
Addition of pointer and integer is well defined in C,
so an operator add that would take a pointer and an
integer would introduce an ambiguity in the language,
and it is therefore not allowed.
The same for pointer subtraction.
The result type of an operator can be any type, except
for the equality and the other comparison operators
that always return an integer.
The number of arguments are fixed for each operator
(as described in the table above). It is not possible
to change this and define ternary operators that would
make two multiplications, for instance.
When an operator needs to modify its argument (for instance
the assignment operator, or the += operator) and can't
take pointers, it should take a reference to the object
to be modified.
This ties somehow this enhancement to the second one
described further down, references. This is optional
of course.
Overloaded operators cant have default arguments.
Name resolution
Name resolution is the process of selecting the right
operator from a list of possible candidates.
There can be only one operator that applies to a given
combination of input arguments, i.e. to a given signature.
If at the end of the name resolution procedure more than
one overloaded operator is found a fatal diagnostic is
issued, and no object file is generated.
Step one: Compare the input arguments for the list of
overloaded functions for this operator without any
promotion at all. If at the end of this operation one
and only one match is found return success.
Step two: Compare input arguments ignoring any signed/unsigned
differences. If in the implementation sizeof(int) == sizeof(long)
consider long and int as equivalent types. The same if in
the implementation sizeof(int) == sizeof(short).
Consider the enum type as equivalent to the int type.
If at the end of this operation one and only one match is
found return success.
Step three: Compare input arguments ignoring all differences
between numeric arguments. If at the end of this operation only
one match is found return success.
Step four: If the operation is one of the comparisons
operators (equal, not equal, less, less-equal greater, greater-equal)
invert the operation and try to find a match. If it is found invert
the order of the arguments for less, less-equal greater-equal
greater, or, call the not operator for equal and not equal. It is assumed that:
Assumed operator equivalences
Operator Equivalent
equals ! different
different ! equals
less invert arguments: less(a,b) is equivalent to greater-equal(b,a)
less-equal invert arguments less-equal(a,b) is equivalent to greater(b,a)
greater-equal invert arguments: greater-equal(a,b) is equivalent to less(b,a)
greater invert arguments: greater(a,b) is equivalent to less-equal(b,a)
Step five: Return failure.
Differences to C++
------------------
In the C++ language, you can redefine the operators
&& (and) || (or) and , (comma). You cannot do this in C.
The reasons are very simple.
In C (as in C++), logical expressions within conditional
contexts are evaluated from left to right. If, in the
context of the AND operator, the first expression
returns a FALSE value, the others will NOT be evaluated.
This means that once the truth or falsehood of an
expression has been determined, evaluation of the
expression ceases, even if some parts of the
expression haven't yet been examined.
Now, if a user wanted to redefine the operator AND or
the operator OR, the compiler would have to generate a
function call to the user-defined function, giving it
all the arguments of BOTH expressions. To make the
function call, the compiler would have to evaluate
them both, before passing them to the redefined operator &&.
Consequence: all expressions would be evaluated and
expressions that rely on the normal behavior of C
would not work.
The same reasoning can be applied to the operator OR.
It evaluates all expressions, but stops at the first that returns TRUE.
A similar problem appears with the comma operator, which
evaluates in sequence all the expressions separated
by the comma(s), and returns as the value of the expression
the last result evaluated. When passing the arguments to
the overloaded function, however, there is no guarantee
that the order of evaluation will be from left to right.
The C standard does not specify the order for evaluating
function arguments. Therefore, this would not work.
Another difference with C++ is that here you can redefine
the operator []=, i.e., the assignment to an array is a
different operation than the reference of an array member.
The reason is simple: the C language always distinguishes
between the operator + and the operator +=, the operator *
is different from the operator *=, etc. There is no reason
why the operator [] should be any different.
This simple fact allows you to do things that are quite
impossible for C++ programmers: You can easily distinguish
between the assignment and the reference of an array, i.e.,
you can specialize the operation for each usage.
In C++ doing this implies creating a proxy object, i.e.,
a stand-by construct that senses when the program uses it
for writing or reading and acts accordingly. This proxy must
be defined, created, etc., and it has to redefine all
operators to be able to function. In addition, this
highly complex solution is not guaranteed to work!
The proxies have subtle different behaviors in many
situations because they are not the objects they stand for.
.
- Follow-Ups:
- Re: Operator overloading in C
- From: André Gillibert
- Re: Operator overloading in C
- From: Douglas A. Gwyn
- Re: Operator overloading in C
- References:
- Operator overloading in C
- From: jacob navia
- Operator overloading in C
- Prev by Date: Re: Operator overloading in C
- Next by Date: Re: Operator overloading in C
- Previous by thread: Re: Operator overloading in C
- Next by thread: Re: Operator overloading in C
- Index(es):
Relevant Pages
|