Re: ptr conversions and values



Wojtek Lerch <Wojtek_L@xxxxxxxx> wrote:
> "S.Tobias" <siXtY@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> news:3imb1qFm4ferU1@xxxxxxxxxxxxxxxxx
>> Wojtek Lerch <Wojtek_L@xxxxxxxx> wrote:

>>> What about the difference between "the value of X is one" and "the result
>>> of
>>> converting X to int is one"? You never commented on that one...
>>
>> I didn't quite understand the purpose of that example. When I said
>> each pointer pointed to a byte, I didn't think of conversions.
>
> Well, the only way *I* can think of each pointer pointing to a byte is by
> keeping in mind that the pointer can be converted to a pointer to a
> character type. Without that conversion, a pointer that points to an N-byte
> object points to an N-byte object, not to a 1-byte object.

For me pointers differ in ability to point to things (alignment),
but once a pointer points to one thing, it may point to many things.
For an example, suppose an implementation on which `float' and `int'
have different size and alignment. But once you have an address that is
suitably aligned for both types (eg. malloc) you can store an integer in
there, but keep references to it in a pointer to float throughout the
whole program, and access the value through this pointer appropriately
converted and dereferenced. IOW a pointer doesn't know what object type
it is really pointing to.

>> All objects consist of bytes, and we can always select its first
>> byte (we can establish it in a program by casting to (char*)).
>> The last byte (or any other byte) would not be a good choice, because
>> it wouldn't work for pointers to incomplete types (or for other
>> reasons). Note that I'm not relying on any conversion here, but rather
>> on the fact that we may access any object through a character lvalue.
>
> But accessing an object through a character value relies on pointer
> conversions. If converting to char* were defined as returning a pointer to
> the last byte of the object, we would probably consider the last byte as the
> "natural" way of thinking about what pointers point to. If the conversion
> were not defined at all, neither way would be more natural than the other.

Even in lack of definition for (char*) conversions[*] (some would still
have to be defined), the first byte would still be more natural (more
important) than others, I think. I'm thinking of unions (members aligned
on the first byte); an array expression is converted to a pointer to
its first element; a struct and its first member have the same address
(ie. no padding, but there may be unspecified padding at the other end);
pointers to incomplete array types must indicate the start of the array
rather than the end (or any other part), because the same pointer
might point to different arrays in a union at the same time - both
small and very big; similar argument for the generic pointer type
(void*) and other incomplete pointer types; two pointers point to
the same object if both objects start on the same byte.
I think the core of the C language is biased towards the "first" whatever.

[*] Some of the following assertions probably can't be proved
wihout the fact that conversion to char points to the first byte.
What I presuppose is that conversion to char is removed, while all
the conclusions resulting from it are kept.

>
>> [snip]
>>> What do you think about the following fixed version of my interpretation:
>>>
>>> When two types A and B are said to have the same representation and
>>> alignment requirements, it means that if you have an object whose type is
>>> A
>>> and value is V, then using an lvalue with type B to read from the object
>>> produces the same result as converting V to the unqualified version of B
>>> (and, in particular, is undefined behaviour if the conversion is).
>>
>> It's all right, mostly equivalent with the previous one, representation
>> copying has been avoided.
>>
>> I didn't understand the last remark about undefined behaviour, though.
>> What were you thinking about?
>
> Consider A="struct x*" and B="struct y*". In some cases, converting a valid
> struct x pointer to a struct y pointer is undefined behaviour due to
> alignment requirements. In those cases, reinterpreting the bits of the
> struct x pointer using a struct y* lvalue should be undefined behaviour,
> too.

I just thought this requirement (for undefined behaviour) was not necessary.
I think it's enough to say that this works for all valid values and is
reciprocal.

(As for reinterpretation (of `struct x*' as `struct y*'): reinterpretation
need not work exactly as conversion does: when conversion yields UB,
the reinterpretation might ignore some bits (eg. treat some least
significant bits as padding bits) and yield a valid value).


>> I'll try once again.
>>
> ...
>> Both our models have reinterpretation parts. What we only differ
>> in, is understanding the "equality" part. You say that it is
>> performed though a conversion of one type into the other one, and
>> performing a comparison (not necessarily though "==" operator).
>
> No, I say that the reinterpretation is equivalent to the conversion. If the
> conversion is guaranteed to return some value, the reinterpretation must
> return the same value. If the conversion produces an unspecified value, the
> reinterpretation also produces an unspecified value. If the conversion has
> undefined behaviour, so does the reinterpretation.

I don't know what I did understand wrong, but this is more-or-less
what I had thought.

>> I say that both values (of different types) are just compared (in
>> loose sense: mathematically or conceptually). (I haven't actually
>> formally given my definition, but I hope it can be already inferred
>> from the whole discussion.)
>
> Actually, no. I have no idea how your loose conceptual comparison is
> supposed to work in some corner cases, such as pointers to structures that
> have different alignment requirements.

Types A and B have the same representation if for all object representations
if a reinterpretation in type A gives a valid value, the reinterpretation
in type B also gives a valid value which is equivalent to the first one,
and v.v. The equivalence criterion must be given.
[I have ignored alignment issues, it would need some special handling. ]

IOW: similar values share same sets of bit-patterns. Basically this
is the same as what you write, but we differ in understanding of
what "similar" means.

>If I take a pointer that points to a
> two-byte structure and reinterpret its bits as a pointer to a two-megabyte
> structure, does the resulting pointer point to an object?

Yes, if the object they point to is well-aligned for both types.
Consider an malloc()'ed union containing two struct members (all pointers
to structs have the same representation).

If some values valid for one type were not valid for the other, then
they would not have the same representation. However, we could speak
of the same representation for a subrange of values (that are aligned
for both types).

>> The implicit assumptions in your model are:
>> 1. Values of both types exist.
>> 2. There must be a (single) conversion between the types.
>
> Not really, but there must be some rules about whether the conversion is
> defined or not and what its result is, and I'd expect at least some values
> to be convertible. If you can't convert any values between a pair of types,
> then saying that the two types have the same representation doesn't seem
> very meaningful.

I strongly disagree here. Although there're no conversions between
struct types, I think it would be sensible to say that on some
implementation all initial-layout-compatible structs have the same
representation in the initial part (so as to satisfy requirements of
common initial sequence access in 6.5.2.3; note also that I'm talking
here of partial representation - neither your definition nor mine covers
such case).

For another example, on some implementation pointers might be
implemented as integers, so we could say that the representations
of pointers and `unsigned long' are the same, while there might
not be a meaningful conversion between them.

>> 3. The results can be compared.
>
> No, that's not necessary. In fact, the main reason I like my model is
> because it doesn't involve comparing.

My bad writing. I had in mind that results of reinterpretation in type A
and reinterpretation in type B with conversion to type A are (of course)
comparable. I added this point for symmetry with my assumptions.

>> My implicit assumptions are:
>> 1. Values of both types exist, in a loose sense though (see further).
I think what I mean exactly here is _object_ values.
>> 2. (none)
>> 3. Both types are commensurate, in loose sense.
Ie. there can an equivalence be drawn between different type values.
>>
>> I'll attack 1. and 2. now.
>>
>> 6.2.5#13 says "Each complex type has the same representation
>> and alignment requirements as an array type containing exactly
>> two elements of the corresponding real type; [...]".
>>
>> 1. There are no values of an array type.
>
> Huh? But of course there are values of an array type. They can't be
> results of expressions, but that doesn't mean that they don't exist. They
> are what the bit patterns stored in array objects represent. They're
> members of some structure values. And so on.

There are object values, and rvalues. Conversion to an array type
would imply existence of array rvalues. I believe C is designed in such
a way that array rvalues are banned from the language, although it's
not said so explicitly.

For me this argument is stronger than the second one, because
if you don't have anything to convert to, you can't even invent
your own conversion; it shuts all doors.

>
>> 2. There's no defined conversion between a complex and an array type.
>
> Yes, the part about complex and array types is a problem. The informal way
> it's described in the standard is pretty much guaranteed to break any
> attempt at defining more or less formally what it means for two types to
> have the same representation, without making it an ugly special case.
>
>> (QED)

(See further down for more considerations.)

>>
>> Remarks:
> ...
>> C) My "loose attitude" is both a power and a weakness of my model.
>
> It's a power, but mainly because it's difficult to argue against vague
> handwaving.

It is a little "handwaving", because I really can't find anything
solid in the Standard or elswhere to back up my arguments.

>"Your model" doesn't really add much to what is already
> obvious from the standard: the same bit pattern, interpreted as the two
> types, is meant to produce values that are in some vague way equivalent.
Yes. Same for yours.
> The real question is what way that is, exactly.
And this is the only point where we differ in opinion.
>Saying that it's a "loose"
> and "conceptual" equivalence is not really an answer.
It is contextual. The relation must be given each time.
For the complex-and-array example, the Std says when a complex type
and a corresponding array are to be treated equivalent.
(See also further down.)
>
>> Particularly, I haven't proved there's at most one (unambiguous)
>> conceptual equality criterion between differing types. I hope
>> it could be defined by enumerating it for different types and type
>> categories.
>
> Good luck. There are an almost infinite number of conceivable equality
> criterions between differing types. Almost all are obviously useless or
> silly, but the simple fact that mine differs from yours is a strong hint
> that sometimes more than one may be thought of as reasonable.

Agreed. I begin to believe the Standard is lacking something.


After some thinking I must concede you had a point actually, and
I haven't quite proved what I had wanted to.

All I have proved is that the Standard doesn't assume conversion
by default in all cases, but not that it forbids it. I've found one
counter-example to your model, and in C99 only (I hope we can assume that
C89 is meant to be upwards compatible in this case). It means your model
(your definition) is _generally_ wrong (but not necessarily specially).

As I have said above, the equivalence is contextual. I have claimed
two pointers are equivalent if they point at the same object. But
now I see that another kind of equivalence can be defined for
pointers, ie. that two pointers are considered equivalent if conversion
of one's type to the other's type gives the other's value (this is
basically your model, but for pointers only; and possibly there are
others, too).

I give up here, I can't really find anything to prove that it's wrong.
I begin to think it's a flaw in the Standard that it doesn't specify
explicitly when two pointers are to be considered equivalent wrt sameness
of representation.


Below are a couple of my personal reasons why I don't like your
"equivalence via conversion" model.

If "same representation" meant what you said, then saying "type A
has the same representation as type B" would actually not define
directly the representation of type A wrt B, but rather create a binding
between representation (of A) and conversion (A-B). Either representation
would be subjected to conversion, or conversion to representation;
they could not be independent of each other. I see no reason
for such dependence between concepts from two different worlds (while
"representation" works between bits and values, "conversion" works
only on values and is totally oblivious to existence of objects).

Although C has at most one, conceptually there might exist more
conversions between two types (examples are found in C++).
Binding the representation to to a conversion would have to distinguish
one of them for no obvious reason.

And it looks ugly. :-)


+++

>> [snip]
>>>> "Representation" could be understood as "all properties", including
>>>> representation and alignment.
>>>
>>> Not according to the definition...
>>
>> Right. I didn't want to change the Standard. What I meant to
>> say was that the Standard would probably achieve its goal better
>> if "the representation" were understood as, or replaced with,
>> "all properties"; or it just could say directly that expressions
>> of types A and B are interchangeable in most contexts (where A and
>> B mean void* and char*, ptr to struct/union types, etc.).
>
> But there are a lot of contexts where they're not interchangeable: you can
> dereference char* but not void*, arrays behave differently from complex
> numbers, pointers to different structures produce lvalues with different
> types when dereferneced, and so on. "Most contexts" is way too loose; you'd
> have to specify the exact list.
>
[ I actually wasn't willing to continue this little sub-discussion, but
browsing the Std I found something that seems relevant. ]
In 6.2.5#15 it says `char' shall be defined to have "the same range,
representation, and *behaviour*" as either `un/signed char'. So there
was a way to well describe what I was talking about - similar words
could have been used for `void*' and `char*' types.

--
Stan Tobias
mailx `echo siXtY@xxxxxxxxxxxxxxxxxxxxxxxxxx | sed s/[[:upper:]]//g`
.



Relevant Pages

  • Re: multi dimensional arrays as one dimension array
    ... please - where does the standard say that such a conversion ... Pointer conversion yields a pointer to the same object as ... exist only where there are array declarations. ...
    (comp.lang.c)
  • Re: question about pointer define
    ... p is a pointer to an array of 3 ints. ... Note that we considered this conversion rule in two different ...
    (comp.lang.c)
  • Re: A base conversion~ help me to correct it since it cant run
    ... An array name is a pointer to the first element of the array. ... Specify a length argument along with scanf for string input. ... To print a pointer value use the p conversion ...
    (comp.lang.c)
  • Re: ptr conversions and values
    ... >>> representation is a mapping, ... > Joining them in pairs is called mapping, ... Two pointer values are equivalent if they ... In case of no convenient pointer-integer conversion (on my imagined ...
    (comp.std.c)
  • Re: Wording glitch: sizeof array vs. sizeof (array)
    ... so it's implicitly converted to a pointer. ... That type is an array type, ... the unparenthesized expression "array" is subject to the conversion, ... because it's not an operand of sizeof. ...
    (comp.std.c)