• 0

[C,C++] Subtle obscure differences


Question

Sometimes when I am doing low-level development I run into interesting incompatibilities between C and C++. The type of thing you may have known at one time, but that you eventually forget unless you know the specification of both languages inside-out.

 

One such case happened to me today with code similar to the following:

// foo.h:
int RAM[10000];

// foo.c:
#include "foo.h"
int main() {
    return (unsigned long long)RAM;
}

// bar.c
#include "foo.h"
int bar() {
    return (unsigned long long)RAM;
}

This particular code is valid in C, but not valid in C++ if you link both foo.c and bar.c into the same binary. Why? Because there is a subtle difference with how C and C++ treat uninitialized global symbols (RAM[] in this case). In C, the symbols are merged into one and become a single symbol (instance of a variable) (emitted using common linkage: see here). However, in C++ there is no such thing as common linkage. If you declare the same variable twice, regardless of the circumstances, it is seen as a two separate variables that conflict. So in the latter case you will see the following error:

/tmp/ccK4Aa4B.o:(.bss+0x0): multiple definition of `RAM'
/tmp/cc8u2dAO.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status

Of course, you can get around the limitation in C++ by doing the following instead:

// foo.h:
extern int RAM[10000]; //modified

// foo.c:
#include "foo.h"
int RAM[10000]; //added
int main() {
    return (unsigned long long)RAM;
}

// bar.c
#include "foo.h"
int bar() {
    return (unsigned long long)RAM;
}

The interesting part is some of the implications for C programmers. Suppose for example, you borrowed a database implementation that employed common linkage & you just happened to accidentally clobber over the name in your own code. In this case, you would have a subtle silent bug on your hand. The variable would be shared between your code and the database and you might never know! Here's another example shows some interesting things that occur in these cases:

// baz.c:
#include <stdio.h>
int test; //4 byte declaration.
unsigned long long qux();

int main()
{
    printf("wrong size returned in baz: %d\n", sizeof(test)); //Oops!
    qux();
    printf("set value in baz: %d\n", test);
    return test;
}


// qux.c:
#include <stdio.h>
unsigned long long test; //8 byte declaration.
unsigned long long test;//you can redeclare without error.


unsigned long long qux() {
    test = 0xFF;
    printf("correct size in qux: %d\n", sizeof(test));
    printf("set value in qux: %d\n", test);
}

Output:

wrong size returned in baz: 4
correct size in qux: 8
set value in qux: 255
set value in baz: 255

There are a few interesting things to note: (1) the variable is declared twice in qux.c without error, (2) is declared once with a different sized type in baz.c, (3) has actually been merged and is eight bytes large. Yet the wrong size will be printed in baz.c and the correct size only in qux.c. So it is silent even with incompatible types. The final thing to note is that even if you tried to enable verbose warnings in the compiler, you still won't see this.

  • Like 3
Link to comment
Share on other sites

Recommended Posts

  • 0

^ I have no idea what you are talking about, and I'm mostly bored

 

BUT

 

You are worthy of my respect and praise

:laugh:

that was Aheer.R.S. ways to say:

cool-story-bro.jpg

  • Like 2
Link to comment
Share on other sites

  • 0

^ I could have titled the thread obscure quirks between C/C++. Basically, I'm just highlighting differences in how C and C++ handles global variables in certain cases and some potentially screwy behaviors that result from C's way of doing it.

Link to comment
Share on other sites

  • 0

Thanks for sharing, you reinforce my resolve never to program in C++ ever again. :)

I decided a while ago that all the time I spent learning the intricacies and incoherences of this underspecified amateurish patchwork of ideas was better spent writing actually useful code in a language that generally makes sense (which turns out to be most languages out there except for C++ and a few others). I care about performance but my own sanity comes first.

 

That said if you happen to work with a platform where only C++ makes sense (and I know there are many), then you have all my sympathy.

  • Like 3
Link to comment
Share on other sites

  • 0

Thanks for sharing, you reinforce my resolve never to program in C++ ever again. :)

I decided a while ago that all the time I spent learning the intricacies and incoherences of this underspecified amateurish patchwork of ideas was better spent writing actually useful code in a language that generally makes sense (which turns out to be most languages out there except for C++ and a few others). I care about performance but my own sanity comes first.

 

That said if you happen to work with a platform where only C++ makes sense (and I know there are many), then you have all my sympathy.

Well to be fair, I'm working in a platform where only C works normally... so even worse. But, for the moment I'm jumping to C++ for the STL (and only the STL, no actual OOP). Hence, me (re-)finding the above nonsensical differences  :)

 

Sadly, I wanted the nonsensical C behavior in this case because I wanted to save a few lines of code (i.e. not having to declare externs and such)  :laugh:

Link to comment
Share on other sites

  • 0

^ I could have titled the thread obscure quirks between C/C++. Basically, I'm just highlighting differences in how C and C++ handles global variables in certain cases and some potentially screwy behaviors that result from C's way of doing it.

A6jSmoN.jpg

  • Like 1
Link to comment
Share on other sites

  • 0

lol that's mean

 

Big bully, I'm telling... :p

 

lol i'm just being funny, not trying to be offensive and i'm pretty sure that Myles would understand that; in fact most of the times that I'm talking to my wife about technology or some very interesting topics like some DNS zones that were really messed up or why some exquisite update made that particular server crawl to his knees she's the one in the picture saying that to me, so yeah i understand that.

  • Like 2
Link to comment
Share on other sites

  • 0

^ I'll let you guys into a little off topic secret, my middle name is Myles. I couldn't fit my full name with the nick so I chopped off my first name, Aaron.  ;)

  • Like 2
Link to comment
Share on other sites

  • 0

^ I'll let you guys into a little off topic secret, my middle name is Myles. I couldn't fit my full name with the nick so I chopped off my first name, Aaron.  ;)

 

okay Myles.

  • Like 2
Link to comment
Share on other sites

  • 0

Sometimes when I am doing low-level development I run into interesting incompatibilities between C and C++.

Not to be pedantic, but C and C++ aren't low level. That distinction is reserved for assembler. At best they could be labelled as medium/high. After all they both have the same constructs as almost every other high level language. The only thing setting them apart is memory management, which I don't see as low level because the details are hidden behind malloc and new.

 

The interesting part is some of the implications for C programmers. Suppose for example, you borrowed a database implementation that employed common linkage & you just happened to accidentally clobber over the name in your own code. In this case, you would have a subtle silent bug on your hand. The variable would be shared between your code and the database and you might never know!

That's what static is for:

#define PRIVATE static;
PRIVATE int module_level_global;  
To restrict global variables to module level.

Globals are generally a bad idea to begin with except in specific cases. It's almost always better to have a module encapsulate it and provide functions which manipulate it.

Link to comment
Share on other sites

  • 0

Not to be pedantic, but C and C++ aren't low level. That distinction is reserved for assembler. 

The creator of the C language disagrees with you:

 

C is a relatively "low-level'' language. This characterization is not pejorative; it simply means that
deals with the same sort of objects that most computers do, namely characters, numbers, and addresses
These may be combined and moved about with the arithmetic and logical operators implemented by real 
machines. 
 
C provides no operations to deal directly with composite objects such as character strings, sets, lists or 
arrays. There are no operations that manipulate an entire array or string, although structures may be 
copied as a unit. The language does not define any storage allocation facility other than static definition 
and the stack discipline provided by the local variables of functions; there is no heap or garbage 
collection. Finally, C itself provides no input/output facilities; there are no READ or WRITE statements, 
and no built-in file access methods. All of these higher-level mechanisms must be provided by explicitly 
called functions. Most C implementations have included a reasonably standard collection of such 
functions. 
 
Similarly, C offers only straightforward, single-thread control flow: tests, loops, grouping, and 
subprograms, but not multiprogramming, parallel operations, synchronization, or coroutines
 

http://net.pku.edu.cn/~course/cs101/2008/resource/The_C_Programming_Language.pdf

 

Link to comment
Share on other sites

  • 0

The creator of the C language disagrees with you:

He didn't explicitly state it was low level. When he says 'relatively', he means compared to very high level languages:

C is a general-purpose programming language with features economy of expression, modern flow control

and data structures, and a rich set of operators. C is not a ``very high level'' language, nor a ``big'' one,

and is not specialized to any particular area of application.

Very high level programming languages are often domain specific, something C certainly isn't.

Link to comment
Share on other sites

  • 0

Not to be pedantic, but C and C++ aren't low level. That distinction is reserved for assembler. At best they could be labelled as medium/high. After all they both have the same constructs as almost every other high level language. The only thing setting them apart is memory management, which I don't see as low level because the details are hidden behind malloc and new.

You are being very pedantic, I didn't go into it my initial post, but the work I'm doing is related to runtime development in the context of a future exscale-architecture (hence the lack of C++ support in said architecture). Runtime here means something that is running bare-bones on the system (replaces OS functionality). You can think of it has an OS that doesn't do time sharing, without preemption, without protection, and that lacks standard library support (glibc, newlib, etc.). This sort of thing means that I am working directly with hardware (simulated) and even inline assembly when needed. Strictly speaking the development effort also hasn't precluded hacking additional architecture and other features into the simulation framework (e.g. temperature modeling, performance monitoring capabilities) or modifying the compiler and linker at times (due to various bugs in code generation because of linker script and compiler errors). 

 

If you want to call that non-low level you'd be off the mark honestly. Whether C is low level really depends on what you are actually doing with it. Would anyone realistically conclude that the Linux Kernel is high-level development because the majority of the work is done in C or that LLVM/Clang is high-level development because it is done in C++? You wouldn't because architecture is involved at that point in both cases. Or to say that another way, the intersection of hardware/architecture is when something starts being low-level.

 

 

That's what static is for:

#define PRIVATE static;
PRIVATE int module_level_global;  
To restrict global variables to module level.

Globals are generally a bad idea to begin with except in specific cases. It's almost always better to have a module encapsulate it and provide functions which manipulate it.

 

Let's complicate the example I gave a bit to make it less contrived then. Suppose, you have database1.c, and database2.c. You need the same variable instance in both database1.c and database2.c. In this situation, you would actually want to employ the common linkage property of C (you could also do this in a different way to avoid it altogether).

 

Sure, globals are bad idea in many cases, but they can be very useful in some scenarios. They do exist for a reason. And, maybe common linkage isn't best practice in terms of isolation, but merging symbols is the purpose of common linkage. Now, if you are arguing that it is useless because there are better alternatives (in terms of isolation), that's neither here nor there. The language did add the feature for a reason (probably so you could declare variables in headers and lazily avoid redeclarations -- I mean that is why I use it) and it is something that exist at the compiler level outside of C entirely (as the .comm directive). C++ chose to deviate and not allow it likely for the potential silent isolation problems.

Link to comment
Share on other sites

  • 0

He didn't explicitly state it was low level. 

Yes he did, and I provided and highlighted the quote! He then immediately details what he means by "low-level" which is not "not being a very high-level language", and I also provided the quote and highlighted relevant statements (such that the language maps directly to the things computers work with, or doesn't provide direct operations for working with collections, threads, etc.). The statement you quote, about C not being a very high-level language, is not even part of the same section of the book, so you cannot interpret it as an explanation of what he meant by C being low-level, and exclude the explanation he provided directly after making that statement. This would simply be incorrect reading comprehension.

  • Like 1
Link to comment
Share on other sites

  • 0

I didn't go into it my initial post, but the work I'm doing is related to runtime development in the context of a future exscale-architecture (hence the lack of C++ support in said architecture). Runtime here means something that is running bare-bones on the system (replaces OS functionality). You can think of it has an OS that doesn't do time sharing, without preemption, without protection, and that lacks standard library support (glibc, newlib, etc.). This sort of thing means that I am working directly with hardware (simulated) and even inline assembly when needed.

So basically embedded development :D You're still calling into a micro kernel or whatever its running. The C language itself isn't low level, that was my point. Assembly most definitely is because each mnemonic corresponds to a CPU instruction (usually).

That's not to disparage what you're doing. I'm a fan of C myself. I just think a clear line must be drawn between assembly and C in terms of lowlevelness. C has the same constructs as other high level languages like Java, C#, Vala, Go, Perl, etc. I just don't see why it should be labelled as low level because it uses memory addresses (pointers) and has no garbage collection.

A low level language doesn't lend itself well to general purposeness. That's why assembly is employed sparingly.

 

Let's complicate the example I gave a bit to make it less contrived then. Suppose, you have database1.c, and database2.c. You need the same variable instance in both database1.c and database2.c. In this situation, you would actually want to employ the common linkage property of C (you could also do this in a different way to avoid it altogether).

Then why use one at all? Globals lead to messy and unmaintainable code.

Instead, why not pass a value to a function in database2.c. I do that 99% of the time. Have each module manage its own internal data. It provides the same encapsulation as a class in C++ and prevents ownership issues.

 

Sure, globals are bad idea in many cases, but they can be very useful in some scenarios. They do exist for a reason. And, maybe common linkage isn't best practice in terms of isolation, but merging symbols is the purpose of common linkage. Now, if you are arguing that it is useless because there are better alternatives (in terms of isolation), that's neither here nor there. The language did add the feature for a reason (probably so you could declare variables in headers and lazily avoid redeclarations -- I mean that is why I use it) and it is something that exist at the compiler level outside of C entirely (as the .comm directive). C++ chose to deviate and not allow it likely for the potential silent isolation problems.

For a start, you should be using inclusion guards in the header file so it won't get defined twice. Secondly, it makes logical sense that only one symbol exists. The compiler's associative map should see that the symbol is already defined. I wouldn't do it that way anyway. I'd create a common header file with a declaration (extern) in it.
Link to comment
Share on other sites

  • 0

C has the same constructs as other high level languages like Java, C#, Vala, Go, Perl, etc.

It does not, and analogous constructs between these languages do not have the same semantics.

 

A C struct, for instance, is a memory layout: at this offset you have 4 bytes representing an integer, at this offset you have an inline array of 16x1-byte characters, etc. A Java or C# object makes no guarantee of any particular layout in memory; where or how it is represented by the machine is abstracted away from the programmer. A Java or C# object include plenty of things that the programmer need not even know about, such as a virtual table pointer, a sync root, etc.

 

C is low-level because everything in C closely maps to machine-level representations: any particular C statement predictably generates a small number of assembly instructions. A "high-level" language is one where constructs map to more abstract, mathematical concepts and the actual machine-level translation may be extremely different.

 

C also lacks most of the constructs of languages considered "high level", such as support for object-oriented programming, generic programming, functional programming, parallel programming, etc.

 

As the K&R C book explains, this is not a pejorative thing to say: it's simply a convenient term to use to denote that C maps closely to traditional computer architectures and never strays far from them.

Link to comment
Share on other sites

  • 0

Yes he did, and I provided and highlighted the quote! He then immediately details what he means by "low-level" which is not "not being a very high-level language"

Well if C is low level, then what is assembly? I think we can agree to disagree on that. I've done assembly and C and I know which one is most unlike any other high level language - hint it's not C ;)
Link to comment
Share on other sites

  • 0

Let's complicate the example I gave a bit to make it less contrived then. Suppose, you have database1.c, and database2.c. You need the same variable instance in both database1.c and database2.c. In this situation, you would actually want to employ the common linkage property of C (you could also do this in a different way to avoid it altogether).

 

Sure, globals are bad idea in many cases, but they can be very useful in some scenarios. They do exist for a reason. And, maybe common linkage isn't best practice in terms of isolation, but merging symbols is the purpose of common linkage. Now, if you are arguing that it is useless because there are better alternatives (in terms of isolation), that's neither here nor there. The language did add the feature for a reason (probably so you could declare variables in headers and lazily avoid redeclarations -- I mean that is why I use it) and it is something that exist at the compiler level outside of C entirely (as the .comm directive). C++ chose to deviate and not allow it likely for the potential silent isolation problems.

 

The MS linker supports COMDAT folding via the selectany directive. I thought GCC had picked that up too but I'm not sure.

Link to comment
Share on other sites

  • 0

Well if C is low level, then what is assembly? I think we can agree to disagree on that. I've done assembly and C and I know which one is most unlike any other high level language - hint it's not C ;)

 

C is generally considered to be a low level language. That doesn't mean "lowest" - and clearly these terms are relative. Obviously assembly is lower level than C. But in today's spectrum of programming languages, C is very much toward the "lower" end in this regard.

 

C++ is considered a somewhat higher level language than C, but it's still a "systems language" and considerably lower-level than something like Java, C#, or JavaScript. Now, some would say that C and C++ are high level systems languages. But yeah, that's getting pretty pedantic :-)

Link to comment
Share on other sites

  • 0

Then why use one at all? Globals lead to messy and unmaintainable code. 

Instead, why not pass a value to a function in database2.c. I do that 99% of the time. Have each module manage its own internal data. It provides the same encapsulation as a class in C++ and prevents ownership issues.

 

In general, that is a better model. But there are perfectly good uses for globals (particularly for something like a globally shared cache, for example). They also help save space for large constants like CLSIDs/IIDs and other GUIDs.

 

For a start, you should be using inclusion guards in the header file so it won't get defined twice. Secondly, it makes logical sense that only one symbol exists. The compiler's associative map should see that the symbol is already defined. I wouldn't do it that way anyway. I'd create a common header file with a declaration (extern) in it.

 

That's not how inclusion guards work. They (i.e. "pragma once") prevent a header from being included multiple times in a given compilation unit. But database1.c and database2.c are clearly separate compilation units. Thus why this is a linker issue, not a compiler issue. You cannot solve it via compilation directives and inclusion guards.

Link to comment
Share on other sites

  • 0

It does not, and analogous constructs between these languages do not have the same semantics.

Conditionals, Loops, Operators, Functions, standard integral datatypes, complex datatypes, and almost identical syntax. I'd say C and other high level languages are very similar.

 

A C struct, for instance, is a memory layout: at this offset you have 4 bytes representing an integer, at this offset you have an inline array of 16x1-byte characters, etc. A Java or C# object makes no guarantee of any particular layout in memory;

There's no guarantee of of memory layout in C unless you pack the structs with compiler specific preprocessor macros. Regardless, C structs are similar to Classes in other languages. They are templates from which objects are created. I see them as a class precursor.

where or how it is represented by the machine is abstracted away from the programmer. A Java or C# object include plenty of things that the programmer need not even know about, such as a virtual table pointer, a sync root, etc.

By the same token, a C programmer need not concern himself with the memory layout of a structure except in very specific circumstances. One can just assign the member variables values.

 

C is low-level because everything in C closely maps to machine-level representations; any particular C statement predictably generates a small number of assembly instructions.

Let's see what Wikipedia has to say on the subject:

In computer science, a low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture. Generally this refers to either machine code or assembly language. The word "low" refers to the small or nonexistent amount of abstraction between the language and machine language; because of this, low-level languages are sometimes described as being "close to the hardware."

I think the complexity of modern C compilers and how much optimisation they do attests to the fact that they are very abstract.

C also lacks most of the constructs of languages considered "high level", such as support for object-oriented programming, generic programming, functional programming, parallel programming, etc.

OO isn't a prerequisite for high level programming, but C is quite capable via libraries like GLib. Functional and parallel programming are domain specific.

As the K&R C book explains, this is not a pejorative thing to say: it's simply a convenient term to use to denote that C maps closely to traditional computer architectures and never strays far from them.

K&R's C was written decades ago. C compilers have advanced significantly since then. The fact that C can produce very fast and efficient code is more down to the fact that it's a very simple language with a small number of terse constructs. There isn't the bloat that you get from C++ or C#.
Link to comment
Share on other sites

  • 0

Conditionals, Loops, Operators, Functions, standard integral datatypes, complex datatypes, and almost identical syntax. I'd say C and other high level languages are very similar.

That's like saying assembly and C++ are very similar because they both support comparisons, jumps/gotos, loops, and so on.

 

C has syntactical similarities to the plethora of C-derived high level languages. But that's a tautology.

 

There's no guarantee of of memory layout in C unless you pack the structs with compiler specific preprocessor macros. Regardless, C structs are similar to Classes in other languages. They are templates from which objects are created. I see them as a class precursor.

That's not correct. Structs in C are not objects and are nothing like (non-POD) C++ classes and structs. They aren't templates for any definition of that word. They're type definitions, yes, but they map directly to structs in ASM. C structs *ARE* just a map of a memory layout. C99 explicitly requires that fields are ordered as declared (and that's always been the case as far as I know). Yes, fields are padded to maintain alignment as necessary, though you can control this. In general, C (and C++) programmers will pay attention to the layouts of their structs to ensure efficient packing (often as simple as grouping fields of same size type together).

 

By the same token, a C programmer need not concern himself with the memory layout of a structure except in very specific circumstances. One can just assign the member variables values.

I think you've inverted this. In simple cases, yes, but in general a C developer needs to be conscious of the size of struct fields and the struct's layout.

 

Let's see what Wikipedia has to say on the subject:

I think the complexity of modern C compilers and how much optimisation they do attests to the fact that they are very abstract.

OO isn't a prerequisite for high level programming, but C is quite capable via libraries like GLib. Functional and parallel programming are domain specific.

K&R's C was written decades ago. C compilers have advanced significantly since then. The fact that C can produce very fast and efficient code is more down to the fact that it's a very simple language with a small number of terse constructs. There isn't the bloat that you get from C++ or C#.

The fact that libraries can make C scale and be treated as a somewhat higher level language, does not change the fact that C itself is a relatively low-level one.

Link to comment
Share on other sites

  • 0

In general, that is a better model. But there are perfectly good uses for globals (particularly for something like a globally shared cache, for example).

I wouldn't use them for a shared cache either. I'd create a module to manage it and create accessor functions.

 

That's not how inclusion guards work. They (i.e. "pragma once") prevent a header from being included multiple times in a given compilation unit. But database1.c and database2.c are clearly separate compilation units. Thus why this is a linker issue, not a compiler issue. You cannot solve it via compilation directives and inclusion guards.

I'm aware how inclusion guards work. I'm talking about Macro inclusion guards, not Microsoft compiler specific directives. It wouldn't solve his C++ problem, but it should be standard behaviour when including header files. I also stated earlier that the commonly included header file should declare the variable (extern), not define it. I would do this in C regardless of whether the compiler complains because it's simply good practise.

I still haven't heard a good reason for his global variable though.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.