-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathweb-vs-c++.fqa
372 lines (303 loc) · 16.1 KB
/
web-vs-c++.fqa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
C++ criticism by other people
{}
This page is a collection of the best C++ criticism by FQA readers, copied
from e-mail messages and online discussions. If you know an interesting consequence
of C++ problems not mentioned in the FQA, please [http://yosefk.com send me e-mail]. Similarly to
[http://yosefk.com/c++fqa/web-vs-fqa.html the FQA errors] page, this one lists things
that can be proved \/ tested rather than qualitative statements. The stuff is published with
credits or anonymously, according to the choice of each author.
The issues listed here (or the FQA itself) are not supposed to be "new" in the sense that
they were never discussed in a published work
(if C++ problems were so hard to discover as to take decades, discussing them wouldn't necessarily be worth the trouble).
There's lots of (well-reasoned or entertaining or both)
C++ criticism on the web, including several pieces by celebrity programmers. However, I made
the decision not to cite famous quotes by celebrities on this site. The main reason
is that I don't want to make the FQA more convincing to the people who mostly value the credentials
of an author and ignore things like facts and reasoning. I want to work less both with C++
/and/ with these people. So I'd rather have them use C++ than convince them to switch to something else.
`<!-- h2toc -->`
`<h2>`Implicit type conversions`</h2>`
/Anonymous:/ I'd add something about the broken type system - how the following code is legal
and compiles without warnings with most compilers, for example:
@
void foo(const std::string &) {}
int main()
{
foo(false);
}
@
Another example:
@
class A { public: int a; };
class B : public A { public: int b; };
int main()
{
A * p = new B[10];
p[5].a = 1;
}
@
Why should a static type system allow this without an explicit cast?
/Yossi:/ The FQA doesn't talk much about implicit type conversions, since the FAQ doesn't. The problem
is quite important though. It wouldn't be so bad if C++ detected run time errors
(as opposed to compiling the second example to code modifying the wrong place), and\/or
if so many C++ programmers didn't think that "with C++, when it compiles and links, it will run correctly"
(I actually heard this one, and then there are many large C++ monolithic applications without unit tests
speaking for themselves).
Note that this has nothing to do with safety and C++ being a "power tool allowing you to do dangerous things"
because it's so "high-performance". This argument only makes sense for /explicit/ casts. What the code
demonstrates is unexpected interactions between pairs of different implicit conversions
(in the first example, |bool -> char* -> std::string|, in the second - |B[] -> B* -> A*|).
By the way, the second example explains
[21.5 the FAQ's remark] about arrays being [6.15 evil] in the context of inheritance and substitutability.
The thing is that with arrays of objects, there's an implicit type conversion that allows you to violate
the substitutability principle without a compile time error. With |std::vector<B>|, there's no implicit
cast. I didn't understand the FAQ was talking about that, because most of the time, when inheritance and
polymorphism are involved, you allocate arrays of /pointers/ to objects. And in that case, there's no
difference between a |vector<B*>| and a |B**| - you'd need an explicit cast in both cases. I automatically thought
the question was the continuation of the discussion in [21.2 preceding questions]
about /why/ the compiler wouldn't do the cast implicitly, and in that context
the "arrays are evil" remark didn't quite fit. I completely forgot about the arrays-of-objects case,
which is something a newbie coming from another language
(and with C++, some people stay "newbies" for years) could very well try to use.
`<h2>`C++ grammar: the type name vs object name issue`</h2>`
/drorz:/ In C\/C++ you can not separate parsing into separate syntax and semantic passes.
No existing compiler does it in two separate passes.
In the example:
@
AA BB(CC);
@
The parse tree is different in the following cases:
`<ul>
<li>When AA and CC are types, BB is a function prototype.</li>
<li>When AA is a type and CC is a variable, then BB is a variable/object initialization.</li>
<li>When AA is a variable, `|AA BB(CC)|` is illegal and its parse tree is entirely different from the first two cases.</li>
</ul>`
You can not (more precisely, no one did it in a real C\/C++ compiler) fix a wrong parse tree in semantic analysis pass.
Consider this example:
@
x * y(z);
@
in two different contexts:
@
int main() {
int x, y(int), z;
x * y(z);
}
@
and
@
int main() {
struct x { x(int) {} } *z;
x * y(z);
}
@
In the first case |x * y(z)| is expression, and in the second case it is a declaration of pointer y.
Parse trees for those cases are completely different.
/Yossi:/ This is the first part of the problem making the C++ grammar undecidable. The second
part of the problem is that |AA| may really be |Template<Params>::InnerDef|, and figuring
out whether |InnerDef| is a type name or an object name is equivalent to solving the halting
problem, since templates may instantiate themselves recursively and in fact represent arbitrary
recursive functions. Maybe I'll expand on this one later. In particular, it has to do with
template specializations, which are discussed in the next item.
Purists who don't like the "nearly context-free" expression in Defective C++: when you write
parsers, it /does/ make sense to discuss the "extent" of your dependence on the context.
For example, C++ inherits the type name\/object name riddle from C. But in C, you can
solve it using a single dictionary of |typedef| names. Of course, theoretically the
important part is that the C grammar is /decidable/ (though not context-free). In practice,
what matters is that it's /easy to parse/. In particular, you can use a parser generator for
context-free grammars (|yacc|\/|bison| is one mature program in this family) with the
simple "symbol table hack" described above, and get a working parser. This is what
"nearly context-free" means.
I think the example is excellent since it took me quite some time to figure out what the code
means myself
(I think it's the asterisk). The |AA BB(CC);| example used in Defective C++ is simpler, but I think it doesn't convey
the point as clearly, since apparently it makes it intuitively easier to counter with something like "you can
solve the ambiguity at the semantical analysis stage". Note that you can /always/ counter with that -
for example, you can say that the "parse tree" of your language is simply the list of characters
in the file, and the rest is semantical analysis.
`<h2>`C++ grammar: type vs object and template specializations`</h2>`
/drorz:/ Consider the following example:
@
#include <cstdio>
template<int n> struct confusing
{
static int q;
};
template<> struct confusing<1>
{
template<int n>
struct q
{
q(int x)
{
printf("Separated syntax and semantics.\\n");
}
operator int () { return 0; }
};
};
char x;
int main()
{
int x = confusing<sizeof(x)>::q < 3 > (2);
return 0;
}
@
If you "didn't care" about semantics during parsing, then |confusing<1>::q| is a typename,
so |confusing<1>::q<3>(2)| creates an object of type |confusing<1>::q<3>| with the argument 2.
If you "do" semantics during the syntax pass, then |confusing<4>| will be looked up,
|confusing<4>::q| is a variable. The declaration would "expand" to |int x = (confusing<4>::q < 3) > 2|.
You can see that parse trees in those cases are completely different, based on the output of the |sizeof| operator!
/Yossi:/ ...and |sizeof| depends on the platform and the implementation details of inheritance
(including multiple and virtual), virtual functions, etc. The "parser" gets closer and closer
to a full-blown compiler.
The problem is the freedom that template specializations have when defining members. Now if anybody
showed me a /useful/ application of the ability to define something as an inner type in one specialization
and a static variable in some other specialization, I'd be surprised.
The [http://programming.reddit.com/info/5z7jr/comments/c02bmog reddit thread]
which this and the previous example are taken from has a detailed discussion about parsing C++.
`<h2>`printf, iostream and internationalization`</h2>`
/Alexander E. Patrakov (patrakov at ums dot usu dot ru):/ The FQA
lists [15.1 valid information] for and against the use of |<iostream>| instead of |<cstdio>|.
There is,
however, one more thing for |<cstdio>| and against |<iostream>|: the
possibility to translate program messages to a different natural
language (using, e.g., |gettext|). And here I don't mean that there is
currently no gettext equivalent for C++ iostreams, but that there is no
way to design such thing correctly.
Translation works on phrases, not on their parts. Consider, e.g., such C
statements:
@
printf("Read %d files\\n", total);
printf("New data were found in %d files\\n", found);
@
With the standard C++ iostreams, this becomes:
@
cout << "Read " << total << " files\\n";
cout << "New data were found in " << found << " files\\n";
@
A well-designed program fetches translations from a message catalog,
Windows resource or anywhere else except its own source code. With C
and gettext message catalogs, the translator sees the whole phrases such as /"Read %d files",
"New data were found
in %d files"/, etc. If the same approach were applied to C++, the
translator would see just /"Read ", "New data were found in "/, and /"files"/ (used twice). Lack of context is the least of all worries. The
real problem is that, e.g., when translating to Russian, the two
instances of " files" have to be translated slightly differently, because
Russian has six grammatical cases and different cases are required
in the two sentences:
@
Read %d files => Прочитано %d файлов
New data were found in %d files => Новые данные были найдены в %d файлах
@
(approximately - I don't want to
overwhelm the example with the singular\/plural treatment)
Even worse, examples exist with two format substitutions where they have
to be reordered when translating. C (or, more precisely, the Single UNIX
Specification) allows such reordering with something like |printf("%2$d x
%1$d inches", width, num);| but in C++ the output order of fields is
hard-coded.
The downside is, of course, that nobody except the translator checks
the translated format string, and wrongly-copied conversion specifiers
can crash a program in the corresponding locale (and this did happen
with sed and vim in the past).
See how Trolltech
[http://doc.trolltech.com/4.3/i18n.html#use-qstring-arg-for-dynamic-text handles] the abovementioned problems in their Qt
toolkit.
/Yossi:/ I really like this example because it can be a real eye-opener for a practical programmer,
and I wish I heard and thought about it several years ago. Clearly the |printf| interface gets
something right that |iostream| doesn't, since it seems to save us lots of trouble. What is it that |printf|
gets right? Could it be that representing the program structure using compile time constructs incomprehensible
to any tool except for the compiler is not the way to go? Effectively the advantage of the |printf| program
is that it's easier for /other programs/ to manipulate. The idea that backfires is that program structure may be encoded
in abitrarily complex ways and the only one who ever has to worry about it is the compiler writer.
But maybe this translation business is a singularity in the computing universe, and we
shouldn't infer general conclusions from it? Well, here's another example. Suppose you want to do real time logging. You don't have
enough time and\/or bandwidth to do the formatting at the target machine. And yet you want to log free
text, not some strict binary format with versioning schemes and fixed size limits and other headaches.
With |printf|-style interface, you can log packets of (for example) 32 bit words - size, constant format string
pointer, and the list of arguments. You can then extract the format strings from the executable file
(reading ELF or COFF files is easy - there are examples on the net of about 200 lines of C code),
and do the formatting at the host machine. Now, with |iostream|-like interface, the format string is split to many
little parts, and all kinds of types come in the middle - types of data items have to be encoded in the logged packets, too.
And you'd have to log calls to I\/O manipulators such as |hex|, |setfill|, etc.
Clearly the overhead per logged data word is going to increase significantly.
Think about it: how can it be that a simplistic "1 format string plus N arguments of dynamic types" interface beats
an advanced "statically dispatched polymorphic operators" interface,
and what makes it surprising to you?
`<h2>`Static binding rules`</h2>`
/Miguel Catalina:/ The following test program does not compile under gcc 4.3.{1,2}:
@
#include <cmath>
struct my_class {
my_class(int) {}
};
inline my_class operator&&(my_class,int){return my_class(0);}
int main(void)
{
double x = std::pow(1.0, 1.0);
(void) x; // to avoid unused variable warning
}
@
The error message is:
@
$ g++ -Wall -pedantic-errors simple_test.cpp
simple_test.cpp: In function ‘int main()’:
simple_test.cpp:11: error: ambiguous overload for `operator&&' in `std::__traitor<std::__is_integer<double>, std::__is_floating<double> >::__value && std::__traitor<std::__is_integer<double>, std::__is_floating<double> >::__value'
simple_test.cpp:11: note: candidates are: operator&&(bool, bool) <built-in>
simple_test.cpp:7: note: my_class operator&&(my_class, int)
@
The reason for this error has to do with this declaration in |cmath|:
@
template<typename _Tp, typename _Up>
inline
typename __gnu_cxx::__promote_2<
typename __gnu_cxx::__enable_if<__is_arithmetic<_Tp>::__value
&& __is_arithmetic<_Up>::__value,
_Tp>::__type, _Up>::__type
pow(_Tp __x, _Up __y)
{
typedef typename __gnu_cxx::__promote_2<_Tp, _Up>::__type __type;
return pow(__type(__x), __type(__y));
}
@
Tracing a few header files up brings us to |bits\/cpp_type_traits.h|:
@
template<typename _Tp>
struct __is_arithmetic
: public __traitor<__is_integer<_Tp>, __is_floating<_Tp> >
{ };
@
...and:
@
template<class _Sp, class _Tp>
struct __traitor
{
enum { __value = bool(_Sp::__value) || bool(_Tp::__value) };
typedef typename __truth_type<__value>::__type __type;
};
@
So it turns out that the operation that is giving us trouble is the &&
inside the |__enable_if| in the template declaration of |pow()|. We are
invoking && with two |enum| operands (|__is_arithmetic<T>::__value| is an
unnamed |enum|). I guess the compiler is treating unnamed |enum|s as
|int|s. So the compiler is
trying to call |operator&&(int,int)|. But there isn't,
there are only |operator&&(my_class,int)| and |operator&&(bool,bool)|. So
the compiler is trying to do an implicit conversion of the operands so
that they can match the available prototypes. There are implicit ways
of converting an |int| to a |my_class|, as well as convering an |int| to a
|bool|. The compiler does not know which one to use, hence the
ambiguity.
The question is: why on Earth when you are trying to invoke a
function that only deals with |double|s, do you have to deal with the
ambiguity between two available implicit conversions for types that
have nothing to do with |double|?
/Yossi:/ Takes time to wrap one's mind around this, um, treason
(don't you just love the |public __traitor| bit? I guess a "traitor"
is something used to generate so-called "type traits", a key idiom
in the world of C++ templates arcana). Now that we (presumably) understand
the error message, how would you work around the problem? If the compiler
would barf trying to dispatch an operator with user-defined types,
we could specifically define the operator with the prototype it would
pick as the best match (as the GNU STL implementors themselves [35.11 do]
in similar situations). But we can't define |operator&&(int,int)|. Now what?