As you already have mentioned, "enabling portability across hardware that was much more diverse than today's" was a key point in early language engineering. It was so much important that the guys (I guess there was no woman, unfortunately) who worked on standardizing ALGOL come up with a concept later was called Wijngaarden two-level grammars to get rid of the non-formal and therefore ambiguous nature of programming language specifications done in a natural human language. Wijngaarden grammars are two level grammars and ALGOL68 was the first language which has been described using this formalism.
This formalism introduced a parametrization of the non-terminals in the BNF rules which are already in use since ALGOL60 to specify the language syntax in a formal way. Hereby, the formalism is based on 2 semantic principles. First, the parameters are described by a second rule system (the first was the BNF based rule system for the syntax). Therefore the name: two level grammars. Second, the principle of consistent replacement, which means that 2 parameters with the same name has to be treated/forced to be equal across a grammar rule. Both grammars together define the syntax and the semantic of a language unambiguously, if the target language is semantically well defined (e.g. an abstract machine instruction language)! This is because at the end, the Wjingaarden two level grammars define a mapping, from the source to the target language.
Unfortunately, the Wijngaarden formalism was not processable like the BNF grammar for the syntax, means, no evaluator could be generated. A later computer scientist (Watt) come up with a not too much restricted pragmatic formalism which is processable but has a similar expression power as Wijngaarden grammars. Watt called it Extended Affix Grammars (EAG). A closed calculus comparing to the open calculus introduced by Knuth as Attribute Grammars. (Disclaimer: All this information I have from my computer science tutorial resources. I did not attend the events, was before my time. :-))
The language department at TU Berlin was doing research on EAG since 1980, probably. I took lectures there around 1995 and specified a Pascal-similar language (Pascal w/o arrays) which compiles to P-code language (by Wirth). It took probably 20 A4 pages for the whole compiler. This was so impressive to me that I joined the re-implementation the EAG compiler generator at this department in 1997. A good comprehensive description of this approach you may find at https://github.com/linkrope/gamma/blob/assets/doc/epsilon-red-series.adoc if you are interested (a non-professional translation from German to English).
This is also where the current re-implementation of the EAG compiler generator in D language lives: https://github.com/linkrope/gamma We originally implemented it in Oberon-2 for the Oberon system and then it got translated to D.
So, nevertheless that I agree to your main statement, with the EAG formalism there is a least the chance to get language specifications being done more accurate and formal reproducible and verifiable when targeting a well defined abstract or well known concrete machine. Especially for very specific applications where formal verification and future portability is important and foreseeable this may be a valuable solution.
However, the specification of the language failed to include conditional compilation, didn't mention a standard library which for basic math and string functions must always exist, and didn't specify the hardware data types, so different compilers ended up with annoying variations, some calling a byte "BYTE" and others "CARD8", which makes code compiler specific, which is annoying in a small community which is trying to share code.
But more disappointing was the ISO committee which ruined the Modula2 language. They spent so many years going back and forth with their meetings and suggestions, that by the time they published their 3 inch think $200 in paperback form ISO document, the vendors had evolved their products beyond the standard, and the decisions of the committee didn't match exactly what the vendors had built, so the specification was disconnected from reality.
Committees of non-committed parties move too slowly. So really it should have been a quick summit of the teams building compilers, to iron out petty differences for the betterment of the language overall, so that the vendors could compete on features/price/performance, and let the language thrive.
The spec was done in the Vienna style; very ponderous and hard to read. I think overall the ISO spec as thorough as it was, was a major net negative to the language.
Common Lisp is a language with several ideas of equality, and this strictest equality is well-defined and named EQ. The types that will and won't work with EQ are specified, and there's a less strict equality named EQL that works consistently for more types, and so on with EQUAL and EQUALP. There's also = for numbers, CHAR= for characters, STRING= for strings, and so on to some degree. The important part is their well-defined nature.
Ada's another language with a proper specification. A quality such as order of evaluation in an expression isn't necessarily defined, but it can be:
A and B -- The order of evaluation is unspecified.
A and then B -- The order of evaluation is A followed by B.
Unlike the C language, Ada has very little such unspecified behaviour, none of it easily encountered. I can't, offhand, recall anything in Ada that comes close to the asinine behaviour of the C language. Instead, Ada has the concept of bounded errors, which more or less specify what can go wrong when, say, reading an uninitialized variable or raising an exception in very specific circumstances or the like.
The claim is often made that the C language has these disgusting behaviours in the name of speed, and yet it's clear that a fast, broken program is worthless. It's not a mechanism of portability at all, except for the compiler author who may inflict the pains which he should've taken on everyone else. Ada specifies exceptions to be raised when array bounds are violated or when signed integers are made too large or too small, and this is the correct behaviour; not only can they be disabled for critical program regions, the machine will remove them when proven unnecessary, which is a better failure case than in the C language, where the machine can't insert these checks when needed. These operations are expensive only to anyone penny wise and pound foolish.
For many years, the presence of undefined behaviour in the specification didn't really cause problems.
This is a lie. The lack of array bounds checking alone has always been an issue, since the language's creation. It has absolutely naught to do with optimizations.
However, before multicore processors started entering the mainstream (in the years immediately after 2000) there was little need to specify these aspects at all [10], and so no programming language that I know of did so.
Ada has had a concurrency mechanism, or whatever we wish to call it, since its creation in 1983; they're called tasks; Ada 1995 added protected objects as a passive form of concurrency or whatever.
Although I've slightly lost track of the current state of such matters, I believe that there are still edge cases that mainstream language specifications do not adequately cover.
That's only because they're defined by those with no business defining a programming language.
I would like to add (for the sake of completeness :D) a curious case of a modern "language project" that's (much) more mainstream compared to Scheme and Standard ML and yet features formal specification for both syntax and semantics: Web Assembly. :)
Granted, it's mostly a specification of a VM, but it does specify a textual format, that's even a bit higher-level than JVM/CLR byte-code. Besides, JMM is also specified on the level of VM, not "Java-the-language" really. Anyway, Wasm position in-between a language and a VM makes it only more curious to consider in my view. :)
This is a very powerful tool for finding bugs in compilers, IF you can effectively automatically generate well-defined programs. Millions and millions of weird test cases for little manual effort, and you can even automatically minimize the test inputs when a bug is detected. The big obstacle Csmith faced in doing this on C was generating programs that avoided all the umpteen different undefined behaviors in C.
Re [6], you could probably remove the undefined behaviour by simply casting the pointers to integers before comparison (using uintptr_t as the destination type would be my suggestion). The standard notes that:
"Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type."
So there is still theoretically room for undefined behaviour, but really only if pointers are larger than the chosen integer type.