For example, out of choice, specifications such as Python's often want to leave open the possibility of an implementation caching certain objects, since that can make programs faster (even though users can then observe seemingly surprising behaviour, such as in the example in the post you're referencing).
As an example of "impractical" - albeit at the far end of things! - one can often tell which implementation one is running on by timing parts of a program at run-time. A specification could remove that way of observing differences by mandating that every operation takes a precise amount of time to run -- but that would leave us disappointed when we buy a faster computer and our programs run at the speed as before!
Finally, as we've looked more and more to programming languages to squeeze out extra performance, language implementers have increasingly started to examine language specifications with lawyer-like eyes, to find flexibility where none might have been intended. Simplifying grossly, since writing specifications (in natural or formal languages) is hard, there are nearly always unintended gaps: clever language implementers make use of those unintended gaps to make programs run faster while still (technically) adhering to the specification.
I think BF is a great choice for illustrating interpretation in general, but poorly illustrates the advantages of bytecoding, as it's operators are already single characters.
How to distinguish between a compiler and an interpreter? One way is to look at the time it takes. A compiler should run in time roughly proportional to the size of the source, whereas the runtime of an interpreter is determined by how much work the source program has to do for a given input. For example, suppose the source program reads a number and then has a loop whose trip count is this number. The time it takes to compile this program will be independent of the number, in contrast to the time it takes to interpret the program. The CPU, in effect, is an interpreter of machine code (although many CPUs actually translate fragments of the program from the advertised instruction set to an internal one which is more efficiently interpreted in hardware).
I spent some time agonizing over these topics for the class on VMs I taught some years ago (www.wolczko.com/CS294). Then I concluded that a key characteristic of interpretation is that, at some granularity (both in program space and in time), an interpreter is context-free -- it does not consider the context of the larger program or execution history when it is executing a given language construct. In contrast a compiler usually "knows" about the context at least up to the level of the surrounding function/method, and maybe a lot more than that. Some people may disagree with that claim, but, I suspect we can all agree that coming up with a crisp definition that distinguishes interpretation from compilation is surprisingly difficult!