Summary by Dan Luu on the question about whether for statically typed languages, objective advantages (like having measurably fewer bugs, or solving problems in measurably less time) can be shown.
If I think about this, authors of statically typed languages in general at their beginning might not even have claimed that they have such advantages. Originally, the objective advantage was that for computers like a PDP11 - which had initially only 4 K of memory and a 16-bit adress space - was that something like C or Pascal compilers could run on them at all, and even later C programs were much faster than Lisp programs of that time. At that time, it was also considered an attribute of the programming language whether code was compiled to machine instructions or interpreted.
Todays, with JIT compilation like in Java and the best implementation of Common Lisp like SBCL being at a stone’s throw of the performance of Java programs, this distinction is not so much relevant any more.
Further, opinions might have been biased by comparing C to memory-safe languages, in other words, when there were perceived actual productivity gains, the causes might have been confused.
The thing which seems more or less firm ground is that the less lines of code you need to write to cover a requirement, the fewer bugs it will have. So more concise/expressive languages do have an advantage.
There are people which have looked at all the program samples in the above linked benchmark game and have compared run-time performamce and size of the source code. This leads to interesting and sometimes really unintuitive insights - there are in fact large differences between code sizes for the same task between programming languages, and a couple of different languages like Scala, JavaScript, Racket(PLT Scheme) and Lua come out quite well for the ratio of size and performance.
But given all this, how can one assess productivity, or the time to get from definition of a task to a working program, at all?
And the same kind of questions arise for testing. Most people would agree nowadays that automated tests are worth their effort, that they improve quality / shorten the time to get something working / lead to fewer bugs. (A modern version of the Joel Test might have automated testing included, but, spoiler: >!Joel’s list does not contain it.!<)
Testing in small units also interacts positively with a “pure”, side-effect-free, or ‘functional’ programming style… with the caveat perhaps that this style might push complex I/O functions of a program to its periphery.
It feels more solid to have a complex program covered by tests, yes, but how can this be confirmed in an objective way? And if it can, for which kind of software is this valid? Are the same methodologies adequate for web programming as for industrial embedded devices or a text editor?


Well, you can conclude anything using your reasoning, but that does give the high degree of certainty that is sought after in the studies reviewed in the article.
Again, I’m not saying that I don’t believe static type checkers are beneficial, I’m just saying we cannot say that for sure.
It’s like saying seat belts improve crash fatality rates. The claim seems plausible and you can be a paramedic to see the effects of seat belts first-hand and form a strong opinion on the matter. But still, we need studies to inspect the impact under scrutiny. We need studies in controlled environments to control for things like driver speed and exact crash scenarios, we need open studies to confirm what we expect really is happening on a larger scale.
Same holds for static type checkers. We are paramedics, who see that we should all be wearing seat belts of type annotations. But it might be that we are some subset of programmers dealing with problems that benefit from static type checking much more than average programmer. Or there might be some other hidden variable, that we cannot see, because we only see results of code we personally write.
No I disagree. There are some things that it’s really infeasible to use the scientific method for. You simply can’t do an experiment for everything.
A good example is UBI. You can’t do a proper experiment for it because that would involve finding two similar countries and making one use UBI for at least 100 years. Totally impossible.
But that doesn’t mean you just give up and say “well then we can’t know anything at all about it”.
Or closer to programming: are comments a good idea, or should programming languages not support comments? Pretty obvious answer right? Where’s the scientific study?
Was default case fallthrough a mistake? Obviously yes. Did anyone ever do a study on it? No.
You don’t always need a scientific study to know things to a reasonable certainty and often you can’t do that.
That said I did see one really good study that shows Typescript catches about 15% of JavaScript bugs. So we don’t have nothing.
Just because we cannot prove something, doesn’t mean that we can treat strong claims the same way as proven hypnosis. If we cannot prove that UBI is overall beneficial, we just cannot believe it with the same certainty that we would if we had a bunch of studys on our side.
Look, I’m not saying that we have nothing - I’m just saying that what we have are educated guesses, not proven facts. Maybe “open question” was too strong of a term.
Yeah I agree. Scientific studies are usually a higher standard of proof. (Though they can also be wrong - remember “power poses”?) So it’s more like we’re 80% sure instead of 90%.
The problem with typing is that it does cost time to apply it - the stronger the typing, the more time. And at some point, the extra time spent to get the typing working will inhibit other things that would improve quality, such as testing, or writing detailed, clear specifications. For example, if stronger typing is really always better and Rust is better than C++ or Go, then why isn’t it better to write instead everything in Haskell, which has a stronger type system than Rust? Or F# instead of Java? And why not annotate everything that involves threads in TLA+ even if it is known to be very time-consuming?
Though I agree that there is a counter-argument which is related to the conditions of modern commercial software development: Anything that improves quality does cost a bit of time. In much of commercial software development, there is a lot of time pressure, so there is a strong incentive to cut all these corners and produce the fastest thing that somehow works.
Now, with languages like Rust, the code would not compile and work without the typing being right. So that’s kind of a protection or dam against that pressure. You can say: “Boss, alright, we need that as soon as possible - just wait a minute, I need to make it compile” And perhaps you can say: “See, our competitors use Rust and they ship faster”.
Because that’s very far from the only difference between Haskell and Rust. It’s other things that make Haskell a worse choice than Rust most of the time.
You are right in that it’s a spectrum from dynamically typed to simple static types (something like Java) to fancy static types (Haskell) then dependent types (Idris) and finally full on formal verification (Lean). And I agree that at some point it can become not worth the effort. But that point is pretty clearly after and mainstream statically typed language (Rust, Go, Typescript, Dart, Swift, Python, etc).
In those languages and time you spend adding static types is easily paid back in not writing tests, debugging, writing docs, searching code, screwing up refactoring. Static types in these languages are a time saver overall.
I really would like to believe that. And, as I said, I use Rust for larger, more complex programs with high requirements to performance.
But again, the statement in the second cited sentence - what proof does exist that this is true? I get that you can spend time and effort and get better correctness. I also get that better ergononics - like the far better error messages which the Rust compiler has compared to a C++ compiler - can possibly offset or perhaps even outweight the extra effort. And I think this probably increases with a programs size, complexity, and life time.
But to say it is always less effort if you use Rust or Idris, for any kind of program - I think that, even when subtracting the amount of becoming familiar with it, well this is a strong statement, and I’d like to see some evidence for it.
The evidence is that I have tried writing Python/JavaScript with/without type hints and the difference was so stark that there’s really no doubt in my mind.
You can say “well I don’t believe you”… in which case I’d encourage you to try it yourself (using a proper IDE and use Pyright; not Mypy)… But you can equally say “well I don’t believe you” to scientific studies so it’s not fundamentally different. There are plenty of scientific studies I don’t believe and didn’t believe (e.g. power poses).