Hi all, I wrote down some thoughts on colorless async. See below! Iām not experienced in use of go-routines etc; I write from a language design perspective and, yeah so; I might be totally wrong/have mis-conceptualized something. Iām in a process of learning to think about this stuff and would be delighted to have feedback/think together about the topic of designing async into a language!
Colorless async seems obviously great, but what if the opposite is also true?
I read this blog post What Color is Your Function?? on ways to integrate async into a language. Itās an entertaining and convincing read by a super experienced developer. The article leads you through a chain of very obvious thoughts, that you cannot imagine even wanting to contradict; ending up with the understanding that the best solution is to enable using sync and async functions interchangeably, e.g. by supporting go-routines.
I was totally convinced. But then I thought, what if you go all the way in the other direction, embracing that sync
and async
are so fundamentally different in their semantics. What it that difference was present on the language level, in stead of being hidden by magic?
This has been done many decades ago and with superb usability results, in the form of state machines and most notably statecharts. āRedā (sync) functions are transitions. āBlueā (async) functions are states. Thinking of them and visualizing them as distinctly different is a game-changer for the human mind: A top-down understanding can be conveyed to non-programmers and programmers alike, etc. In the typical visualization, sync stuff is not āred functionsā, they are arrows. async stuff is not āblue functionsā, they are rectangles. Arrows and Rectangles composes perfectly, because of their differences. Itās hard to see whatās gained conceptually by merging arrows and rectangles into one thing.
Ray Imber "Isn't cpu scheduling a state machine by definition?" The point isn't that using stacks avoids state machines, it's that they do that work for you.
It's like automatic garbage collection. The point isn't that memory management isn't done, it's that it's done for you.
Writing your own state machines instead of using stacks is like programming with GOTOs instead of using functions. It's technically possible, and in theory may be more efficient, but there are reasons why it's no longer considered good practice.
It's like automatic garbage collection. The point isn't that memory management isn't done, it's that it's done for you.
I don't think it's the difference between manual memory management and GC, it's the difference between reference counting and GC. Both pre-emptive multitasking and co-routines are automatic scheduling algorithms. Neither requires you to build the state machine yourself. The difference is how you influence the scheduling. Pre-emptive schedulers don't take any influence from your program. Whereas coroutines provide "yield" points that can be seen as hints to the scheduler to be more in line with the needs of your program.
is like programming with GOTOs instead of using functions
I don't think that's accurate at all. Are you saying all state machines can be represented as coroutines? This duality seems much more like the Church-Turing duality of lambda calculus vs. Turing machines. It's theoretically important but not practical by itself.
I'm also not convinced your argument is true if that's the case. Coroutines deal with a particular class of state machines (those dealing with scheduling), they are not appropriate for all state machines.
Are you saying that all state machines are better represented as cooroutines? If that's the case, then this is very clearly subjective. That is an argument similar to the difference between functional and imperative programming. There is no evidence that one is universally better than another. Some things are better represented functionally, and some better procedurally.
Anecdotally, I dislike the GOTO argument. It misses too much nuance, in the same way that "functional programming is always better" is a useless statement. Both Knuth and Dijkstra himself walked back the "GOTO considered harmful" statements.
Donald E. Knuth:> I believe that by presenting such a view I am not in fact disagreeing sharply with Dijkstra's ideas, since he recently wrote the following: "Please don't fall into the trap of believing that I am terribly dogmatical about [the go to statement]. > I have the uncomfortable feeling that others are making a religion out of it, as if the conceptual problems of programming could be solved by a single trick, by a simple form of coding discipline!
https://pic.plover.com/knuth-GOTO.pdf
Programming abstractions should be ladder that you can both walk up or down depending on the engineering needs of the particular problem. Dogma has no place in engineering imo. To be fair, we are all human, so that is more of an aspiration than the reality.
coroutines allow us to avoid the complexity of state machines (by using stacks) without making in impossible to write correct code (as preemption does).
...
This is unlike multiple threads preemptively writing on the same memory at the same time, which is an environment where even the world's top experts have been shown to be unable to write correct code.
Sorry, I don't buy it. Async lets you write more efficient code around IO scheduling. It solves a particular problem, that is completely orthogonal to correctness or memory safety. Coroutines are about "when", not "where".
Corountines do not solve multithreaded memory safety. Async and Parallelism are not the same thing. You can still have race conditions with coroutines (Go has had plenty of bugs showing this). Multiple Coroutines can still be scheduled on different CPU cores and write to the same memory. You need something like the Rust borrow checker to solve that problem. Not coroutines.
Coroutines also have the advantage (depending on how they are implemented) of having far smaller (and extendable) stacks than preemptive threads, which allows them to scale to several orders of magnitude more concurrency than preemptive/OS threads.
This is the problem coroutines solve. Coroutines deal with optimization, not correctness.
Ray Imber I'm saying that in most every case in which single stack asnyc is used (say, in Javascript) the proper (with i/o wrappers and a schedule) use of coroutines (i.e. multi-stack async) would result significantly easier to write/read/debug/maintain code (as well as less code). The larger and more complex the app, the larger this difference will tend to be.
I think this exists already: it's called Node.js, and, afaik, it did not make large complex apps easier to write/read/debug/maintain.
From what I've seen, adding richer type systems with things like typescript or one of the myriad of other languages that use javascript as a compile target have had better success on those fronts.
Node.js has async, yield, and iterators. That's the required building blocks for coroutines. See the many coroutine libraries on npm for examples. All of this is based on CSP Theory from Hoare. Just varying levels of abstraction.
https://medium.com/@adambene/async-await-vs-coroutines-vs-promises-eaedee4e0829
Most Lisps implements coroutines the same way btw: async, yield, and iterators
Look. You can do some cool stuff with async and coroutines. But it's just another tool in the toolbox imo. In fact, in my experience, show that async and coroutines make debugging more difficult not less. Coroutines expose more things for the programmer to think about, not less.
Standard sequential architecture hides all scheduling from the programmer. The OS or runtime pre-empts your code at arbitrary points. It's invisible to you. You don't think about it. It's like Virtual Memory. Virtual Memory creates the illusion that you have infinite RAM (and the OS in the background pages memory in and out of your process without your program having any knowledge of it at all.)
But, there are downsides. A database really needs control over when things are paged in and out of memory. But I don't want all programs to do this. It makes life harder on the programmer. I want the option to pick the right tool for the job. For databases Virtual memory is the wrong tool, for some other application it's the right one.
Pre-emptive scheduling is the same idea. You write your code as if there was a single CPU core and your program is the only thing running. Then the OS will sneakily pause your program and let some other program run, you being non the wiser. This is a great abstraction!
Similarly, coroutines explicitly expose the scheduling points to the user. Preemptive scheduling has the disadvantage of only being able to proceed as a fast as it's slowest I/O call. If you are building a web server, preemptive scheduling is the wrong tool. You want to process a lot of I/O and not have have to wait for the slowest one, you really want something like coroutines.
But! now you have many possible orders of execution that you have to keep track of when debugging. The illusion of a single linear order of execution is gone. You have added more states to your DFA, not less.
I have some experience in this area. I am actually a big fan of coroutines! I spent a year working with the async team working to add coroutines and CSP to the Nim language. I read a lot of research on this stuff, and I have experience with the inner workings of both the language transforms and the schedulers. CSP has some cool properties. I'm not denying that. It certainly isn't some magic paradigm shifting solution to modern software architecture though.
Here is an example where (formal) CSP works really well. I would like to point out that it is used as an explicit tool to model state machines, explicitly modeling all the states, not hiding them:
https://www.reaktor.com/blog/why-csp-matters-ii-how-do-i-know-sync-works/
Another really cool use of CSP principles is delimited continuations, as seen in Racket: https://docs.racket-lang.org/more/index.html#(part._.Continuations)
It's a way to do AJAX style stateful request/response, using only a single handler. It combines server routing and I/O scheduling of the network call together. IMO this results in much more readable and clean code. Iirc Hacker News uses this trick, still to this day. I wish more modern languages and servers supported the technique.
Did you read the article linked in the original post for this thread? ( https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ ) I think it explains JS's async/await is not the same as coroutines.
I have read the article many times. Did you read my responses? Let me be more clear. JS doesn't have coroutines, but it has the building blocks to make coroutines. Ok, so that isn't at the language runtime level, but it amounts to the same thing.
From the article:
This is where the āred functions can only be called by red functionsā rule comes from. You have to closurify the entire callstack all the way back to >
main()
> or the event handler.
What's the difference between "closurify the entire callstack" and "reified callstacks"?
An implementation detail, that's what. They are semantically equivalent.
You implement a coroutine scheduler using a trampoline at the main and a set of execution stacks.
Go is the language that does this most beautifully in my opinion. As soon as you do any IO operation, it just parks that goroutine and resumes any other ones that arenāt blocked on IO.
You know that "goroutines" are just a fancy closure right?
When you write the go program "read("file.txt")", it's essentially tranformed into "await read("file.txt")"
There is a difference between what is possible and what is practical. For example, if I try to use some open source JS code, will everything be written in async/await? Or will I have to rewrite it and potentially maintain the async/await version will future updates of the module? What about all of its dependencies? Do I update and maintain all of those too? While these barriers are theoretically surmountable given unbounded time and resources, they are practically non-starters for most programmers. Coroutines would remove these barriers.
If you want an example of something closer to Go, see my original Zig example at the top of the the thread. It works very similarly to Go but without the magic that hides it from you.
From the zig docs:
The point here is that the >
amain
> function, which is the demo of typical async/await usage, works in both an async context and blocking context. The programmer was able to express the inherent parallelism of the logic, without resorting to > function coloring> .
There is admittedly a bit of boilerplate in the example. > Here's the tracking issue for that> .
Now for the related > Standard Library> updates:
This introduces the concept of "IO mode" which is configurable by the > Root Source File> (e.g. next to > pub
>
> fn
>
> main
> ). Applications can put this in their root source file:
I personally think the Zig solution is much better than the Go solution.
here is a difference between what is possible and what is practical. For example, if I try to use some open source JS code, will everything be written in async/await? Or will I have to rewrite it and potentially maintain the async/await version will future updates of the module? What about all of its dependencies? Do I update and maintain all of those too?
That's a good argument, though as much of a political one as a technical one. It's more practical from a technical standpoint than you might think though. It's possible at the compiler level to use program transforms. i.e. lisp style macros (what Nim and lisp do) or compiler passes (what Zig and Go do) to automate a lot of that, at least from the point of the standard library. Just make the standard library I/O calls async, and automatically transform the callers into delimited continuation closures. Admittedly that is fairly invasive and not foolproof (depending on how the caller code was written).
"It's more practical from a technical standpoint than you might think though..." If you write and maintain it, I would love to use (and advocate for) it - assuming it worked well with debuggers and didn't wreck performance. IMO this is the greatest weakness of JS at the moment.
You are right about JS. There would have to be a lot of changes there. It's most practical at the language level, where you can uniformly enforce the transforms. In terms of performance, reification of closures is something that can be heavily optimized. (I think V8 does a lot for this already, but I don't remember all the details). Debugging is another story. I have yet to see good debugger support for concurrent code in any language. That's the main reason I think concurrent code is not a silver bullet. If you had a solution for that, I would love to use and advocate for it!
You could run some "macro expander" program on top of your JS code and all dependencies. I guess during the "bundler" build step in the javascript world. I think something like this is what Vue.js does for their component language?
But at that point it's not JS anymore is it? You are basically building a new language that compiles to JS... Are DSLs that get macro expanded in lisp still lisp? š
I've been out of the JS world a long time, looks like React already added coroutines (specialized for react components): https://blog.logrocket.com/deep-dive-react-fiber/