"Are there any languages with transactions as a first-class concept?" reddit.com/r/ProgrammingLanguages/comments/10gylhm/are_there_any_languages_with_transactions_as_a
Would be interested to hear the thoughts of folks here on this thread.
"SQL" is not the answer the poster wanted, but arguably the correct one. :D
Transactions do figure into my thoughts about language design, and OS/HCI design in general, but I haven't clarified it enough to talk much about it. I'm also very interested in the community's thoughts here.
The "predict the effect of this code without running this code" aspect of some of the examples runs afoul of the Halting Problem in the general case. đ€·ââïž
For me 2022 was:
Explanations below.
There is nothing ânewâ here. I believe that our crop of programming languages subtly discourages certain kinds of thoughts. You can do these things with our programming languages, but, you donât bother .
[I wrote this on Jan 1/2023. Then I promptly got sick and found new ways to procrastinate. I will gladly remove this if it is inappropriate or too long...]
0D is a short-hand for the phrase zero dependency . Total decoupling.
Programmers already know how to write 0D code, but, they tangle this simple concept up with other concepts and call the result âparallelismâ.
At a very, very basic level, you can achieve 0D by using FIFOs instead of LIFOs (queues vs stacks). LIFOs - callstacks - are good for expressing synchronous code. LIFOs are less-good for expressing asynchronous code.
Programmers often conflate nested, recursive functions with the notion of pipelines. If a component sends itself a message, the message is queued up in FIFO order and there is a âdelayâ before the message is processed, whereas if a component recursively calls itself, the function parameters are pushed onto a stack and the processing happens immediately, in LIFO order. This subtle difference in processing sequence manifests itself in design differences. For example, in electronics - where all components are asynchronous by default - you often see the use of ânegative feedbackâ, say in Op-Amp designs. You rarely see this technique used in software design. In electronics, negative feedback is used by components to self-regulate, whereas in software, recursion is used as a form of divide and conquer. Feedback loops make it possible to be explicit about software design, whereas recursion hides the key element - the callstack - of the design.
EEs had this issue sussed out, before the advent of the âeverything must be synchronizedâ mentality.
All components in an electronic circuit are asynchronous by default. Synchrony is judiciously, explicitly designed-in through the use of protocols . Synchrony is not designed-in everywhere by default and is explicitly designed in on an as needed basis. There is a reason - a subtle reason - why it is easy to draw diagrams of computer networks and not-so-easy to draw diagrams of synchronous code.
In EE designs, concurrency is so cheap that you canât help but use it. In software, concurrency implies difficulty and designers end up avoiding concurrency in their designs.
This subtle difference has a trickle-down effect to end-user code. When it is difficult to draw diagrams of programs and to snap components together, programmers tend not to provide such features to end-users. Or, when they provide such features, they implement such features under duress. If DaS and snappable components were abundantly available, such features would naturally leak through to end-user apps.
0D can be implemented a lot more efficiently than by using operating system processes and IPCs. Most modern programming languages support closures (anonymous functions) and make it easy to build queue data structures. Stick one queue at the front of a closure - the âinput queueâ - and one queue at the tail of a closure - the âoutput queueâ - and, you get 0D. Then, you need to write a wrapper component that routes âmessagesâ from the output queue of one closure to the input queue of another closure. Can this concept be generalized? This ainât rocket science.
When you build 0D software components, does the order-of-operation of components matter? Nope. Can a 0D component create more than one result during its operation? Yep. Can a 0D component directly refer to another 0D component? Nope. The best you can do is to compose networks of 0D components inside of routing wrappers.
It would be nice to build up solutions using pipelines of many little solutions and syntaxes made expressly for those solutions.
What do you need to be able to do this?
1) You need to be able to write grammars that are very, very small and that allow you toâignoreâ bits of syntax that donât pertain to a problem, e.g. kind-of like REGEX, but, better.
2) Total isolation of building blocks.
Ohm-JS - a derivative of PEG (Parsing Expression Grammars) - makes it possible to write grammars that skip over uninteresting bits of text.
For example, if you want to write a quickie parser for C code, you might want to say:
... function-name (...) {...}
In Ohm-JS, you can say this, whereas in a CFG-based parser generator you need to over-specify all of the niggly bits of C syntax. In Ohm-JS, this results in a few minutes of work and only a few lines of code. The Ohm-Editor assists in developing the micro-grammar.
In YACC and CFG-based approaches, though, youâre looking at a gargantuan job (days, weeks, months, ...) and you simply donât bother to write such a quickie parser. You either donât bother with the whole idea, or you use something like REGEX which fails on a number of edge-cases for this kind of thing. REGEX canât search recursively for matching brackets, Ohm-JS can. Using REGEX, you might get away with a partial solution, or, the project might grow larger as you hit unexpected speed bumps. You either persevere or you just give up.
For the record, the grammar plus the accompanying code fabricator specification for the above simple example are shown in the appendix.
When you can build totally isolated building blocks, you can draw sensible diagrams of how the building blocks should be snapped together to solve a problem.
Later, you can steal (cut/copy/paste) chunks of previous solutions and use them as building blocks for new problems.
DaS: Diagrams as Syntax.
DaS is not diagrams as an Art Form. DaS is diagrams as programming languages. For example, instead of writing {...}
, you draw a rectangle.
Programming languages were created by culling the English language and by choosing only the words and phrases that could be compiled to executable code.
Can we cull diagrams in the same way to invent new programming languages?
EEâs have done this and they call the resulting diagrams âschematicsâ.
Building construction engineers have done this and call the resulting diagrams âblueprintsâ.
âCode Librariesâ look like building blocks, but, contain subtle bits of coupling that discourage building-block-iness.
For example, the very common idiom of a function call f(x)
introduces at least 3 kinds of coupling:
f
is hard-wired into the callerâs code. The calling code cannot be cut/copy/pasted into some other solution without also dragging in the called code, or, by futzing with the source code.f(x)
waits for the callee to return a value. This is also known as blocking . Function call notation works fine on paper, where functions can be evaluated instantaneously. Itâs different when you map function call syntax onto hardware that has propagation delays wherein functions take finite amounts of time to ârunâ. This subtle difference in behaviour leads to hidden gotchas. A glaring example of the impact of such a difference can be seen in the Mars Pathfinder disaster[^pathfinder].v = f(x)
hard-wires a routing decision into the calleeâs code. The callee must direct its response back to the caller. This is called âreturning a valueâ. Again, this doesnât look like a problem when you just want to build fancier calculators, but, this hard-wired routing decision discourages simple solutions to non-calculator problems, like machine control.When you donât have complete isolation, you donât have building blocks. Imagine a LEGOÂź set where all the pieces are joined together with a single, long sewing thread glued to each LEGOÂź block. Or, you have two real-world objects, e.g. one apple and one orange. You cut the apple in half. What happens to the orange?
As humans, we are used to the idea that objects are completely isolated. Programs donât work that way. We have to stop and think hard when writing programs.
If you want to play along with this experiment, the code is in github.com/guitarvydas/cfunc.
A quickie grammar that matches function declarations in a C file.
Note that this grammar is longer than a REGEX, but, is significantly shorter than a CFG specification (LR(k), YACC, etc.) for the C programming language.
Cfunctions {
program = item+
item =
| comment
| string
| applySyntactic<FunctionDecl> -- decl
| any -- other
FunctionDecl = name "(" param+ ")" "{" block+ "}"
param =
| "(" param+ ")" -- nested
| ~"(" ~")" any -- flat
block =
| "{" block+ "}" -- nested
| ~"{" ~"}" any -- flat
name = letter (alnum | "_")*
comment =
| "//" (~nl any)* nl
| "/*" (~"*/" any)* "*/"
string =
| bqstring
| dqstring
| sqstring
bqstring = "`" (qbq | (~"`" any))* "`"
dqstring = "\"" (qdq | (~"\"" any))* "\""
sqstring = "'" (qsq | (~"'" any))* "'"
qbq = "\\" "`"
qdq = "\\" "\""
qsq = "\\" "'"
nl = "\n"
spaces += comment
}
Can this grammar be improved and optimized? Probably. But, why would you care?
You would care only if you used this code in an end-user product.
If you use this code in something like a batch-editing environment, âefficiencyâ takes on a different meaning. End-users donât care about the efficiency of your code editor and its Find-and-Replace function. End-users donât care how efficient your command line tools, like grep , are.
When you treat Ohm-JS + Fab as batch editors for development, then, only development efficiency matters.
I strongly believe that one shouldnât write code. One should write code that writes code. From this perspective, âefficiencyâ breaks down into 2 camps:
Note that traditional compilers are simply apps that write code. Developers use compilers . End-users donât care if a developer created end-user app code by hand or by using a compiler. The only things that end-users care about is if the app is cheap and runs on cheap hardware. The final app is assembler, regardless of how it was created. Developers, on the other hand, do care about development time and effort. Hand-writing apps requires much more effort than using high-level language compilers to generate the final app code. Debugging apps is easier when using high-level languages with type-checkers. On the other hand, developers usually buy fancier hardware than that which is used by end-users. Developers can afford to burn CPU cycles on their fancy hardware to give themselves faster - and cheaper - development and debugging times.
The final step in development is that of Production Engineering an app to make it cheap-enough to sell. Up until that point, the development workflow should consist of anything that speeds up and cheapens development time, for example, dynamic language environments and REPLs. For example, Rust is a Production Engineering language and neednât be used until the last moment.
A .fab
file is a specification that creates strings based on the above grammar. Fab is an experimental transpiler tool that works with Ohm-JS. It generates JavaScript code required by Ohm-JS. This could all be done by off-the-shelf Ohm-JS. Fab simply reduces the amount of keyboarding needed for creating JavaScript âsemanticsâ code required by Ohm-JS. Fab is written in Ohm-JS.
Cfunctions {
program [item+] = â«item»'
item_decl [x] = â«x»'
item_other [x] = â'
FunctionDecl [name lp param+ rp lb block+ rb] = â\n«name»'
param_nested [lp param+ rp] = â'
param_flat [c] = â'
block_nested [lp block+ rp] = â'
block_flat [c] = â'
name [letter c*] = â«letter»«c»'
comment [begin cs end] = â'
nl [c] = â«c»'
spaces [cs] = â«cs»'
bqstring [begin cs* end] = â'
dqstring [begin cs* end] = â'
sqstring [begin cs* end] = â'
qbq [bslash c] = â'
qdq [bslash c] = â'
qsq [bslash c] = â'
}
The above was tested against grep.c
from the Gnu grep repo.
git clone
git.savannah.gnu.org/git/grep.git
Iâm playing with the design of a new tool that I call bred (bracket editor). Itâs like a super-simple batch editor that walks through text that contains bracketed constructs.
The full specification consists of 2 strings
The above specifications might be re-expressed as:
â«name» («params») {«block»}'
â«name»'
which reads as:
«name» («params») {«block»}
Currently, my concepts have warts - what happens when a comment or a string or a character constant contains brackets, or, even worse, what happens if they contain unmatched brackets?
Nice ideas.
Re 0D, my next question is: how to decide at what granularity to stop using function calls? Or are you suggesting eliminating them entirely?
Re transpiler pipelines: I tried this for a while a few years ago. The conclusion I reached was that they were great for adding capabilities but they can't add restrictions. In first class languages often a lot of value comes from guarantees that certain events won't occur. An int won't be assigned to a string. There you need a single coherent grammar. Does this seem right?
... re: 0D ...
ideal: use both, without letting language influence your thinking
ideal: use both, but, remain aware of what each choice accomplishes
ideal: 0D to be so cheap that it could be used on every line of code
reality: 0D is entangled with Multiprocessing and the current grain size is âProcessâ
alternate reality: 0D can be couched in terms of closures and FIFOs, hence, grain size is âfunctionâ (where closure is roughly equivalent to function)
reality: CALL/RETURN and the callstack are hard-wired into CPUs (there used to be a time when CPUs didnât have hard-wired callstacks)
reality: 1950s IDEs for Programming were Programming Languages, but, in 2022++ IDEs include other stuff, like powerful programming editors
CALL is used for 2 reasons: (1) compacting code size, (2) DRY (Donât Repeat Yourself). There is no good reason to allow CALL/RETURN to leak into end-user code except for case (1) compacting code size [corollary: case (2) should be entirely optimized away at âcompile timeâ and âedit timeâ]
x.f(x) is syntax with the meaning âmutate the global callstack and mutate the IP to point at the method function x.fâ (and âreturnâ means âput the return value in a special place, then mutate the global callstack, then mutate the IP to point at the callerâs continuation codeâ)
but, there is no popular builtin syntax for Send()ing to an output queue and passing the finalized output queue back up to the parent Container
... re: transpiler pipelines question ... thinking ...
... re: transpiler pipelines question, progress towards answering the question, WIP ...
... this doesnât necessarily answer the question, but might show where my thinking is going, while I try to figure out what is really being asked ...
... I think of a PL as 2 issues: (1) data (2) control flow, i.e. (1) operands and (2) syntax ...
... I am playing with Orthogonal Programming Languages, where (1) is OO, (2) is syntax ; based on Cordyâs Orthogonal Code Generator ideas and based on RTL and based on dataless PLs (like Holtâs S/SL (used in PT Pascal, Concurrent Euclid, Turing, etc.)) ...
... I think that dataless languages boils down to 2 entities: (1) Things, (2) Lists of Things. Types are opaque and cannot be defined at the dataless language-level (Types are defined and manipulated in other layers, implemented in common PLs (e.g. Python, C, etc.))
String s
s <- âabcâ
s <- 7
$defsynonym (âsâ, âšod, kind=var, type=âStringâ, key=âsââ©)
s <- âabcâ
s <- 7
$defsynonym (âsâ, âšod-var, âStringâ, âsââ©)
$Assign s, âšod-lit, âStringâ, âabcââ©
$Assign s, âšod-lit, âintâ, 7â©
... same as ...
$Assign âšod-var, âStringâ, âsââ©, âšod-lit, âStringâ, âabcââ©
$Assign âšod-var, âStringâ, âsââ©, âšod-lit, âintâ, 7â©
âStringâ == âStringâ --> OK
âStringâ != âintâ --> Error
This looks like simple name equivalence. Lower layers are free to use structural equivalence instead (using names as keys to a map containing more detail for each type).
[The goal here is to think of a compiler as a string of pearls on a pipeline instead of as a honking big tree].
[od - oh-D, not zero-D, means âobject descriptorâ]