Paul Tarvydas 2023-01-22 14:06:36

Summary 2022

For me 2022 was:

0D
transpiler pipelines

Explanations below.

There is nothing “new” here. I believe that our crop of programming languages subtly discourages certain kinds of thoughts. You can do these things with our programming languages, but, you don’t bother .

[I wrote this on Jan 1/2023. Then I promptly got sick and found new ways to procrastinate. I will gladly remove this if it is inappropriate or too long...]

TL;DR

0D

0D is part of traditional parallelism (zero-dependency, total decoupling)
breaking 0D away from parallelism enables other uses
0D uses FIFOs, whereas functions use LIFOs (LIFOs are used by most modern programming languages, Python, Rust, etc. and stifle possible solutions)

Transpiler Pipelines

“skip over” uninteresting bits of syntax, whereas CFG requires full language specification
leads to a different class of tools -> parser used for “quickie” matches instead of for building compilers ; different way of using parser DSLs ; like mathematical manipulation of notation
“skipping over” bits of syntax allows syntactic composition ; syntactic composition enables pipelines ;

0D

0D is a short-hand for the phrase zero dependency . Total decoupling.

Programmers already know how to write 0D code, but, they tangle this simple concept up with other concepts and call the result “parallelism”.

At a very, very basic level, you can achieve 0D by using FIFOs instead of LIFOs (queues vs stacks). LIFOs - callstacks - are good for expressing synchronous code. LIFOs are less-good for expressing asynchronous code.

Programmers often conflate nested, recursive functions with the notion of pipelines. If a component sends itself a message, the message is queued up in FIFO order and there is a “delay” before the message is processed, whereas if a component recursively calls itself, the function parameters are pushed onto a stack and the processing happens immediately, in LIFO order. This subtle difference in processing sequence manifests itself in design differences. For example, in electronics - where all components are asynchronous by default - you often see the use of “negative feedback”, say in Op-Amp designs. You rarely see this technique used in software design. In electronics, negative feedback is used by components to self-regulate, whereas in software, recursion is used as a form of divide and conquer. Feedback loops make it possible to be explicit about software design, whereas recursion hides the key element - the callstack - of the design.

EEs had this issue sussed out, before the advent of the “everything must be synchronized” mentality.

All components in an electronic circuit are asynchronous by default. Synchrony is judiciously, explicitly designed-in through the use of protocols . Synchrony is not designed-in everywhere by default and is explicitly designed in on an as needed basis. There is a reason - a subtle reason - why it is easy to draw diagrams of computer networks and not-so-easy to draw diagrams of synchronous code.

In EE designs, concurrency is so cheap that you can’t help but use it. In software, concurrency implies difficulty and designers end up avoiding concurrency in their designs.

This subtle difference has a trickle-down effect to end-user code. When it is difficult to draw diagrams of programs and to snap components together, programmers tend not to provide such features to end-users. Or, when they provide such features, they implement such features under duress. If DaS and snappable components were abundantly available, such features would naturally leak through to end-user apps.

0D can be implemented a lot more efficiently than by using operating system processes and IPCs. Most modern programming languages support closures (anonymous functions) and make it easy to build queue data structures. Stick one queue at the front of a closure - the “input queue” - and one queue at the tail of a closure - the “output queue” - and, you get 0D. Then, you need to write a wrapper component that routes “messages” from the output queue of one closure to the input queue of another closure. Can this concept be generalized? This ain’t rocket science.

When you build 0D software components, does the order-of-operation of components matter? Nope. Can a 0D component create more than one result during its operation? Yep. Can a 0D component directly refer to another 0D component? Nope. The best you can do is to compose networks of 0D components inside of routing wrappers.

Transpiler Pipelines

It would be nice to build up solutions using pipelines of many little solutions and syntaxes made expressly for those solutions.

What do you need to be able to do this?

1) You need to be able to write grammars that are very, very small and that allow you to”ignore” bits of syntax that don’t pertain to a problem, e.g. kind-of like REGEX, but, better.

2) Total isolation of building blocks.

Very Small Grammars That Ignore Uninteresting Items

Ohm-JS - a derivative of PEG (Parsing Expression Grammars) - makes it possible to write grammars that skip over uninteresting bits of text.

For example, if you want to write a quickie parser for C code, you might want to say:

... function-name (...) {...}

In Ohm-JS, you can say this, whereas in a CFG-based parser generator you need to over-specify all of the niggly bits of C syntax. In Ohm-JS, this results in a few minutes of work and only a few lines of code. The Ohm-Editor assists in developing the micro-grammar.

In YACC and CFG-based approaches, though, you’re looking at a gargantuan job (days, weeks, months, ...) and you simply don’t bother to write such a quickie parser. You either don’t bother with the whole idea, or you use something like REGEX which fails on a number of edge-cases for this kind of thing. REGEX can’t search recursively for matching brackets, Ohm-JS can. Using REGEX, you might get away with a partial solution, or, the project might grow larger as you hit unexpected speed bumps. You either persevere or you just give up.

For the record, the grammar plus the accompanying code fabricator specification for the above simple example are shown in the appendix.

DaS Comes For Free

When you can build totally isolated building blocks, you can draw sensible diagrams of how the building blocks should be snapped together to solve a problem.

Later, you can steal (cut/copy/paste) chunks of previous solutions and use them as building blocks for new problems.

DaS: Diagrams as Syntax.

DaS is not diagrams as an Art Form. DaS is diagrams as programming languages. For example, instead of writing {...} , you draw a rectangle.

Programming languages were created by culling the English language and by choosing only the words and phrases that could be compiled to executable code.

Can we cull diagrams in the same way to invent new programming languages?

EE’s have done this and they call the resulting diagrams “schematics”.

Building construction engineers have done this and call the resulting diagrams “blueprints”.

Don’t We Already Use Building Blocks?

“Code Libraries” look like building blocks, but, contain subtle bits of coupling that discourage building-block-iness.

For example, the very common idiom of a function call f(x) introduces at least 3 kinds of coupling:

The name f is hard-wired into the caller’s code. The calling code cannot be cut/copy/pasted into some other solution without also dragging in the called code, or, by futzing with the source code.
The function call f(x) waits for the callee to return a value. This is also known as blocking . Function call notation works fine on paper, where functions can be evaluated instantaneously. It’s different when you map function call syntax onto hardware that has propagation delays wherein functions take finite amounts of time to “run”. This subtle difference in behaviour leads to hidden gotchas. A glaring example of the impact of such a difference can be seen in the Mars Pathfinder disaster[^pathfinder].
The function return v = f(x) hard-wires a routing decision into the callee’s code. The callee must direct its response back to the caller. This is called “returning a value”. Again, this doesn’t look like a problem when you just want to build fancier calculators, but, this hard-wired routing decision discourages simple solutions to non-calculator problems, like machine control.

When you don’t have complete isolation, you don’t have building blocks. Imagine a LEGO® set where all the pieces are joined together with a single, long sewing thread glued to each LEGO® block. Or, you have two real-world objects, e.g. one apple and one orange. You cut the apple in half. What happens to the orange?

As humans, we are used to the idea that objects are completely isolated. Programs don’t work that way. We have to stop and think hard when writing programs.

Paul Tarvydas 2023-01-22 14:10:45

Appendix

If you want to play along with this experiment, the code is in github.com/guitarvydas/cfunc.

c.ohm

A quickie grammar that matches function declarations in a C file.

Note that this grammar is longer than a REGEX, but, is significantly shorter than a CFG specification (LR(k), YACC, etc.) for the C programming language.

Cfunctions {

  program = item+

  item =

    | comment

    | string

    | applySyntactic<FunctionDecl> -- decl

    | any -- other

  FunctionDecl = name "(" param+ ")" "{" block+ "}"



    param =

      | "(" param+ ")" -- nested

      | ~"(" ~")" any  -- flat



    block =

      | "{" block+ "}" -- nested

      | ~"{" ~"}" any  -- flat



      name = letter (alnum | "_")*

      comment =

        | "//" (~nl any)* nl

        | "/*" (~"*/" any)* "*/"

      string =

        | bqstring

        | dqstring

        | sqstring

      bqstring = "`" (qbq | (~"`" any))* "`"

      dqstring = "\"" (qdq | (~"\"" any))* "\""

      sqstring = "'" (qsq | (~"'" any))* "'"

      qbq = "\\" "`"

      qdq = "\\" "\""

      qsq = "\\" "'"

      nl = "\n"

      spaces += comment

}

Can this grammar be improved and optimized? Probably. But, why would you care?

You would care only if you used this code in an end-user product.

If you use this code in something like a batch-editing environment, “efficiency” takes on a different meaning. End-users don’t care about the efficiency of your code editor and its Find-and-Replace function. End-users don’t care how efficient your command line tools, like grep , are.

When you treat Ohm-JS + Fab as batch editors for development, then, only development efficiency matters.

I strongly believe that one shouldn’t write code. One should write code that writes code. From this perspective, “efficiency” breaks down into 2 camps:

developer efficiency
end-user efficiency.

Note that traditional compilers are simply apps that write code. Developers use compilers . End-users don’t care if a developer created end-user app code by hand or by using a compiler. The only things that end-users care about is if the app is cheap and runs on cheap hardware. The final app is assembler, regardless of how it was created. Developers, on the other hand, do care about development time and effort. Hand-writing apps requires much more effort than using high-level language compilers to generate the final app code. Debugging apps is easier when using high-level languages with type-checkers. On the other hand, developers usually buy fancier hardware than that which is used by end-users. Developers can afford to burn CPU cycles on their fancy hardware to give themselves faster - and cheaper - development and debugging times.

The final step in development is that of Production Engineering an app to make it cheap-enough to sell. Up until that point, the development workflow should consist of anything that speeds up and cheapens development time, for example, dynamic language environments and REPLs. For example, Rust is a Production Engineering language and needn’t be used until the last moment.

c.fab

A .fab file is a specification that creates strings based on the above grammar. Fab is an experimental transpiler tool that works with Ohm-JS. It generates JavaScript code required by Ohm-JS. This could all be done by off-the-shelf Ohm-JS. Fab simply reduces the amount of keyboarding needed for creating JavaScript “semantics” code required by Ohm-JS. Fab is written in Ohm-JS.

Cfunctions {

  program [item+] = ‛«item»'

  item_decl [x] =  ‛«x»'

  item_other [x] =  ‛'

  FunctionDecl [name lp param+ rp lb block+ rb] = ‛\n«name»'

    param_nested [lp param+ rp] = ‛'

    param_flat [c] = ‛'

    block_nested [lp block+ rp] = ‛'

    block_flat [c] = ‛'

      name [letter c*] = ‛«letter»«c»'

      comment [begin cs end] = ‛'

      nl [c] =  ‛«c»'

      spaces [cs] =  ‛«cs»'

      bqstring [begin cs* end] = ‛'

      dqstring [begin cs* end] = ‛'

      sqstring [begin cs* end] = ‛'

      qbq [bslash c] = ‛'

      qdq [bslash c] = ‛'

      qsq [bslash c] = ‛'

}

grep.c

The above was tested against grep.c from the Gnu grep repo.

git clone git.savannah.gnu.org/git/grep.git

Even Smaller

I’m playing with the design of a new tool that I call bred (bracket editor). It’s like a super-simple batch editor that walks through text that contains bracketed constructs.

The full specification consists of 2 strings

what to match
how to rewrite it.

The above specifications might be re-expressed as:

‛«name» («params») {«block»}'

‛«name»'

which reads as:

match, recursively, anything that looks like «name» («params») {«block»}
then, throw away everything except the name

Currently, my concepts have warts - what happens when a comment or a string or a character constant contains brackets, or, even worse, what happens if they contain unmatched brackets?

Kartik Agaram 2023-01-22 18:44:08

Nice ideas.

Re 0D, my next question is: how to decide at what granularity to stop using function calls? Or are you suggesting eliminating them entirely?

Re transpiler pipelines: I tried this for a while a few years ago. The conclusion I reached was that they were great for adding capabilities but they can't add restrictions. In first class languages often a lot of value comes from guarantees that certain events won't occur. An int won't be assigned to a string. There you need a single coherent grammar. Does this seem right?

Vijay Chakravarthy 2023-01-22 22:39:55

this talk is quite relevant —

Vijay Chakravarthy 2023-01-22 22:39:58

youtu.be/JMZLBB_BFNg

🎥 "Parser Parser Combinators for Program Transformation" by Rijnard van Tonder

Paul Tarvydas 2023-01-23 12:19:49

... re: 0D ...

ideal: use both, without letting language influence your thinking

ideal: use both, but, remain aware of what each choice accomplishes

ideal: 0D to be so cheap that it could be used on every line of code

reality: 0D is entangled with Multiprocessing and the current grain size is “Process”

alternate reality: 0D can be couched in terms of closures and FIFOs, hence, grain size is “function” (where closure is roughly equivalent to function)

reality: CALL/RETURN and the callstack are hard-wired into CPUs (there used to be a time when CPUs didn’t have hard-wired callstacks)

reality: 1950s IDEs for Programming were Programming Languages, but, in 2022++ IDEs include other stuff, like powerful programming editors

CALL is used for 2 reasons: (1) compacting code size, (2) DRY (Don’t Repeat Yourself). There is no good reason to allow CALL/RETURN to leak into end-user code except for case (1) compacting code size [corollary: case (2) should be entirely optimized away at “compile time” and “edit time”]

x.f(x) is syntax with the meaning “mutate the global callstack and mutate the IP to point at the method function x.f” (and “return” means “put the return value in a special place, then mutate the global callstack, then mutate the IP to point at the caller’s continuation code”)

but, there is no popular builtin syntax for Send()ing to an output queue and passing the finalized output queue back up to the parent Container

... re: transpiler pipelines question ... thinking ...

Paul Tarvydas 2023-01-24 10:11:31

... re: transpiler pipelines question, progress towards answering the question, WIP ...

... this doesn’t necessarily answer the question, but might show where my thinking is going, while I try to figure out what is really being asked ...

... I think of a PL as 2 issues: (1) data (2) control flow, i.e. (1) operands and (2) syntax ...

... I am playing with Orthogonal Programming Languages, where (1) is OO, (2) is syntax ; based on Cordy’s Orthogonal Code Generator ideas and based on RTL and based on dataless PLs (like Holt’s S/SL (used in PT Pascal, Concurrent Euclid, Turing, etc.)) ...

... I think that dataless languages boils down to 2 entities: (1) Things, (2) Lists of Things. Types are opaque and cannot be defined at the dataless language-level (Types are defined and manipulated in other layers, implemented in common PLs (e.g. Python, C, etc.))

Src

String s

s <- ‘abc’

s <- 7

Gather

$defsynonym (‘s’, ⟨od, kind=var, type=“String”, key=‘s’⟩)

s <- ‘abc’

s <- 7

Normalize

$defsynonym (‘s’, ⟨od-var, “String”, ‘s’⟩)

$Assign s, ⟨od-lit, “String”, ‘abc’⟩

$Assign s, ⟨od-lit, “int”, 7⟩

... same as ...

$Assign ⟨od-var, “String”, ‘s’⟩, ⟨od-lit, “String”, ‘abc’⟩

$Assign ⟨od-var, “String”, ‘s’⟩, ⟨od-lit, “int”, 7⟩

Semantic Check

“String” == “String” --> OK

“String” != “int” --> Error

This looks like simple name equivalence. Lower layers are free to use structural equivalence instead (using names as keys to a map containing more detail for each type).

[The goal here is to think of a compiler as a string of pearls on a pipeline instead of as a honking big tree].

[od - oh-D, not zero-D, means “object descriptor”]

You are viewing archived messages. Go here to search the history.

Summary 2022

TL;DR

0D

Transpiler Pipelines

0D

Transpiler Pipelines

Very Small Grammars That Ignore Uninteresting Items

DaS Comes For Free

Don’t We Already Use Building Blocks?

Appendix

c.ohm

c.fab

grep.c

Even Smaller

Src

Gather

Normalize

Semantic Check

You are viewing archived messages.
Go here to search the history.