You are viewing archived messages.
Go here to search the history.

Paul Tarvydas 2022-12-06 21:30:32

An essay about the history of compiler-appeasement languages, meant for discussion.

[meta: I would be glad to move this to another channel or to remove it completely. One aspect of my “work” is to observe. FWIW, I share this...]

Type checkers are helpers for development. Similar to parsing and syntax checkers. Type checkers are - currently - more difficult to write than syntax checkers. There was a time when syntax checking was not well understood and was difficult and was written in an ad-hoc manner. Early FORTRAN had syntax that we would consider weird today. Early FORTRAN did not treat spaces specially and went for the shortest match (I think). For example IF and IFX were both parsed as the beginning of an IF statement. Later, it was discovered that making spaces “special” would help in making parsers and would help in cleaning up silliness like IF and IFX . At the time, the character set consisted of ASCII (well, there was EBCDIC, championed by IBM, but, IBM was hated even more than Microsoft and Apple are hated today, so EBCDIC was mostly avoided by non-IBMers). The fact that ASCII only has 128 characters to choose from (some 32 “unprintables” must be subtracted from this count) made for silly decisions like denoting strings with the same beginning and ending quote (which makes parsing more difficult) and not-allowing spaces to be embedded in names. With Unicode, we have many more choices, but, we remain stuck with decisions made to appease 1950s hardware. Aside: in 2022, we have hardware that can handle vector graphics and overlapping graphical elements, e.g. windows and very-small windows (“buttons”, “widgets”), but, we are stuck with decisions made to appease 1950s hardware. I argue that we should be building languages based on vector graphics instead of non-overlapping characters. SVG is a simple example of something that might work on this front (rectangles, ellipses, lines, text, groups). Aside: “declaration before use” is a result of 1950s thinking (save CPU time by making compilers 1-pass), even though, in 2022, we could easily burn CPU cycles to figure out “declaration after use”. Aside: declaration-checking (before or after use) is only a helper for developers. The machine doesn’t care if you make a typo. “Declaration-checking” is an app to help developers stamp out simple errors (like typos). Demanding that programmers rearrange their code so that the declarations ALL come before the code is compiler-appeasement (based on 1950s hardware).

The best way to write a type checker is to use a Relational Language (like PROLOG, miniKanren, Datalog, etc., etc.). Relational languages are shining examples of languages that don’t appease compilers. In a relational language, you write relations (“truth”) and let the underlying system figure out how to implement the machinery for matching up the relations. Other technologies that bark up this same tree: declarative languages and ML. (Aside: oh my, HTML is a declarative language. But, HTML needs to lean on JavaScript to allow imperative break-outs).

There is no “ideal language”. The notation you use depends on the problem you are trying to solve. A simple example would be the idea of Spreadsheets vs. Lambda Calculus. Accountants and financial analysts like Spreadsheets. Programming rigor analysts like Lambda Calculus. Accountants would not want to use Lambda Calculus and rigor-ists would not want to use Spreadsheets. Another example, closer to my heart, is the difference between using Language Theory to generate parsers and using PEG to generate parsers. Language Theory-based parsers cannot do what PEG-based parsers can do (for example, parse balanced parentheses). Trying to force-fit language theory onto parsing has stagnated the field. Most languages look the same, with minor differences. (Aside: PEG is Parser Theory, not Language Theory, despite the superficial similarities in syntax). The fact that parsing is “difficult” has restricted programmers to using only a small number of programming languages, instead of using a zillion nano-languages and defining their own nano-languages (I call these SCNs (Solution Centric Notations)).

“Dynamic” languages are “good” for fast turnaround on ideas, but are “bad” for producing end-user apps which are cost-optimized. “Static languages” are “good” for Production Engineering apps, but, are “bad” for inventing new products. Trying to force-fit all use-cases into one language results in a watered-down notation which isn’t particularly good for either use-case. (Aside: at the moment, efforts to force-fit all use-cases into one language favour the Production Engineering side over the Design side of things, and, this is what I call “compiler appeasement”. When programmers have to stop and rearrange their inventions to help the compiler figure out how to optimize, they are appeasing the compiler). (Aside: if Physicists ALL engaged in worshipping functional notation, we wouldn’t have Feynman Diagrams, nor Polyani’s “Order Out of Chaos”, etc.).

Barnacles might be invented for helping developers, e.g. type checkers and linters. Twisting a language design to appease only pre-compilation is not OK in my book. At the moment, most of our programming languages are compiler-appeasement languages and insist that developers waste time (and imagination) on dealing with compiler-appeasement and pre-compilation issues, long before the program works.

Barnacle-like pre-compilation was researched in the mid-1900s with work like Fraser/Davidson peephole technologies. This was called RTL and formed the basis of gcc . Cordy’s Orthogonal Code Generator is a generalization of this technique replete with declarative syntax for portability choice-trees (MISTs) and Data Descriptors and Condition Descriptors that improve on the virtual registers used by RTL.

[This essay can be found in github.com/guitarvydas/py0d/issues/2]

Konrad Hinsen 2022-12-07 07:17:12

I tend to agree with most of this, except that I am not sure I understand your ideas about SVG. In mainstream programming, the ground-truth representation of code is text (a linear stream of characters), and the UI is a slightly rendered version of that text (syntax coloring etc.). I see how SVG could yield better UIs, but it sounds as if you want SVG to become the ground-truth representation of code. If that's the case, I don't see why.

Konrad Hinsen 2022-12-07 07:22:29

On a different topic, your essay contains the eternal "we are so stuck in the past, we need to move on" thread that comes up frequently here (and elsewhere). What I have yet to see is a realistic proposition for "moving on". If it's "start from scratch", it just won't happen. We are not stuck with 1950s hardware, we have built up an enormous IT edifice starting with 1950s hardware. We are still adding to it, and it has become infrastructure for other human endeavours. It won't go away (except maybe in a major civilization crash).

A realistic scenario for moving on must either start from the present edifice and evolve it into something better, remaining functional along the way, or build a second, disconnected edifice to which the world can slowly migrate as it matures.

Nicholas Yang 2022-12-09 02:34:23

Konrad Hinsen I really agree with this sentiment on the “we are so stuck in the past” thread. It’s a little cliche almost to complain about how software development is such a mess. And not necessarily the most productive mindset.

Paul Tarvydas 2022-12-09 14:01:13

Kartik Agaram Yes.

The machine does the work instead of asking programmers to rearrange their code.

Your reminder reminds me of Holt’s Data Descriptors, which reminds me of Cordy’s Orthogonal Code Generator which reminds me of RTL (which is used in gcc).

Paul Tarvydas 2022-12-12 12:25:58

Thank you all, for your comments! I find the insightful criticism to be quite helpful.

Hoping to clarify some points:

  • IMO The Ground Truth is Programming, not Code.
  • IMO, change, even big change, is possible, especially in a young discipline like this. We are witnessing big change right now - the conversion of gasoline-powered automobiles to electric-powered automobiles. The enormous infrastructure for supplying gasoline is being replaced. The figurehead for this huge change believes in “First Principles Thinking” [fs.blog/first-principles]. Counter-arguments against big change might include:
  • The fact that electric cars still have 4 wheels and still have offset steering wheels (instead of centered steering sticks as in the first automobile)
  • The fact that the change-over from gas to electric is being facilitated by the existence of hybrid cars.
  • Computer keyboards are still QWERTY.

    IMO, change is driven by perceived orders-of-magnitude improvement (in process? ... in what?). It remains incumbent on me to explain why I think that it is possible to improve on programming, noting that hardware has improved, and, noting that end-user apps have improved, while maintaining the assertion that programming languages have not improved proportionally.

  • PEG blurs the line between characters in a Scanner vs. tokens in a Parser. IMO this is important and may lead to change.
  • IMO, snipping ALL dependencies is important and may lead to change. Hence, this long response to an issue in an experimental code repo. To my eyes even simplef(x) causes several kinds of dependencies that are ripe for snipping.
  • SVG. This is the simplest (albeit useless) example of SVG as a “programming language” that I could imagine. There’s more, and this is but a hint. This useless example transpiles a diagram to running Python code. The live code is at drawware.

github.com/guitarvydas/drawware/blob/dev/sourcecodehelloworld.png?raw=true

<html>

  <body>

    

    <h1>My first SVG</h1>

    

    <svg>

      <g id='myfunc'>

        <rect x="20" y="0" width="280" height="130" rx="19.5" ry="19.5" fill="none" stroke="black"/>

        <text x="120" y="19">myfunc</text>



        <g>

          <rect x="60" y="40" width="200" height="60" fill="#f8cecc" stroke="#b85450"/>

          <text x="70" y="64">print ('Hello ', end='')</text>

          <text x="70" y="84">print ('World')</text>

        </g>

        

      </g>

    </svg>

    

  </body>

</html>

the Ohm-JS grammar that I used is:

CodeDrawing {

  Main = "<html>" "<body>" H1 Drawing "</body>" "</html>"

  H1 = "<h1>" stuff "</h1>"



  Drawing = "<svg" stuff ">" CodeContainer "</svg>"

  

  CodeContainer = "<g" FunctionName ">" Boundary Title CodeBlock "</g>"

  

  Boundary = "<rect" stuff "/>"

  Title = "<text" stuff ">" name "</text>"

  CodeBlock = "<g>" RedRect Text+ "</g>"

  RedRect = "<rect" (~AttrRed any)* AttrRed stuff "/>"



  FunctionName = "id=" sq name sq

  Text = "<text" stuff ">" stuff "</text>"

  AttrRed = "fill=" dq "#f8cecc" dq



  stuff = notElementChar+

  notElementChar = ~"<" ~">" ~"/>" any

  name = letter alnum*

  sq = "'"

  dq = "\""

}

And my personal notation (“Fab”) that completes the transpiler is:

CodeDrawing {

  Main [khtml kbody H1 Drawing kbodyend khtmlend] = ‛«Drawing»'

  H1 [kh1 stuff kh1end] = ‛«kh1»«stuff»«kh1end»'



  Drawing [ksvg CodeContainer ksvgend] = ‛«CodeContainer»'

  

  CodeContainer [kgroup FunctionName k Boundary Title CodeBlock kgroupend] = ‛def «FunctionName» ():«CodeBlock»

«FunctionName» ()'

  

  Boundary [krect stuff kend] = ‛«krect»«stuff»«kend»'

  Title [ktext stuff k name ktextend] = ‛«ktext»«stuff»«k»«name»«ktextend»'

  CodeBlock [kgroup RedRect Texts+ kgroupend] = ‛\n(-«Texts»-)'

  RedRect [krect cs* AttrRed stuff kend] = ‛«krect»«cs»«AttrRed»«stuff»«kend»'



  FunctionName [kid sq1 name sq2] = ‛«name»'

  Text [ktext stuff kend stuff2 ktextend] = ‛\n«stuff2»'

  AttrRed [kfill dq kred dq2] = ‛«kfill»«dq»«kred»«dq2»'



  stuff [cs+] = ‛«cs»'

  notElementChar [c] = ‛«c»'

  name [c1 cs*] = ‛«c1»«cs»'

  sq  [c] = ‛«c»'

  dq  [c] = ‛«c»'

}

The generated Python code (from the diagram) is:

def myfunc ():



  print ('Hello ', end='')

  print ('World')

myfunc ()

FYI - Fab transpiles to JavaScript code that can be used in conjunction with Ohm-JS. I got tired of writing JavaScript, so I built an SCN (nano-DSL) for myself.

I’ve chosen not to include the generated JavaScript code in this thread, but have pushed a working transpiler to the github listed above.

[pardon the cave-man HTML, but I don’t know enough about building web pages, and, clearly, need help]

31:50~ “In a ‘real’ Computer Science, the best languages of an era should serve as ’assembly code” for the next generation of expression.

youtube.com/watch?v=fhOHn9TClXY&t=859s

Alan Kay

Nicholas Yang 2022-12-09 02:35:44

Hey all, this blog post has been kicking around in my drafts for a while. I finally decided to complete it. It’s about thinking about compilation, basically the strategies and techniques that I’ve learned in my short time writing compilers. It’s what I wish I had as a resource when I started. github.com/NicholasLYang/uptointerpretation/blob/master/content/posts/how-to-think-about-compiling.md

Andrew F 2022-12-09 03:39:31

You lost me at "nice, elegant language like Java". ;) Just kidding, this seems like a nice survey of things to keep in mind. I liked the reminder in the IR section that an IR didn't necessarily have to be "closer" to the output language in an obvious way. I generally like this level of abstraction and detail, and I wish more writing took this approach.

Nicholas Yang 2022-12-09 04:36:41

Haha I don’t even remember if i was being serious when I wrote that. Thanks for the feedback!

Ivan Reese 2022-12-09 03:00:45

Future of Coding • Episode 60

Bret Victor • Magic Ink

🔗 futureofcoding.org/episodes/060

Hey, ya'll ever hear of this guy? He posts some wild stuff. Feels like it might be relevant to the folks here. Maybe a little fringe. For instance, he thinks that software could be — get this — better! You might be surprised to learn that I also think software could be better. Radical idea, yes, but it feels like it's finally time for us to take the idea seriously.

Next month, we're reading Peter Naur's Programming as Theory Building, with a little bit of Gilbert Ryle's The Concept of Mind as background.

Personal Dynamic Media 2022-12-10 06:08:39

Thank you so much for making this episode! I love Brett Victor's presentations and most of his essays, but I've never been able to get into Magic Ink. I'm not sure why, but I've always struggled both to follow that essay and to stay interested while reading it, so it's great to get your distillation of its contents.

Several of the ideas that you discuss, such as showing the distributions of reviews with tick marks under the stars, remind me of Edward Tuft's wonderful work on presenting information graphically.

I love Robot Odyssey, and also Rocky's Boots which was a similar game, also from The Learning Company, that dealt with solving puzzles by creating digital logic circuits.

For good time, you can play both of them in your browser at archive.org.

archive.org/details/d098s1-robot-odyssey

archive.org/details/RockysBoots41

Personal Dynamic Media 2022-12-10 16:25:11

It also occurs to me that part of why modern software, especially web sites, doesn't minimize interaction is because the authors are not actually focused on creating the best experience for the user.

Facebook, YouTube, Google, and Amazon are all worse for users than they were ten years ago because they are trying to maximize advertising profits, which means maximizing "engagement," which means maximizing interaction.

Their incentive isn't to create the best experience possible, but rather to create the worst experience that users will tolerate.

This article from the Onion is focused on Facebook, but I think it sums up the whole problem nicely.

theonion.com/facebook-we-will-make-our-product-worse-you-will-be-1819575256

William Taysom 2022-12-12 05:52:26

Here's to Robot Odyssey and the Witness!

The best views of data tend to combine perspectives, like a "three quarter view" en.wikipedia.org/wiki/Portrait_painting#three-quarter_view does the face. It let you see the interaction of properties rather individual properties in isolation.