Future of Coding History

Alex Cruise 2022-06-20 17:45:58

There’s a recurring theme in my career, going back to my very first job, that goes under the mental headline of “wide logic” problems… That’s where you have a somewhat narrow stream (conceptually, not necessarily streaming per se) of data, and you need to evaluate a large number of functions against each item--at least a handful, often dozens, sometimes hundreds! The functions are basically always provided by the user, although normally in a declarative, non-Turing-complete DSL. I’m curious whether others have noticed this pattern, and whether you know of any more mainstream labels for it?

The closest I’ve found is papers that reference publish/subscribe and indexing techniques for complex boolean expression trees, e.g.:

An efficient publish/subscribe index for e-commerce databases (VLDB ’14)
Analysis and optimization for boolean expression indexing [BE-Tree] (ACM Trans. DB. Sys. ’13)
A-Tree: A Dynamic Data Structure for Efficiently Indexing Arbitrary Boolean Expressions (SIGMOD/PODS ’21 … actually this is a new one for me, I need to read it now… 🙂)

Alex Cruise 2022-06-20 17:48:24

Some stories…

Alex Cruise 2022-06-20 18:01:54

First job: a (snail mail) direct marketing agency. We would run dozens of queries/reports to analyze the performance of mailings/campaigns, breaking down response rates, average donation amounts, etc.

The functions were often “symbolic” (i.e. calculated) fields in the 4GL, they were in a Turing-complete language but often written by relatively unskilled programmers. I built a system that would traverse the analysis/mailing list once, eval all the expressions once per record, and aggregate them in memory over multiple group-bys

Alex Cruise 2022-06-20 18:17:09

Second, third and fourth jobs didn’t really have this feature…

Alex Cruise 2022-06-20 18:20:42

Fifth job was layer7.com (an API gateway appliance), every message needed to be resolved to one of potentially hundreds of policies, but the resolution logic was pretty simple, and once resolved, the policy didn’t tend to be too branchy

Chris Granger 2022-06-20 18:23:05

Sounds like streaming analytics to me.

Alex Cruise 2022-06-20 18:23:36

Sixth job was a SaaS app for monitoring compliance rules at brokerages… There were hundreds of handcrafted rules that needed to be evaluated against every Order, Trade and Account record after a nightly ETL job. Before I got involved, every rule was run one at a time, sequentially, and some customers’ nightly jobs were starting to run up against the clock.

Alex Cruise 2022-06-20 18:24:12

Definitely streaming analytics, but from what I’ve seen so far the support for this pattern isn’t great anywhere

Chris Granger 2022-06-20 18:24:50

Differential dataflow should handle it well I would think.

Alex Cruise 2022-06-20 18:26:19

I’ve definitely been keeping a close eye on materialize.io 🙂 … IIUC they don’t do a great job with datasets that are significantly larger than main memory

Alex Cruise 2022-06-20 18:30:01

At the brokerage monitoring thing, the hack I came up with was to stipulate that every rule needed to start with (in an AND position) a predicate that was constrained to be mechanically translatable to SQL, and all other predicates were MVEL, less constrained. All the rules that shared that primary predicate would be run as a single query as the first filter, and the MVEL expressions for the sub-rules would be run in-memory instead of pushed down

Chris Granger 2022-06-20 18:35:22

ah yeah, if you need something out of core, I don't think DD (which is lower level than the materialize offering) will give you that, though it may be something you could add. Relational.ai is probably almost exactly what you want (very powerful variant of datalog that can handle very large datasets and does incremental evaluation), but they're a bit early yet.

Alex Cruise 2022-06-20 18:37:12

Lately I’ve been more focused on teasing out the 3-4 very different cost models for the sub-expressions that queries are composed of:

pure functions that don’t even look at data or the environment; that can be evaluated whenever
functions that need to look at the record, or cheap environment stuff like the current time of day
stateful functions, usually aggregates but sometimes more complex
things that need to do IO: lookups, external service calls, etc. Caching becomes mission-critical

Tom Larkworthy 2022-06-20 21:51:33

yeah I have seen this a lot too, e.g. achievements on fitness trackers (tell me when I have done 10,000 steps over 10K customer base).

The specification is declarative but the data arrives incrementally. Due to the huge maintenance issues of trying to upgrade state on an incremental streaming solution, I tend to recommend just recomputing from scratch every X mins and throw horizontal compute at it (e.g. BigQuery). The system remains stateless then, plus you always need the from scratch computation anyway to recover from disaster, so it's the first thing to do anyway.

Tom Larkworthy 2022-06-20 21:58:43

I see this as a recurring theme of incremental computation, where I see analogues with differential equations.

declarative is to imperatives as y = f(x) is to dy/dx = f(x, y)

Jack Rusher 2022-06-21 11:39:35

These are all Complex Event Processing (CEP) use cases. We built a system at a startup called Aleri in the mid-00s that handled this sort of thing with continuous queries implemented over a dataflow engine. We sold the company to Sybase, which sold to SAP, which -- so far as I know -- still sells the product today.

Personal Dynamic Media 2022-06-20 18:16:06

Is it possible that in our efforts to find alternatives to imperative programming, we have failed to promulgate knowledge about how to program imperatively?

Back in the day, folks like Dijkstra, Hoare, Wirth, Knuth, and Naur did a lot of work on figuring out how to write imperative programs that did what they were intended to do.

However, nowadays, I get the impression that much of the energy being spent on making programs better is focused on alternative ways to structure programs, like object oriented design, distributed and/or parallel and/or concurrent, event driven, reactive, etc. However, most of these design disciplines still involve executing chunks of imperative code, they just involve new and different ways of deciding which imperative code runs when.

This may be a dull and boring idea, but is it possible that part of what we need in order to improve software is wider distribution and study of the old ways of writing correct imperative programs, so that more of the little chunks of imperative code that get executed during an object oriented or event driven program will do what they are supposed to do?

Alex Cruise 2022-06-20 18:31:42

the old ways of writing correct imperative programs

Not sure how well this ever worked 😉

Kartik Agaram 2022-06-20 21:29:12

I notice you didn't mention functional in your list. Was that deliberate? 🙂 I don't think it involves chunks of imperative code, or orchestration of when each step runs.

Dijkstra's A Discipline of Programming is one of the top 10 books on my bookshelf, and highly recommended no matter what paradigm you program in.

Personal Dynamic Media 2022-06-20 21:34:32

Yes, I left out purely functional programming and logic programming on purpose, both because they are not especially popular and because it requires some twisted thinking to fit them into the viewpoint I am taking.

However, I would argue that mostly functional programming, like in Scheme, ML, or Clojure would still fit into this worldview of programs still containing small chunks of imperative code that need to be written correctly.

Ryo Hirayama 2022-06-21 00:54:46

Rust does the things. To write imperative programming correctly today, we will need Rust or better.

Jan Ruzicka 2022-06-21 05:53:01

I’m not sure I agree with that bits of imperative code always get run (I mean: obviously it does; but is this a good viewpoint?).

However, what certainly happens in all paradigms (even functional and logic) is that inbetween “cybernetical parts” (parts that do stuff: function applications, rules, statements, message sends, …) you get to play with lexical variables containing state. These aren’t easily analyzed, even when the state is “immutable” (immutability doesn’t imply discoverability or predictability).

So what is needed, in my view, is better handling of lexical variables (what they contain w.r.t. cybernetic elements of the program) and their evolution through control flow (imperative control flow, but also repeated message receives / function applications). One can imagine that a better understanding of imperative programs might lead to this handling of variables and control flow, since both are present in almost all paradigms in the form they had in imperative languages (an exception I can think of is constraint-solving).

Yusuke Shinyama 2022-06-21 22:55:14

I don't think imperative style is necessarily inherent or ideal for computer programs. It was rather born out of the limitation of early computer architectures (single ALU, single memory bus, everything has to be synchronized, etc.) When multiple operations are happening in multiple places, imperative programming will no longer be suitable. Another thing to consider is that modern software needs to take care of thousands of different conditions (hardware configurations, user preferences, access controls, GUI events, network inputs and concurrency!). In a purely imperative style, it will blow up if-thens. We somehow needed to "abstract away" those conditionals. Programming styles don't evolve in vacuum. It's constantly reacting to ever changing needs from external forces like these.

William Taysom 2022-06-23 04:41:19

Think about it this way. Though an imperative, interactive, step-by-step process may be the domain of a program, it doesn't mean a language with imperative semantics is going to be a good fit for the domain. Often the process semantics and the imperative semantics have a big mismatch, take event driven UIs as an example.

Orion Reed 2022-06-21 14:12:53

Does anyone know of FOC orgs (businesses and/or communities) that are run cooperatively/democratically? What are your thoughts on the intersection of democratic self-management and FOC projects? There’s often talk of “democratising” computing, and to me this often feels half-baked when the control is ultimately in private/undemocratic hands, even if the technology has emancipatory potential.

Ivan Reese 2022-06-21 14:16:22

By "orgs" do you mean just businesses, or also communities (like this one)?

Orion Reed 2022-06-21 14:27:01

Either! But I had businesses in mind. Edited the post to clarify

Srini K 2022-06-21 14:28:22

Small Giants covers a lot of companies that are closer to this flavor, but none are FoC companies per say 🤔

amazon.com/Small-Giants-Companies-Instead-10th-Anniversary/dp/014310960X

Konrad Hinsen 2022-06-21 15:03:53

There are small-scale open source projects about scientific software that are run more or less democratically, if you count only regular contributors as "citizens". NumPy (numpy.org) comes to mind as an example. Larger projects invariably have a more oligarchic style, and none I am aware of make any attempt at integrating "mere" users into the decision making processes.

Srini K 2022-06-21 15:12:14

I’m involved with a project within the Apache Software Foundation and it has a cooperative style apache.org/foundation/governance/pmcs.html

People can become a committer or PMC member through sustained contributions

Daniel Krasner 2022-06-21 19:03:59

i think informal.systems is trying to do something along those lines

You are viewing archived messages. Go here to search the history.

You are viewing archived messages.
Go here to search the history.