Outline of a Swift-like domain specific language in Haskell

ben / tech / swiftlike-dsl

about

I worked for a while on Swift, "a system for the rapid and reliable specification, execution, and management of large-scale science and engineering workflows." Swift workflows are driven by a programming language that is (sometimes) called SwiftScript.

A recurring question amongst the researchers and developers was whether it was the right thing to do to build a programming language from the ground up for this rather than producing a library or embedded language. The historical reason for this is that SwiftScript evolved from a much simpler language, VDL (the Virtual Data Language) which was a graph specification language for DAGs (directed acyclic graphs) of tasks. Over time, this evolved to the much richer SwiftScript without it clearly being intended at the start to be a general purpose Turing-complete language. Whilst this is a legitimate justificiation for SwiftScript to be like it is now, it does not answer the question; and to even attempt to answer that, I think it would be necessary to go some way in prototyping an embedded language or library.

As I have a hobbyist interest in functional programming, and found working on Swift very interesting, I sometimes muse over what such a prototype would look like in Haskell. This page is a repository for such musings.

Features and goals

What are the features that SwiftScript as its own language provides? And what are the features that we would hope to get from SwiftScript inside another language? What are general goals of Swift?

General goals

Benefits of being own language

Benefits of being embedded

Implementation Ideas

I think there can be two broadly orthogonal pieces of implementation: 1. Ordering of execution as tasks finish asynchronously elsewhere, and 2. Managing data out-of-core, that can be combined to provide much of the desired functionality.

Asynchronous exection

Make something like a future. This will likely be a monad, although in many cases can be used as an applicative functor.

When can values in this monad be converted into normal values? Only at the end of the entire execution of a program?

Functions to process normal values can be lifted into the monad: a function will be executed when all of its input parameters are materialised, and the future it represents will at that point be materialised. This keeps an implicit assumption from Swift that locally executed stuff (in the Swift-as-dsl case - in-core rather than out-of-core code) is cheap to execute. This may not always be true though, and perhaps concurrent or parallel haskell could be used to handle that.

When applicative functor syntax can be used, there will be extra syntax, which adds to the ugliness. To address this, perhaps haskell syntax could be extended so that when an applicative functor type is used, normal function application syntax can be used instead of the applicative function application operators. This should work for any applicative functors. It brings more ambiguity into type inference. This is the only proposed haskell language extension.

Related to futures will be the tasks which generate/materialise the values. There will be many of these, in general, and not all can be executed at once. So some scheduler (which could be naive in the base implementation as long as its clear how to add a new one) should be implemented. This scheduler can be made aware of dependencies of futures. (and could perhaps pass them on to an underlying execution system, when that underlying execution system can handle dependency input - rather than handle them all in-core as the present Swift engine does - and in the case of multiple execution systems able to handle dependencies, only those dependencies passing between execution engines would need to be handled by Swift.)

TODO

things that may or may not be related to implementation as an edsl: performance characterisation is desirable (the swift-plot-log stuff and related); debugging (of parallel distributed system...)

Vague future ideas for swift that might be sketched out here: streaming (use same syntax for datasets, but have them always growing (eg over years) and so evaluating things partially...)