Outline of a Swift-like domain specific language in Haskell

ben / tech / swiftlike-dsl

about

I worked for a while on Swift, "a system for the rapid and reliable specification, execution, and management of large-scale science and engineering workflows." Swift workflows are driven by a programming language that is (sometimes) called SwiftScript.

A recurring question amongst the researchers and developers was whether it was the right thing to do to build a programming language from the ground up for this rather than producing a library or embedded language. The historical reason for this is that SwiftScript evolved from a much simpler language, VDL (the Virtual Data Language) which was a graph specification language for DAGs (directed acyclic graphs) of tasks. Over time, this evolved to the much richer SwiftScript without it clearly being intended at the start to be a general purpose Turing-complete language. Whilst this is a legitimate justificiation for SwiftScript to be like it is now, it does not answer the question; and to even attempt to answer that, I think it would be necessary to go some way in prototyping an embedded language or library.

As I have a hobbyist interest in functional programming, and found working on Swift very interesting, I sometimes muse over what such a prototype would look like in Haskell. This page is a repository for such musings.

Features and goals

What are the features that SwiftScript as its own language provides? And what are the features that we would hope to get from SwiftScript inside another language? What are general goals of Swift?

General goals

To provide straightforward script-like access to large amounts of compute power and data.

Benefits of being own language

Syntax can be made easy on the user compared to embedding in another language where there may be awkward boiler-plate: 1. consider how futures would look in Java, as a wrapper object which can be hidden entirely in a custom syntax) 2. Shell commands are specified in a more shell-like syntax that differs from the rest of the language.
Massive parallelism in the runtime may be more easily accomodated as threads can be lighter weight (one million files to be processed = one million threads)
Computation in Swift is partly ordered as tasks (which run often on other systems) complete, rather than entirely chosen by the runtime system. Construction of composite data structures (arrays and structs) happens asynchronously, and it is desirable to operate on partially constructed data structures where possible.
Provenance - automatic recording of computations performed. When I worked on Swift, I found it hard to get concrete user experience with provenance to get a feel for what is really useful there, and so what this should really look like.
File mapping - data stored outside of the runtime system, in files, can be addressed in a very similar way to data stored inside the runtime system (for example, integers), and movement of that data to execution sites is automatic. Such data can also have structure exposed at the language level (the classic example being MRI data consisting of a .hdr and a .img file being referenced as a single composite item in the language, like C structures)

Benefits of being embedded

Access to native libraries (such as file access and specialised algorithms). In SwiftScript, there is even vagueness surrounding how numbers should behave (for example, when floats and ints are automatically cast to each other), which would likely be solved by embedding, as that behaviour would come from the hosting language.

Implementation Ideas

I think there can be two broadly orthogonal pieces of implementation: 1. Ordering of execution as tasks finish asynchronously elsewhere, and 2. Managing data out-of-core, that can be combined to provide much of the desired functionality.

Asynchronous exection

Make something like a future. This will likely be a monad, although in many cases can be used as an applicative functor.

When can values in this monad be converted into normal values? Only at the end of the entire execution of a program?

Functions to process normal values can be lifted into the monad: a function will be executed when all of its input parameters are materialised, and the future it represents will at that point be materialised. This keeps an implicit assumption from Swift that locally executed stuff (in the Swift-as-dsl case - in-core rather than out-of-core code) is cheap to execute. This may not always be true though, and perhaps concurrent or parallel haskell could be used to handle that.

When applicative functor syntax can be used, there will be extra syntax, which adds to the ugliness. To address this, perhaps haskell syntax could be extended so that when an applicative functor type is used, normal function application syntax can be used instead of the applicative function application operators. This should work for any applicative functors. It brings more ambiguity into type inference. This is the only proposed haskell language extension.

Related to futures will be the tasks which generate/materialise the values. There will be many of these, in general, and not all can be executed at once. So some scheduler (which could be naive in the base implementation as long as its clear how to add a new one) should be implemented. This scheduler can be made aware of dependencies of futures. (and could perhaps pass them on to an underlying execution system, when that underlying execution system can handle dependency input - rather than handle them all in-core as the present Swift engine does - and in the case of multiple execution systems able to handle dependencies, only those dependencies passing between execution engines would need to be handled by Swift.)

TODO

things that may or may not be related to implementation as an edsl: performance characterisation is desirable (the swift-plot-log stuff and related); debugging (of parallel distributed system...)

Vague future ideas for swift that might be sketched out here: streaming (use same syntax for datasets, but have them always growing (eg over years) and so evaluating things partially...)