This implements an object model where potentially big or mutable objects are stored in files to make them shareable across processes, plus a library of pure functions (in the mathematical sense, implemented as individual scripts) for working with them.
This is not really meant for production use, but for learning/illustration. It is way slower and more obfuscated than other functional language implementations. It aims to be correct / precise / safe and efficient except for a big constant factor (in time and space, compared to other functional language implementations), though. (But it isn't finished, see Todo.)
This (currently) uses Bash. Most functions / commands are implemented as bash scripts though (as opposed to bash functions sourced as a library), thus could be reimplemented in C instead and be used by any other shell, too. (Alternatively, they could all (in addition?) be offered as bash functions in a source'able file to reduce the fork/exec overhead; although see 'Tail-call optimization' for a problem that would arise with a pure source'ing approach.)
-
export PATH="<path-to>/functional-shell/system:$PATH" -
set MEM to a place to put the object storage, e.g.
export MEM=~/.functional-shell -
only once to create the object storage:
functional-shell-init -
only once after booting:
nohup id-server & disown(or run in a separate terminal to see error messages as they turn up)
-
Test the system:
test/str test/call test/lists test/boxes -
read the code of the scripts in the test directory (and the text below), do your own experiments...
All objects are passed as shortish strings that bundle both the type and then either the object value directly ('immediate object') or a reference, which is used to identify the filesystem item that stores the object contents.
All mutable data types (currently: string, pair) are allocated and passed by reference.
Currently implemented types:
type kind desc constructor accessors
------- --------- -------------------- ----------- -------------------------
number immediate stringified number number number-value
string allocated utf8-encoded string string string-value, string-path
pair allocated pair of two objects cons car, cdr
null immediate the empty list null or nil -
box allocated a mutable box box unbox, box_set
Each type has a type predicate which is named after the type with a
trailing 'P'. (E.g. pairP "$v" will be true iff $v refers to a pair
object.)
'Memory' is allocated by getting an id from the id-server and using that as the name for either the directory (records like pair) or file (single-field objects like string) to hold the data. The files of the former contain objects, the files of the latter binary data.
The constructor and accessor functions form an abstraction layer: memory allocation should only be done by the constructors, and accesses by the accessor functions, both of which hide the low-level details from the user.
There's no way to do optimized tail-calls to bash functions. Thus programs that need optimized tail-calls would either have to use trampolines, or more idiomatically, call the target 'function' as an external program using 'exec' (which is more uniform, too, but of course slower.)
For an example, see system/list_ref.
The Bash language does not offer closures, lexical variables, or local or anonymous functions. It does offer anonymous code as first-class values though (any string can be called as code reference (function or executable name), or using 'eval' even complete scripts, although functional-shell doesn't make use of the latter).
Local functions (be it named or anonymous ones) in a functional language can refer to the lexical context; since there is no mechanism in bash to capture that context transparently, there's no way around doing that explicitely instead.
So, the solution offered here is the following:
-
There's a
closuredata type, withcodeandenvfields.codeis a reference to a bash function or executable,envcan be anything (it's basically like thevoid*argument passed to callbacks in C),listcan be used to bundle multiple values. -
There's also a
codedata type, that only contains code and no env; its purpose is (together with theclosuretype) to offer a super type (a "callable") that generic functions can simplycall. (Example:mapusage in test/lists.) -
So, to create a closure instance, run e.g.:
cl=$(closure function_or_code_name "$(list "$val1" "$val2" "$val3")") -
to call it, run e.g.:
call "$cl" "$val4" "$val5"This has the same effect as running:
function_or_code_name "$(list "$val1" "$val2" "$val3")" "$val4" "$val5"
Some of the languages that support first-class functions (and hence closures), and also allow mutations of variable bindings, for example Perl or Scheme, will make the mutations visible to all closures referring to the same bindings. Others, like Python, do not share the mutations between closures. Yet others (like ML or Haskell) don't offer mutation of bindings at all; ML offers boxes.
There's no way to share variable mutations in Bash (again, they aren't even lexical variables, thus the life time of a variable binding can be shorter than the life time of one of our closures.) As a solution for cases where the algorithm asks for mutable bindings, a box data type is offered here. Instead of mutating variables directly, use as follows:
b=$(box "$val1")
set_box "$b", "$val2"
unbox "$b" # prints $val2
See test/boxes for an example involving closures.
Problems compared to some other "real" programming languages (like Scheme or other Lisps):
-
Bash offers no safe way around using double quotes around all variable references (except in assignment context (is this actually correct?)).
IFS=is not enough in some cases (with empty strings as values). This entails a permanent risk of omission with possible implications for safety/security.Although, when using tagged values throughout the whole application (eliminating the empty string case) ditching the quotes might be safe again. (Find out and report back.)
-
The only exception mechanism that bash offers is
set -euo pipefail, but this only works in the top level contexts of files and function bodies, not in nested expressions. This means that for exception propagation to work, one cannot use subexpressions, and instead has to unnest them and assign temporary variables (basically static single assignment form). Worse,localinhibits the exception propagation, too (but only from failing subcommands, not failing variable accesses, i.e.set -uenforces termination whereasset -edoesn't; crazy). All of this means that for example a function that can be written in Scheme like(define (map fn l) (if (null? l) '() (cons (fn (car l)) (map fn (cdr l)))))becomes
map () { set -euo pipefail # ^ XX: necessary in every function? It was in some cases, # I have not properly tracked this down though. local fn="$1" # XX: is it a good idea to rely on set -u working local l="$2" # through `local` in the future? if nullP "$l"; then null else local a a=$(car "$l") local v v=$(call "$fn" "$a") local r r=$(cdr "$l") local rr rr=$(map "$fn" "$r") cons "$v" "$rr" fi }(Ok, the
rintermediate variable could have been saved since there's no possibility forcdrto be failing here. But, rather be consistent than smart.)Note that a style like
local a=$(car "$l") || diedoes not help either:
localstill clobbers the exception here. Bash is just that bad. (Of course this has nothing to do with functional-shell or functional programming; it's just that if you have an expectation of safety, you don't want programs to continue after running into unhandled errors.) -
The above actually clobbers exceptions happening in
nullP. This could be remedied by changing predicates to return booleans via stdout instead of their exit code, which would also have the advantage of being consistent with how values are returned otherwise. It will turn the above into:map () { set -euo pipefail local fn="$1" local l="$2" local t t=$(nullP "$l") if is_true "$t"; then ...
-
implement filter: this is a good exercise
-
implement delay, force, lazy list (stream) library
-
implement garbage collection (this will make the syntax even uglier since it requires registration of all bindings, which will be instructive for understanding but very painful; perhaps a memory-scanning approach can be used instead that linearly scans process RAM by way of ptrace or /proc/$pid/mem for reference strings, that should be neat if there's no superfluous retention of such strings by bash or malloc, but that latter point may be an issue; this would have an issue with references that have been written to pipes but not yet read (and thus are present in a kernel buffer only), but echo and read wrappers could be used to add/remove these values to/from the root set explicitely.)