From 5b90b4e4cb555e8df4ac06927efe32cf2c2b0bc6 Mon Sep 17 00:00:00 2001 From: DavePearce Date: Wed, 15 Jun 2022 17:34:51 +1200 Subject: [PATCH 1/2] Initial rectype RFC --- text/0000-rectype.md | 93 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 text/0000-rectype.md diff --git a/text/0000-rectype.md b/text/0000-rectype.md new file mode 100644 index 0000000..1ba363f --- /dev/null +++ b/text/0000-rectype.md @@ -0,0 +1,93 @@ +- Feature Name: `rectype` +- Start Date: `15-06-22` +- RFC PR: (leave this empty) + +# Summary + +Recursive types can be defined in Whiley using the `type` keyword. +Whilst this is syntactically neat, it does not give any indication to +the compiler has to how best to *implement* them. This RFC proposes a +requirement that recursive types are defined using `rectype` instead +of `type`. + +# Motivation + +Recursive types are easily defined in Whiley using `type` as follows: + +```Whiley +type List is null|{int data, List next} +``` + +This type is _recursive_ because it is defined in terms of itself. As +such, any implementation on a low-level platform will have to +implement the type using a reference. We say that the recursive type +must be _broken_ into its flat and reference components. To +understand this better, lets imagine one "low-level" encoding of the +above: + +``` +typedef List = null | &{int data, List next} +``` + +Here, a variable of type `List` is a union of `null` and a reference. +On a 64-bit architecture, for example, this requires `65` bits to +encode (ignoring trickery with word aligned pointers). An alternative +encoding is this: + +``` +typedef List = &(null | {int data, List next}) +``` + +In this case, a variable of type `List` is just a reference and, +hence, fits into `64` bits on a 64-bit architecture. + +The point here is not that one representation is better than another. +Only that there are options here and the final decision is currently +left to the compiler. We can take this a step further as follows: + +```Whiley +type Message is {int header}|{int header, Payload data} +type Payload is {bytes[] bytes, null|Message sub} +``` + +Whilst this example is contrived, it illustrates the issue: where do +we break the recursive type `Message`? There are various places it +could be broken. For example, `Payload` could be implemented as a +reference; or, `Payload` could be implemented as a flat structure. +Leaving the compiler to make this decision is somewhat risky. For +example, passing recursive types across FFI boundaries would require a +deep understanding of the algorithm. Furthermore, it might be subject +to change as the compiler evolves. Therefore, instead, _requiring_ +programmer direction on the breakpoint is the intent of this proposal. + +# Technical Details + +This should provide a detailed discussion regarding the technical +aspects of the proposal. This should clearly identify what exactly is +being changed (e.g. what the new syntax is, what the new feature does, +etc). This should also detail what parts of the compiler need to be +changed in order for the change to be implemented. + +# Terminology + +In changing the language, it is important to develop a suitable +"vernacular" for talking about the new aspects of the language. For +example, if a new piece of syntax is proposed, then a standard way of +referring to this should be proposed. Likewise, if a new phase of the +compiler is introduced, then suitable terminology for discussing the +various aspects of this should be developed. + +# Drawbacks and Limitations + +This should identify any potential drawbacks or limitations of the +proposed change. For example, if the proposed change would break +existing code, this should be clearly identified. Likewise, if the +change could impact upon other proposed changes, this should be +identified. + +# Unresolved Issues + +List any currently unresolved aspects of the change proposal. These +will need to be adequately resolved before the RFC is accepted. +Identifying these issues provides a way for the author(s) of the RFC +to leverage the community in finding appropriate solutions. From a2a732c79b55ce62387719aebcc07ea8af3ceaf0 Mon Sep 17 00:00:00 2001 From: DavePearce Date: Thu, 16 Jun 2022 12:16:55 +1200 Subject: [PATCH 2/2] Completed draft! --- text/0000-rectype.md | 144 +++++++++++++++++++++++++++++++------------ 1 file changed, 104 insertions(+), 40 deletions(-) diff --git a/text/0000-rectype.md b/text/0000-rectype.md index 1ba363f..064ce3d 100644 --- a/text/0000-rectype.md +++ b/text/0000-rectype.md @@ -5,33 +5,31 @@ # Summary Recursive types can be defined in Whiley using the `type` keyword. -Whilst this is syntactically neat, it does not give any indication to -the compiler has to how best to *implement* them. This RFC proposes a -requirement that recursive types are defined using `rectype` instead -of `type`. +This offers syntactic simplicity but offers no hints as to how they +should be *implemented*. This RFC proposes a requirement that +recursive types be defined using `rectype` instead of `type`. # Motivation -Recursive types are easily defined in Whiley using `type` as follows: +Recursive types are currently defined in Whiley using `type` as follows: ```Whiley type List is null|{int data, List next} ``` -This type is _recursive_ because it is defined in terms of itself. As -such, any implementation on a low-level platform will have to -implement the type using a reference. We say that the recursive type -must be _broken_ into its flat and reference components. To -understand this better, lets imagine one "low-level" encoding of the -above: +This type is _recursive_ because it is defined in terms of itself. +Any "low level" translation will need use a reference somewhere to +implement this. We say that the recursive type must be _broken_ into +its flat and reference components. To understand this better, imagine +an encoding of the above in an imaginary "low-level" language: ``` typedef List = null | &{int data, List next} ``` -Here, a variable of type `List` is a union of `null` and a reference. -On a 64-bit architecture, for example, this requires `65` bits to -encode (ignoring trickery with word aligned pointers). An alternative +A variable of type `List` is a union of `null` and a reference. On a +64-bit architecture, for example, this requires `65` bits to encode +(ignoring trickery with word aligned pointers). An alternative encoding is this: ``` @@ -50,44 +48,110 @@ type Message is {int header}|{int header, Payload data} type Payload is {bytes[] bytes, null|Message sub} ``` -Whilst this example is contrived, it illustrates the issue: where do -we break the recursive type `Message`? There are various places it -could be broken. For example, `Payload` could be implemented as a +Whilst this example is contrived, it illustrates the issue: _where do +we break the recursive type here?_ There are various places it could +be broken. For example, `Payload` could be implemented as a reference; or, `Payload` could be implemented as a flat structure. Leaving the compiler to make this decision is somewhat risky. For example, passing recursive types across FFI boundaries would require a -deep understanding of the algorithm. Furthermore, it might be subject -to change as the compiler evolves. Therefore, instead, _requiring_ -programmer direction on the breakpoint is the intent of this proposal. +deep understanding of the decision process used by the compiler. +Furthermore, it might be subject to change as the compiler evolves. +Therefore, instead, _requiring_ programmer direction on the breakpoint +is the intent of this proposal. # Technical Details -This should provide a detailed discussion regarding the technical -aspects of the proposal. This should clearly identify what exactly is -being changed (e.g. what the new syntax is, what the new feature does, -etc). This should also detail what parts of the compiler need to be -changed in order for the change to be implemented. +This document proposes that every recursive type should be +_explicitly_ broken by the programmer using a `rectype` definition. +In particular, failure to do this should generate a compile-time +error. Intuitively the intention is that, at a low level, a `rectype` +is implemented as reference. + +## Overview + +Consider again our `LinkedList` example. Under this proposal, our +previous definition of `LinkedList` would generate a compile-time +error. However, we can update it to use `rectype` as follows: + +```Whiley +rectype List is null|{int data, List next} +``` + +This now compiles, and explicitly demarks `LinkedList` as a recursive +type. Furthermore, the expected translation of this into our +low-level language would be: + +``` +typedef List = &(null | {int data, List next}) +``` + +The key is that there is _exactly one expected translation_. If we +wanted to get something close to the other translation of `LinkedList` +shown above, we would have to write the code differently like this: + +```Whiley +rectype Link is {int data, List next} +type List is null|Link +``` + +In this case, `Link` is implemented as a reference whilst `List` is +implemented inline. + +## Nesting + +Whilst it is a requirement that any recursive type is broken at least +once, it is permitted to have multiple breaks. For example, the +following illustrates: + +```Whiley +rectype Link is {int data, List next} +rectype List is null|Link +``` + +In this case, `List` has two breaks and now corresponds with this +low-level representation: + +``` +typedef Link = &{int data, List next} +typedef List = &(null | Link) +``` + +Here we see that both `List` and `Link` are translated as references. +Whilst this is unlikely to be an efficient implementation, the point +is that the programmer has control. The reason for allowing this is +that we imagine situations where it is desirable or even necessary. +For example, a `Stmt` type which is itself recursive (e.g. through +compound statements such as `WhileStmt`) and contains potentially +recursive expressions (of which some, such as lambdas, might contain +statements). + +## Non-recursive Recursive Types + +This proposal does not require that a `rectype` is _actually_ +recursive. For example, the following should compile without error: + +```Whiley +rectype Point is {int x, int y} +``` + +The key, however, is that this _would still be translated as a +reference_. The reasoning here is that there is no need to prohibit +non-recursive `rectype`s (and potentially they might be useful in some +situations). # Terminology -In changing the language, it is important to develop a suitable -"vernacular" for talking about the new aspects of the language. For -example, if a new piece of syntax is proposed, then a standard way of -referring to this should be proposed. Likewise, if a new phase of the -compiler is introduced, then suitable terminology for discussing the -various aspects of this should be developed. + * **Recursive Type**. A _recursive type_ is one declared with `rectype`. + + * **Flat Type**. A _flat (or inline) type_ is one declared with `type`. # Drawbacks and Limitations -This should identify any potential drawbacks or limitations of the -proposed change. For example, if the proposed change would break -existing code, this should be clearly identified. Likewise, if the -change could impact upon other proposed changes, this should be -identified. +N/A # Unresolved Issues -List any currently unresolved aspects of the change proposal. These -will need to be adequately resolved before the RFC is accepted. -Identifying these issues provides a way for the author(s) of the RFC -to leverage the community in finding appropriate solutions. + * **Checking**. Its not immediately clear how best to implement + the compiler check that recursive types are broken using + `rectype`. This requires some kind of depth-first search to + identify cycles in the type graph.