* Semantics - Synopsis should be defined in a simple mathematical model helps us reason abt correctness, execution time, memory usage etc. 4 widely used approaches. | Model | Description | Suitable for | Example | |--------------+-----------------------------+---------------+---------| | Operational | how to execute | Imperative | | | | a statement in terms of an | Declarative | OZ | | | *abstract machine* | Logical | | |--------------+-----------------------------+---------------+---------| | Axiomatic | Statement = *relation* | statement | Hoare | | | b/w input and output states | sequence | Logic | | | | | | | | relation is a logical | stateful | | | | | models | | |--------------+-----------------------------+---------------+---------| | Denotational | statement = *function* over | Declarative | | | | an abstract domain | | | |--------------+-----------------------------+---------------+---------| | Logical | statement = *model* for | | | | | logical theorems | Declarative | | | | | Relational | | |--------------+-----------------------------+---------------+---------| Two step approach: (1) Translate every program into an equivalent one in a simple core language called the *kernel language*. (2) Define /operational semantics/ for the kernel language. Other approaches: Translator Approach (1) Foundational Calculus approach: translate into a mathematical calculus. Example, the lambda calculus, first-order logic, or pi-calculus. (2) Translation directly into instructions on an abstract machine, or a virtual machine. Interpreter Approach o Language Semantics is defined by an interpreter. o Linguistic Abstraction == extending the interpreter. o Interpreter is a program in L1 that accepts programs in L2 and executes them. o Metacircular Evaluator : L1 = L2 + Self-contained implementation of linguistic abstractions - does not preserve execution-time complexity + basic concepts interact inside the interpreter. ------------------------------------------------------------------------ * Single-Assignment Store set of variables that are initially uninitialized. can be assigned to one value They are called *declarative variables* (dataflow variables). A declarative variable is indistinguishable from its value. Value Store: persistent mapping of variables to values. + We can compute with partial values. e.g. Bind an unbound argument as the output of a procedure ** Value Creation (Binding) x = 314 (1) Construct value in the store (2) Bind x to this value. If x is already bound, '=' tests whether the operands are compatible. Incompatible operands cause an exception. ** Variable Identifier textual name that refers to a store entity from outside the store. Environment { -> x} A variable identifier can refer to a *value* or another *bound variable*. Following the links of bound variables is called *dereferencing*. This is invisible to the programmer. ** Partial Values Data structure that may contain unbound variables. ** Variable-to-variable binding X = Y => forms an equivalence class in the store X = [1 2 3] {x1, x2} We can even have circular chain of references: X = [1 2 X] ** Use before binding Ways of handling: (1) No error in execution: Garbage Output. e.g. C/C++ (2) No error in execution, initialized to default value. (3) Exception in Runtime : Java (4) Compiler Error (5) Wait until possibly another path binds the variable. (3) and (4) are good strategies for sequential programs. (5) is a good strategy for concurrent programs. Could it be bad for sequential programs? ------------------------------------------------------------------------ * Kernel Language ** Backus Naur Form (Above Kernel Language) ** Language Let and denote any variable identifier, any statement, any value, and some pattern. | Statement | Description | |------------------------+---------------------------| | s::= | | | skip | empty statement | |------------------------+---------------------------| | 1 2 | Statement sequence | |------------------------+---------------------------| | local in end | Variable creation | |------------------------+---------------------------| | 1 = 2 | Variable-Variable binding | |------------------------+---------------------------| | = | Value creation | |------------------------+---------------------------| | if | Conditional | | then 1 | | | else 2 end | | |------------------------+---------------------------| | case | Pattern matching | | of then 1 | | | else 2 end | | |------------------------+---------------------------| | { 1 ... m} | Procedure application | |------------------------+---------------------------| The following are the value expressions in the language (the above.) | Nonterminal | Expression | Comment | |---------------------+----------------------------+------------------------| | | OR | Note that procedures | | | OR | are values | | | | | |---------------------+----------------------------+------------------------| | | OR | Integers are | | | | arbitrarily large | | | | Floats are of | | | | arbitrary precision | |---------------------+----------------------------+------------------------| | , | OR | nil, 'fine day' | | | (1: 1 | are patterns | | | ... | | | | n: n) | | |---------------------+----------------------------+------------------------| | | proc{$ 1 ... m} | Procedure values are | | | end | called *closures* | |---------------------+----------------------------+------------------------| | | OR | | |---------------------+----------------------------+------------------------| | | OR OR | integers are default | | | | features if labels are | | | | omitted. | |---------------------+----------------------------+------------------------| | | true OR false | | |---------------------+----------------------------+------------------------| A type is a value together with operations defined on it. The system has a well-defined set of types called the basic types. For example, the following is a fragment of the hierarchy of the basic types. This hierarchy is ordered by set inclusion (for example, every list is a tuple) Value | +-------------------------------------+... | | | Number Record Procedure | | +-------+ Tuple | | | Int Float +--------------------+... | | | Char Literal List | | +-----------+... String | | Bool Atom Abstract Data Types: User-defined types which are not basic. Oz is dynamically typed. Type errors in the (basic) declarative model cause immediate termination. If we extend the model with exceptions, type errors can be handled within the system. Notes on basic types: Numbers: Unary minus is written with a tilde prefix: e.g. ~10 Strings: A string "ex" is a list of character codes [101 121] Procedures: = proc {$ 1 ... m} end has the syntactic shortcut proc { 1 ... m} end This has two operations which are distinct in the elaborate version: (1) creating the procedure value (2) Binding it to the identifier Operations (on basic types) | Operation | Description | Argument Type | |-----------------+------------------+----------------| | A==B | | Value | | A\=B | Not Equals | " | | {IsProcedure B} | | " | | A==B | !!!! | " | | A>B | | " | | A+B | | Number | | A-B | | " | | A*B | | " | | A div B | Integer Division | Int | | A mod B | Modulo | Int | | A/B | Integer Division | Float | | {Arity R} | Arity | Record | | {Label R} | | " | | R.F | Field selection | " | |-----------------+------------------+----------------| ------------------------------------------------------------------ * Rationale Can the kernel language be further reduced while keeping the expressivity unchanged? Why are some of the above features considered basic? ** Records basic way to structure data. easy to compose, and decompose. decompose using patterns several utility methods: to find arity, to select a field etc. Building block for (1) Object-Oriented Programming (see PlanePoint.oz) (2) Component-based Programming ** Procedures Why not objects? or functions? Functions in Oz return a value. Procedures can define entities that are not necessarily like mathematical functions. ----------------------------------------------------------------- * Kernel Language Semantics Consists in evaluating functions over partial values. Operational Semantics with respect to an abstract machine. ** Basic Concepts *** Simple execution -------------------------- local A B in : create 2 variables in the store, make A, B point to them A = 11 : Bind A to 11. B = A + A : Add A to itself, bind the result to B end ------------------------- *** Variable identifiers and Static Scoping -------------------------- local X in X = 1 local X in X = 2 {Browse X} : value of an identifier is determined by end its innermost local declaration. {Browse X} Can be determined from the *text* of end the program. -------------------------- How else can this be done? What could be dynamic scoping? *** Procedures Parameters are passed by reference. Does not return a value. Produces effects by binding its unbound arguments. -------------------------- proc {Max X Y ?Z} ? is an annotation for o/p [No effect] if X>Y then Z=X else Z=Y end : Z is unbound when input, bound inside end the procedure. -------------------------- -------------------------- proc {LB X ?Z} if X>Y then Z=X else Z=Y end : Y is a free variable. end -------------------------- Free variables take the values they have in the *text*, when the procedure is *defined* (and not the value it has when it is invoked). -------------------------- local Y in Y=0 proc {LB X ?Z} if X>Y then Z=X else Z=Y end : Y is 0, and not 15 (Static scoping) end local Y = 15 Z in {LB 5 Z} end end -------------------------- When could dynamic scoping be useful? e.g. locally configurable parameters deeply passed-on arguments. e.g. fn(Z,X1) : X1's value is used here ... f2(Y,X1) f1(X,X1) : X1's value is determined here /can be simplified with dynamic scoping When could dynamic scoping be harmful? Harder to reason about programs from the text! Does not support *modular* reasoning. A procedure that was correct may behave incorrectly in other contexts. *** Procedural Abstraction Any statement can be made into the body of a procedure. The variables used in the statement can be made either into formal parameters of the procedure, or into free variables. The free variables are scoped statically. *** Dataflow behavior In a single assignment store, variables can be unbound. When a statement needs a variable which is as yet unbound, it waits till the object is bound. (This could happen in a different thread of execution.) This is called dataflow behavior. Dataflow behavior enables concurrency Can be implemented in the abstract machine in a simple way. ** Abstract Machine *** Definitions A *single assignment store* \s: a set of variables partitioned into (1) Equal but unbound variables (e.g. after a variable-to-variable assignment) (2) Variables bound to a a value - number, record or procedure. e.g. {x1, x2=x3, x1=a|x2} A stored variable bound to a value is indistiguishable from that value. An *environment* E: mapping from variable identifiers to entities in the single assignment store. e.g. E is { X->a, Y-y} A *semantic statement* is a pair (,E) where is a statement E is an environment Semantic statement /relates/ the statement to what it references (many-to-many) e.g. (local X in X=10 end, {X->x}) An *execution state* is a pair (ST, \s) where ST is a semantic stack: a stack of semantic statements \s is a single-assignment store. | Semantic Stack | Single-Assignment Store | |-------------------------------+-------------------------| | (local X in X=10 end, {X->x}) | x->10 | |-------------------------------+-------------------------| A *computation* is a sequence of execution states starting from the initial. A single transition in a computation is called a *computation step*. A computation step is atomic, there are no visible intermediate states. Questions: (A) Why is an environment associated with every statement? Why not just have a global store? (B) Why do we separate the identifiers and the value store variables? *** Program Execution A program is a statement . The initial execution state s | (, O) | O | |----------+---| o At each step, the first element of ST is popped and execution proceeds according to the form of the statement. o The final execution state, if the program terminates, is one in which the semantic state is empty. A semantic stack can be in + Runnable State + Terminated State + Suspended state. *** Calculating with Environments E(): entity associated with variable in the store. Adjunction: E+{->x} overrides any existing mapping of e.g. in lexical scoping, not reassignment. Restriction: defines a subdomain of an existing one. e.g. E|_{1, ..., n} used in defining closures. ** Nonsuspendable statements +----------------------------------------------------------------- |(skip, E) | execution is complete after popping this | | | statement from the semantic stack. | +-------------+--------------------------------------------------+ | | | |(1 2, | (1) Push (2,E) on to the stack. | | | (2) Push (1,E) on to the stack. | | | Q: Why doesn't the environment change? | +-------------+--------------------------------------------------+ | | | |(local | (1) Create a new variable x in the store | |in end,E)| (2) E' = E + { -> x} | | | (3) Push (,E') | +-------------|--------------------------------------------------+ |(1=2,E)| Bind E(1) to E(2) in the store. | +-------------|--------------------------------------------------+ |(=,E) | (1) Create a new variable x in the store. | | | (2) Construct the value represented by | | | (3) Let refer to it. | | | (4) All identifiers in are replaced by | | | contents as given by E. | | | (5) Bind E() and in the store. | +-------------|--------------------------------------------------+ N.B. Procedure values are also called *closures*. The procedure body can have free variable identifiers. formal parameters external references The procedure value is a pair (proc {$ 1 ... m} end, CE) where CE is E|_{1 ... k} is the *contextual environment*. -------------------------------------------------------------------- ** Suspendable statements. If a variable is unbound, the execution is suspended until *activation* : E() must be determined. In the declarative sequential model, if an execution is suspended, it will never continue. |----------------------------------+---------------------------------------| | (if then 1 else 2 end, | If E() is determined | | E) | . If E() is not a boolean | | | . raise an error condition | | | . If E() is true | | | . push(1, E) | | | Else | | | . Suspend execution | |----------------------------------+---------------------------------------| | ({ 1 ... n}, | If E() is determined | | E) | . If E() not a procedure value | | | . or does not have n arguments | | | . raise an error condition | | | . If E()::(proc {$ 1 ... n} | | | . end, CE) | | | . push(, CE+{1 -> 1, | | | . ..., n -> n}) | | | Else | | | . Suspend execution | |----------------------------------+---------------------------------------| | (case of (1:1 | If E() is determined | | ... n = n) | . If the label of E() is and | | then 1 else 2 end, E) | . its arity is [1 ... n] | | | . push(1, E+{1 -> E().f1, | | | . ..., n = E().fn}) | | | . Else | | | . push(2, E) | |----------------------------------+---------------------------------------| * Execution Examples ** Variable Identifiers and Static Scoping -------------------------- local X in ------+ X=1 | local X in --+ X=2 | | {Browse X} 1 | end --+ | {Browse Y} 2 | end ------+ -------------------------- Sequence of Execution states. 1. (s,O) O 2. After executing the local statement, and binding X=1 (1 2 , {X-> {x->1} 3. After executing the sequential composition (1, {X->x}) {x->1} (2, {X->x}) 4. Executing local statement in 1 (X=2 {Browse X}, {X->x'}) {x', x->1} (2, {X->x}) 5. After the binding X=2 ({Browse X}, {X->x'}) {x'->2, ({Browse X}, {X->x}) x->1} ** Procedure Definition and Calls First, a procedure with no external references in its body. -------------------------- 1. local Copy in 2. local B in 3 local A in 4. Copy = proc {$ X ?Y} 5. Y=X 6. end 7. B = 1 8. {Copy B A} 9. end 10. end 11. end --------------------------- 1. The initial execution state is (1-12:: O) O 2. After executing the three local declarations, (4-6,7,8::{Copy->c,B->b,A->a}) {c,b,a} 3. After executing the two bindings, (8::{Copy->c, B->b, A->a}) {c->proc...end, b->1, a} 4. After executing the procedure application, (5-5::{Copy->c, B->b, A->a}) {c->proc...end, b->1, a} The environment is unchanged since there are no external references in the body of the procedure. 5. After executing the assignment, the statement stack is empty, and a is bound to 1. Now, we consider a procedure with an external reference. -------------------------- 1. local Y in 2. Y=2 3. local CopyConstant in 4. local A in 5. CopyConstant = proc {$ B ?A} 6. A=Y 7. end 8. {CopyConstant 1 A} 9. end 10. end 11. end --------------------------- The procedure value in this case is (proc {$ B A} A = Y end, {Y->y}) ====== where the store contains y->2 Procedure application has the following change: the new environment is computed starting from the contextual environment. (6-6::{Y->y, B->b, A->a}) {CopyConstant->proc...end, a, b->1, y->2} After the statement is executed, the stack is empty, with a bound to 2. * Memory Management A running program needs information only in the semantic stack and the part of the store reachable from it. We need to keep only the semantic stack and the reachable store. (*active memory*) +----------+ Dealloc +-------------+ | |--------->| | | Active |<---------| Free | | | Alloc | | +----------+ +-------------+ | Execution ^ V | +----------+ | | | | | Inactive |------------------+ | | Garbage Collection +----------+ If inactive memory is not reclaimed (automatically by the garbage collector, or by the programmer), there are two errors that might result. (1) Dangling Reference: When a target is replaced even when it is reachable. (There are references to the freed block. (2) Memory Leak: When memory is not reclaimed even when it isn't reachable. Is being unreachable the same as "going out of scope"? Automatic Variables. ** Garbage Collection Starting with Lisp, many lanugages do automatic garbage collection. An easy way: counting references. What is the problem? A typical garbage collector has two phases. (1) Determine the active area (Mark) (2) Compact the active area. (Sweep) A garbage collector may run in batch mode, or "real-time" interleaved mode. Can there be memory leaks in a garbage-collected language? ** An Example -------------------------- 1. proc {Loop4 I} 2. if I == 10 then skip 3. else 4. {Browse I} 5. {Loop4 I+1} 6. end 7.end 8.{Loop4 0} -------------------------- Execution States: 1. (8, E0) \s After executing the if statement 2. (4, {I->i_0}) {i_0 = 0} union \s (5, {I->i_0}) 3. (5, {I->i_0}) {i_0 = 0} union \s 4. (4, {I->i_1}) {i_1 = 1, i_0 = 0} union \s (5, {I->i_1}) 5. (5, {I->i_1}) {i_1 = 1, i_0 = 0} union \s 6. (4, {I->i_2}) {i_2=2, i_1 = 1, i_0 = 0} union \s (5, {I->i_2}) 7. (5, {I->i_2}) {i_2=2, i_1 = 1, i_0 = 0} union \s ** Managing External References When Mozart system needs references to (1) External Resources. e.g. File, GUI When such an entity becomes unreachable, then cleanup is done ---Paste--- through *finalization*. (2) When an external resource needs a Mozart resource. It is not easy to know this in general. So keep a proxy that is active as long as needed (run it in a thread). When the external resource does not need the resource any longer, then it notifies the proxy. e.g. Database servers. ** Mozart Garbage Collector Copy Dual-Space: Partition the memory into equal halves. When out of memory, copy only the active area into the other half. This automatically compacts the active area. + Running time is proportional to the size of the active area, not the size of the memory. - Long-lived data, like system symbol tables collected repeatedly - Half the memory is wasted. Remedies Generational Garbage Collector ** Current: ?? * Building a Practical language. ** Syntactic Sugar *** Declarations and Bindings (1) We allow nested partial values. For example, employee(age:24 salary:Base+0.2*Base) (2) Implicit Variable Initialization. e.g. local X=10 in {Browse X} end In general, local = in end The variables in are implicitly declared. The variables in are not. e.g. local tree(node:X left:L right:Y)=T in skip end is equivalent to local X L R in T=tree(key:X left:L right:R) skip end *** Expressions synctactic sugar for a sequence of operations that return a value. How do you perform X*X in the kernel language? *** Statements as expressions (1) Nesting Markers: One way to turn a statement into an expression Nesting Marker '$' Expression's value: what is at the position of $. e.g. If {Plus 1 2 X} does X=1+2 Then Y={Plus 1 2 $} does Y=X=1+2 (2) Each statement can be changed into an expression. (Used especially in functions) *** Generalized If-Then-Else and Case statements (1) Generalized If Syntax: if then [elseif then ]* [else ] end Why did we not insist that boolean expressions be used as conditions? (2) Generalized Case Syntax: case of [andthen ] then ['[]' [andthen ] then ]* [else ] end where := OR OR OR OR OR unit OR true OR false OR