WARNING: This text has been OCRd from the original paper and so will contain many typographical errors. It should be useful for searching, however. [Ash, August 2002]. Formal specification from an observation-oriented perspective M. Beynon, J. Rungrattanaubol and J. Sinclair Department of Computer Science, University of ÌÇÐÄTV, Coventry, CV4 7AL, UK Abstract: A formal specification of an algorithm is a very rich mathematical abstraction. In general, it not only specifies an input-output relation, but also - at some level of abstraction - constrains the states and transitions associated with computing this relation. This paper explores the relationship between a formal specification of an algorithm and the many different ways in which the associated states and transitions can be embodied in physical objects and agency. It illustrates the application of principles, tools and techniques that have been developed in the Empirical Modelling Project at ÌÇÐÄTV and considers how such an approach can be used in conjunction with a formal specification for exploration and interpretation of a subject area. As a specific example, we consider how Empirical Modelling can be helpful in gaining an understanding of a formal development of a heapsort algorithm. 1 Introduction One current theme of research in theoretical computer science is the way in which different formal (and semi-formal) approaches to system development may be combined to provide a more coherent and complete picture of the system under consideration (see, for example, [8, 9]). The work of this paper is related to this theme, but broadens the scope by questioning how two very different modelling paradigms may be viewed in relation to each other, and how they can complement each other when used together. The first approach considered is that of formal development. Formal techniques for the development of both software and hardware have evolved over the past 25 years, giving rise to a wealth of different notations and approaches. Such techniques have been used in many areas of industry and, although research continues with particular emphasis on usability and scalability, the principles behind them are well understood. Formal approaches have in common a precise, unambiguous syntax with a clearly-defined semantics enabling verification of key properties and of refinement. As an example, we consider a heapsort algorithm derived from a pre/post-condition specification using weakest precondition techniques [5]. It is the formal aspect of the approach which is of importance here rather than the specific notation. The second approach we consider is that of Empirical Modelling [11] developed over the past 10 years by the Empirical Modelling Research Group in the Computer Science department at the University of ÌÇÐÄTV. Whereas a formal specification fixes the important features of the system under development, an Empirical Modelling (EM) approach allows exploration of the state and the effects of dependencies between observables. In this sense it is closer to the requirements-gathering end of the development life-cycle. However, it incorporates tools to enable this exploration which perform a visualisation r61e and may be seen as closer to an environment for rapid prototyping or programming. EM is based on a rather different conceptual framework from formal approaches. It is this important underlying difference which is introduced and explored in this paper. The aim of our work is to examine, with reference to a particular case study, the fundamental differences between the two approaches and the ways in which they may be used together to provide observational motivation for the formal descriptions we develop. This investigation will be carried out from a pedagogical perspective; that is, we concentrate on how the approach can help a student explore and gain an understanding of the basic components and relationships which interact to achieve a required goal. The subject in this case is taken to be a heapsort algorithm, where understanding of such concepts as "is a heap" is needed to master the overall approach. We were able to use existing EM tools with the addition of predicates from the formal development to facilitate this exploration. The two approaches differ fundamentally in their methodology. We use the example to emphasise this and to illustrate the EM philosophy. In the following section the more widely known formal approach is used to introduce the case study. We consider what constitutes a heapsort and how we can recognise and characterise a heapsort activity. Questions also arise concerning how a given specification is interpreted. This leads to a description of the EM approach and the features of using it to explore the heapsort activity. Next, we describe how this model can be extended to incorporate the formal properties, and cofisider the benefits of exposing students to both mathematical description and interactive exploration. Finally, we discuss what has been achieved by this work and consider some directions for future research. 2 A formal approach to heapsort What is heapsort? If we are thinking of a formal description and development of the algorithm we might well start by describing the specification the procedure is required to meet and the data structures it uses. So, if a(1 ... N) is an array of length N with an ordering relation on its elements, we can give a specification of a sorting process in terms of pre- and post-conditions as follows: PRE: N > 1 POST: (Vi,j • 1 < i < j < N ~ a(i) <- a(j)) A a = perm(ao) That is, our formal description views a heapsort as a process which establishes a "sorted" predicate, with the predicate perm defined to ensure that the final content of the array is a permutation of the original. If it is heapsort in particular that we are interested in, then we also need to introduce the concept of a heap: heap(lo, hi) ¢~ (Vi, j • (lo < i < j < hi A 3k • k > 0 A i = j div 2k) ==~ a(i) > a(j)) The algorithm we have in mind should maintain a heap structure of the elements to be sorted and proceed towards its goal by increasing the number of elements in the sorted segment of the array. When we develop such an algorithm we draw on both our knowledge of the strategy we intend to pursue (e.g. "lengthen the sorted segment of array whilst maintaining heap in unsorted portion") and also on the conditions for correctness in the formal system (e.g. "the invariant of the loop together with the negation of the guard must imply the postcondition"). It is natural to introduce pictures of examples of heaps to explain what is intended. This is also true for describing the workings of the algorithm itself. unsorted --~ ~ sorted , "'" I "'' a | index 1 n-l- 1 N all LHS elements < all RHS elements A decision on the representation has been made at this stage, the plan being to store the sorted portion in the the higher indexed part of the array. Thus, right from the start, we are rejecting some possible implementations of heapsort and homing in on a particular approach. The unsorted portion will start as the whole of the array and decrease from the upper indices. A variable, n, is introduced to indicate the highest unsorted position so far (N initially). Using a weakest precondition development, the task can be broken down, introducing intermediate goals which fit the algorithm we intend to develop. For example, the plan could be represented as follows. {PRE} n :---- N; "establish heap(l, n)" Loop variant is n. {heap(l, n)} Loop invariant is Q, defined: do n ¢ 0 --~ {Q} n:=n-1; (0). In deciding whether the tree is a heap, it is further necessary to consider whether the heap condition is satisfied at each node - that is to say, is the value associated with a particular node at least as great as that associated with each of its children. To model observation of this nature via dependencies requires observables to represent the index and value of each node, to record the order relation that pertains on each edge of the tree, and to register whether the heap condition Figure 1: An EM construal for heapsort holds at each node. The index and value of a node are defined by explicit values, whilst the order relations and heap conditions have values that depend on these. For instance, for the node with index i, the heap condition would be defined by: HC[i] -- (val[i] > val[2 • i]) and (val[i] < val[2 • i + 1]) (subject to a suitable convention to deal with nodes with less than 2 children). Likewise, an order relation for the edge that joins the nodes indexed by i and 2"i is defined by: OR[i, 2, i] --- if (val[i] > val[2 • i]) then 1 else (if(val[i] < vail2, i]) then (-1) else 0). In our EM modelling environment, additional dependencies can readily be introduced to establish suitable visual conventions for representing these abstract conditions. For instance, the label of a node and the edges between nodes can be coloured so as to reflect whether or not the heap condition is satisfied at a node, and to reflect the nature of the order relation associated with an edge: colour-of -label-at-node[i] = if HC[i] then black else white colour-o ]-edge[i, 2 • i] --- if (OR[i, 2 • i] --- 1) then black else white This allows the user to experiment with the assignment of values to nodes, and register visually the status of just those observables that are significant in understanding the heap concept. For instance, Figure 1 represents a heap if and only if all the nodes of the tree are coloured black. Such a condition can be independently monitored by attaching another high-level observable, defined by is-heap = HC[1] and HC[2] and HC[3] and ... and HC[7] The computer model developed in this way serves a similar function to the animation that a lecturer might conduct on a blackboard when explaining the basic heap concept. For instance, it can be used to demonstrate how the heap condition is affected by changing the value at a node, or exchanging the values at adjacent nodes. In giving an account of heapsort, more is required. The definition of the heap condition has to be refined to take account of restricting the heap to a segment of the array. For this purpose, the indices that define the endpoints of this segment are new observables to be referred to as ]irst and last, and a new observable in-heap[i] introduced to determine whether each index i lies within the segment. The definition of the heap condition at node i can then be interactively revised to the form: He[i] = not in-heap[i] or( in-heap[i] and ((not in-heap[2 • i]) or (in-heap[2 • i] and OR[i, 2 * i] > 0)) and ((not in-heap[2, i + 1]) or (in-heap[2, i ÷ 1] and OR[i, 2 • i ÷ 1] > 0)) ). 6 The semantics of EM computer-based construals The above discussion illustrates how the development of a construal proceeds in an exploratory manner. EM tools give computer support to this activity, enabling incremental and interactive extension, refinement and revision of a computer model. The semantic framework for this modelling activity is radically different from conventional computer programming. The key feature is that what is Referent being construed (to be termed the referent of the construal) is itself subject to clarification and modification during the model-building. Such fluidity and negotiation of meaning is possible because the modelling involves open-ended experimental interaction with the environment of the referent. For instance, in construing heapsort, the modelling activity has to embrace interactions associated with issues such as "what is a heap?" that are pertinent but not specific to heapsort. The aim of this section is to examine the semantics of EM construals more closely. For more details of the practical tools that can be used to construct construais such as Figure 1, see [2]. Computer model Figure 2: Empirical Modelling for computer-based construals Key concepts in using EM principles to construct construals are depicted in Figure 2. The concepts that pertain to the referent, and to the external semantics of the computer model are displayed on the right of the diagram. The way in which these concepts are represented in and through the construction of the computer model is indicated on the left. In the above discussion of construing heapsort, the computer model is the construal, and the referent is heapsort. A relevant situation might be observation of a heapsort expert in action. The diagram is to be interpreted in the implicit context of the modeller's exploratory interaction with the computer model and its referent. The aim of this interaction is to create a model embodying relationships between observables, dependencies and agents congruent to those that the modeller projects onto the referent. The computer model provides perceptible counterparts for relationships that typically cannot be directly observed in the referent. The use of humanoid icons to depict agents is not intended to exclude impersonal or inanimate forms of agency, but to stress a key principle of EM. All agency is construed as similar to human agency. All state-changing agents are construed as operating through changing observables and, in their turn, responding to changes of observables. The current state of the referent, as construed by the modeller, is determined by the current values of the observables and the dependencies that hold between them. Each observable is represented by a variable in the computer model. To each variable there is an attached definition that resembles the definition of a spreadsheet cell in character. This definition may either associate an explicit value with a variable, or express the way in which its value is functionally dependent on the values of other variables. Taken in conjunction, the variable definitions in the computer model make up what we shall call a definitive (for definition-based) script. It is significant that the definitions attached to variables are not fixed or subject to variation within a preconceived circumscribed framework. The values and dependencies exhibited by the computer model are subject to change in many different ways. Such changes are always driven by its rSle as a construal, but can have all kinds of semantic significance. For instance, redefining a variable may reflect a change of state in the referent, or a correction to aa observation; introducing a new dependency may correspond to a new insight on the part of the modeller, or a development in the situation. (A useful comparison can be made here with a spreadsheet, whose possible evolution in development and use is similarly guided by its external semantics, so that its potentially meaningful states cannot be preconceived.) As befits its open-ended exploratory r51e, the computer model is associated with the uncharted space of possible configurations of values and dependencies that can be associated with a definitive script. Despite its openness, this char-acterisation is precise in much the same sense that the concept of mainland Britain represents the land - most of which I have never visited - which I can in principle reach on foot. As a pedestrian explorer, I cannot specify in advance what land can be reached. In clarifying my referent, I may need to negotiate interpretations: is an island reachable by low-tide, or on an inland lake part of mainland Britain? How can I be absolutely sure that the Isle of Wight will never be accessible by foot? Is the American Embassy part of the British mainland? As Figure 2 indicates, my perception of a situation is represented both by the space of conceivable states of a definitive script, and by the state-changing agents that I construe to operate in that space. Agent action is associated with particular privileges to redefine variables. In Figure 2, possible actions of agents A and B are represented by the redefinitions and corresponding transitions in state space labelled by a and b respectively. Figure 2 depicts a and b as non-interfering actions that can be performed simultaneously to achieve the same state transition as would result from performing them in either order. This is represented in the computer model by performing redefinitions of a and b in parallel. The openness of a heapsort construal is respected in its implementation using EM tools. There is no single way in which the computer model can be extended and applied. In using definitive scripts to represent state, the ordering of definitions is immaterial. This means that the same script can be organised for presentation in different ways, and assembled in different orders. Two distinct purposes for a heapsort model derived from the experimental environment for studying the heap concept introduced above are discussed elsewhere [2, 3]. Two models chosen from these sources to illustrate subtleties associated with construing an activity as heapsort will now be briefly outlined. The definitive script outlined in the previous section captures the way in which the values at the nodes of the tree, the order relations on the edges, and the heap conditions at the nodes depend upon the values in the array. By interacting with such a script, manipulating the values of first and last, and making appropriate sequences of exchanges, a user can manually simulate heapsort. The visualisation in the model is such that the choice of nodes at which to perform an exchange can be inferred from the colours encircling the nodes. This means that the user can learn to carry out heapsort without explicitly consulting the values at nodes, following a recipe based only upon the colour conventions used in their visualisation. Consideration of this model exposes some of the subtle issues attached to construing an activity as heapsort. A user who learned the colour conventions to be followed in a recipe for heapsort could not necessarily be deemed to be performing heapsort. Possible experiments to test understanding could easily be applied by adapting the heapsort model. Suppressing the colour coding on the visualisation, or removing the dependency between the visualisation and the true values at the tree nodes would both offer relevant insight. The use of EM tools also makes it possible to introduce automatic agents into models. In [2], a number of possible scenarios are described, in which different degrees of automatic support for heapsorting are offered, ranging from completely manual to completely automatic execution. All these models can be derived interactively from a single model simply by introducing an appropriate file of definitions and automatic agents. An automatic agent is represented by a triggered procedure for redefinition. A useful mechanism that is exploited in all the automated models attaches such an agent to each node of the tree. When the heap condition at this node is violated, the corresponding agent is primed to exchange the value at this node for a value attached to one of its child nodes, whichever is the greater. An interesting feature of this approach that the heap-sort process can be mimicked merely by manipulating the first and last indices according to the prescription of heapsort, followed by invocation of any primed agent attached to a node. Despite appearances, this process differs from authentic heapsort in a significant way. In effect, the transfer of control from node to node is always driven by the nodes at which the heap condition is currently violated. This does not accord with the formal specification, where - for the most part - transfer of control entails no reference to the values attached to nodes. In this case, the need to construe the activity as differing from heapsort is disclosed by intervening during the execution of the algorithm. A conventional heapsort does not repair violations of the heap condition except in contexts that are preconceived in designing the control procedure. Our unconventional algorithm in some circumstances can. Figure 1 is an extension of the heapsort model that includes observation that is associated with a formal specification of heapsort. The concept behind this extension is that the formal specification supplies an abstract trace of the heapsorting process as it might be observed by a mathematician. The lower component of Figure 1 takes on a different form according to which phase of the heapsort algorithm is currently being inspected, and the values of invariants and variants are monitored as the algorithm is executed. There are two complementary motivations for such observation: the formal specification can be used to confirm that the heapsorting process is indeed being correctly followed, or the heapsorting process may serve as worked example for the purpose of checking the accuracy of the formal specification. In Figure 1, the invariants and variants of the specification are treated as observables in their own right, and linked to the more primitive observables attached to the heapsort model. Strictly speaking, this mode of observation of the heapsort model is only appropriate in a restricted context for use, since it presumes that heapsorting activity is in progress, and makes references to observables concerned with control issues. For instance, each invariant is expressed as a predicate whose truth value is dependent on the current state, as determined by the present status of both data and control. As the above discussion has indicated, there are many modes of interaction with the heapsort model. Most of these operate outside the context of a particular heapsorting process. When monitoring the invariants of the formal specification in interaction with the model, the user has complete discretion over whether this interaction respects the heapsorting process. This is a crucial distinction between our computer-based construal and a conventional animation of heapsort. The function of a construal can only be served by a model that can be tested beyond the limits of any preconceived and circumscribed range of interactions. If our formal specification is flawed, it is still important that it can be incorporated in the model. If the heapsort process is not correctly followed, there must be scope to reflect this deviation. More generally, a complete understanding of the heapsort process - if indeed there is such a thing - stems from insight into the way in which the process relies upon its context. In developing this insight, it is valuable - if not essential - to have scope for experimental interaction. Setting algorithm specification and design in an experimental setting is a powerful way to explore and develop new functionality. In experimenting with the model in Figure 1, we can "follow the right steps in the algorithm" and "check that the invariants are respected", or "deliberately depart from the algorithm" and "check that this transgression is reflected in the specification". It is appropriate to annotate these phrases with quotation marks because they may reflect the intentions of the human interpreter rather than the true status of the model: there may be unrecognised anomalies in either the specification or the construal. Observation of invariants and variants attached to the formal specification can also be used in a constructive way to counteract the effects of random changes to the values to be sorted during the heapsort process. To demonstrate, we have created a variant of the heapsort model in which such changes prompt the model to determine the optimal point to which the heapsort process has to be rewound. In this way, observation of the formal specification is used as a powerful form of meta-control that would normally entail intelligent action on the part of a user. 8 Conclusion Incorporating a formal specification into an EM environment allows us to interpret the significance of the formal statements and to explore the concepts behind them. One aspect of the EM approach is the visualisation it provides but, as emphasised above, an observation-oriented viewpoint offers more than this. Fundamental to the approach is its support for user interaction and experimentation which is crucial for gaining an understanding of the abstract concepts in a formal specification. To explore a particular algorithm effectively, the openness of interaction associated with EM may need to be restricted in certain ways (as when tracing the steps of the algorithm) but the user is also free to step outside such constraints for a wider exploration of the subject. Reliance on empirical evidence does not give certainty since, although experience can contribute to understanding, it can also be misleading. It is essential to introduce formal specification to guard against this. Combining the two helps us to formulate theories which can be verified. Our work emphasises the different and complementary nature of the two approaches, but also reveals some similarity. Both a formal specification of heapsort and a construal of a particular instance of heapsort refer to abstract features of a physical process that are independent of any particular realisation. Although the approach here has been to explore an existing specification from an observation-oriented perspective, it would also be possible to start with an EM investigation to explore the requirements and clarify ideas, using this to inform the construction of the formal specification. Safety-critical areas, such as railway operation, have been successfully modelled both empirically [12] and formally [10]. A combined approach may be beneficial here, with EM helping to resolve conflicting requirements and perhaps suggesting approaches which might not otherwise have been considered. The further development of existing EM tools to support such usage will be a theme of future work. The distributed variants of our EM tools are of particular interest in this connection. References 1. J. Barwise and J. Etchemendy. The language of first-order logic(Third Edition) Center for the Study of Language and Information, Stanford, 1992. 2. W. M. Beynon. Modelling state in mind and machine. Research Report 337, Dept. of Computer Science, Univ. of ÌÇÐÄTV, 1998. 3. M.Beynon,J.Rungrattanaubol,P.H.Sun,A.Wright. Explanatory Models for Open-Ended Human-Computer Interaction. Research Report 346, Dept. of Computer Science, Univ. of ÌÇÐÄTV,1998~ 4. B. Cantwell-Smith. The Origin of Objects. The MIT Press, 1996. 5. E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. 6. D.Gooding. Experiment and the Making of Meaning Kluwer Academic Pubs., 1990. 7. M.Loomes and R.Vinter. Formal Methods: No Cure for Faulty Reasoning. Technical Report 265, School of Information Sciences, Univ. of Hertfordshire,September 1996. 8. C.A.R.Hoare and He.Jifeng. Unifying theories of programming. Prentice Hall, 1998. 9. Proceedings of Integrated Formal Methods. Springer, York UK, June 1999. 10. FMERail website, http://www.ifad.dk/Projects/fmerail.htm 11. The Empirical Modelling website, http://www.dcs.warwick.ac.uk/modelling 12. M.Beynon. Empirical Modelling and the Foundations of Artificial Intelligence. Proceedings of CMAA Workshop, Lecture Notes in AI, Springer-Verlag, 1999. J