Monday, August 29, 2011

Smallworld Technical Paper No. 11 - Use of an Integrated CASE Tool for GIS Customization

by Gillian Kendrick and Peter Batty

Abstract

The implementation of large corporate GIS systems places heavy demands on the customisability of GIS products. This paper examines the use of an integrated CASE tool in GIS customisation. The authors start by discussing some of the common problems which designers of GIS systems face in developing new applications. The paper describes the or features which a CASE tool should include in order to address these issues. The similarities between the technology which can be used to underpin both a CASE tool and a GIS will be mentioned.

The paper describes the practical use of a CASE tool in application development. It also discusses the management of multi-user application development and the benefits gained from utilising a CASE tool in the development process. It is shown that the use of an integrated tool facilitates the adoption of a new incremental design methodology.

Introduction

One of the greatest costs in large GIS implementations is that of customising the basic system to meet an organisation's specific requirements. There are many aspects to this customisation. The user interface of the system may be tuned to speed up data capture or else to streamline the execution of particular queries. The way in which object's spatial attributes are displayed must be specified. However, one of the most significant parts of any customisation involves the design and maintenance of the application's data model.

Data modelling is a complex task for most computer applications, but this is particularly true in GIS for several reasons. The first of these is the large number of classes of objects which are involved in these systems. Each implementation will typically involve hundreds of different object classes. The second factor is the number and variety of relationships between these objects. Relationships are of three main types: aggregation and association, e.g. a building is 'part of' a school; spatial, e.g. a house is 'near to' a lake; and topological, e.g. a valve is 'connected to' a pipe. Of these types, only the first may be explicitly represented in terms of traditional relational joins. The others will be derived indirectly from spatial attributes of the objects or else through the interaction between these spatial attributes.

For many organisations, the GIS will only form a part of the corporate computing system. There will be existing data bases holding data such as customer information. Some of this data may be relevant to the GIS, and therefore part of the customisation will involve integrating these existing data models with that of the GIS (Bundock & Theriault 1992). This integration will again increase the size and complexity of the overall design.

Another aspect of GIS implementations, which has an impact on the maintenance of the data model, is that user requirements change. GIS projects typically have quite long life cycles, and the technology is relatively new to most users. This means that at the outset of the project they may not realise the full capabilities of the system. As new applications are developed and added to the system, changes are required in the data model.

Since GIS projects involve capturing large amounts of data, it is critical that these changes to the data model can be made without losing or compromising any data which is already stored in the system.

With most traditional GIS software, data model evolution is a difficult issue. This has meant that a long period of time has been spent at the beginning of the project on requirements analysis and data model design to try to make sure that it is exactly right, which of course is rarely achieved. This has had to happen before any other work could begin, meaning that there has been a very long delay for users between the purchasing of a GIS and the start of productive use of the customised system.

This paper describes how an integrated CASE tool can be used to address all of these data modelling issues. This in turn radically changes the way in which one can approach the problem of customising a GIS, using an interactive design methodology.

What is a CASE Tool?

The acronym CASE stands for Computer Aided Software Engineering. It is used to describe a variety of computer-based tools which can be used to assist in the design and development of computer programs. Such tools have been developed for various purposes including the analysis and documentation of procedures and data flows, and the design and documentation of a data model. It is the latter function which we consider in this paper as this is the most appropriate kind of tool to help with GIS customisation.

A CASE tool which is to be used for data model design must display a graphical representation of the model. This is usually in the form of an entity-relationship diagram which shows the entities, or classes of objects, and the relationships between them. Such a facility gives the designer a good picture of a large design. The tool allows plots of the diagram to be made. These can be included as part of the documentation of the overall design.

The tool allows the designer to interact with the spatial representations of object classes and relationships. A graphical user interface (GUI) provides an environment in which the attributes and behaviour of the objects and relationships can be edited and queried. The tool produces automatic documentation on selected parts of the design.

One of the ways in which a CASE tool can reduce the time taken to implement a design is by automatically generating the code which creates the data model. This facility has two main benefits. Firstly the designer no longer has to worry about implementation details; he can concentrate on the task of data modelling. Secondly, the automatic code generation speeds up the process of creating data mode]s, and avoids programming errors.

The tool should also provide facilities which enforce the correctness of the data model. It should validate that the design is consistent. Checks made by the tool at the design stage will reduce the amount of time spent in tracing bugs by the designers at a later date. This will again speed up the delivery of finished applications.

This section has described general requirements for a CASE tool which is to be used for data model design. The next two sections describe in more detail some of the requirements for CASE tools which are to be used in GIS. The first section covers general requirements for tools for GIS data model design. The second considers the extra requirements for an integrated tool and covers some of the benefits provided by such a tool.

CASE Tools for GIS

Multi-User Design

It was mentioned in the introduction that one of the reasons for the complexity of GIS data models was that they involved a large number of classes of objects. It will usually be the case that more than one designer will be involved in the development of these objects, their relationships and their behaviour. This means that it is important that the tool used to develop the data model is multi-user.

Each designer should be able to work on additions or modifications to the data model without being affected by changes which others are making to different parts of the design. These changes will also be made over a long time scale - maybe hours, days or weeks. The designer may wish to try out several alternative versions of the data model in order to settle on a best design. When a new part of the model has been completed or re-engineered, the tool must offer support for 'merging' the new work in with the rest of the design.

The requirements of CASE tool database technology in this area, involving long transactions and version management, are very similar to those of GIS. (Easterfield, et al. 1990).

Modelling of Spatial Data and Topological Relationships

GIS data models are special because they require the user to model spatial attributes of objects. New data types such as point, area, chain, raster and grid are supported. The tool also models the topological relationships which will exist between some objects. For example a valve location 'connects to' a pipe centreline, a land parcel 'shares' its boundary with another land parcel.

These aspects of GIS data models should be supported in a CASE tool which is to be used in GIS customisation.

Integrated CASE Tools for GIS

For high level conceptual design, it is not necessary to have a close coupling between the CASE tool and the DBMS used to implement the system. However, if the CASE tool is to be used for the physical database design and to generate code, then a much closer link is required between the tool, the DBMS, and the application development environment used to implement the actual system. The requirements for the tool if it is to achieve this higher level of integration can be separated into two parts.

First, the CASE tool must support all of the data modelling concepts supported by the GIS. These include things such as the datatypes, relationships and other system specific concepts such as triggers, validators, and enumerators. For object oriented systems, the CASE tool should also manage the behaviour for the object classes (Booch 1991). In particular for GIS the CASE tool needs to manage spatial information and topological relationships. The CASE tool may extend to covering aspects of the user interface of the application, such as which fields are visible to the user and what sort of interface is used to edit objects and their individual fields.

Second, the CASE tool must generate code in an appropriate format which implements the data model which has been designed. in the case of the work reported here, the tool produces a script in the language Smallworld Magik.

These requirements are to a certain extent independent of each other. If the first one is met, but not the second, then it is at least possible to use the CASE tool to do the logical design for the system, but the code to implement this then has to be written manually. on the other hand, it is possible to have a CASE tool which supports a subset of the concepts supported by the DBMS (with GIS, for example, a CASE tool which supports alphanumeric datatypes but not spatial datatypes). This could be made to output code in an appropriate format for the DBMS, but further development work would then be required in the DBMS environment (outside the CASE tool) to incorporate any of these additional concepts in the application. In this situation it is likely to be much more difficult to usefully maintain the data model using the CASE tool after its initial creation, since it does not know about any of the changes which have been made to the data model within the DBMS environment. Clearly, the CASE tool is much more useful if it meets both of these requirements in full.

A CASE tool which can generate the data model for a GIS application automatically is a very useful part of a development tool kit. There are two more facilities however which are necessary in order to make the tool truly useful for corporate GIS design.

Integration of Existing Data Models in the GIS Data Model

Something that was mentioned in the introduction to this paper is that a common requirement in GIS implementations is that existing (legacy) databases, and therefore data models, should be integrated into the system. It will often be the case that there are relationships between objects which have been created as part of the GIS and those which belong to these other databases.

As an example consider a large utility company that has a customer information database. It is undesirable to move this data to the GIS as many other applications will already be designed to run on it. Part of the requirement of the GIS application is to associate spatial information with those customer records.

To facilitate this integration, the CASE tool can 'reverse engineer' the data models from these existing databases. The objects can then be integrated into the whole design within the domain of one tool.

Maintenance of the data model

GIS applications involve large amounts of data, and as mentioned before, they are also particularly prone to having the user requirements change during the life cycle of the project. If a change in the user requirements means that the data model must be modified, then it is essential that facilities are provided to 'forward engineer' the populated database to be consistent with the new datamodel.

A common feature of CASE tools is that they generate the code which creates a design. It is less common to generate the code which will 'forward engineer' an existing design when the data model definition held in the tool has been changed. A CASE tool for GIS should offer support to evolve a database from one schema to another.

Features of an Integrated CASE Tool for GIS

Based on the previous sections, we can summarise that an integrated CASE tool should have the following features:

  • Support all of the data modelling concepts which are supported by the GIS DBMS
  • Generate the code which will implement the data model in the GIS DBMS 'reverse engineer' existing data models so that their schema can be integrated into the overall design for the system
  • 'forward engineer' populated databases when the design of the objects and relationships is updated.

Such a tool provides significant benefits in the speed of GIS customisation and also in the maintenance of the applications throughout their life cycles.

Similarities Between a GIS Application and a CASE Tool

Several of the capabilities provided by a CASE tool are similar to those which are found in a GIS.

Firstly, in both systems, the user creates objects which combine both spatial and alpha-numeric attributes. In the GIS, these objects are things such as houses, pipes or rivers. The alpha-numeric attributes of a house are things such as its address and owner, and its spatial attributes are its position and footprint. In the CASE tool, the objects which the user works with are the definitions of the classes of objects which are to be used in the GIS application. For example, a CASE tool object has an alphanumeric attribute which holds the name of an object class, this could be 'House'. The spatial attributes of these objects are used to position them in the entity-relationship diagrams. Similarly, the CASE tool stores relationship objects, their attributes would include the type of the relationship, e.g. 'part of' or 'connected to'.

In both the CASE tool and the GIS, users can query and edit the properties of the objects through a GUI. This interface provides facilities for plotting and reporting.

One of the areas in which a GIS and CASE tool are most similar is in their need to provide support for multi-user working in a long transaction database. This should not be surprising as both systems are used as design tools as well as information systems.

The Smallworld CASE Tool

After a consideration of the benefits of using an integrated CASE tool, and a survey of the tools currently available on the market, Smallworld chose to implement its own. Because of the similarities outlined above between GIS and CASE, it was possible to implement the tool as a specific GIS application with its own data model.

Working with the CASE Tool

The tool is activated during a normal GIS session. The designer can have both the GIS application and the CASE tool running at the same time.

[Figure not yet available]

Figure 1. Integration of GIS and CASE in one environment

Application Development with the CASE Tool

Smallworld GIS contains a powerful data dictionary which supports many advanced data modelling features. These include the definition of new data types, triggers and validators. The data dictionary is capable of storing the definition of object classes, their attributes, behaviour and also the details of the topological and associative relationships between them. The data dictionary can manage many different versions of a data model and allows the designer to look at these different versions within one GIS session.

The CASE tool operates directly on the GIS application's data dictionary. If the GIS application involves integration with external databases, the data models held in these databases are reverse engineered into the tool. Once present in the tool, the user can add spatial attributes to the objects, and can also define relationships between them and the objects which live in the main GIS database. The user can also add behaviour to these external objects.

The CASE tool supports a steady progression from a logical to a physical definition of the required data model. It does not demand that the data model is completely defined but will advise on areas which are not yet complete. Facilities are provided which allow the designers to document the model extensively as they are working.

Developing the Data Model

As already mentioned, the CASE tool is a multi-user facility. We will now describe how many users can work on the same model.

[Figure not yet available]

Figure 2. Version management of both schema and GIS data

In figure 2, the boxes to the left of the dotted line represent the different alternatives in the CASE tool's database. Beneath the Top alternative there are two others where the designers Phil and Betty can work independently. The user Phil also has another two sub alternatives where he can try out different designs. The area to the right of the dotted line shows the alternative tree in the main GIS application. Beneath the Top, there are two alternatives Live Data and Test. The alternatives beneath the Live Data alternative are those where the GIS application is being used. The GIS applications may include data capture, analysis of existing data and design. The part of the tree under the 'Test' alternative is used by CASE designers to try out their developments. When Phil decides that he has completed a part of a design in the alternative 'Design 1', he can make the CASE tool 'apply' the new data model to the alternative 'Test 1'. At this point, he can immediately start testing the new data model in the GIS application. If after testing, the new part of the design is found to be a success, the changes he has made can be 'posted' up to the Top of the CASE database. If the design was unsatisfactory, he can proceed to improve it and then 'apply' the new changes.

When changes have been made to the design which have been thoroughly tested they can be applied to the Top alternative of the GIS application's database. That alternative will have its data model forward engineered'. Data model changes then spread through the alternative tree as users at lower alternatives request to see them.

[Figure not yet available]

Figure 3. Part of a GIS application's data model

The use of an integrated CASE tool in GIS allows designers to develop data models iteratively. There no longer needs to be such a great reliance on getting the design right first time. This in turn means that the end users can rapidly be presented with a prototype system which allows them to more easily understand what the system can offer. They are therefore able to offer much more feedback in the design of the system, leading to the development of systems which are much better suited to their requirements.

Data Model Re-use

Another feature of the CASE tool is that it facilitates the re-use of pieces of design. It can archive complete sections of data model in a format which can be read in by another tool. This has enabled the setting up of a library of designs. This greatly speeds up the development of new applications as they rarely have to be designed from scratch. An existing 'template' data model can be loaded in to an applications CASE tool and this can be thought of as the first prototype.

Conclusions

Since Smallworld commenced using an integrated CASE tool for application development the following benefits from this approach have been found:

  • Customisation of the system has been made more accessible. The tool has reduced the size of the knowledge barrier which designers have to overcome, as they no longer have to be trained in how to talk to the underlying DBMS directly. They can instead spend more of their time with design considerations. This means that more people are able to customise the system with less initial training
  • The tool has enabled the creation of applications which better meet the customer's requirements through the interactive development approach. Users are given the chance to interact with a version of the application at an early stage. They can then refine their requirements, the developers can evolve the design and the tool will evolve the database.
  • Designs are more easily re-used. The archiving facility in the Case tool allows parts of designs to be stored in a form where they can be loaded in to another tool. This means that a library of commonly used parts of designs can be set up. When a new application is being developed, much of the initial data model can be created from these standard components. This has two main benefits: firstly the development time for new applications is greatly reduced; secondly, the quality of developed applications is improved. Time invested in improving the quality of the basic data model components leads to improvements in the overall system quality.
  • Multi-user working is easier to manage. It is always a difficult problem when many designers are working on the same project to avoid duplication of effort, and conflicts between their different designs. The CASE tool aids in these problems by incorporating a number of validation and completeness checks which will prevent the design from becoming inconsistent. The users can develop parts of the data model independently in separate alternatives. When they incorporate these into the total design by merging' their changes, the DBMS automatically spots any conflicts in the design which the designer has introduced and allows him/her to reconcile them.
  • An Incremental development methodology means that designers can get production systems in place much more quickly.

References

Booch, G. (1991) Object Oriented Design with Applications. Benjamin/Cummings, Redwood City California, 1991

Bundock, M. & Theriault, D. (1992): Integration Of Case Technology into the GIS Environment, in AGI92 Conference Papers, Birmingham, November 1992

Easterfield, M., Newell, D.G. & Theriault, D. (1990): Version Management in GIS - Applications and Techniques, EGIS 90 Conference Proceedings, Amsterdam, April 1990.

No comments: