Course description

The course introduce the paradigms of distributed systems, for evaluating its scopes and limitations, The course also familiarize current design trend and philosophy seen from software, hardware and applications, and review the problems of linking geographically distributed computers through a communication network


Course Objectives:

At the end of the course candidates should be able to:

  1. Understand the concepts, and design and implementation of distributed systems.
  2. Understand the reasons for, possibilities, advantages and disadvantages of deploying distributed technologies in a business context
  3. Understand the various architecture models and middleware of distributed systems.
  4. Choose a model or a middlware for a particular situation by comparing the attributes of each type in a critical way for a range of typical applications.
  5. Design and implement distributed applications using a middleware.
  6. Understand the reasons for, possibilities, advantages and disadvantages of deploying distributed technologies in a business context.


Course Contents:

  • Characterization of distributed systems.
  • Basic Distributed systems Architecture: Hardware and Software Systems.
  • File Service: File Service components, design issues, interfaces, implementation techniques and a network file system case study.
  • Distributed Operating Systems: The Kernel, Naming and Protection, communication and invocation.
  • Coordination and concurrency: Synchronizing physical clocks, logical time, logical clocks, distributed coordination, and a distributed operating system case study.
  • Distributed Data: Client/Server Communication (Remote Procedure Call), Transaction, Fault Tolerance and Locking

Required Readings

  • Coulouris, G et al (2001) : Distributed systems: Concepts and
    Design, 3rd edition, Addison -Wesley,

Recommended Readings

  • Andrew S. T. et al (2006) : Distributed Systems: Principles and
    Paradigms (2nd Edition),
    Prentice Hall
  • Sukumar G. (2006) : Distributed Systems: An Algorithmic
    Approach (Computer and Information Sciences), Chapman & Hall/CRC
  • Kenneth P. B. (2005) : Reliable Distributed Systems: Technologies,
    Web Services, and Applications, Springer
  • Tamer M. O. et al (1999) : Principles of Distributed Database Systems,
    Prentice Hall
  • Deitel et al, (2001) : XML: How to program, Prentice Hall,

 Distributed system in one lesson


A distributed system  is the one in which hardware and software components located at remote networked computers, coordinate and communicate their  actions only by passing messages. Sharing of resources is the main motivation of distributed systems. Resources my be managed by servers and accessed by clients or they may be encapsulated as objects and accessed by client objects.

A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.

Distributed systems are groups of networked computers, which have the same goal for their work

a distributed system is a system in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages

Characteristics of DS


A distributed system has the following characteristics:

  • It consists of several independent computers connected through communication network
  • The computers  communicate with each other by exchanging messages oover a communication network
  • Each computer has its own memory,clock and runs its own operating system
  • Each computer has its own resources, called local resources
  • Remomte rosources are accessed through the network

Key characteristics of distributed systems

  • Resource sharing.
  • Openess.
  • Concurrency.
  • Scalability.
  • Fault Tolerance.
  • Transparency.

Examples of distributed systems

  • Intranets, Internet, WWW, email, ...
  • DNS (Domain Name Server) - Hierarchical distributed database
  • Distributed supercomputers, Grid/Cloud computing
  • Electronic banking
  • Airline reservation systems
  • Peer-to-peer networks
  • Sensor networks
  • Mobile and Pervasive Computing

Challenges of distributed systems

  • Heterogeneity
  • Transparency
  • Openness
  • Concurrency
  • Security
  • Scalability
  • Failure handling


Distributed Computing Architecture

  • Client Server Architecture
  • Layered Architecture
  • Peer to peer architecture


Physical model: Capture hardware composition in terms of computers and their interconnecting networks. A physical model is a representation of the underlying hardware elements of a distributed system that abstracts from specific details of the computer and networking technologies employed. Hardware and software components located at networked computers communicate and coordinate their actions only by passing messages. Very simple physical model of a distributed system
Architectural model: Describes a systems in terms of computational and communication tasks performed by computational elements. An architectural model of a distributed system simplifies and abstracts the functions of the individual components of a distributed system. It deals with the organization of components across the network of computers, and their interrelationship, i.e., how these components communicate with each other. Architectural elements:What are the entities that are communicating in the distributed system? How do they communicate, or, more specifically, what communication paradigm is used? What (potentially changing) roles and responsibilities do they have in the overall architecture? How are they mapped on the physical distributed infrastructure (what is their placement)?
Communication paradigms
Roles and responsibilities

System-oriented perspective

In distributed systems the entities that communicate are typically processes. Exceptions:In primitive environments such as sensor networks, operating systems does not provide any abstractions, therefore nodes communicate; In most environments processes are supplemented by threads, so threads are more the endpoints of communications;

Programming perspective: Objects - Computation consists of a number of interacting objects representing units of decomposition for the problem domain. Objects are accessed via interfaces; Components - Resemble objects in that they offer problem-oriented abstractions, also accessed via interfaces. Specify not only their interfaces but also the assumptions they make in terms of other components/interfaces that must be present for a component to fulfil its function;  Web services: Software application which is identified via URI. Supports direct interactions with other software agents

Types of communication paradigms: Interprocess communication; Remote invocation; Indirect communication

Inter-process communication: Low-level support for communication between processes in distributed systems including message parsing-primitives; Direct access to the API offered by Internet protocols (socket programming) and support for multicast communication.

Remote invocation: Covering a range of techniques based on a two-way exchange between communicating entities Resulting in the calling of a remote operation, procedure or method; Request-reply protocols:- more a pattern imposed on an underlying message-parsing service to support client-server computing; Remote procedure calls:- procedures in processes on remote computers can be called as if they are procedures in the local address space; Remote method invocation:- a calling object can invoke a method in a remote object

Direct communication: communication represent a two way relationship between sender and receiver; send explicitly directing messages/invocations to the associated receivers; receivers are aware of the senders; and both roles must exist at the same time

Indirect Communication: Sender do not need to know who they are sending to (space uncoupling); senders and receivers do not need to exist at the same time (time uncoupling).

Indirect communication
Group communication: Delivery of messages to a set of recipients; Abstraction of a group which is represented in the system by a group identifier; Recipients elect to receive message send to a group by joining a group. Two kinds of group communication: Broadcast (message sent to everyone); and Multicast (message sent to specific group). Used for: Replication of services; Replication of data; Service discovery; and Event notification
Publish-subscribe-systems: A large number of producers (publisher) distribute information items of interest (events) to a similarly large number of consumers (subscribers); Intermediary service (e.g. Distributed Event-Based System) needed for routing information to consumers. In Publish-subscribe-systems: Communication through propagation of events; Generally associated with publish/subscribe systems; Sender process publishes events; and Receiver process subscribes to events and receives only the ones it is interested in
Message queues: Message queues offer a point-to-point service whereby producer processes can send messages to a specified queue and consumer processes can receive messages from the queue or being notified

Architectural styles: client-server and peer-to-peer

Fundamental issue with client-server: Client-server offers a direct, relatively simple approach to the sharing of data and other resources. But it scales poorly. The centralization of service provision and management implied by placing a service at a single address does not scale well beyond the capacity of the computer that hosts the service and the bandwidth of its connections. Even though, there a several variations of the client-server architecture to respond to this problem but none of them really solve it. There is a need to distribute shared resources much more widely in order to share the computing and communication loads amongst a much larger number of computers and network links.

Peer-to-peer application: Is composed of a large number of peer processes running on separate computers; All processes have client and server roles-servent. Patterns of communication between them depends entirely on application requirements. Storage, processing and communication  loads for accessing objects are distributed across computers and network links. Each object is replicated in several computers to further distribute the load and to provide resilience in the event of disconnection of individual computers. Need to place and retrieve individual computers is more complex then in client-server architecture

Proxy server and caches
A cache is a store of recently used data objects that is closer to the objects themselves. Caches might be co-located with each client or may be located in a proxy server that can be shared by several clients.

Process: A new object is received at a computer > it is added to the cache store, replacing some existing objects if necessary. Object is needed by the client process > caching service checks the cache for an up-to-date copy. If copy is not available this copy is fetched

Mobile code: A typical well-known and widely-used example for mobile code are applets.

Mobile agents
A mobile agent is a running program (both code and data) that travels from one computer to another in a network carrying out a task on someone’s behalf, e.g. collecting information. Benefits agents provide for creating distributed systems (Lange & Oshima, 1999): They reduce the network load; They overcome network latency; They encapsulate protocols; They execute asynchronously and autonomously; They adapt dynamically; They are naturally heterogeneous; They are robust and fault-tolerant.

Fundamental model: Abstract perspective in order to study the individual issues of a distributed system Three models are introduced: interaction model, failure model, and the security model
Interaction model
Failure model
Security model

Interaction model
Performance of communication channels
Latency: Delay between the start of a message’s transmission from one process and the beginning of its receipt by another. It includes: Time taken for the first string of bits transmitted through a network to reach its destination; Delay in accessing the network; Time taken by the operating system communication services at both the sending and the receiving processes Bandwidth: Total amount of information that can be transmitted over a computer network in a given time. Jitter: Variation in the time taken to deliver a series of messages

Two variants of the interaction model
Synchronous distributed systems
The following bounds are defined: The time to execute each step of a process has known lower and upper bounds. Each message transmitted over a channel is received within a known bounded time. Each process has a local clock whose drift rate from real time has known bound.

Asynchronous distributed system: There are no bounds on: Process execution speed; Message transmission delays; Clock drift rate

Failure model
Introducing the failure model. The failure model defines ways in which failure may occur in order to provide an understanding of the effects of failure. Taxonomy of failures of processes and communication channels (Hadzilacos & Toueg, 1994): Omission failures; Arbitrary failures; Timing failures

Omission failures Class of failure     Affects     Description
Fail-stop     Process     Process halts and remains halted. Other processes may detect the state.
Crash     Process     Process halts and remains halted. Other processes may not be able detect this state.
Omission     Channel     A message inserted in an outgoing message buffer never arrives at the other end’s incoming message buffer.
Send-omission     Process     A process completes a send operation but the message is not put in its outgoing message buffer.
Receive-omission     Process     A message is put in a process’s incoming message buffer but that process does not receive it.

Arbitrary failures
Often called Byzantine failure. This is the worst possible failure semantics, in which any type of error may occur. Example of an arbitrary failure of a process
A process arbitrarily omits intended processes steps or takes unintended processing steps Example of an arbitrary failure of a communication channel. Message content may be corrupted, nonexistent messages may be delivered or real messages may be delivered more than once. Solutions: checksum to detect corrupted messages and message sequence numbers to detect nonexistent and duplicated messages

Timing failures

Class of Failure     Affects     Description
Clock     Process     Process’s local clock exceeds the bounds on its rate of drift from real time.
Performance     Process     Process exceeds the bounds on the interval between two steps.
Performance     Channel     A message’s transmission takes longer than the stated bound.

Introducing the security model
The security of a distributed system can be archived by securing the processes and the channels used for their interactions and by protecting the objects that they encapsulate against unauthorized access.

Protecting objects
Access rights are used to specify who (Principal) is allowed to perform the operations of an object, e.g., who is allowed to read and right this state

Securing processes and their interactions
Threats to processes: Without reliable knowledge a server can not tell the principal’s identity behind an invocation; The same applies to a client who receives the result from an invocation but it is not sure if this is from the intended server

Threats to communication channels: An ‘enemy’ can copy, alter, or inject messages as they travel across the network -> Threat to privacy and integrity of information; Another attack is saving copies of messages and reply them later, e.g. an invocation message requesting transferring a sum of money from one bank account to another

Defeating security threads: Cryptography and shared secrets:Example: Pair of processes shares a secret and nobody other know this; By exchanging a message the pair of processes includes information that proves the senders knowledge of this secret; Cryptography is based on an encryption algorithm that uses secret keys Authentication; Providing the identities supplied by their senders; Basic technique: include in a message an encrypted portion that contains enough of the contents of the message to guarantee its authentication Encryption and authentication are used to build secure channels on top of existing communication services.

Three generations of distributed systems

1. Early distributed systems
Emerged in the late 1970s and early 1980s because of the usage of local area networking technologies. System typically consisted of 10 to 100 nodes connected by a LAN, with limited Internet connectivity and supported services (e.g., shared local printer, file servers)
2. Internet-scale distributed systems. Emerged in the 1990s because of the growth of the Internet: Incorporates a large number of nodes, across organizations; Increasing heterogeneity; Increasing emphasis on open standards and services and associated middleware such as CORBA and Web Services
3. Contemporary distributed systems: Emergence of mobile computing leads to nodes that are location-independent, Need to added capabilities such as service discovery and support for spontaneous interoperation, Emergence of cloud computing and ubiquitous computing, Cloud computing: move from autonomous nodes performing a given role to pools of nodes that together provide a given service, Ubiquitous computing has led to move from discrete nodes to architectures where computers are embedded in everyday objects and the surrounding environment (washing machines)


  1. Properties and design issues of distributed systems are captured and discussed through the use of descriptive models. Define “Physical Model”, “Architectural Model” and “Fundamental Model”.
  2. What are the entities that are communicating in the distributed system from a system-oriented perspective and a programming-oriented perspective?
  3. How do the distributed entities communicate, or, more specifically, what communication paradigm is used? Explain inter-process communication, direct remote invocation and indirect communication.
  4. What (potentially changing) roles and responsibilities do distributed entities have in the overall architecture? Distinguish client-server from peer-to-peer architecture style.
  5. How are distributed entities mapped on the physical distributed infrastructure (what is their placement)?
    What is the purpose of a middleware in a vertically distributed system?
  6. What is a multi-tier architecture? Describe the communication in a typical 3-tier architecture.
    What are the benefits of horizontal distribution?
  7. Interaction model: considers the structure and sequencing of the communication between the elements of the system.
  8. What are differences of a synchronous distributed system and asynchronous distributed system in the interaction model?
  9. Failure model: considers the ways in which a system may fail to operate correctly. Discriminate timing failures, omission failures and arbitrary failures (Byzantine failures).
  10. Security model: considers how the system is protected against attempts to interfere with its correct operation or to steal its data.


Distributed Computing

A brief introduction to distributed systems