Saturday, October 13, 2012

Connection vs. connectionless vs. protocol vs. service vs. ...

I was searching for an example of a connection oriented unreliable service and associated, or implementing, protocol and I stumbled on a post written by someone who claims to be CCIE that doesn't distinguish between terms service and a protocol, or at least the post was written in such a way that there is no distinction. Now, there are pages on Wikipedia that explain those terms but I was compelled to write my own post about those terms and to make distinction and characteristics clear. Also, because connection oriented service is frequently associated with TCP, and connectionless with UDP, then the characteristics of the protocols are often attributed to the connection oriented and connectionless services as well. But this is wrong, and let me also explain what is wrong and why.

Network layers and service

To understand the difference between service and a protocol you have to know that network functionality is divided into independent layers, stacked on one another. This is, obviously, true for all layers except for the first and the last one. This division is necessary because, for example apparently simple operation of opening Web page is actually very complex and includes a lot of functionality at the bottom of which is problem of sending and receiving bits using wireless communications, copper communication and/or fiber communication. Those areas alone are so complex that people specialize not only in, e.g. wireless communications, but in more specific parts like, e.g., antenna design. Anyway, the main purpose of each layer is to encapsulate some functionality and provide service to higher layer (note that I'm referring to layer immediately above) without higher layer being aware of what's happening in the layer below, or knowing what is the number of layers below. This is the same principle used in software design where applications are divided into modules to make them manageable. This process of using concepts of layers and services is iterative (or recursive, depending how you look at it) meaning that the layer that offers service to higher layer in the same time uses service of a lower layer to accomplish its goals. Again, first and last layer are somewhat specific, but I won't go into that.

Now, we came to the important fact that each layer provides service to a higher layer and uses services from the lower layer. So, service is just that, some functionality offered to a higher layer in which higher layer doesn't know how the service is implemented. Note that the layer that offers service is also called service provider, while the one that uses service from a lower layer is called service user.

Actually, this is enough knowledge of layering in networks to understand distinction between service and a protocol, but for completeness I'll mention few more things about layering. First, there is (almost) infinite number of ways this layering could be done. Not only with respect to the exact number of layers there are, but also with respect to specific functionality placed in layers. The most popular layering model is ISO/OSI Reference Model which has exactly 7 layers with each layer having prescribed functionality. It is called reference model because it is almost exclusively used as a reference for all other possible models. In other words, concrete networks like e.g. Internet, or even Ethernet, have different number of layers and/or functionalities in layers so they are frequently mapped to ISO/OSI Reference Model for a purpose of discussions and better understanding.

One final, very important thing. Layers don't implement functionalities, they are abstract concept so they don't exist as something material. What implements functionalities and offers services are different entities that logically belong to a certain layer. In other words, there are software and hardware modules written by programmers, or designed by hardware engineers, that exist in computers which implement some functionality and which are connected to other software/hardware modules that use them or which they use. For those software/hardware modules, by looking how they are connected and what they do, we say that they belong to a certain network layer. And when I say that layer implements, what I actually mean is that some entity in a layer implements, also when I say that layer uses a service of a lower layer, what is actually meant is that entity in a layer uses a service of an entity in a lower layer. There is a bit of ambiguity in those statements, but is easier to write and I think that with this clarification it isn't so confusing.


The fact that each layer has services of a lower layer on its disposal, and doesn't know how the lower layer works nor the lower layer knows how higher is working, means basically that the communication is implemented between the same layers in different machines (or within the some one, which is actually a special case). So, to establish communication I, as an entity in say 3rd layer, am communicating with entity in 3rd layer on some another machine and we exchange information in order to allow communication of users in 4th layer. The same goes for 4th and other layers, too. But now, we have a problem. Namely, communicating entity in one layer on one machine is programmed by one company (say Microsoft) while entity on another machine, in the same layer, is programmed by someone implementing the analogous entity in Linux. Clearly, the two programmers probably don't know each other, and possibly they will never know. So, how do we make sure that their software will work, i.e. talk to each other! The answer is: by defining protocol. Human language is actually a protocol, albeit a very complex and ambiguous one. But nevertheless, if two secretaries don't speak the same language and have the same set of concepts that the language is referring to, they will never be able to pass messages from their bosses (users)!

So, protocol allows two (or more) entities within the layer to exchange information and establish communication and transfer data between their users. Protocol thus implements service, or is used to implement a service! More about that later. So, the protocol includes the following elements:
  1. Data units exchanged, called Protocol Data Unit, or PDU. For every data unit exchanged, format has to be rigorously defined!
  2. Behavior, usually defined and implemented using state machines. Behavior is actually how entity responds to information it receives from its peer (other entity), from its users and also what it expects from lower layer and how it uses it.
Note that each entity actually has communication with three other entities. The first one is a user in a higher layer - service user, the second one is entity in lower layer whose services are used to transfer data - service provider, and finally, there is a peer with whom the communication is established.

Connection and connectionless services

We saw in the section Network layers and services what the service is. Now we can say that there are two primary types of services. The first one is modeled according to how telephones work and is called connection oriented service while the second one is modeled according how post office works and is called connectionless service. It is interesting to note that connectionless is actually older, i.e. the telegraph system is connectionless and was in use before telephone was invented, but connection oriented is more dominant and before advent of digital computers was basically the only type in use.

The key difference between the two is that entity that uses connection oriented service from a lower layer entity has to first establish connection, i.e. to say with whom it is going to communication on the other end but without transferring any data yet. This is called connection establishment phase. Also, when the user is finished with data transfer, or communication, it has to explicitly break communication channel with its peer entity on the other end. This is called connection teardown phase. In between those two phases, data is transferred. Because of this, the identifier (i.e. address) of the other end is transferred only once, during connection establishment phase.

If you think a bit about this, you'll immediately see the similarity between telephone call and this service. In telephone call you first establish connection by dialing your peer's number, then you talk (i.e. transfer data), and finally you hang-up. Also, during telephone call you are user and telephone company offers you a service in which you don't know what's happening within telephone system. You only know and care that you have established communication channel with your peer entity, i.e. the person on the other end. Now, maybe you spouse told you that you call you friends to dinner. In that case, your spouse is your user and you are providing service to him/her.

On the other hand, connectionless service has only data transfer phase, i.e. no connection establishment nor teardown, you just send data. Obviously, when sending data you have to tell to your service provider to whom data should be sent and it has to be done each time you send something. Again, we said that postal office works that way and letters are sent that way, i.e. each one of them has an address and all the letters you've sent are mutually independent!

Relation between connectedness and protocol

Note again that, while talking about types of service, we didn't once talked about how things work, only how it appears to work. And that's the main point. Namely, service is one thing, protocol is other, and service can be connection oriented or connectionless, but the protocol is, well, just protocol. Now, the  terms connection oriented protocol and connectionless protocol are extensively used in the literature, but this connectedness attribute is actually bound to the service protocol implements, not to the protocol itself.

Let us, as an example, take protocols from the Internet, IP, TCP and UDP. TCP and UDP are transport layer protocols (meaning, they are part of the transport layer in ISO/OSI RM). IP on the other hand is network layer protocol and it is used for communication of network layer entities. In networking texts entities are almost exclusively called the same as protocol they use, so we have TCP entity that uses TCP protocol to communicate with other TCP entities, called simply TCP, UDP entity that uses UDP protocol to communicate with other UDP entities, called simply UDP. It is similarly for IP protocol/entity. This might be ambiguous sometimes, but from the context it should be clear if the authors are talking about entities or protocols.

Lets start with IP. IP offers connectionless service to its service user and uses connectionless service from its service provider. This means that each IP's protocol data unit (called datagram or packet or IP packet) carries destination address and data, and in order for two IP entities to communicate it is not necessary to establish connection. Actually, there is no way connection could be established wth IP protocol. Furthermore, entities using IP protocol offer connectionless service to users, in our case, TCP and UDP. And IP also uses connectionless service from lower layers. The reason for this is that connectionless is a least common denominator, it actually expects least from the network, and that's the one reason why IP is connectionless protocol. If the underlying network is connection oriented, like e.g. ATM is, than it has only to expose connectionless service that will be used by IP. And if in the implementation of these services it is necessary to establish and break connection for each packet, then so be it. It will work, though not particularly efficient.

The next entity is TCP. It offers connection oriented service to its users, and uses connectionless service from its service provider, the IP entity. But TCP's service is more that that, it is also reliable (more about that in the next section). Now, take a note that TCP uses IP for communication (more precisely it uses services provided by IP entity) which are connectionless! So, TCP offers connection oriented service on a top of connectionless service. This is actually very hard to achieve.

Finally, there is UDP entity that offers connectionless service to its users and it uses connectionless service from its service provider, IP entity. UDP is actually very thin layer in terms of functionality because it adds almost nothing to what IP already provides. In a way it only relays data.

Note that what each entity offers to its users (i.e. service) doesn't necessarily correspond to what it gets from its service provider.

Relation between connectedness and QoS

Ok, final thing to discuss is reliability, or more generally Qualit of Service (QoS) offered by service. As I said in the introduction, because connection oriented service is mostly associated with TCP, characteristics of TCP are associated with connection oriented service. Similarly goes for UDP. But that's not true. Connectedness of service and the guarantees it provides for a certain parameters of communicatoin (called QoS) don't have anything to do with each other. It is perfectly feasible to have connected oriented unreliable service as is to have connectionless reliable service.

Now, reliability is a bit of vague term here. In case of TCP it means that the service guarantees that all the data that was sent will arrive, in order sent, without duplicates. In case it couldn't fulfill those requirements, the service will be disconnected with an appropriate error indication. Note that fulfillment of those guarantees is part of the protocol operation, and there are different mechanisms to achieve that like sequence numbers, acknowledgments, timeouts, retransmissions, etc. Also note that one more important thing. There is no guarantee that there will be no errors in a stream, i.e. that some bit fill be accidentally flipped. TCP doesn't detect that. And if you think that errors might appear when data travels through the network, think about bugs in software and possible consequences on data...

Anyway, connectedness and QoS are separate things that can be combined in different ways.

Croatian terminology

This is actually note for Croatian readers. When I was thinking should I write this post I wasn't sure should I write it in Croatian or English. In the end, I decided to write it in English (obviously) but one of the reasons I was thinking about using Croatian is because of the terminology. I insist on using Croatian translations if they are available and I don't like when someone in Croatia is speaking half in Croatian and half in English. Even worse is when someone writes half English half Croatian. Ok, some level of mix is acceptable (especially in spoken language), but there are quite good translations and I don't see why it would be necessary to use English equivalents in talk.

So, I refer croatian readers to look at dictionary with all the translations.


Zoran Plesivčak said...

Great post. Thank you!

Sandy Shaw said...

I like your blog post. Keep on writing this type of great stuff. I'll make sure to follow up on your blog in the future.
Establishing Serial Point-to-Point Connection

Robb stark said...

Great post. Found it very helpful.

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive