This mini tutorial is intended as an introduction for developers wanting to gain a better understanding of how to leverage the .NET Socket API; written for those without prior knowledge of the subject matter, and hopefully intelligible to the layman. I first deal with the fundamental concepts, before easing into a practical example of how to implement a very simple last.fm client written in C#.
Introduction
T
he Berkeley Sockets Interface is currently the de-facto standard network Application Programming Interface (API) for inter-process and network-bound communication between computers. The name was derived from the origins of the API in V4.2 of the Berkeley Standard Distribution (BSD) of the UNIX operating system.
The Microsoft technical specification which defines how Windows software accesses and exposes information on the Internet is called the Windows Socket API, or Winsock for short (documentation here). Version 1 of the specification, introduced in 1992, was based upon the paradigm of the socket popularised by the Berkeley Sockets API. Winsock 2 (the current version) is actually a very close feature-for-feature implementation of the entire Berkeley sockets API. The .NET implementation of Winsock — encapsulated by the System.Net.Sockets namespace — provides a fully managed implementation of Winsock for .Net developers (The implementation is in fact mostly a wrapper for the Win32 implementation.)
The Socket
The socket is a network communications endpoint. Nowadays, almost all communication across the Internet occurs from one socket on one computer, to another socket on another computer. Sockets, however, are not physical things within a computer, nor do they form part of the hardware abstraction layer (HAL) to represent something physical in software (a common mistake among beginners). The socket is a completely virtual conceptualisation, defined in software. Essentially, it is a programming interface that is used to open communication pathways between two or more processes (applications) which are usually (but not necessarily) running in different computers. The analogy is to a wire — the network data connection — being plugged into a power socket. To perform network based communications from an application, you must do so (either directly or indirectly) through a socket.
Sockets come in two primary flavors. An active socket is connected to a remote active socket via an open data connection. Closing the connection destroys the active sockets at each endpoint. A passive socket is not connected, but rather awaits an incoming connection, which will spawn a new active socket once a connection is established.
At any point in time, a typical Internet connected PC may have 10-50 sockets, of which a handful are active. It is possible to list these active sockets, along with the applications which own them, via the netstat tool:
To run the Netstat tool from Windows:
- press start > run
- enter "netstat -bn" into the run tool
- press enter
C:\Windows\system32>netstat -bn
Active Connections
Proto Local Address Foreign Address
TCP 192.168.123.121:53872 212.58.251.86:554
[wmplayer.exe]
TCP 192.168.123.121:56059 66.249.93.111:993
[wlmail.exe]
TCP 192.168.123.121:56089 216.34.181.45:80
[firefox.exe]
TCP 192.168.123.121:56090 74.125.242.25:80
[firefox.exe]
Here you can see 4 active sockets on my computer, all connected to other computers on the Internet. The details aren't important at this point, but you can see that:
- wmplayer.exe (my media player) is listening to a BBC radio station
- wlmail.exe (my mail client) is checking for new mail at Gmail
- firefox.exe (my web browser) is loading two webpages simultaneously
Protocols
The information communicated between two sockets are sequences of bytes constructed and interpreted by applications. In the context of computer networks these byte sequences are typically named packets, a name synonymous with a parcel in the mail; the packet displays information about where the packet has come from, and the address of the recipient. Furthermore, the packet contains a payload, which is the actual data used by the recipient.
While travelling through the Internet, the packet is handled by several routers — performing a job similar to a parcel distribution warehouse of a postal service — each spaced over the globe, which use the information on the packet to forward it on to either the next router or to the packet's final recipient.
The exact structure of each packet is determined by the packet's protocol; for example, where the destination address is located within the packet, the exact format of the address, and how big the payload is, are all determined or indicated by the packet's protocol. A protocol is usually designed to solve a specific problem. For example, the File Transfer Protocol (FTP) is used for the specific task of managing the transfer files and folders across the Internet.
The TCP/IP protocol suite
Implementing a useful network requires that a large number of different problems be solved. To keep things manageable and modular, different protocols are designed to solve different sets of problems. The TCP/IP protocol suite is one such collection of solutions, and is the collection of protocols which, essentially, make the Internet work. Split into 2 layers — the network layer, and above that, the transport layer — each TCP/IP packet requires one protocol from each layer.
Transport Layer
This layer is responsible for ensuring reliable delivery of data from end to end and for other connection management activities. There are two protocols available in this layer.
- Transmission Control Protocol (TCP)
- User Datagram Protocol (UDP)
Network Layer
Sometimes called the Internet layer, the protocols in this layer are responsible for the sending and receiving the data on the Internet and for deciding how packets of data get from one network to another (called routing). The Internet Protocol is the only network-level protocol in the TCP/IP protocol suite.
- Internet Protocol (IP)
The Internet protocol provides a datagram service: every packet is handled and delivered by the network independently (like the parcels in a postal service). IP also uniquely identifies each host on a network with an address, called an IP address. IP, unfortunately, can not guarantee that a packet will arrive at it's destination, nor can it provide assurance that packets will be received in the same order they are sent. TCP is designed to solve these problems by ensuring the quality of service (QoS) of a network connection. TCP is used in combination with IP to detect and recover from packets received out of order, that are duplicated, or that don't arrive at all. With this inherent QoS built in to TCP, a reliable, continuous stream of bytes can be established between two hosts. This stream transcends the limitations of packets, so that applications don't have to deal with these network related problems. This way, a socket can read from a network almost exactly like reading from a file on a disk.
Ports
To summarise so far: The IP protocol is responsible for carrying packets to a particular computers, each identified by a unique IP address. Once a packet of information is received over the Internet via the network layer, the transport layer accepts the packet through a TCP or UDP port.
Ports offer a simple mechanism which enable multiple applications on a computer to simultaneously communicate with multiple peers. Without ports directing traffic to the applications they were intended for, it wouldn't be possible for applications to reliably differentiate which packets were intended for consumption, which ones it should leave other applications.
An easy way to understand ports is to imagine your computer is a factory building and the ports are the pigeon-holes of a message box in that building. The postman (which represents the IP protocol) knows how to send mail to your postal address (using your IP address). There are 100 different people who work at the factory, however, (each representing an application on a computer), and they all have their own pigeon-hole so they know which messages are for them. Unfortunately the postman (who works on the network layer) doesn't know how to use the message box system (which is on the transport layer), and so the postman hands the packets to a person at the factory whose job it is to sort each packet into the correct pigeon-hole. This person represents the TCP (or sometimes UDP) protocol.
There is a total of 65,535 TCP Ports and another 65,535 UDP ports on each network interface. If a program uses the TCP protocol to send and receive packets over a network then it will connect and bind itself to a TCP port. If it uses the UDP protocol to send and receive packets, it uses a UDP port. Once an application binds itself to a particular port, that port can not be used by any other application until the port is released.
The operating system is responsible for managing TCP/UDP ports. On Windows, Winsock is responsible for managing these ports.
When an application wants to connect to a port to start waiting for incoming connections, it first asks Winsock to create a socket handle. Winsock creates the socket and hands control of it to the application. From this point the application is said to 'own' the socket. Then, the application binds the socket on to a specified port. The socket is then free to begin listening for incoming connections (back to the factory analogy, it's as if the employee tells the mail sorter that he wants his mail delivered to say, pigeon-hole no.15 — in essence he binds himself to the pigeon hole.)
A port is not a socket (a common confusion), though there is a close relationship between the two. A socket is associated with a port, though this is potentially a many-to-one relationship. Each port can have a single passive socket binded to it, awaiting incoming connections, and multiple active sockets, each corresponding to an open connection on the port. It's as if the factory worker is waiting for new messages to arrive (he represents the passive socket), and when one message arrives from a new sender, he initiates a correspondence (a connection) with them by delegating someone else (an active socket) to actually read the packet and respond back to the sender if necessary. This permits the factory worker to be free to receive new packets.
Putting it all together: creating sockets
The Berkeley Sockets API models network access after file access; an application uses a socket handle for network communication as it would a file handle for disk I/O. These two sockets define the endpoints of a virtual pipe between the two applications, and once a connection is established, each application can view the socket similar to any other I/O stream.
To create a socket, we need to provide the system with a minimum of three pieces of information.
1. The Socket address family
The address family defines the addressing scheme that a socket will use to resolve a given address. The .NET implementation of the Berkeley Sockets API supports 29 different schemes (defined in the system.net.sockets.AddressFamily enumeration) but by far the most commonly used is Internetwork, which supports the very popular version 4 Internet Protocol addressing scheme—the one used in the TCP/IP protocol suite.
2. The Socket type
The socket type identifies the basic properties of socket communications. The .NET framework supports 5 such socket types, each defined in the SocketType enumeration. The first two are the most important. The Stream SocketType is required for TCP, and Dgram for UDP communications.
Stream
Supports reliable, two-way, connection-based stream of bytes without the duplication of data and without preservation of packet boundaries. A Socket of this type communicates with a single peer and requires a remote host connection before communication can begin. Stream uses the Transmission Control Protocol (TCP) ProtocolType and the InterNetwork AddressFamily.
Dgram
Supports datagrams, which are connectionless, unreliable messages of a fixed (typically small) maximum length. Messages might be lost or duplicated and might arrive out of order. A Socket of type Dgram requires no connection prior to sending and receiving data, and can communicate with multiple peers, and is therefore compatible with multicasting. Dgram uses the Datagram Protocol (Udp) as well as the InterNetwork AddressFamily, and is most commonly used for time-sensitive or bandwidth restricted scenarios where dropped packets are preferable to using delayed packets. Media communications such as live video streams, VOIP (the protocol for Internet telephones) and online games commonly use this protocol type.
Raw
Supports access to the underlying transport protocol. Using the SocketType Raw, you can communicate using protocols like Internet Control Message Protocol (Icmp) and Internet Group Management Protocol (Igmp). Your application must provide a complete IP header when sending. Received datagrams return with the IP header and options intact.
Rdm
Supports connectionless, message-oriented, reliably delivered messages, and preserves message boundaries in data. Rdm (Reliably Delivered Messages) messages arrive duplicated and in order. Furthermore, the sender is notified if messages are lost. If you initialise a Socket using Rdm, you do not require a remote host connection before sending and receiving data. With Rdm, you can communicate with multiple peers.
Seqpacket
Provides connection-oriented and reliable two-way transfer of ordered byte streams across a network. Seqpacket does not duplicate data, and it preserves boundaries within the data stream. A Socket of type Seqpacket communicates with a single peer and requires a remote host connection before communication can begin. Seqpacket semantics are supported only for the NetBIOS protocol.
3. The socket protocol
For a given socket domain and type, there may be one or several protocols that implement the desired behaviour. SocketType will sometimes implicitly indicate which ProtocolType will be used within an AddressFamily. For example when the SocketType is Dgram, the ProtocolType is always Udp. Equally when the SocketType is Stream, the ProtocolType is always TCP. If you try to create a Socket with an incompatible combination, Socket will throw a SocketException.
TCP also implements flow control. Unlike other protocols such as UDP, where the danger of filling the data-receiving buffer is very real, TCP automatically signals the sending host to suspend transmission temporarily when the application reading the stream is falling behind, and to resume sending the data when the reader is ready to begin receiving again.
Sockets in C#
In the conclusion of this tutorial, I will demonstrate in C# how to use the .NET implementation of the Berkeley Sockets API to create a console-based last.fm client. For those who don't have the WMP last.fm plug-in installed on your computer — or don't even know what last.fm is — I've also included a simple dummy plug-in in the download which will allow you to send customised messages to the client.
So, the first thing we want to do is create a TCP/IP socket. The 3 pieces of required information to create our socket (the address family, the socket type, and the transport protocol) are specified in the Socket class constructor.
Socket sock = new Socket(
AddressFamily.InterNetwork,
SocketType.Stream,
ProtocolType.TCP);
Creating an endpoint
Now that we have our socket, we need to bind it to a TCP port on the computer. To do this, we specify an IP endpoint. The endpoint consists of the IP address and a port number. The IP address we'll be using is called the loopback address. This special address is the same address all computers on a network use to identify themselves; that is, any data sent to the loopback address is received by the same network interface on the same computer. This address is 127.0.0.1 for IP v4, and is defined as follows:
IPAddress ipa = new IPAddress(
new byte[] { 127, 0, 0, 1 });
The port we will be binding to is 33367. This is chosen by the last.fm WMP plug-in as the default port it will attempt to send messages to. Now we combine the IP address and the port number to create the endpoint.
IPEndPoint ipe = new IPEndPoint(
ipa, 33367);
We can now bind the socket to the port, to create a passive socket capable of listening for connections.
sock.Bind(ipe);
To begin listening for incoming connections, we call the listen method, specifying the maximum size of the backlog; that is, the maximum number of outstanding connection requests to be queued whilst the application processes the current message (or does anything else for that matter). If our application is too busy to receive new messages and the size of the backlog reaches its maximum, new TCP/IP connections are refused.
sock.Listen(10);
Establishing a connection
Now that we're listening for new incoming connections on the socket, we need to accept each new connection in turn. We call the accept method, which blocks until we receive a new connection. When this happens, the socket spawns a new active socket to interact with this remote host, and returns it to us. All further communication is done through this socket named 'connection'.
Socket connection = sock.Accept();
Now we are ready to receive data from the remote host, via this passive socket. As I said earlier, Reading data from a TCP stream is very much like reading data from a file. We define a buffer (an array of bytes), and the socket's Receive operation fills this buffer with data. If there are more bytes than the size of the buffer — we determine this using the socket's Available property — the buffer is resupplied to the socket to be filled by the remaining unread bytes. We repeat this until the entire stream has been processed (more info on using buffers to read streams can be found here). Meanwhile, we read the contents of the buffer after each recieve opperation and build up the message into a result string. We know beforehand that the messages the WMP plugin sends are encoded in UTF8.
string result = String.Empty;
do {
// read the contents of the
// received data into our
// buffer.
bytes = connection.Receive(
bytesReceived,
bytesReceived.Length,
0);
result += Encoding.UTF8.GetString(
bytesReceived,
0,
bytes);
// continue writing to the buffer
// while there's data to be read
// from the socket
} while (connection.Available > 0);
Finally, we output the entire result in to the console, close the connection, and begin waiting for the next incoming connection.
// output the result to the console
Console.WriteLine(result);
// we've finished reading from
// the socket.
connection.Close();
That's just about it! Here's a sample output I produced after listening to some tracks, then pausing and resuming WMP playback.
START c=wmp&
a=R.E.M.&
t=Radio Song&
b=Out of Time&
m=&
l=255&
p=E:\Music\Library\R.E.M\Out of Time\02 Radio Song.mp3
START c=wmp&
a=R.E.M.&
t=Losing My Religion&
b=Out of Time&
m=&
l=268&
p=E:\Music\Library\R.E.M\Out of Time\03 Losing My Religion.mp3
START c=wmp&
a=R.E.M.&
t=Shiny Happy People&
b=Out of Time&
m=&
l=226&
p=E:\Music\Library\R.E.M\Out of Time\07 Shiny Happy People.mp3
PAUSE c=wmp
RESUME c=wmp
Download
Download and install the last.fm client and WMP plugin(optional)
Download the example Last.fm client project
Download the dummy Last.fm message sender project
Further Reading
The Online TCP/IP guide
The online Winsock Reference Guide
A tutorial on how to create an Asynchronous Socket Server and Client in C#