Promo

Preface

Oh, hello.  Here lie a collection of articles, narratives and ponderings of computery things; finely blended with my portfolio bestowing works and experiments in U.I. design, infographics, and software development.  Bon appétit.

Calendar

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

Categories

posts rss
Cynosura 0 rss
General 0 rss
Graphic Design 4 rss
Programming 6 rss
WWW Tech 2 rss

All posts tagged '.net'

3 posts

This mini tutorial is intended as an introduction for developers wanting to gain a better understanding of how to leverage the .NET Socket API; written for those without prior knowledge of the subject matter, and hopefully intelligible to the layman. I first deal with the fundamental concepts, before easing into a practical example of how to implement a very simple last.fm client written in C#.

Introduction

T he Berkeley Sockets Interface is currently the de-facto standard network Application Programming Interface (API) for inter-process and network-bound communication between computers.  The name was derived from the origins of the API in V4.2 of the Berkeley Standard Distribution (BSD) of the UNIX operating system.

The Microsoft technical specification which defines how Windows software accesses and exposes information on the Internet is called the Windows Socket API, or Winsock for short (documentation here).  Version 1 of the specification, introduced in 1992, was based upon the paradigm of the socket popularised by the Berkeley Sockets API.  Winsock 2 (the current version) is actually a very close feature-for-feature implementation of the entire Berkeley sockets API.  The .NET implementation of Winsock — encapsulated by the System.Net.Sockets namespace — provides a fully managed implementation of Winsock for .Net developers (The implementation is in fact mostly a wrapper for the Win32 implementation.)

The Socket

The socket is a network communications endpoint.  Nowadays, almost all communication across the Internet occurs from one socket on one computer, to another socket on another computer.  Sockets, however, are not physical things within a computer, nor do they form part of the hardware abstraction layer (HAL) to represent something physical in software (a common mistake among beginners).  The socket is a completely virtual conceptualisation, defined in software.  Essentially, it is a programming interface that is used to open communication pathways between two or more processes (applications) which are usually (but not necessarily) running in different computers.  The analogy is to a wire — the network data connection — being plugged into a power socket.  To perform network based communications from an application, you must do so (either directly or indirectly) through a socket.

Sockets come in two primary flavors. An active socket is connected to a remote active socket via an open data connection. Closing the connection destroys the active sockets at each endpoint. A passive socket is not connected, but rather awaits an incoming connection, which will spawn a new active socket once a connection is established.

At any point in time, a typical Internet connected PC may have 10-50 sockets, of which a handful are active.  It is possible to list these active sockets, along with the applications which own them, via the netstat tool:

To run the Netstat tool from Windows:

  1. press start > run
  2. enter "netstat -bn" into the run tool
  3. press enter

C:\Windows\system32>netstat -bn

Active Connections

  Proto  Local Address          Foreign Address
  TCP    192.168.123.121:53872  212.58.251.86:554
 [wmplayer.exe]
  TCP    192.168.123.121:56059  66.249.93.111:993
 [wlmail.exe]
  TCP    192.168.123.121:56089  216.34.181.45:80
 [firefox.exe]
  TCP    192.168.123.121:56090  74.125.242.25:80
 [firefox.exe]

Here you can see 4 active sockets on my computer, all connected to other computers on the Internet.  The details aren't important at this point, but you can see that:

  1. wmplayer.exe (my media player) is listening to a BBC radio station
  2. wlmail.exe (my mail client) is checking for new mail at Gmail
  3. firefox.exe (my web browser) is loading two webpages simultaneously

Protocols

The information communicated between two sockets are sequences of bytes constructed and interpreted by applications.  In the context of computer networks these byte sequences are typically named packets, a name synonymous with a parcel in the mail; the packet displays information about where the packet has come from, and the address of the recipient.  Furthermore, the packet contains a payload, which is the actual data used by the recipient.

While travelling through the Internet, the packet is handled by several routers — performing a job similar to a parcel distribution warehouse of a postal service — each spaced over the globe, which use the information on the packet to forward it on to either the next router or to the packet's final recipient.

The exact structure of each packet is determined by the packet's protocol; for example, where the destination address is located within the packet, the exact format of the address, and how big the payload is, are all determined or indicated by the packet's protocol.  A protocol is usually designed to solve a specific problem.  For example, the File Transfer Protocol (FTP) is used for the specific task of managing the transfer files and folders across the Internet.

The TCP/IP protocol suite

Implementing a useful network requires that a large number of different problems be solved.  To keep things manageable and modular, different protocols are designed to solve different sets of problems.  The TCP/IP protocol suite is one such collection of solutions, and is the collection of protocols which, essentially, make the Internet work.  Split into 2 layers — the network layer, and above that, the transport layer — each TCP/IP packet requires one protocol from each layer.

Transport Layer

This layer is responsible for ensuring reliable delivery of data from end to end and for other connection management activities.  There are two protocols available in this layer.

  1. Transmission Control Protocol (TCP)
  2. User Datagram Protocol (UDP)

Network Layer

Sometimes called the Internet layer, the protocols in this layer are responsible for the sending and receiving the data on the Internet and for deciding how packets of data get from one network to another (called routing).  The Internet Protocol is the only network-level protocol in the TCP/IP protocol suite.

  1. Internet Protocol (IP)

The Internet protocol provides a datagram service: every packet is handled and delivered by the network independently (like the parcels in a postal service).  IP also uniquely identifies each host on a network with an address, called an IP address.  IP, unfortunately, can not guarantee that a packet will arrive at it's destination, nor can it provide assurance that packets will be received in the same order they are sent.  TCP is designed to solve these problems by ensuring the quality of service (QoS) of a network connection.  TCP is used in combination with IP to detect and recover from packets received out of order, that are duplicated, or that don't arrive at all.  With this inherent QoS built in to TCP, a reliable, continuous stream of bytes can be established between two hosts.  This stream transcends the limitations of packets, so that applications don't have to deal with these network related problems.  This way, a socket can read from a network almost exactly like reading from a file on a disk.

Ports

To summarise so far: The IP protocol is responsible for carrying packets to a particular computers, each identified by a unique IP address. Once a packet of information is received over the Internet via the network layer, the transport layer accepts the packet through a TCP or UDP port.

Ports offer a simple mechanism which enable multiple applications on a computer to simultaneously communicate with multiple peers.  Without ports directing traffic to the applications they were intended for, it wouldn't be possible for applications to reliably differentiate which packets were intended for consumption, which ones it should leave other applications.

An easy way to understand ports is to imagine your computer is a factory building and the ports are the pigeon-holes of a message box in that building. The postman (which represents the IP protocol) knows how to send mail to your postal address (using your IP address).  There are 100 different people who work at the factory, however, (each representing an application on a computer), and they all have their own pigeon-hole so they know which messages are for them.  Unfortunately the postman (who works on the network layer) doesn't know how to use the message box system (which is on the transport layer), and so the postman hands the packets to a person at the factory whose job it is to sort each packet into the correct pigeon-hole.  This person represents the TCP (or sometimes UDP) protocol.

There is a total of 65,535 TCP Ports and another 65,535 UDP ports on each network interface.  If a program uses the TCP protocol to send and receive packets over a network then it will connect and bind itself to a TCP port. If it uses the UDP protocol to send and receive packets, it uses a UDP port.  Once an application binds itself to a particular port, that port can not be used by any other application until the port is released.

The operating system is responsible for managing TCP/UDP ports.  On Windows, Winsock is responsible for managing these ports.

When an application wants to connect to a port to start waiting for incoming connections, it first asks Winsock to create a socket handle.  Winsock creates the socket and hands control of it to the application.  From this point the application is said to 'own' the socket.  Then, the application binds the socket on to a specified port.  The socket is then free to begin listening for incoming connections (back to the factory analogy, it's as if the employee tells the mail sorter that he wants his mail delivered to say, pigeon-hole no.15 — in essence he binds himself to the pigeon hole.)

A port is not a socket (a common confusion), though there is a close relationship between the two. A socket is associated with a port, though this is potentially a many-to-one relationship. Each port can have a single passive socket binded to it, awaiting incoming connections, and multiple active sockets, each corresponding to an open connection on the port.  It's as if the factory worker is waiting for new messages to arrive (he represents the passive socket), and when one message arrives from a new sender, he initiates a correspondence (a connection) with them by delegating someone else (an active socket) to actually read the packet and respond back to the sender if necessary.  This permits the factory worker to be free to receive new packets.

read on...

Author
Ray
Published
Mon. 2 March, 2009 03:15
16 Comments
comments open
Tags
,
filed under
Programming

A mongst serveral additions I made to the BlogEngine platform (the ASP.NET blogging application which runs this weblog) was the archive metadata which details the active time of the blog; that is, the span of time from the first and last post publication dates.  This feature is achieved with the TimeSpanArticulator; a simple static class which converts a TimeSpan to simple english.  In the true spirit of openess, I’ve released this utility under the LGPL, and here’s how you use it.

read on...

Author
Ray
Published
Tue. 27 January, 2009 12:17
1 Comments
comments closed
Tags
,
filed under
Programming

The LineMappingStreamReader is a TextReader which can be used to map the line breaks along a given Stream.  Using this single class it is possible to calculate two-dimensional coordinates (line and column values) for a given offset within a stream, and vice-versa.

background

During a recent .NET project I was confronted with the common problem of manipulating large swaths of XML in a performance-constrained setting.  The details aren’t important, however in summary the system had to read (potentially quite large amounts of) in-memory XML mark-up at ~2000ms intervals, so the solution necessitated something fast and lightweight.  Such constraints typically rule out XmlDocument and XDocument, which are far too slow and heavy. Furthermore I only required read access to the XML.  XPathDocument was designed to be a fast, lightweight alternative to XmlDocument which provides read-only access to an in-memory XML document; however it still doesn’t cut the mustard performance wise.  Instead, I went with the trusty XmlTextReader.

The plug-in I was writing is responsible for parsing a volatile code-DOM which contained many snippets of XML, and sanitizing this XML to make it conform to a 3rd party schema to be used in a ‘live preview’ — again, a 3rd party component.  This basically involved removing certain elements and attributes which the schema didn’t support.  To achieve this, the exact position of each node had to be discovered so it could be removed.  To do this, we can use IXmlLineInfo; an interface which XmlTextReader implements.

As the XmlReader progresses along a given stream, IXmlLineInfo can be used to find the line and column of any node upon which the XmlReader is currently positioned (figure A).  This proves to be inadequate, however, because to perform a simple extraction and concatenation on a data stream–much like what String.Remove (...) does — you need to know the exact position within the stream at which the extraction is to take place.  IXmlLineInfo only provides us with a coordinate (x,y) in the two-dimensional space of a multi-line document, and not the offset within the stream at which the coordinate occurs.  Surly this fact was a design oversight on the .NET architects’ part?  Enter the LineMappingStreamReader.

Usage

The LineMappingStreamReader can be used on any stream (be it in-memory or from a file, for example) and on any type of textual data.  The following very simple example shows how it can be used to map the lines in a string.  Note that because LineMappingStreamReader extends from StreamReader and not the less versatile StringReader, we first have to convert the string into a MemoryStream so it can be read.

static string testStr =
	"The\r" + 
	"quick brown fox jumps\r\n" + 
	"over the lazy dog";

static void TestA()
{
	// Create a LineMappingStringReader
	// to read the test string

	MemoryStream ms = new MemoryStream(
		Encoding.ASCII.GetBytes(testStr));

	LineMappingStreamReader lmsr =
		new LineMappingStreamReader(
			ms, Encoding.ASCII, false, 0x256);

	// read the entire string
	lmsr.ReadToEnd();

	// print all the recorded lines, 
	// incluing the last logical line.

	for (int i = 0; i <= lmsr.LinesRead; i++)
	{
		Console.WriteLine(
			String.Format(
				"Line {0}: {1}", i.ToString(),
				lmsr.GetLengthAtLine(i)));
	}

	lmsr.Close();
}

Output

 
   Line 0: 4
   Line 1: 24
   Line 2: 18

Example 2

To demonstrate the LineMappingStreamReader in a more useful context, the following example shows a LineMappingStreamReader being used by an XmlReader to read xml from a file, whilst outputting the starting positions of each ‘plant’ element.

static void TestB()
{
	// open the file
	FileStream fs = new FileStream(
		"catalog.xml", FileMode.Open);
	
	// create a new reader for the file
	LineMappingStreamReader lmsr = 
		new LineMappingStreamReader(
			fs, Encoding.UTF8, false, 0x1000);
	
	// create an XmlReader to read
	// the file as XML
	XmlTextReader reader = new XmlTextReader(lmsr);
	
	// process each node
	while (reader.Read())
	{
		if(reader.NodeType == XmlNodeType.Element &&
		   String.Equals(
				reader.Name, 
				"plant", 
				StringComparison.IgnoreCase))
		{
			int offset = lmsr.PositionToOffset(
				new Point(reader.LinePosition - 1, 
						  reader.LineNumber - 1));

			Console.WriteLine(String.Format(
				"Plant element found at line:{0} column:{1} offset{2}",
				reader.LineNumber, reader.LinePosition, offset));
		}
	}

	reader.Close();
}

Remarks & Disclaimer

I’ve included two files for download here.  One is the LineMappingStreamReader class, and the other is a sample console project which includes the previous examples.  This version of the class has been updated with non-localised exception messages so it’s easy to distribute.  The class has also been fully implemented to extend all of StreamReader’s operations (originally only the Read(...) method was implemented because of time constraints), and consequentially some of these — ReadLine() and ReadToEnd() — haven’t been unit tested.

LineMappingStreamReader.zip
LineMappingStreamReader.txt

Author
Ray
Published
Fri. 26 December, 2008 19:42
3 Comments
comments open
Tags
,
filed under
Programming