Promo

Preface

Oh, hello.  Here lie a collection of articles, narratives and ponderings of computery things; finely blended with my portfolio bestowing works and experiments in U.I. design, infographics, and software development.  Bon appétit.

Calendar

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

Categories

posts rss
Cynosura 0 rss
General 0 rss
Graphic Design 4 rss
Programming 6 rss
WWW Tech 2 rss

A Line Mapping StreamReader

A utility to map lines in a character stream

The LineMappingStreamReader is a TextReader which can be used to map the line breaks along a given Stream.  Using this single class it is possible to calculate two-dimensional coordinates (line and column values) for a given offset within a stream, and vice-versa.

background

During a recent .NET project I was confronted with the common problem of manipulating large swaths of XML in a performance-constrained setting.  The details aren’t important, however in summary the system had to read (potentially quite large amounts of) in-memory XML mark-up at ~2000ms intervals, so the solution necessitated something fast and lightweight.  Such constraints typically rule out XmlDocument and XDocument, which are far too slow and heavy. Furthermore I only required read access to the XML.  XPathDocument was designed to be a fast, lightweight alternative to XmlDocument which provides read-only access to an in-memory XML document; however it still doesn’t cut the mustard performance wise.  Instead, I went with the trusty XmlTextReader.

The plug-in I was writing is responsible for parsing a volatile code-DOM which contained many snippets of XML, and sanitizing this XML to make it conform to a 3rd party schema to be used in a ‘live preview’ — again, a 3rd party component.  This basically involved removing certain elements and attributes which the schema didn’t support.  To achieve this, the exact position of each node had to be discovered so it could be removed.  To do this, we can use IXmlLineInfo; an interface which XmlTextReader implements.

As the XmlReader progresses along a given stream, IXmlLineInfo can be used to find the line and column of any node upon which the XmlReader is currently positioned (figure A).  This proves to be inadequate, however, because to perform a simple extraction and concatenation on a data stream–much like what String.Remove (...) does — you need to know the exact position within the stream at which the extraction is to take place.  IXmlLineInfo only provides us with a coordinate (x,y) in the two-dimensional space of a multi-line document, and not the offset within the stream at which the coordinate occurs.  Surly this fact was a design oversight on the .NET architects’ part?  Enter the LineMappingStreamReader.

Usage

The LineMappingStreamReader can be used on any stream (be it in-memory or from a file, for example) and on any type of textual data.  The following very simple example shows how it can be used to map the lines in a string.  Note that because LineMappingStreamReader extends from StreamReader and not the less versatile StringReader, we first have to convert the string into a MemoryStream so it can be read.

static string testStr =
	"The\r" + 
	"quick brown fox jumps\r\n" + 
	"over the lazy dog";

static void TestA()
{
	// Create a LineMappingStringReader
	// to read the test string

	MemoryStream ms = new MemoryStream(
		Encoding.ASCII.GetBytes(testStr));

	LineMappingStreamReader lmsr =
		new LineMappingStreamReader(
			ms, Encoding.ASCII, false, 0x256);

	// read the entire string
	lmsr.ReadToEnd();

	// print all the recorded lines, 
	// incluing the last logical line.

	for (int i = 0; i <= lmsr.LinesRead; i++)
	{
		Console.WriteLine(
			String.Format(
				"Line {0}: {1}", i.ToString(),
				lmsr.GetLengthAtLine(i)));
	}

	lmsr.Close();
}

Output

 
   Line 0: 4
   Line 1: 24
   Line 2: 18

Example 2

To demonstrate the LineMappingStreamReader in a more useful context, the following example shows a LineMappingStreamReader being used by an XmlReader to read xml from a file, whilst outputting the starting positions of each ‘plant’ element.

static void TestB()
{
	// open the file
	FileStream fs = new FileStream(
		"catalog.xml", FileMode.Open);
	
	// create a new reader for the file
	LineMappingStreamReader lmsr = 
		new LineMappingStreamReader(
			fs, Encoding.UTF8, false, 0x1000);
	
	// create an XmlReader to read
	// the file as XML
	XmlTextReader reader = new XmlTextReader(lmsr);
	
	// process each node
	while (reader.Read())
	{
		if(reader.NodeType == XmlNodeType.Element &&
		   String.Equals(
				reader.Name, 
				"plant", 
				StringComparison.IgnoreCase))
		{
			int offset = lmsr.PositionToOffset(
				new Point(reader.LinePosition - 1, 
						  reader.LineNumber - 1));

			Console.WriteLine(String.Format(
				"Plant element found at line:{0} column:{1} offset{2}",
				reader.LineNumber, reader.LinePosition, offset));
		}
	}

	reader.Close();
}

Remarks & Disclaimer

I’ve included two files for download here.  One is the LineMappingStreamReader class, and the other is a sample console project which includes the previous examples.  This version of the class has been updated with non-localised exception messages so it’s easy to distribute.  The class has also been fully implemented to extend all of StreamReader’s operations (originally only the Read(...) method was implemented because of time constraints), and consequentially some of these — ReadLine() and ReadToEnd() — haven’t been unit tested.

LineMappingStreamReader.zip
LineMappingStreamReader.txt

Author
Ray
Published
Fri. 26 December, 2008 19:42
3 Comments
comments open
Tags
,
filed under
Programming
Rating
  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Bookmarks

Commentary

Comments are closed