The LineMappingStreamReader is a TextReader which can be used to map the line breaks along a given Stream. Using this single class it is possible to calculate two-dimensional coordinates (line and column values) for a given offset within a stream, and vice-versa.
background
During a recent .NET project I was confronted with the common problem of manipulating large swaths of XML in a performance-constrained setting. The details aren’t important, however in summary the system had to read (potentially quite large amounts of) in-memory XML mark-up at ~2000ms intervals, so the solution necessitated something fast and lightweight. Such constraints typically rule out XmlDocument and XDocument, which are far too slow and heavy. Furthermore I only required read access to the XML. XPathDocument was designed to be a fast, lightweight alternative to XmlDocument which provides read-only access to an in-memory XML document; however it still doesn’t cut the mustard performance wise. Instead, I went with the trusty XmlTextReader.
The plug-in I was writing is responsible for parsing a volatile code-DOM which contained many snippets of XML, and sanitizing this XML to make it conform to a 3rd party schema to be used in a ‘live preview’ — again, a 3rd party component. This basically involved removing certain elements and attributes which the schema didn’t support. To achieve this, the exact position of each node had to be discovered so it could be removed. To do this, we can use IXmlLineInfo; an interface which XmlTextReader implements.
As the XmlReader progresses along a given stream, IXmlLineInfo can be used to find the line and column of any node upon which the XmlReader is currently positioned (figure A). This proves to be inadequate, however, because to perform a simple extraction and concatenation on a data stream–much like what String.Remove (...) does — you need to know the exact position within the stream at which the extraction is to take place. IXmlLineInfo only provides us with a coordinate (x,y) in the two-dimensional space of a multi-line document, and not the offset within the stream at which the coordinate occurs. Surly this fact was a design oversight on the .NET architects’ part? Enter the LineMappingStreamReader.
Usage
The LineMappingStreamReader can be used on any stream (be it in-memory or from a file, for example) and on any type of textual data. The following very simple example shows how it can be used to map the lines in a string. Note that because LineMappingStreamReader extends from StreamReader and not the less versatile StringReader, we first have to convert the string into a MemoryStream so it can be read.
static string testStr =
"The\r" +
"quick brown fox jumps\r\n" +
"over the lazy dog";
static void TestA()
{
// Create a LineMappingStringReader
// to read the test string
MemoryStream ms = new MemoryStream(
Encoding.ASCII.GetBytes(testStr));
LineMappingStreamReader lmsr =
new LineMappingStreamReader(
ms, Encoding.ASCII, false, 0x256);
// read the entire string
lmsr.ReadToEnd();
// print all the recorded lines,
// incluing the last logical line.
for (int i = 0; i <= lmsr.LinesRead; i++)
{
Console.WriteLine(
String.Format(
"Line {0}: {1}", i.ToString(),
lmsr.GetLengthAtLine(i)));
}
lmsr.Close();
}
Output
Line 0: 4
Line 1: 24
Line 2: 18
Example 2
To demonstrate the LineMappingStreamReader in a more useful context, the following example shows a LineMappingStreamReader being used by an XmlReader to read xml from a file, whilst outputting the starting positions of each ‘plant’ element.
static void TestB()
{
// open the file
FileStream fs = new FileStream(
"catalog.xml", FileMode.Open);
// create a new reader for the file
LineMappingStreamReader lmsr =
new LineMappingStreamReader(
fs, Encoding.UTF8, false, 0x1000);
// create an XmlReader to read
// the file as XML
XmlTextReader reader = new XmlTextReader(lmsr);
// process each node
while (reader.Read())
{
if(reader.NodeType == XmlNodeType.Element &&
String.Equals(
reader.Name,
"plant",
StringComparison.IgnoreCase))
{
int offset = lmsr.PositionToOffset(
new Point(reader.LinePosition - 1,
reader.LineNumber - 1));
Console.WriteLine(String.Format(
"Plant element found at line:{0} column:{1} offset{2}",
reader.LineNumber, reader.LinePosition, offset));
}
}
reader.Close();
}
Remarks & Disclaimer
I’ve included two files for download here. One is the LineMappingStreamReader class, and the other is a sample console project which includes the previous examples. This version of the class has been updated with non-localised exception messages so it’s easy to distribute. The class has also been fully implemented to extend all of StreamReader’s operations (originally only the Read(...) method was implemented because of time constraints), and consequentially some of these — ReadLine() and ReadToEnd() — haven’t been unit tested.
LineMappingStreamReader.zip
LineMappingStreamReader.txt