text file is loaded using another text editor; such as Vim; the text is displayed as shown in Figure 10…4。 As you can see; Vim has loaded the text file without any formatting errors。 ■Note Vim is available from http://vim。org。 It is a vi…derived clone that can be used on Windows systems。 Figure 10…4。 Vim loads the text file in a nicely formatted display。 The real pressing problem lies in the structure of the data; which is illustrated in Figure 10…5。 Here; the data has new formatting; with extra columns; and the first column is not always in the proper data format。 And to make matters worse; the badly formatted data has repeating information。 The challenge of the application is to read the stream and fix all of the problems。 This requires a thorough understanding of string processing and the different ways that text can be stored; as discussed in Chapter 3。 When you are processing data streams; you need to be aware of the format of the data stream。 In this example; we are processing ASCII text; and thus will be manipu lating bits according to the rules of the ASCII lookup table。 Whitespace characters are special characters in the text lookup table。 They are associated with numbers; but their representation is in the form of an action that the user can see。 For example; the character between single quotation marks (" ") is a space; the character t is a tab; and the character n is a newline。 The reason Notepad does not format the lottery text file nicely (Figure 10…3) is because of the whitespace characters used to indicate a newline。 In Figure 10…6; the highlighted buffer entry 0A is the hexadecimal character that indicates a linefeed; or newline; in the lottery text file。 …………………………………………………………Page 284…………………………………………………………… 262 CH AP T E R 1 0 ■ L E A R N I N G A B OU T P E R S IS TE N CE Figure 10…5。 Structural problems of this data stream Figure 10…6。 Newline character used in lotto。txt Figure 10…7 is a file created by Notepad。 Notepad expects not a single whitespace character; but two whitespace characters to indicate a newline: 0D and 0A。 …………………………………………………………Page 285…………………………………………………………… CH A PT E R 1 0 ■ L E A R N I N G A B O U T P E R S IS T E N CE 263 Figure 10…7。 Newline characters used by Notepad Deciphering the Format The echo has served its purpose of providing a way to develop an application in a top…down manner。 The next step is to remove the echo code and start writing the code that will fix the data stream。 Fixing the data stream is not a trivial undertaking; because you are yet again faced with a state problem。 You don’t want to fix one part of the stream; only to end up with a problem in another part of the stream。 Thus; you need to incrementally fix the stream and make sure at each step that there are no ramifications。 The first step is to break the data stream into individual fields (each value in a column is a field in this case)。 In Figure 10…5; the data stream had two parts; where the upper part seemed to have a single space between the numbers and the lower part had the amount of space neces sary to align the numbers。 The difference between the upper and lower parts is the whitespace characters used。 So; the first step will be to clean up the whitespace。 The following is the code that reads the buffer; splits it up; and reassembles the content into a new buffer。 The code is intermediate code that adds special bracket markers to indicate what the text contains。 Imports System。IO Imports System。Text " TODO: Fix up this class Public Class LottoTicketProcessor : Implements IProcessor Public Function Process(ByVal input As String) As String Implements IProcessor。Process Dim reader As TextReader = New StringReader(input) Dim retval As New StringBuilder() …………………………………………………………Page 286…………………………………………………………… 264 CH AP T E R 1 0 ■ L E A R N I N G A B OU T P E R S IS TE N CE Do While reader。Peek() …1 Dim splitUpText As String() = _ reader。ReadLine。Split(New Char() {〃 〃c; ControlChars。Tab}) Dim c1 As Integer For c1 = 0 To splitUpText。Length 1 retval。Append((〃(〃 & splitUpText(c1) & 〃)〃)) Next retval。Append(ControlChars。NewLine) Loop Return retval。ToString() End Function End Class In the implementation of Process(); the text will be parsed line by line。 Then each line is split into the individual fields。 You could write the parsing routines yourself; but to parse a buffer line by line; it is more efficient to use StringReader。 StringReader accepts the string to parse and is then assigned to a TextReader interface instance。 As each line of text is parsed; the most efficient approach to building a buffer is to use StringBuilder。 You could keep appending data to the string; but if you do that too often the application’s performance will suffer。 The String type is an immutable type; which means once an object is initialized; you cannot change the state of the object。 The advantage of immutable types is that they increase the speed of your application; because code can assume once an object has been assigned; it will never change。 The downside is that once an object is assigned; to modify the object state even slightly; you must instantiate a new object; which would be the case if we used the = and