Subheader

Programming discussions since 1842

Monday, August 8, 2011

Trigram string searching

String searching and sorting is another one of my unexplainable favorite programming pasttimes. I was reading the interesting but tragically short article on the subject on wikipedia when I stumbled upon a special technique for fuzzy string searching known as Trigram string searching. What's that? Well, this is what the (entire) article had to say:
Trigram search is a powerful method of searching for text when the exact syntax or spelling of the target object is not precisely known. It finds objects which match the maximum number of three-character strings in the entered search terms, i.e. near matches. A threshold can be specified as a cutoff point, after which a result is no longer regarded as a match.
Bit short for such an important technique. This is the algorithm that manages to successfully turn up the search results for "Programming" when you spelled it "Porgraming".

The lack of a real implementation to go with the article irritated me. "That can't require very much coding, can it?"

Nope, It didn't!

Saturday, August 6, 2011

Challenge: Read a file of unknown size

I'm not sure this can be considered a "real world programming problem" but it is a "fun" and important challenge for any would be programmer and a good measurement of whether you have a good grasp of memory management.

The task is as follows:
You have a text file containing an unknown number of rows of plain text. You are to parse this file and read each row into a string in a two-dimensional array (aka a string matrix). At your disposal you have a statically typed language, like C. Each row may be of varying length.

As for the restrictions, you may only traverse the file once. You may not count the number of rows manually. You may not use a linked-list with strings.

Are you up for it? Grab this textfile and go!

The solution is below the jump. But you solved it yourself, right?