Tuesday, January 31, 2012

Regular Expressions vs. Legacy String Functions

Back in the day of Visual Basic 6, when you wanted to find some information from a long string, you were (almost) required to write a function that parses your string section by section and use switches or if blocks. In today's world, regular expressions lighten the load quite a bit. As I've been developing apps over the years, I've learned to love regular expressions and figured it was worth mentioning in a post.

Regular expressions are basically patterns that have their own parsing engine tied to them for stripping text-based information from a string of text. For example, say I want to strip a certain value from a nasty looking string, and I know that the string will be in a certain "format". By this I mean that there might be parts of the string that don't change, but the parts that do change, I want to use them for something else. Instead of cracking my knuckles and writing a loop to sift through the string character by character, I'm able to write an single expression to strip out "matches" one by one.

Consider the following connection string:

            Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\data.xls;Extended Properties="Excel 8.0;HDR=YES";

From this string, I want to be able to pull out the value of 'Data Source' (in this case c:\data.xls) and the value of 'HDR' (ie. YES or NO). Back in the day, I would write a few lines of code to Split() the string into pieces and then strip those pieces into shorter pieces and save them into variables. Something like this:

Dim filePath As String
Dim hasHeader As Boolean

Dim sections() As String
sections = Split(connectionString, ";")

For Each section As String In sections
  Dim pieces() as String
  pieces = Split(section, "=")
  If pieces(0) = "Data Source" Then
    filePath = pieces(1)
    If pieces(0) = "Extended Properties" Then
      Dim properties() As String
      properties = Split(pieces(1), ";")

      For Each prop As String in properties
        Dim values() As String
        values = Split(prop, "=")

        If values(0) = "HDR" Then
          hasHeader = CBool(values(1) = "YES")
        End If
    End If
  End If

Look at all of that code!? Just to get two values from a single string. I used to write code like this in VB6 all the time. There are lots of assumptions in that code and it is not optimized at all, let alone understandable at first site. There are many chances for errors to happen in that code as well... Not fun.

Regular Expressions to the Rescue
When .NET was introduced, there was this new (to me) concept of regular expressions that allowed pieces to be picked out of the string as needed using a "pattern". This is very common in Unix and many other programming languages, but it was new to me at the time and intimidated me. It was one more thing I had to learn. I used to approach it with dread and felt like I had to relearn the syntax every time I used it.

Today, I actually think in regular expressions a lot of time. There are many special characters that allow the regular expression engine to understand certain functions. For example a "$" means "beginning of string". A "." means any character. A "+" means "one or more times". A "*" means "zero or more times". When you put these special characters together, you can do some very powerful things. There is all kinds of documentation on the internet to help you understand this language if you are interested. If you are new to computer programming, then I recommend that you learn this sooner than later. It will make your life much easier.

To get the values from the above connection string using regular expressions, it requires a very simple regular expression pattern:


Once I've defined this pattern, I can use it to strip out the "filename" and "hasheader" values very quickly and efficiently. While this example is border-line elementary, consider stripping values from a 20kb Xml file or a huge Html string that you stripped from a web page. Better yet, consider the power that it offers you when parsing a 5mb log file for information. 

This might be over many peoples heads and it might be common knowledge for others. For me, it was way over my head for a few years. However, after using it so much and relying on it, it's become common knowledge; it is baked into my daily routines now. Regular expressions are a very common thing inside Vim, and many Unix command line programs. For example, if I were to paste the connection string into Vim, I could place my cursor at the beginning of the connection string and simply type "d/Data Source" and the entire string from "provider" all the way up to "Data Source" is removed.

Here is a killer utility application that you should put on your thumb drive and use if you need to parse a large text file for values: http://regexlab.codeplex.com/

It's free, and I've found it to be very very powerful when constructing complex regular expressions in my daily programming tasks.

Regular expressions. Learn them. Use them.

Thursday, January 19, 2012

Chesterville, ON

I'm out of town on business. I'm currently in the little town of Chesterville in Ontario, Canada. It's a beautiful place and I'm loving it. There's lots of agriculture and farms here, but it's all under about 4 inches of snow right now. As you might imagine, the weather is a bit cold... OK it's really cold. There is snow and ice everywhere and the wind is pretty brisk at times. Luckily I'm indoors most of the time and don't have to deal with it. The apartment that I'm staying in is upstairs from the office that I'm working in. Therefore, the only time I go outside is when I want to run and when I go out for lunch/dinner.

Here's where Chesterville is located:

View Larger Map

I landed early Tuesday morning and have been treated like a prince since my arrival. My host, Birket, is a wonderful host and has taken me to lots of fun events. As soon as I landed, he immediately took me to a great restaurant for lunch where he knew the owners and we got top notch service with some amazing food. I was a special guest at his rotary club meeting. I have got to enjoy the company of some pretty amazing locals. I even got to experience Canadian pub life (hockey et al) two nights in a row. Last night I enjoyed some Mexican cooking classes at his pub and that was extremely fun. I've enjoyed some fine cuisines at various local restaurants. Everybody here has been so hospitable.

The scenery up here is amazing as well. Yesterday I took a 4.5 mile run along the frozen highway and it was a beautiful run. Kind of technical due to all the ice and frozen slush, but beautiful none-the-less. I took off out the door around 4:15 pm and the sun was setting on the horizon. It looked like the entire western sky was on fire. Unfortunately I had no camera, so you'll just have to take my word for it. The weather was "butt-cold" and I had a frozen ice-stache on my upper lip to go with my red nose and red cheeks. I think some of the drivers thought I was crazy, but I really enjoyed the run. I do plan to get in some miles tomorrow before my flight home if time permits... We'll see.

Tonight I went to a nice restaurant with Raymond (an old friend from my Virginia trip last spring) and we had a great evening together. When I got back to the apartment, I had a lot of time to myself, so I dialed up Jennifer on Skype and we've been talking for the last hour or so. We caught up on the weeks events. Apparently we had a showing on our house today. I liked the way that worked out because I didn't have to be there to do any of the cleaning. :) We're pretty sure it will fall through and our contract is up with our realtor in the next week or two. Looks like we ain't goin nowhere. No worries. We'll make it work.

So I have a flight back to the states tomorrow and I'm pretty excited to get back to a normal schedule. The last month or so has been absolutely crazy. Lots of holidays followed by lots of travel. My goal is to get back to the basics and catch up on some projects I've been putting off too long (talking about you Pop!). :)

Here's what I'm looking forward to this weekend:

Jennifer and Lizzie <3