Monday, December 21, 2009

Crazy Ideas I Think Up At My Job

Normally I discuss politics and political ideas here.  But today I want to talk about something completely different.  I doubt any regular readers will enjoy or understand it all that much, however, because it has to do with my current profession.

I am a software developer by trade.  I write computer code and design software applications for the people who pay me.  I love what I do, provided it is challenging and allows me to engage my problem-solving skills.  I guess it’s one of the few problems that I can solve in my life…

Anyway, when I’m at work, I sometimes hit upon really good ideas.  Usually these involve framework items, that is, code that can be used across multiple programs.  This Friday, I hit upon something that may be incredibly useful for people: an encoding-specific string class for the Microsoft .NET Framework.

A string in computer programming is simply a string of characters.  The .NET Framework provides a really good class definition for string objects.  However, when you are dealing with text types in programming, you have more than one encoding.  A text encoding is simply a character set that defines what number corresponds to what character.  The .NET Framework, by default, uses the Unicode encoding.  This encoding class contains over 65000 characters and thus covers most written languages, but not all (I think some forms of Chinese and other East Asian languages are left out).

But what if a programmer needs to use a different encoding that is not supported by Unicode?  There are other types out there, as well as language specific types that have smaller character sets and thus take up less space (Unicode takes two bytes per character, specific language types usually take one byte unless the character set is much larger than 256).

So I came up with the idea for a string type that uses generics.  (Don’t worry, if you don’t understand any of this, I’m glad you’ve read this far).  Basically, the class definition would look like this:

public sealed class String<TEncoding> where TEncoding : Encoding

{

private byte[] _text;

private TEncoding _encoding;

//class implementation

}

While there is going to be a lot more, the two fields would be all you need.  I’ve set up the constructors to either accept a specific Encoding object or infer it from the actual TEncoding type.  I plan on adding as much functionality as appropriate to this class so that it mimics the System.String class in the .NET Framework.  I probably will ignore the culture-specific stuff and only make comparisons based on ordinal comparisons because it is impossible to really do otherwise.

I’ve worked out the constructors and some of the properties.  I next decided to do the System.Object overrides, so I made the ToString() method return the System.String converted using the _encoding object.  The GetHashCode() method was a little tricky, but I finally decided to use ToString().GetHashCode() because I wanted to allow value-based comparisons between System.String and this class.  I may change that in the future, though, and add in my own hash function, probably based on what is done with anonymous types.  It all depends on how I develop the comparisons.  Since System.String has only one encoding type, I may just do my own since you will be able to convert it to System.String anyway.

Next I plan on working on operators and Static members.  Concat and implicit/explicit conversions will be thrown in so people don’t have use the constructor every time they declare a new type.  The last thing will most likely be the various member methods that transform the string and return a new string object as a result, like Substring and all that.

It really is a shame that Microsoft didn’t implement generics with their first release of the .NET Framework.  I think much of the Framework would have been missing and much of the design would have been very different than it is today.  I’m sure the string type they implemented would have some similarity to what I’m doing.  In the end, the reserve word in C# (string) could have just simply mapped to String<UnicodeEncoding> and much of the same implementation would have been used.

Of course, that also leaves the possibility of implementing some kind of struct for Char where you pass in the encoding type.  I may design something like that in the future, but for now I’ll just do the String.  If I figure out something for Char, I may implement the above class a little differently then.