David R. Heffelfinger

  Ensode Technology, LLC

 

How do kids these days get started in programming?


Back in the 80's, when I was growing up, all personal computers would come with a BASIC interpreter which you could use to write your own software. As a matter of fact, it was expected for end users to write their own applications.

My very first personal computer was an Atari 800,  I was in my early teens when I got it, it was a hand me down from my uncle, who had gotten himself a shiny new IBM PC.

 

During that time, computer magazines came with games and applications in source code form that you had to type into your computer in order to "install" them. A lot of us didn't know exactly what all these lines of code meant, but we wanted the game or application so we typed away, unfortunately typos were an issue, since we were just blindly copying what seemed like greek into our BASIC prompt. Fortunately BASIC was interpreted, so it would catch syntax errors immediately, but many times the syntax was correct, but there was still a typo in the line, making the program not run as expected. It could be frustrating at times, but it was very satisfying to finally get the code to work exactly right. You could also experiment and make little changes here and there to see if you could change the behavior of the software. I remember eagerly waiting for the next issue of A.N.A.L.O.G magazine to arrive in the mail every month to see what goodies it would bring.

It is worth mentioning that at this time there wasn't yet a dominant computer architecture for personal computers. Some of us had Ataris (8 bit and/or ST), others had Commodores (PET, Commodore 64 or 128, Amiga), others had IBM PCs, other architectures existed as well. What all of these architectures had in common was that they all came with a BASIC interpreter. As a matter of fact, in most cases, the machine would boot directly into a BASIC prompt. The BASIC versions of the machines were not 100% compatible across one another, since vendors modified them to highlight specific features of their own products, but in general your BASIC skills could be used across architectures.

I remember been amazed at the wonderful things you could make these machines do, it got me really motivated to learn to write my own software, not simply blindly typing code listings from magazines. A lot of software developers from that era got our start that way, at the time, the barrier of entry for software development was very low. I derived a lot of satisfaction in creating software, I would proudly show my creations to my friends and relatives. All of these got me motivated to pursue a career in software development, which is what motivated me to major in computer science when I went to college.

Somewhere in the 90's most of these various architectures disappeared, and the one true personal computer platform emerged, the IBM PC, or what we simply call a PC today. Just like all the platforms of the time, the IBM PC came with a BASIC interpreter, but unlike the others, BASIC wasn't built into the operating system, it was something you had to look for if you wanted to use it. When the PC became the de facto standard, the focus of having end users as programmers started to decline. Magazines stopped coming with BASIC listings for you to type in. When DOS 6.0 came out, PCs even stopped coming with a BASIC interpreter altogether. Now if you wanted to develop software, you had to install a compiler or interpreter yourself, which, sadly, is still the case today.

So I wonder, how do new generations of software developers get their start? It is not as easy to "get your feet wet" these days like it was back in the day. I wonder if they pick computer science without knowing exactly what they are getting into? It's a shame that software development is not as accessible as it once was.


 
 
 
 

OpenOffice.org Document Version Control With Mercurial


I've always wanted to put my documentation under version control, just like I do with my source code. However, word processor files are binaries, therefore not that well suited for version control (track changes aside). Of course, they can be committed, however, being binaries they can't be diffed very easily.

Standard OpenDocument Text (the default format for OpenOffice.org Writer since version 2), are nothing but zipped XML files. I searched around for an easy, automated way to unzip them and zip them "on the fly" as necessary, thinking that i could put the "raw" XML files under version control. However, I couldn't find anything that would help in that regard. Manually zipping and unzipping files seemed like more trouble than it's worth.

OpenOffice.org's word processor, Writer, allows us to save in formats that are text based, such as Docbook XML, Microsoft Word 2003 XML, and OpenDocument Text Flat XML (.fodt). I figured I could try to use one of these formats internally, since they are text based they would be "diffable" by Mercurial (or any other version control tool), then when I needed to distribute the document I could export to Word format, PDF or what have you.

I haven't had the opportunity to work with DocBook in the past, and I admit I've been kind of curious about it, so I tried this option first. Unfortunately it turned out I couldn't use this format since I frequently work with Word templates (even though I work with OpenOffice.org, word templates work fine in Writer) and it doesn't seem like DocBook supports them.

I then turned my attention to the OpenDocument Flat XML (.fodt) format, this format can work with word templates, and it is saved as a plain text (XML) file. It looked like the perfect solution. To test it out, I created a simple document, saved it as OpenOffice Flat XML, and committed it to a Mercurial repository. I then made a simple change to this document, and did an hg diff on it.

To my dismay, this very simple change (I just added a new paragraph with a single sentence on it) resulted in quite a number of diffs between the two versions. Apparently this format contains a bunch of metadata such as creation time, creator, the time the file was saved, etc. This metadata was creating a number of diffs that were irrelevant to the task at hand, which is to find out what change I actually made to the file.

At this point I considered using the Handling OpenDocument Files oodiff trick described in the Mercurial site, however this trick seemed to me more like a hack than a proper solution. When using this approach, files are checked in as binary, then when diffing, a tool called odt2txt to convert the document to plain text "on the fly" then diff the plain text version. The problem with this approach is that the files are still commited to version control as binary, and most version control tools are not very efficient in storing binary files.

At this point started using the above trick, however recently I found the color extension for Mercurial, which allows diffs to be color coded. After I installed this extension, I gave the .fodt format a try again, and I started to notice patterns of what to look for when looking for diffs. For example, paragraphs are nested inside a <text:p> tag, this makes it easy to find text changes. Images are stored inside a <draw:image> tag, which makes it straightforward to see if an image was added, deleted or moved. Tables use the <table:table>, <table:column> and <table:cell> tags, making it fairly easy to identify them. This seemed like a good solution, however after a while I noticed that sometimes making a simple change in the document (for example, adding a heading somewhere in the middle), created a bunch of diffs on the document again, for example, lines that were now farther down in the document were being reported as deleted from one place and added in another, which is inaccurate.

For now, I went back to the oodiff trick, even though it bothers me a bit that I am checking in binary files to the repository, however this approach  results in sane diffs that actually allow me to track what was changed in the document.


 
 
 
 

Excluding directories from zip files on Linux


I frequently have to turn in source code to one of my customers in zip files (not fancy nor sophisticated, but that's life).

Lately, I've been working on a project that uses good old plain ANT build files. I load this project into NetBeans as a free form project so that I can have a decent working environment. NetBeans of course creates its own folders and files so that it can open the project. I am also using Mercurial for version control, which creates an .hg folder that I don't want to distribute.

 I wanted to zip up the code, while excluding the directories and files that were not meant to be distributed (.hg and the NetBeans specific files and folders). I'm on Linux, therefore I usually use file roller, a graphical archive management tool for the GNOME desktop, to create my zip files. File roller is very easy to use, just right click the directory to be archived and select "create archive".

Unfortunately there is no way to easily exclude files or directories from the zip file, I thought I could zip up the whole thing, then delete the unwanted files and directories. This worked fine for files, but for directories it deleted the files in the directory, but left the directory in the zip file.

Obviously file roller wasn't meeting my needs here, it was time to go to the good old command line. Most Linux distributions come with a command line zip utility appropriately named "zip". I read the man page and found a way to tell zip to exclude files and directories from the created archive, all that needs to be done is use the -x switch and list the files and directories to be excluded, separated by spaces, for example:

zip -r filename.zip directoryname/* -x directoryname/.hg\* directoryrname/nbproject\* directoryname/catalog.xml

The above command will do exactly what I needed, which is to create a zip file without the Mercurial and NetBeans specific files and directories. Of course any file or directory name can be passed as a parameter to the -x switch.

 
 
 
 
 

« March 2009 »
SunMonTueWedThuFriSat
2
3
4
5
6
7
9
10
11
12
13
14
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today

 
© David R. Heffelfinger