I hate HTML. I know that is a little unfair, after all, so much of my life revolves around it. Like everyone in the tech industry, there is a browser open on my computer almost all the time and without it, I’d lose my major source for news, entertainment, and communication. But I still hate dealing with it. Perhaps if HTML had been more like xhtml from the start (You know what I mean. The tight syntax, all tags closed etc.), or if XHTMLs other problems (validation, compatibility with existing HTML infrastructure, and browser support etc.) were solved, I wouldn’t be writing this.

The worst thing, though, is having to hand edit HTML that has been generated by other software, be it an Office document or HTML created by web page authoring software. This was the situation I found myself in recently when I was asked to do some simple web page maintenance for a local volunteer organization.

The site was hosted by Yahoo! Japan and was originally authored by Japanese web site software from a major unnamed corporation, which I will just refer to as HAL. I was not given the original project files from which I could make changes and regenerate the html, so I was going to have edit the HTML directly. The Yahoo! Japan portal provides web-based maintenance tools for the sites they host. Through a browser you can see a list of the directories and files in your site, backup and rename files, and edit and save HTML files in an editor control. You could also upload new files. These are crude tools, but for the changes I had, it should have worked. I didn’t want to manually download and upload each page.

I pulled up the first page in the editor control, and started editing. The file was many times longer than needed, filled with obtuse metadata and other unintelligible nonsense. The page switched between English and Japanese frequently, often many times in one sentence, and each change in language required that section to be enclosed in yet another <span> tag with the proper lang attribute. In many cases, a single word was nested in 2 to 3 identical <span> tags. Most lines were about 80 characters wide, but there were also 500+ character lines just to make things interesting. (I hate HTML.) Anyway, without too much trouble, I was able to find the content and make the changes. It wasn’t hard, just tedious.

I was now ready to view the result. I hit save to close the editor control and return to the file list. I switched to another browser tab that had the page loaded and hit F5 to refresh the display. Every bit of Japanese text was total garbage. Argh!!. I inadvertently changed something that ruined the display of the entire page. I knew I’d have to try again and be more careful. So I open the page in the editor control again and much to my chagrin, now all the Japanese text in the original file was garbage as well. The whole page was ruined. It hadn’t just a bug in my edits causing the display to go bad. (Another lesson in why you should back up your content before you start editing. Fortunately I had. Whew!)

I was stuck. A little experimentation quickly revealed that just loading the file in the Yahoo! Japan editor control and immediately saving it still ruined the Japanese. Not very impressive considering this was a site specifically for hosting and maintaining Japanese websites! (To be fair, this did not happen to every page on the site. Most I could edit just fine.) The problem was that the Yahoo! Japan on-line editor control did not resave the file with the correct encoding. It could read the file, display the file, and edit the file, but not save the file correctly. It was actually rather pitiful. Would it have worked had I been running on a native Japanese version of Windows XP and IE? Who knows? I would hope so, but that wouldn’t help me. I had the same problem in IE and Firefox. Could I have changed my browser settings to fix the problem? I don’t know, but I’ve never had a problem with any other Japanese page.

I was about to resign myself to having to recreate the whole page from scratch and upload it. Then I remembered SlickEdit handles different encodings. So I pulled up the backup copy in the Yahoo! Japan editor control and opened a new blank HTML page in SlickEdit and selected Japanese (Shift-Jis) encoding. After a simple copy and paste into SlickEdit, I had the original contents in a real editor (syntax highlighting, completions, HTML beautifier, parenthesis matching etc..) that could preserve the encoding. In a lot less time, thanks to the features in SlickEdit, I had the changes in again, saved properly to Japanese (Shift-Jis), and uploaded back to Yahoo! Japan.

Thank you SlickEdit.

(But I still hate software generated HTML.)