Blogging With Google Docs, Part 2

January 31, 2017

Recently, I posted a blog about using Google Docs for writing and posting blog entries on themnmoores.net by hand editing the HTML generated by Google Docs. It would be helpful to refer back to that blog post for some background on the process I am trying to automate.

http://themnmoores.net/RichBlogs/BloggingUsingGoogleDocsDec242016/BloggingUsingGoogleDocs.html

This post covers the tweaks, additions, and improvements I discovered while creating Javascript code to automate the process of modifying the HTML generated by Google Docs using the “download as web page” file menu option. The HTML and Javascript code for doing this are located at:

http://themnmoores.net/FormatGoogleDocHtml/formatGoogleDocHtml.html

http://themnmoores.net/FormatGoogleDocHtml/formatGoogleDocHtmlFuncs.js

The first issue to solve was finding a Javascript framework that would allow for reading ZIP archives. In past work I have good success using jszip.js.

https://stuk.github.io/jszip/

It had been a while since I had used jszip, and in version 3.0 the framework changed to using promises for asynchronous operations. Since I do not do much Javascript development and do not really track new developments in Javascript promises was something new to me. While diving into promises it reminded me of doing asynchronous operations in iPhone development, and seems like a good way forward for Javascript. In implementing my read operation using jszip I ran into an issue. I could open the zip archive and get archive information but could not always extract a file, and figured I must be doing something wrong (which I was) but could not figure it out. Since I was hitting a bit of a dead end I figured maybe a different framework might be worth a try so I tried zip.js.

https://gildas-lormeau.github.io/zip.js/

I was able to get things working with zip.js fairly quickly, but soon noticed that I could not debug using the Chrome developer tools on my local machine. This was due to how zip.js was loading dependent Javascript code from external files for compression/decompression. Debugging was possible if I loaded all the files to the website, but I felt this was not a good long term solution. So I abandoned zip.js and went back to jszip.js. The diversion allowed me to realized that I was trying to access a var that had gone out of scope. Hence the flaky nature of having data and not having data at times. You would think after 35 years of coding I would not make such a mistake but it seems to be one I fall into from time to time. Here is the Javascript code for reading the HTML file from the zip file Google Docs generates.

   var htmlZipFile = new JSZip();

   htmlZipFile.loadAsync(evt.target.result)
   .then(function success(zip) {
     console.log('Read zip file');
           var filesStr = "HTML Files (should only be one):<br><br> ";
     var fileNames = zip.file(/.html/);
           for (var file = 0 ; file < fileNames.length ; file++)
            {
                    filesStr += fileNames[file].name + "<br>";
            }
     document.getElementById('archiveFileContents').innerHTML = filesStr;
            
            if (fileNames.length != 1)
            {
              filesStr += "<br><b>Invalid Google Docs Save As HTML ZIP arcive!!!!</b>"
       document.getElementById('archiveFileContents').innerHTML = filesStr;
       return;
            }
            
     zip.file(fileNames[0].name).async('string', function (meta) {console.log("Generating the content, we are at " + meta.percent.toFixed(2) + " %");})
     .then(function success(content){


Once the HTML file contents have been read as a string we can do the manipulations needed to the HTML data and then write it back into the zip archive.

First, I add some header information and import my Javascript file of formatting utilities (http://themnmoores.net/formatting.js).

        content = '<!doctype html>\n'+ content;

        content = content.replace('<html><head>',

            '<html>\n\n<head>\n<script src="../../../formatting.js"></script>\n\n');

The <head> tag information is modified to include Google Analytics and to add <meta> information for a description and keywords.

        var keywords = '';

        var beginOfKeywords = content.indexOf('>keywords:');

        if (beginOfKeywords != -1)

        {

          keywords = content.substring(beginOfKeywords+11,

                 content.indexOf('</span>', beginOfKeywords));

        }

        var description = '';

        var beginOfDescription = content.indexOf('>description:');

        if (beginOfDescription != -1)

        {

          description = content.substring(beginOfDescription+14,

     content.indexOf('</span>', beginOfDescription));

        }

       

        content = content.replace('</style>',

'</style>\n\n<script>addGoogleAnalytics();</script>\n\n<meta name="description" content="' + description + '">\n<meta name="keywords" content="' + keywords + '">\n\n');

At first I had been manually adding the keywords and description to the converted HTML file, but after forgetting to do that many times I figured I should automate that. As you can see from the bottom of this file I have added two lines that are then parsed out by the code above to add description and keywords <meta> tags.

keywords: Blogging,Google Docs,Javascript

description: Blogging with Google Docs Part 2, Automation

In the <body> tag before the first <p> tag we add styling to make the document match themnmoores.net website color, header, and to add navigation buttons for the website using a call to Javascript in the formatting.js file.

        content = content.replace('<p class=',

 '\n\n<script>setBodyBackgroundFormatting();</script>\n<div id="headerTopBar"></div>\n<script>commonPageHeaderBar("","../../../");</script>\n<script>commonNavivationButtons("../../../","");</script>\n\n<p class=');

To position the Google Doc HTML content properly on the website the <body> padding is adjusted. I had been using 1 inch margins (the default in Google Docs) and have switched to ½ inch margins which meant I had to have two possible replacement operations.

       

        // Adjust the positioning of the content, 72pt is 1 inch and 36pt is 1/2 inch

        content = content.replace('padding:72pt 72pt 72pt 72pt',

'padding:200px 72pt 72pt 300px')

        content = content.replace('padding:36pt 36pt 36pt 36pt',

'padding:200px 36pt 36pt 300px')

At the end of the <body> tag we add a Javascript call to add commenting to the page using HTML Comment Box. Adding comments turned out to be pretty easy, you just sign up on https://www.htmlcommentbox.com/ and insert the code provided to add a comment box. This is what the Javascript function addHTMLCommentBox code does. Just beware, it will not display when you are running from your local machine.

One last thing I had to deal with was cleaning up how Google Docs output links in the HTML code. Google assumes that you would want to redirect through their servers as you can see below.

<a class="c7" href="https://gildas-lormeau.github.io/zip.js/"c0 c1">

This causes a redirect message and sometimes a protection message to appear. The following code removes the extra Google stuff from each <a> tag in the HTML file.

        content = content.replace('</body>',

'\n\n<script>addHTMLCommentBox();</script>\n\n</body>');

       while(content.indexOf('') != -1)
       {
         content = content.replace('', '');
       }      
       while(content.indexOf('"c0">                                         endOfTrailerStuff);
         content = content.replace(stringToReplace, '');
       }
       while (content.indexOf('?vertscrollspan=') != -1)
       {
         content = content.replace('?vertscrollspan=', '?vertscrollspan=');
       }

As you can see the <a> tag is cleaned up nicely:

<a class="c7" href="https://gildas-lormeau.github.io/zip.js/">https://gildas-lormeau.github.io/zip.js/</a>

Once the HTML content has been modified it is written back to the zip archive replacing the existing file. The modified zip archive is saved to the computer (downloaded) using FileSaver.js (https://github.com/eligrey/FileSaver.js/), which is a nice solution.

        zip.file(fileNames[0].name,content);

        zip.generateAsync({type:"blob"})

        .then(function success(zippedFile) {

          // Save to file

          outputFile = fileNames[0].name.replace('.html', '_new.zip')

          saveAs(zippedFile, outputFile);

        },

        function error(e) {

          document.getElementById('archiveFileContents').innerHTML =

'<br><br><b>ERROR creating new zip file: </b>:' + e;

        });

I will be the first to admit that the Javascript code is not very highly optimized and is a brute force approach to solving the problem. After some years in the programming business I figured out that brute force approaches are fast to implement, work, and are easier to follow and maintain. It is amazing how many times such code really does not need to be optimized for performance and does the job needed. Since this is an utility that is run a few times a week and it executes in well under 1 second there is no need to worry about optimizations.

Copyright 2017, Richard J. Moore

keywords: Blogging,Google Docs,Javascript,

description: Blogging with Google Docs Part 2, Automation