Stoelzel Software Technologie SST
         
         
Information X
OK
 
Please note, this application is still a prototype !
The HTMLStripper is currently still in an early stage of development. Version 0.4 is a preliminary and largely rudimentary implementation, not a fully developed and tested product.
Although it is comparatively stable, many of its features are not fully functional, don't always function as expected, or have yet to be implemented.
Nonetheless, it can be used.
       
HTMLStripper App Icon   HTMLStripper Version 0.4

Preliminary User Guide
       
Click to expand or collapse Topic Hierarchy  
Click to expand or collapse Related Topics  
  This documentation is preliminary in nature. It applies to the prototype, version 0.4 of the application currently named HTMLStripper, the user interface and functionality of which are likely to be subject to numerous changes in the prototype versions leading up to the first, fully implemented, version 1.0.
It therefore assumes that the user is, at least, familiar with the basics, of using Microsoft Windows, computer applications. Furthermore, as the guide does not provide information on how to integrate the generated data into HTML/XML or Microsoft Compiled Help (.CHM) projects, the user should be proficient in the use of various other applications, to make full use of the described features.
Nonetheless, the guide does cover the most relevant aspects of using the application, specifically these are
Acquiring and providing the source file(s)
Using the Integrated Browser to Acquire and Save the Source Code
Converting the Source File into an ANSI or Unicode Encoded File
Opening the ANSI or Unicode encoded Source File in the HTMLStripper
Deciding on and Creating a List of Words to Suppress/Ignore
Processing the Source Code / Stripping the Source Code of its Tags
Running the Auto Find and Add Assignment
Running the Find and Add Anchors Assignment
Adding Keywords to the Keywords List
Selecting and Editing Words and/or Phrases to Add to the Keywords List
Editing Keywords in the Keywords Tree or List View
Modifying the Links to the Keywords
Deleting Text, Words, and Keywords
Exporting the Keywords List
Saving the Created and/or Processed Files
Opening/Loading Files
Creating a Compiled Help, .hhk Index File, Step by Step
Using the HTMLStripper with Unformatted Text Files
Using the HTMLStripper for Proofreading
Basic Procedure(s)
Acquiring and providing the source file(s)
Converting the source file into an ANSI or 16-bit, Unicode encoded file, if necessary
Opening the ANSI or Unicode encoded source file in the SST HTMLStripper application
Deciding on and creating a list of words to suppress/ignore
Stripping the source file of its tags
Selecting and/or editing the words and/or phrases to add to the keywords list
Adding words, phrases, etc, from the word list(s) and stripped contents to the keywords list
Exporting the keywords as a keywords meta tag or Microsoft compiled help, .hhk, keywords, index file
Saving the created and/or processed files
Acquiring and Providing the Source File(s)
Performing the following three steps is only necessary if you can't access the source code files on a local or mapped, network drive (i.e. if they can only be accessed by means of an Internet browser, FTP program, etc.).
1.  Open the URL of the page you want to create a keyword index or keyword meta tag for. Although this can be done in the integrated browser, the pages of many websites on the Internet are not displayed correctly in it.
This is not a inadequacy or bug of the HTMLStripper prototype! It can be attributed exclusively to the fact that many web developers obviously don't deem it necessary to implement their websites for anything but the latest browser generation.
But, performing this step in the integrated browser is merely a question of comfort, not so much of functionality. Opening the page in the browser you normally use will, in most cases, do just as well.
2.  Once it has been fully loaded, save the page to a file on your hard disk (or some other storage medium that will make the HTML/XML, source code, file available on your computer).
However, irrespective of the browser you use, we recommend to save the file encoded in either the ANSI or Windows, Unicode character set.
Although the prototype appears to work faultlessly with European languages, encoded in UTF-8, this recommendation includes pages that contain characters not found in the English alphabet (e.g. German Umlaute, French accents, etc).
It is also imperative that the page be saved as a HTML (or XML) and therefore a plain text file (even if the file extension/suffix is not .txt !).
This should be borne in mind, because some browsers save web pages in their entirety (i.e. including, graphics and everything else) in a compound file that can only be reliably opened in the browser with which it was created. If you're uncertain if this case, we recommend verifying that it is not, by opening the saved file in a text editor (e.g. Windows Notepad). If the file contains readable text together with a lot of undecipherable symbols, it's more than likely that it's a compound file (for example, a .mht file). In such cases, simply save the file again in another format or using a different browser.
If you decide to use the integrated browser, you should also be aware of the fact that, unlike the integrated browser in version 0.3 of the HTMLStripper, the graphics and other files (e.g. scripts, style sheets, etc.) referenced in/by the source code are now saved to disk together with it. Therefore, when the source code file is saved to a local location, in the HTMStripper Version 0.4, the file ought to be displayed correctly, not only in the integrated, but in all other browsers as well.
3.  Repeat the just described steps for all files you want to process or include in your keyword index.
Using the Integrated Browser to Acquire and Save the Source Code
These steps only need to be performed if you want to use the Integrated Browser, don't have access to the HTML (or XML) source code on a local (or LAN) drive, and have not already performed the steps described under Acquiring and Providing the Source File(s in another application.
Furthermore, using the Integrated Browser of the HTMLStripper prototype has both advantages and drawbacks. These being,
•  The page may not be displayed correctly (see remark, above).
•  When saving a page in the Integrated Browser, external files (e.g. graphics, videos, scripts, style sheets, etc.) are not saved with it, resulting in the page not being displayed correctly if subsequently opened in a browser.
•  You don't have to switch between applications.
•  If the page is displayed correctly in the Integrated Browser, it can be saved directly in the required format, making converting the file into an ANSI or Unicode encoded file superfluous.
Should you decide to use the integrated browser (in spite of its current shortcomings), proceed as follows
1.  Click on the "Browser", button style, tab (below the combo boxes and Button Tool Bar) to switch to the Browser Tab Sheet.
2.  Enter the Internet address (aka URL) of the page to open in the HTMLStripper's Source File(s) Combo Box.
3.  Select/click either the Open URL Menu Item in the View Menu of the Main Menu or click on the Open URL Button in the Button Tool Bar.
4.  Wait for the page to be loaded in the Integrated Browser.
5.  Open the File Menu in the Main Menu.
6.  Select/click the File Menu's "Save as ..." Menu item. This will open the Save HTML Document Dialog.
7.  In the Language or Encoding Drop-Down List *1, at the very bottom of the Save HTML Document Dialog, select the character set and/or language in which the text that is displayed to users was authored. If the Language Drop-Down List provides multiple choices, you can select either an item with the supplement "(Windows)" or "(ISO)", or the "Unicode" item. Alternatively, if the page's text is displayed correctly with the preselected item, and this has one of the required supplements (e.g. "(Windows)"), you can simply continue with the next step.
8.  In the Save as Type Drop-Down List, which is located immediately below the File Name Combo Box, select either the "HTML File (*.htm, *.html)" or the "Text File (*.txt)" item. Both file types can subsequently be opened/loaded as source files and processed by the HTMLStripper. However, if the latter (i.e. the "Text File (*.txt)") is chosen, most browsers, including the Integrated Browser, will not display formatted output. In other words, the source code will be displayed as in the HTML Source File Viewer/Editor or any other editor.
9.  Save the file to disk through a click on the Save Button.
Converting the Source File into an ANSI or Unicode Encoded File
Unfortunately, to produce correct output, the current prototype (version 0.4) may still necessitate that the input, source code is manually converted into an ANSI or Unicode encoded file.
Whether or not this is necessary, can depend on various individual and/or a combination of factors, such as the language(s) in which the text exposed to the user was authored, the operating system, its version, edition, and user interface (UI) language, and the Internet (or other) application (e.g. browser, e-mail client, etc.) used to create or save the source file.
However, in the event that a conversion is unavoidable, the text editor that ships with all Microsoft Windows operating systems as of Windows Vista can, in most cases, be used to perform the conversion. Here is how it's done.
1.  Open the source file in Windows Notepad.
2.  Open the Notepad's File Menu and select/click the Save As Menu Item.
3.  In the Save As Dialog, enter a slightly different name for the file in the dialog's File Name Combo Box. For example, you could simply append a capital A or U to the name (not to the extension/suffix) part of the file name, to characterize it as the ANSI or Unicode (encoded) version of the original file.
4.  Depending on the Windows version you are using, select either the "ANSI", the "Unicode", or the "UTF-16 LE" item in the Encoding Drop-Down List, at the very bottom of the dialog. The latter two (i.e. "Unicode" and "UTF-16 LE"), of the referred to three items, will save the file encoded in the required Unicode format, the first item, as an ANSI encoded file.
5.  Close the dialog by means of a click on its Save Button.
6.  Open the file you have just created and saved in either another instance of Notepad or the Internet application for which it wss developed and verify that the contents are displayed correctly.
Opening the ANSI or Unicode encoded Source File in the HTMLStripper Application
Once you have a source code in one of the two, currently, fully, supported formats (i.e. either an ANSI or a Windows Unicode encoded HTML or XML file), you can load it into the HTMLStripper's Source Code Viewer/Editor.
1.  To open the dialog with which you can select the HTML (or XML) source file, you can use either the "Select Source File(s) ..." menu item in the File Menu of the Main Menu or the corresponding button in the (Button) Tool Bar. A click on either one will open the Open File Dialog.
2.  Using the controls at the top of the Open File Dialog, select and open the folder in which the source file(s) you want to process is/are located.
3.  Select the file you want to process in the Open File Dialog's List View.
4.  Close the dialog by means of its Open Button.
5.  Verify that the source code is displayed in the viewer/editor on the HTMLStripper's HTML Source File Tab Sheet.
Deciding on and Creating a List of Words to Suppress/Ignore
Normally, opening/loading and/or editing the list of words to suppress/ignore would be the next step, prior to reducing the source code to the text(s) exposed to the user.
However, initially you are unlikely to know (exactly) which words were used and how often they occur in the texts you are processing, nor how relevant (or irrelevant) they may be as keywords in a keyword meta tag or an index. Furthermore, adding the words to ignore, word by word, would be tedious, to say the least. It is far easier to copy and add all the words used in the texts from the generated lists of distinct words, and then remove those you consider relevant, or at least, potentially, not irrelevant.
Nonetheless, so that you can see the effect the list of words to suppress/ignore has on the resulting list of distinct words, you might want to suppress such words like "menu", "cookie", or any other common words that you can think of and that are sure to occur in the texts of your first source files. To do this,
1.  Switch to the "Suppress(ed) Words" tab sheet by clicking on the "Suppress(ed) Words", button style tab, immediately below the combo boxes and button toolbar, in the upper part of the main window.
2.  Add each word you want to suppress/ignore in a separate, new line.
3.  When you're done, save the list by selecting/clicking on the "Save as ..." menu item in the File Menu of the Main Menu or by pressing the Save As Button in the Button Tool Bar.
When you process the first source file(s), the words you have just entered in the Suppress(ed) Words Editor will not appear in the list of words on the Distinct Words Tab Sheet (in other words, they will have been suppressed). It will also result in this list (of words to suppress) being loaded automatically when you run the HTMLStripper.
To use a different list with other source files, simply load the desired list by
1.  switching to the "Suppress(ed) Words" Tab Sheet, as already described,
2.  selecting/clicking the "Open ..." menu item in the File Menu of the Main Menu to display the Open File Dialog,
3.  using the controls at the top of the Open File Dialog, to open the folder in which the suppress(ed) words list, text file is located
4.  selecting the suppress(ed) words list file you want to use, in the Open File Dialog's List View,
5.  and closing the dialog by means of its Open Button.
Processing the Source Code / Stripping the Source Code of its Tags
Processing the source code, stripping it of its tags, and generating the list of distinct words, inducing the HTMLStripper to find keywords already selected in other files, or locating and adding the anchors, closest to a keyword is probably the easiest part of the whole procedure.
However, with the sole exception of the last task (i.e. automatically locating and adding anchors to the links to the keywords, see Running the Find and Add Anchors Assignment, below), the first step is always to strip the ource code of its tags. That is, reducing the formatted text to the text that can be read by a human.
All you have to do, once the source file has been loaded into the Source Code Editor, is
1.  Switch to the Stripped Contents or Distinct Words List Tab Sheet
2.  If it is not already preceded by check mark or bullet, check the "Strip" menu item in the "Select" sub-menu of the "Assignment" menu item of the main menu. To do so, simply open the Assignment Menu and click on the "Strip" item in the "Select" sub-menu.
3.  Select/click Run, in the Assignment Menu of the Main Menu or click on the corresponding button in the (Button) Tool Bar.
4.  Wait for the stripped contents to be displayed in the Stripped Contents Editor, or in the lists on the Distinct Words List Tab Sheet
Running the Auto Find and Add Assignment
If you already have a list consisting of several keywords and you would like your index to include a reference to these keywords in other files of your project, you do not have to manually add a link to these keywords for every file. In such cases it suffices that, after having stripped the file of its tags, you run the "Auto Find and Add" assignment. This is achieved by
1.  Switching to the Auto Find and Add Tab Sheet.
2.  Opening the Select sub-menu in the Main Menu's Assignment Menu.
3.  Selecting/clicking the "Auto Find and Add" menu item in the Assignment menu's Select sub-menu. This ought to result in the "Auto Find and Add" menu item to be preceded by a check mark or bullet.
4.  Running the Auto Find and Add Assignment either by selecting the Run menu item in the Assignment Menu or the corresponding button in the Button Tool Bar.
5.  Waiting for the HTMLStripper to finish searching the currently open file. This is indicated by a message being displayed on the Auto Find and Add Tab Sheet, in which the HTMLStripper notifies you of how many keywords of your keywords list it has found in the current file.
6.  By removing or adding the check marks before a keyword or phrase in the Auto Find and Add List View, you can determine for which of the found keywords a link will be added to the keywords list. Per default, all found keywords are checked and will therefore be added, when the "Add Source File as Link" menu item is selected/clicked.
Running the Find and Add Anchors Assignment
Like the index of a book, the references to a keyword in the list generated by the HTMLStripper, the entries in the index point to a specific page, in our case, not on paper, but instead, on a page which is typically a HTML formatted text file. The links in the .hhk (or .csv) file, as exported by the HTMLStripper will then open the thus indexed/referenced pages.
But, there is a marked difference between a book's index and one such as for HTML pages. Primarily, this difference is that typically a book's page contains a more or less constant number of words. Although the exact number depends on the language, the nature of the book, as well as several other factors, the average English language novel, for example, normally has no more than 600 words per page. Thus, if a specific page is specified in an index, the reader has to glance over no more than these 600 words, in order to find the text containing the one word he was looking for.
HTML/XML formatted pages, on the other hand, may contain several thousand or even tens of thousands of words. For which reason the HTML specification includes what are termed "anchors". These anchors are normally invisible markers, which mark a specific section, paragraph, sentence, or even word, in the text. They make it possible to specify hyper-links, which will not only open the respective page, but also bring into view the text (or other element) marked by such an anchor.
However, to specify such an anchor, it is necessary for the creator/author of the index to know the "name" of the anchor (if any) which is nearest to the keyword for which a link is to be added. Unless the list of keywords is authored together with the actual text, this would require manually locating the keyword and the anchor closest to it in the formatted text (i.e. the source code). This would have to repeated for each keyword on every referenced/indexed page. Needless to say that for an index consisting of several hundred keywords, which are located on several hundred or even thousand pages, this would be an incredible amount of work. Which is why we have automated this too.
Thus, once you've completed your list of keywords and phrases and don't intend to add any more pages to your index, you can induce the HTMLStripper to add the nearest anchor, to each keyword in your list, on all pages, in one, single assignment.
1.  Open the Main Menu's Assignment Menu.
2.  In the Assignment Menu's "Select" sub-menu, check the "Find and Add Anchors" menu item. This is achieved simply by clicking on it.
3.  In most cases the HTMLStripper is able to locate and open the required source code file fully automatically. However, depending on which other files were opened since the link to this file was added to the list, the HTMLStripper may open a dialog and you may be required to select/specify some files manually, using this dialog. In such cases, simply navigate to the folder in which the required file is located and select it, using the provided dialog's controls.
Note, the dialog may displayed several times, once for each source file it cannot find.
4.  Close the provided dialog by means of its OK Button.
Adding Keywords to the Keywords List
The current, prototype of the HTMLStripper provides five, simple methods by which keywords (and/or phrases) can be added to the list of keywords in the Keywords List View (which, as you'll already have guessed, is located on the Keywords Tab Sheet).
The first two methods are ideal if you have a lot of unusual, individual words, for example if you're a software developer who is documenting types, variables, classes, and functions, all of which normally consist of one word only.
The third method provides greater flexibilty in that phrases can also be added as keywords, and this directly from the text on the Stripped Contents Tab Sheet. The fourth method, is particularly useful in combination with the third method, in that it (somewhat) simplifies creating the inverted forms of keyword phrases. The fifth method is essentially the basis of the fourth method and is the simplest of all.
Common to all methods is that, you can add both, words and/or phrases, as often you like. Only the first time you add a particular word or phrase, will result in the word or phrase being added to the Keywords List View. However, if you retain your keyword list, process a second source file and add a word (or phrase) that is already in the list, the name of the source file and its title will be added to the respective columns of the existing keyword. That is, a link to the second source file will be added to the keyword's links list.
Method 1 (Selecting a range of text in the Distinct Words Editor)
1.  In the Distinct Words Editor on the left-hand side of the Distinct Words Tab Sheet select the range of text that contains the words you want to add to your list of keywords.
Text can be selected in the Distinct Words Editor as in any other editor or word processor, either by positiioning the caret in it and dragging it over the text while keeping the left mouse button pressed or by means of the keyboard's shift and arrow keys.
Alternatively, you can delete all those words you don't want as keywords and select/click the Select All menu item in either the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu.
2.  Select/click either the Add Words to Keywords Menu Item in the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu. This will add all words that occur in the selected text range as individual keywords to the Keywords List on the Keywords Tab Sheet.
Method 2 (Selecting Words in the Distinct Words List)
1.  In the Distinct Words List on the right-hand side of the Distinct Words Tab Sheet select the words that you want to add to your list of keywords.
Individual words in the Distinct Words List are selected, simply by clicking on them while the cursor is positioned over them.
Multiple, separated, words can be selected by holding down the Control (Ctrl) Key on the keyboard and clicking on the individual words you want to select.
A range of adjacent words can be selected by pressing the shift key on the keyboard and clicking on first (or last) word and, while keeping the shift key pressed, moving the cursor to the last (or first) word of the range, and clicking on that word.
2.  As in Method 1, above, selecting/clicking either the Add Words to Keywords Menu Item in the Edit Menu of the Main Menu or in the Distinct Words Tab Sheet's Context Menu will add the selected words to the Keywords List on the Keywords Tab Sheet.
Selecting and Editing Words and/or Phrases to Add to the Keywords List
Apart from adding words from the list of distinct words, you can also select and add both, individual words and phrases, from the Stripped Contents as keywords.
Method 3 (Selecting a text segment in the Stripped Contents Editor)
1.  Through a click of the mouse, position the cursor/caret in the Stripped Contents Editor and mark/select the word or phrase you would like to add to the list of keywords. Note, should the word or phrase extend over several lines, the HTMLStripper will reduce the selected text range to a single line, prior to adding it to the Keyword List.
2.  Open either the Edit Menu in the Main Menu or the Stripped Contents Editor's Context Menu.
3.  In whichever of the two menus you have opened, select/click the "Add the Selection as Keyword" menu item.
Method 4 (Selecting a text segment in the Stripped Contents Editor and Editing it)
This method was primarily devised to make it a little easier to create inverted (keyword) phrases and add them together with the original phrase in more or less in one step.
1.  As in Step 1 of Method 3, position the cursor/caret in the Stripped Contents Editor and select the word or phrase you want to add to the Keywords List.
As you may have noticed, when selecting text in the Stripped Contents Editor, the selected text is replicated in the Keyword Combo Box, which, per default, is situated immediately above the Tab Sheets. This has three advantages when adding a phrase as a keyword
If you have selected a phrase that extends over several lines, it is reduced to a single line in the Keyword Combo Box.
If you want to rephrase (e.g. invert) the phrase, the cursor is closer to the combo box, which already contains the text you will need to modify.
You don't have to open a menu and select the appropriate menu item to add the selected (or modified) text as a keyword. It can be added, simply by pressing the Add/Apply Button, to the right of the Keyword Combo Box.
2.  Press the Add/Apply Button (located on the right-hand side of the Keyword Combo Box), to add the unmodified phrase (you have marked/selected in the Stripped Contents Editor) to the Keywords List.
3.  Position the cursor in the Keyword Combo Box and make the desired changes to the word or phrase.
For example, you might want to invert the original phrase you just have added in steps 1 and 2. If this phrase were "system level applications", all you would have to do is cut "system level" out of the text in the Keyword Combo Box and, together with a comma, append it to the sole, remaining word "applications". The resulting, inverted phrase in the Keyword Combo Box would then be "applications, system level".
In version 0.4, these types of mofiications can be achieved, simply by selecting/highlighting that part of the phrase which should appear at the end and either pressing the "Cut and append Button" in the Button Tool Bar on the keyboard or the Ctrl, Alt., and X keys.
In other words, to produce the same result as in above example, all you have to do now is select/highlight the two words "system level" of the whole phrase in the Keyword Combo Box and either simultaneously press the Ctrl, Alt., and X keys on the keyboard or the "Cut and append Button" in the Button Tool Bar. This will result in the two words being moved from the beginning to the end of the phrase and a comma inserted after the word "applications".
4.  Press the Add/Apply Button again, This will add the inverted or otherwise modified phrase (or word) to the Keywords List as well.
Method 5 (Entering the Keyword Directly in the Keyword Combo Box or Distinct Words List)
Because it is the only method that does not require processing a source file first and can even be applied without one altogether, this method is predestined to create template keyword lists and files. This is useful, if you have numerous "keyword projects" in which you want to include certain words and/or phrases in all or a group of meta tags and/or index files.
1.  Enter the word or phrase you want to add to the keywords template list in the Keyword Combo Box.
2.  Press the Add/Apply Button to add the word or phrase to the Keywords List on the Keywords Tab Sheet.
3.  Repeat steps 1 and 2 as often as necessary to complete your template.
4.  Save the Keywords List as described under ...
Alternatively, if you have a large number of words and/or phrases you want to include per default in your index projects, you can achieve the same result by
1.  switching to the Disinct Words Tab Sheet,
2.  positioning the cursor/caret in the Distinct Words Editor on the right-hand side of the Distinct Words Tab Sheet,
3.  selecting and deleting any remnants of previously generated lists,
4.  entering the words and/or phrases, one word or phrase per line, in the Distinct Words Editor,
5.  selecting the entire text in the Distinct Words Editor,
6.  opening the Main Menu's Edit Menu or the Distinct Words Editor's Context Menu and selecting/clicking "Add Words to Keywords",
7.  and saving the keywords list to a file, as described under Saving the Created and/or Processed Files, below.
Editing Keywords in the Keywords Tree or List View
After you have added a keyword and possibly serveral links to it as well, it may prove necessary to modify its spelling. Of course, you can delete the keyword and add it again, but this might necessitate opening and adding the links to numerous files, in which this keyword occurs, as well. So that this does not become necessary, the keyword can be modified in-place, in both controls on the Keywords Tab Sheet. This is achieved essentially by the same method as in identical controls in other applications, for example, when renamig a file in Windows File Explorer.
Although it is also possible in the Tree View, it is best performed in the Keyword List View. That is, in the list on the right-hand side of the Keywords Tab Sheet.
However, in this version (0.4) we have not yet added functionality which prevents deleting the keyword text completely. Doing so may irrecoverably corrupt the keyword file. Wherefore, this feature ought to be used with great care.
1.  Select the keyword you want to edit, by positioning the cursor over the keyword's text itself and click/press the left mouse button once, to select the desired keyword item.
2.  Press the left button on your mouse while the cursor is over the keyword itself, holding down the button a fraction longer than you would normally. In most cases, after a short time span, this will display an edit field, in which the keyword text is marked/highlighted, ready for editing. If an edit field is not displayed within a tolerable time span, check if the Keyword List View is not marked as being write protected/read-only and repeat the process.
3.  Edit the keyword's text as you would in any other edit field or editor.
4.  Once you're satisfied with your modifications, close the in-place edit field through a click of the left mouse button elsewhere or by presing Return on your keyboard.
Modifying the Links to the Keywords
A feature which we have finally introduced in version 0.4 is a subsidiary window (aka a "tool window") with which the paths and links of the individual keywords can be edited/mdofied. All that is required to do so is
1.  Open the Link Target Details (Tool) Window, by selecting a keyword in either the Keyword Tree or List View and selecting/clicking the "View Link Target Details" in the Main Menu's View Menu. This will result in the Link Target Details (Tool) Window diplaying the first, or if a specific link target was selected in the Keywords Tree View, that link target's properties.
2.  Position the cursor in the edit field of the combo box of the property you want to modify. For example to alter the path and/or the file name of a source file in which this keyword occurs, you would have to position the cursor in the edit field of the uppermost combo box. To specify or alter the relative path to the source and target file, position the cursor in the edit field of the combo box above which it says Link Path, and so on.
3.  Once you've modified one or more link target properties, the Apply Button of the Link Target Details (Tool) Window ought to switch from being disabled to enabled. To apply the changes you've made, press the Link Target Details (Tool) Window's Apply Button.
4.  To view or edit the other keywords' link target properties, in sequence, one after the other, you can switch to the next or previous link target, by clicking on the Link Target Details Window's Previous Link or Next Link Button. To jump to a particular keyword's first link target, simply click on the respective keyword item in the Keyword List View, on the Keyword Tab Sheet of the Main Window.
5.  Once you no longer require the Link Target Details (Tool) Window, close/hide it, by clicking on its close button in the upper, right-hand corner or simultaneously pressing the Control and d keys, while also pressing the shift key, on your keyboard.
Deleting Text, Words, and Keywords
With the exception of the elements displayed in the Integrated Browser, the list on the right-hand side of the Distinct Words List Tab Sheet, and the list of found keywords (if any) on the Auto Find and Add Tab Sheet, it is possible to delete all texts, phrases, and words, including the keywords on the Keywords Tab Sheet, down to individual characters/letters. However, in version 0.4, the methods by which this is (and can be) achieved still differ slightly on some tab sheets.
The following table summarizes the issue.
Control Delete Key
on Keyboard
Delete Menu Item
in Main Menu
Delete Menu Item
in Context Menu
Delete Button in
Button Tool Bar
Source File(s)
Combo Box
Check mark = yes checkmark Check mark = Yes checkmark
Keyword
Combo Box
Check mark = Yes Check mark = Yes Check mark = Yes Check mark = Yes
HTML Source Code
Viewer/Editor
Check mark = Yes Check mark = Yes Check mark = Yes Check mark = Yes
Integrated
Browser
Stripped Contents
Editor
Check mark = Yes Check mark = Yes Check mark = Yes Check mark = Yes
Distinct Words
Editor
Check mark = Yes Check mark = Yes Check mark = Yes Check mark = Yes
Distinct Words
List View
Keywords
Tree View
Check mark = Yes Check mark = Yes Check mark = Yes
Keywords
List View
Check mark = Yes Check mark = Yes Check mark = Yes
Auto Find and
Add List View
Suppressed Words
Editor
Check mark = Yes Check mark = Yes Check mark = Yes Check mark = Yes
Log Check mark = Yes Check mark = Yes Check mark = Yes Check mark = Yes
Exporting the Keywords List
Once you're done with adding words, phrases, and references/links to the Keywords List, you can export it as a HTML keywords meta tag, comma separated values, and/or a Microsoft Compiled Help .hhk file. To do either or all three,
1.  Switch to the Keywords Tab Sheet by clicking on the Keyword Tab.
2.  Open the File Menu in the Main Menu.
3.  Open the "Export as" menu item's sub-menu (by moving the cursor over the "Export as Menu Item")
4.  Select/click the Comma Separated Values, Keywords Meta Tag, or Compiled Help Index File menu item to save the keywords either as a comma separated values, keywords meta tag, or .chm index file, respectively. All three menu items will open the Save As Dialog with the appropriate file type preselected into the Save as Type Drop-Down List.
5.  In the Save As Dialog, use the controls at the top of the dialog to navigate to the folder in which you want to save the respective file.
6.  Enter the file name under which you would like to save the exported file in the File Name Combo Box. The File Name Combo Box is located in the lower half of the dialog between the dialog's File List View (above it) and the Save as Type Drop-Down List (below it). Alternatively you can simply retain the file name we've suggested and which is already entered in the File Name Combo Box.
7.  Close the dialog by means of its Save Button.
Saving the Created and/or Processed Files
With the exception of the list on the Auto Find and Add Tab Sheet, the loaded and generated files can be saved both individually and/or collectively.
Although, unlike the keywords in the Auto Find and Add List, which cannot be saved at all, the pages/files in the Integrated Browser can be saved, but have to be saved individually, on a per page/file basis. Exported files can also be saved only individually, on a per file basis.
Saving the Page(s)/File(s) Displayed in the Integrated Browser
Note, the first three steps only have to be performed if the page has not already been loaded or you have not visited it during your current HTMLStripper session.
1.  Select the URL or path and name of the page to save by entering the full URL in the Source File(s) Combo Box.
2.  Press/click on the Open URL Button in the Button Tool Bar.
3.  Wait for the page to displayed in the Integrated Browser.
4.  Should you have switched to another tab sheet, return to/reopen the Browser Tab Sheet.
5.  Open the File Menu in the Main Menu.
6.  In the File Menu, select/click the "Save as ..." Menu item to open the Save HTML Document Dialog.
7.  In the Language (or Encoding) Drop-Down List*1 of the Save HTML Document Dialog, select the character set/language, in which the the text exposed to users, was authored.
8.  In the Save as Type Drop-Down List, select the type of file as which you save the source code of the page in the Integrated Browser.
9.  Close the Save HTML Document Dialog by means of its Save Button.
 
Saving Files Individually
The files the HTMLStripper generates and/or the modifications you make to the files on the Stripped Contents, Distinct Words List, Keywords, Suppress(ed) Words, and Log tab sheets can be saved on a per tab sheet basis by
1.  switching to the tab sheet on which the file you want to save is located,
2.  opening the Main Menu's File Menu, and
3.  selecting/clicking the Save or Save As Menu Item.
If you have not previously saved a particular tab sheet's file(s), the Save Menu Item will automatically open the Save As Dialog, in which you can select and open the folder in which you want to save the file, specify a file name, and select the file type, as which you want it to be saved. Otherwise the previously saved file's contents will be replaced by the current contents.
The Save As Menu Item always opens the Save As Dialog, irrespective of whether the file has already been saved or not.
Saving Files Collectively
When processing multiple source files, it would be a nuisance to have to save the file on each tab sheet individually. Furthermore, doing so would be error prone, in that saving the contents of a tab sheet could easily be forgotten. This can be avoided by saving the files collectively.
Saving the files collectively is essentially equivalent to switching to each tab sheet and saving it individually, only that this performed automatically. To save the files collectively:
1.  Open the File Menu in the Main Menu.
2.  Select/click the Save All Menu Item.
3.  If the Save As Dialog is opened because a file needs to be saved for the first time and you do not want to skip saving it, proceed as if you were saving the file individually. Otherwise (i.e. if you don't want to save the file), simply close the dialog by means of its Cancel Button. The HTMLStripper will then continue to save the remaining files (if any).
Opening/Loading Files
In this, provisional, version of the HTMLStripper, it is only possible to open/load files into viewers/editors on the following tab sheets
HTML Source File
Browser
Stripped Contents
Keywords
Suppress(ed) Words
It is not yet possible to load the other files produced by HTMLStripper into the appropriate viewers/editors on the other tab sheets (e.g. the Distinct Words, Log, etc.).
Furthermore, when loading files into the Stripped Contents and Suppress(ed) Words Editors, these have to be ANSI encoded files. Only the Source Code Viewer/Editor on the HTML Source Code Tab Sheet and the Integrated Browser are currently capable of handling and displaying Unicode files as well. Attempting to load Unicode encoded files into the viewers/editors on the Stripped Contents and/or Suppress(ed) Words tab sheets will result in the files not being displayed correctly. Editing and saving these files will also fail and/or may lead to data corruption.
Nonetheless, within the limitations imposed by the current functionality, files can be loaded into the respective viewers/editors, much like they are saved individually. That is by,
1.  switching to the tab sheet on which the viewer/editor is located, in which the file should be opened.
2.  Opening the File Menu in the Main Menu.
3.  Selecting/clicking the Open Menu Item in the File Menu to display the Open Dialog.
4.  Selecting the file to open in the Open Dialog's File List View (located in the middle of the dialog) or entering its name in the File Name Combo Box, in the lower half of the dialog, above the File of Type Drop Down List.
5.  Closing the Open Dialog by means of a click on the Open Button.
Creating a Compiled Help, .hhk Index File, Step by Step
1.  Open the following three pages on our website and save each page as a separate, ideally complete, HTML, page, to a easily accessible location on your computer.
ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents
(SST) ShlWAPIFunctionInfo Classes
Tools
2.  Open the Open File Dialog to select the first source file to process, by selecting/clicking on the "Select Source File(s) ..." menu item in the File Menu of the Main Menu.
3.  In the Open File Dialog, select the HTML file that is the first page you have saved in step 1, by selecting it in the dialog's Files List View. This should be the file to which you saved the "ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents" page.
4.  Close the Open File Dialog by means of its Open Button.
5.  Wait for the source code to be displayed on the HTML Source File Tab Sheet.
6.  Switch to the Distinct Words List Tab Sheet.
7.  Process the file by opening the Assignment Menu in the Main Menu and selecting/clicking the Run Menu Item
8.  Wait for the word lists to be displayed in both panes of the Distinct Words List Tab Sheet
9.  In the left pane on the Distinct Words List Tab Sheet, select the entire text, from the word C_ONLINEHELPURL on downward
10.  Open the Edit Menu in the Main Menu.
11.  Select/click the Add Words to Keywords Menu Item.
12.  Switch to the Keywords Tab Sheet to view and/or save the just added keywords.
13.  Save the contents of the Keywords List as described under Dummy Title
14.  Open the source code of the second page you saved in Step 1 (the source file of the page titled "(SST) ShlWAPIFunctionInfo Classes").
15.  Once it has been fully loaded, press the Run Button in the Button Tool Bar and wait for the word lists to be displayed on the Distinct Words List Tab Sheet
16.  In the pane on the right-hand side of the Distinct Words List Tab Sheet, select the words, "Classes", "TAboutBox", and "TForm1".
17.  Add the selected words to the Keywords List, by selecting/clicking on the "Add Words to Keywords" menu item in the Distinct Words List's Context Menu.
18.  Switch to the Auto Find and Add Tab Sheet.
19.  Open the Assignment Menu in the Main Menu.
20.  Select/Click the Auto Find and Add Menu Item in the submenu opened by the Assignment Menu's "Select" menu item.
21.  Select/Click the "Run Menu Item" of the Assignment Menu, the F5 key on the keyboard, or the Run Button in the Button Tool Bar. This will automatically search the currently open source file for all the keywords you have, up to now, added to the keyword list on the Keyword Tab Sheet.
If you want to process more source files, don't forget to select the "Strip" menu item immediately above the "Auto Find and Add" menu item, every time after you've run "Auto Find and Add".
22.  Open the Auto Find and Add List's context menu through a click of the right mouse button.
23.  Select/click the Add Source File as Link Menu Item.
24.  Switch to the Keywords Tab Sheet to view and save the modifications made to the Keywords List.
In this case, the modifications will be less obvious than the previous two additions, because the last action will have primarily added information to the list's second and third columns.
25.  Using the same procedure as before, load the third source file (the HTML source file of the page with the title "Tools").
26.  Switch to the Stripped Contents Tab Sheet.
27.  Strip the source file of its formatting tags by either of the two already described methods or simply by pressing the function key F5 on your keyboard.
28.  In the Stripped Contents Editor, mark/select the phrase "Intel CPU native code"
29.  In the Edit Menu of the Main Menu select/click the Add Selection as Keyword Menu Item.
30.  Position the cursor/caret before the word "Intel" in the Keyword Combo Box.
31.  Mark/select the words "Intel CPU".
32.  Select/click "Cut and Append" in the Main Menu's Edit Menu, simultaneously press the Ctrl, Alt, and x keys on your keyboard, or the corresponding button in the Button Tool Bar. This will automatically cut the selected words and move them to the end of the text in the Keyword Combo Box. It will also automatically insert a comma (",") and a blank, space, before it. The resulting phrase in the Keyword Combo Box will now read "native code, Intel CPU".
33.  Press the Add/Apply Button to the right of the Keyword Combo Box.
34.  In the submenu opened by selecting/clicking the "Select" menu item of the Main Menu's Assignment menu, check the "Find and Add Anchors" menu item.
Note, that to keep this example simple we have specified only three source files and none of these source files contain any anchors (i.e. <a name="ExampleAnchor"></a> tags). Therefore, running the Find and Add Anchors Assignment will not find and add any anchors to the keyword link targets in this example, as is. However, you can easily add some anchor tags to the downloaded source files yourself and then run the Find and Add Anchors Assignment.
35.  Run the assignment by clicking on the Run Assignment Button in the Button Tool Bar.
36.  Switch to the Log Tab Sheet. The log entries made are currently the only means to determine how far the assignment has progressed.
37.  Return to the Keywords Tab Sheet, once all source files referenced in your keywords file have been processed.
38.  Open the first keyword in the Link Target Details (Tool) Window and, in the Link Target Details Window's Link Path Combo Box, enter the relative path from the root folder of your (Microsoft Compiled) help project to the source/target file of the link. In most cases, this relative path is part of the fully qualified path to the source and target file, which is/should be the path shown in the Link Target Details Window's "Source File Path Combo Box".
Note, that executing this step is absolutely necessary, as otherwise, the index of the compiled help will not find (and conequently will not open) the page on which the keyword is located. Unfortunately, in the current version of the HTMLStripper, this also has to be performed for all link targets in your keywords list.
39.  If you want to provide an alternative Uniform Resource Locator (URL), you can do so by entering it in the combo box at the very bottom of the Link Target Details Window.
40.  Don't forget to apply the changes you have made to the link target, by clicking the Link Target Details Window's Apply Button.
41.  Repeat Steps 38 through and including 40 for all keywords and link targets in your list.
42.  While remaining on the Keywords Tab Sheet, open the Main Menu's File Menu and in it select/click the Save Menu Item or click on the corresponding button in the Button Tool Bar, to save the the changes you have just made to the Keywords File, in the previous steps.
43.  Without switching to another tab sheet, reopen the File Menu in the Main Menu.
44.  Select/click the Compiled Help Index File in the sub-menu of the "Export as" menu item.
45.  In the Save As Dialog that ought to have been opened in the previous step, select the folder under which you would like to save the Compiled Help, .hhk, index file.
46.  Enter a file name for the file in the Save As Dialog's File Name Combo Box.
47.  Close the dialog by clicking on its Save Button.
As the exported file (in spite of its file extension) is a plain text file, you can open, view, and edit it in any editor. To acquaint yourself with the format, we recommend doing so. The HTML code, below, shows what it should look like when opened in one. However, if it is at your disposal, we also recommend opening it in the Microsoft (Compiled) Help Workshop.
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta name="generator" content="SST HTMLStripper Version 0.4">
</head>
<body>
<ul>
<li><object type="text/sitemap">
<param name="Name" value="C_ONLINEHELPURL">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="C_SHLWAPIDLLNAME">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Classes">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Constants">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Contents">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<code><param name="Name" value="Developer">
<code><param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<code><param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<code><param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<code><param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
<code></object>
<li><object type="text/sitemap">
<param name="Name" value="Functions">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Intel CPU native code">
<param name="Name" value="Tools">
<param name="Local" value="Tools.htm">
</object>
<li><object type="text/sitemap">
<param name="Name" value="IsValidHandle">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="July">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="last">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="mail"> <param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="native code, Intel CPU">
<param name="Name" value="Tools">
<param name="Local" value="Tools.htm">
</object>
<li><object type="text/sitemap">
<param name="Name" value="PSSTWinResLanguageId">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Reference">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfoAbout">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="ShlWAPIFunctionInfoMain01">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Software">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Table">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TAboutBox">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllGetVersionProc">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllVersionInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TDllVersionInfo2">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TForm1">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedComboBox">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedListView">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTAdvancedMemo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html"> </object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTBasicTextSearchOptions">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTCharSetType">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTDllVerInfo">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="TSSTWinResLanguageId">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Types">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Units">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="updated">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
<li><object type="text/sitemap">
<param name="Name" value="Version">
<param name="Name" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents">
<param name="Local" value="ShlWAPIFunctionInfo Version 1.0 Developer Reference, Table of Contents.html">
<param name="Name" value="(SST) ShlWAPIFunctionInfo Classes">
<param name="Local" value="(SST) ShlWAPIFunctionInfo Classes.html">
</object>
</ul>
</body>
<html>
Using the HTMLStripper with Unformatted Text Files
Although we generally don't process files other than HTML/XML files, or in other words formatted texts, the HTMLStripper can also be used to generate word lists from plain, unformatted texts, such as the "ReadMe.txt" file that is part of the setup package. Just as with HTML/XML files, it will generate the two word lists, which can subsequently be used to create a keyword list from these files. Unfortunately, the word count feature, does not always work entirely reliable, when used with such files if they also contain HTML/XML tags (i.e. if the file is a sort of scratch pad type file).
Using the HTMLStripper for Proofreading
Although, we already had integrated a spell-checker into the HTMLStripper, we were forced to remove it because it only supported a single language and it was clear that further languages would not be made available. But, we have not given up the idea and are looking into alternative solutions.
However, even without an integrated spell-checker The HTMLStripper has two features that simplify proofreading considerably.
1.  The Stripped Contents Editor
2.  The word lists on the Distinct Words List Tab Sheet.
Because, all hidden, on-demand/triggered, texts, such as the cookie settings, are exposed and distracting elements, such as background graphics and advertising images were removed, detecting grammatical and/or usage errors on a HTML page is easier in the Stripped Contents Editor. Typographical errors, on the other hand, can be identified more easily in the alphabetically sorted word lists on the Distinct Words Tab Sheet. The following, two scenarios exemplify the issue.
a. Scenario 1 (Context Required)
Consider the following two sentences:
Eve gave Adam an appeal from the tree of forbidden fruit. To prevent being banned from paradise, Adam requested a hearing before the court of apples.
Unless the sentences are from a satirical or nonsensical version of the bible the words "appeal" and "apples" were either misspelled or used in the wrong context, in spite of having been spelled correctly when examined individually (i.e. out of context).
As these types of errors may require the entire page's text, they can be identified far more easily in the Stripped Contents Editor than in the word lists on the Distinct Words Tab Sheet.
b. Scenario 2 (Simple "Typos")
Consider the following, abridged, list of words
accompanying
accounts
actual
acutal
all
also
...
Prerequisites
preserving
Previous
properies
Properties
purposes
...
Even by going over this list only once, you will have probably noticed the two, typographical errors. They stick out pretty distinctly, because a word seems to occur twice. In fact, this example is from lists generated from two of our website's pages.
Summary
Both features can be used individually or in combination with one another, but still require a human to detect errors in the text or words.
We have adopted the following proofreading method for our pages' texts.
1.  Going down the list of words on the Distinct Words Tab Sheet and if we detect any "unusual" words and/or spellings,
2.  reading either the relevant parts very closely or reviewing the page's entire text in both, the Stripped Contents Editor and/or on the correctly displayed page in a browser.
Footnotes
*1 Depending on the Windows version and edition, the caption that precedes the drop-down list can vary. Whereas it might read "Language" under one Windows version, it may read "Encoding" under another. Furthermore, the captions we have used to name the drop-down list are those of English, Windows, editions. However, all captions on the Save HTML Document Dialog are always those of the primary, user interface (UI), language, under which the HTMLStripper is ruuning.


Discover
Downloads
Support
Site Map


Document/Contents version 1.00
Page/URI last updated on 23.12.2024
 
Copyright © Stoelzel Software Technologie (SST) 2010 - 2017
Suggestions and comments mail to:
webmaster@stoelzelsoftwaretech.com